Quantcast
Channel: macOS/OS X (Mac) — Plex Forums
Viewing all articles
Browse latest Browse all 6828

HOWTO: query data from PLEX on a Mac

$
0
0
Hi everyone,

Here is a short tutorial on how to extract all sorts of information from the PLEX database using SQLite on a Mac. In fact it is very easy to export all your information to an Excel sheet or HTML file for example, but I could not find an easy-to-use guide anywhere. So I tried a bit of trial and error and found out by myself. Some people may just want a list of all their movies that they can pass on to friends. Some install a virtual machine software on their Mac, put Windows 7 on it and run a 120 MB database tool via the PLEX web interface. Actually, the same can be achieved in a second with one single command and you don't even need to install additional software. Sounds cool? Read on then and discover the possibilities.

Cheers
Smil



#Last updated Nov-27-2011#

Extract information from PLEX database the direct way
PLEX stores its information in an SQLite database. This database can be queried without even starting PLEX and you can access information that PLEX won't show you. The tools needed for queering SQLite database are already installed on your Mac by default. If you are willing to use a terminal window and execute some commands I walk you through the basics of SQLite and PLEX.

A word about databases
How does a database work? Every database consists of simple tables. Think of tables as Excel sheets. All database entries are rows in a table and each column contains one specific type of information. Information is usually spread over multiple tables in a database for the sake of compactness and performance. So if we are looking for a certain piece of information we need to know in which table it can be found.

A good old printed phone book is a good example of a database. Basically, it is nothing but a long table. It has the columns "Last Name", "First Name", "Street", "Street Number", "Phone Number". If we want to find all the people who live in a certain street we can, at least theoretically, consult the phone book. It would cost us some time to check the whole phone book row by row, but a computer can do such a thing in a split second. Phone books in Germany also include a table of street names along with a grid square reference like "A7" or "G14" that refers to the grid of a map (which is also included). So that type of phone book is actually a database with multiple tables.

Our PLEX database contains multiple tables too. The one that is most interesting for us is the one called "metadata_items". It contains the "scraper knowledge" - all the information that our media is tagged with like "title", "year" or "director". It is the information that we also find on web sites like www.imdb.com that relates to the movie as such, not a particular file on our disc. With this knowledge we are almost ready to go. But there is one more thing? Where is the database file on our hard disc? By default it is deeply buried in the directory tree of our home folder. In case of Mac OS X in can be found here (the sign "~" is a shortcut to your home folder):

~/Library/Application Support/Plex Media Server/Plug-in Support/Databases/com.plexapp.plugins.library.db


To open it we can use the database query tool "sqlite3" that comes with your Mac. It is a console program so you need to open a terminal window and navigate to the PLEX database directory with

cd ~/Library/Application\ Support/Plex\ Media\ Server/Plug-in\ Support/Databases


Mind the spaces in the directory path which is why we need to escape them with backslash (\). Alternatively you can look for the folder in Finder and drag it in an open terminal window.

Now open the database with the following command:

sqlite3 com.plexapp.plugins.library.db


You should see the SQLite prompt waiting for your commands:

SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>


Here you can enter your commands as will be explained below. I have written all commands in CAPITALS but you actually don't need to type them in that way. Some commands are preceded by a "." These are special commands that modify the behaviour of SQLite. The first one you should memorize is ".exit" which gets you out of SQLite.


Dump tables
Let us try something simple from the SQLite prompt with the PLEX database opened. The most used command in SQL is SELECT. As the name indicates it is for selecting something. SELECT shows us rows from tables matching certain criteria. Let's try that out:

SELECT * FROM metadata_items ;


This takes all rows and columns from the table called "metadata_items" and prints them to the screen. Your terminal window will be flooded with a stream of information. You just saw all the metadata from your database in a second. Wow... but... not really helpful, is it? So let us search for something more specific. Maybe we just want to dump all the titles we have:

SELECT title FROM metadata_items;


By the way, how did I know that the database had a table called "metadata_items" and a column called "title"? Is that documented somewhere? No, actually it is not documented that is why I write this tutorial. What I did was to use an SQLite Database Browser (http://sqlitebrowser.sourceforge.net) and look inside the database. It is like perusing a heap of Excel tables and finding out what they contain and how they are inter-connected.


Select data
Our first commands resulted in a long lists and they listed all sort of media - not just movies, but also music and pictures. I'm more interested in the movie-related stuff here so let us find a way to restrict queries to those sections of PLEX that only deal with movies. We add a restriction ("clause") with the keyword WHERE and choose a section of the PLEX library. In my PLEX library all movies are in the library_section_id with the number "1". This may be different for you, it depends on the order by which you added media directories to your PLEX installation. So in my case the command would be:

SELECT title FROM metadata_items WHERE library_section_id=1;


Your movies may be stored in a library_section_id with a different number. Let us find out where - we can easily look up this info from within SQLite:

SELECT name,id FROM library_sections;


This will print a list of all library sections and their IDs. Take a mental note of where your movies are and whenever I refer to library_section_id No. 1 you should use your number.


Sort data
That's cool for a start, but so far the lists were not sorted in any way. We can fix that using the command ORDER BY:

SELECT title FROM metadata_items WHERE library_section_id=1 ORDER BY title;



Combine information
Can we have the same list with some more information? Sure thing! We could add the year and director for example:

SELECT title,year,tags_director FROM metadata_items WHERE library_section_id=1 ORDER BY title;


Other nifty columns to query from "metadata_items" are

tags_star
tags_writer
summary
tags_genre
tags_collection 
user_thumb_url 
content_rating 
studio
original_title
tags_country



Format and export output
The output can be formatted in different ways. We can switch the output formatting with the command ".mode". Mind the dot (.) ".mode line" gives us one line per piece of information.

.mode line
SELECT title,year,tags_director FROM metadata_items WHERE library_section_id=1 ORDER BY title;


The command ".help" shows you the different modes that are available. It also shows you how we can write everything to a text file instead of simply seeing it on the screen: ".output filename_goes_here" will print the database query to a file. We could, for example, switch to the output mode CSV and create a table with all our movies and load it in Excel later:

.mode csv
.output $HOME/Desktop/mymovies.csv
SELECT title,year,tags_genre,tags_director,tags_rating,tags_collection FROM metadata_items WHERE library_section_id=1;



Finding specific information
The reason why I dived into this matter in the first place was because I wanted to find a simple method for showing me all movies that missed certain tags. Quite a lot of my movies had no "rating" and finding all those un-rated movies in PLEX is a pain if you have 1000+ movies. Here is how to do just that:

SELECT title,year,tags_director FROM metadata_items WHERE library_section_id=1 AND rating="" ORDER BY title;


This prints out a sorted list of all movies where the field "rating" is empty.

Similarly you could also use comparisons with "<" or ">" to print a list with all movies that have a rating lower than "4" for example. This would give you a list of garbage films ordered by rating:

SELECT title,year,tags_director,rating FROM metadata_items WHERE library_section_id=1 AND rating&lt;4 ORDER BY rating;


You could also generate a Top 10 or Bottom 10 list using the LIMIT keyword. LIMIT just stops delivering results after the indicated number of matches:

SELECT title,year,tags_director,rating FROM metadata_items WHERE library_section_id=1 ORDER BY rating ASC LIMIT 10;


The keyword ASC stands for "ascending sort order". DESC would be "descending sort order".

One thing that annoyed me a bit was the long floating point values for the ratings. Luckily SQLite is able to do some maths and can round the values for us:

SELECT title,year,tags_director,ROUND(rating,2) 'rating (rounded)' FROM metadata_items WHERE library_section_id=1 ORDER BY rating ASC LIMIT 10;


In the above you can also see how we inserted a custom text for the rounded result of "rating" by putting text in single quotes. Note that SQLite can do all sorts of basic calculations which comes in handy when you have to deal with movie runtimes that are given in seconds as a huge number.

Matching
What about finding all movies that have Angelina Jolie in it? We will need to find all entries where the column tags_star contains Angelina Jolie somewhere. Since the "tags_star" field usually contain multiple actors when cannot use WHERE tags_star="Angelina Jolie". That would not work because "=" means "totally identical". We need some sort of "column contains xyz" command. This command is LIKE:

SELECT title,year FROM metadata_items WHERE library_section_id=1 AND tags_star LIKE '%Angelina Jolie%' ;


The expression '%Angelina Jolie%' matches everything that contains "Angelina Jolie". The % signs stand for anything that may appear before or after the search term.


Counting
You can also count the number of matches if you want to do some statistical analysis. This example is just counting the number of movies:

SELECT COUNT(titles) 'Number of movies in database' FROM metadata_items;


That is a bit too simple for statistics, isn't it? We can make it more complex:

SELECT (SELECT COUNT(title) FROM metadata_items WHERE library_section_id=1 AND rating&gt;7)*1.00/(SELECT COUNT(title) FROM metadata_items WHERE library_section_id=1)*100 'Percentage of movies rated 7 or more' ;


What we see here is nesting of multiple SELECT commands. The above just means "number of good movies with a rating higher than7" divided by "number of all movies" multiplied by 100. The odd multiplication by *1.00 is a little trick to force SQLite into calculating floating point values instead of integers. There is also a CAST keyword for converting variables from one type to another should you ever need that.


Query info from multiple tables
So far all the info we wanted to know could be retrieved from one table "metadata_items". The PLEX database contains some more tables. Two other tables that may be interesting for us are "media_parts" and "media_items".

"media_parts" tells us where the actual files are stored, how big they are, when they were created, etc. The following columns are most useful:

file
size
duration
created_at
updated_at



There is also a table called "media_items" which contains technical specifications of all movie files:

width
height
size
duration
bitrate
container
video_codec
audio_codec
display_aspect_ration
frames_per_second
audio_channels
interlaced
display_offset



Whenever we are interested in the physical file on the disc we need to consult the "media_parts" and/or "media_items". These tables, however, do not contain movie metadata such as the movie title or year. That is a problem. If we want to have a list of all movie titles along with the location on our hard disc we need to extract information from more than one table at once. And obviously, if we need to look up the movie title in one table and the file path in another one, then both tables must use some kind of common referrer in order to identify the same movie in both tables. Usually this is some sort of ID number for the media file in question. This is the tricky part and it took me a while to get this sorted out by myself. Let's have a look at an example. I have a movie in my database that is called "Taxman".


Taxman in "media_parts" has id=78624 and media_item_id = 78467
Taxman in "media_items" has id=78467 and metadata_item_id= 114494
Taxman in "metadata_items" has id=114494


Oh... The same movie has an "id" in all three tables but it is not the same thing. "id" in metadata_items corresponds to metadata_item_id in media_items whereas "id" in media_items corresponds to "media_item_id" in media_parts. Confusing. If we need to extract data from metadata_items combined with data from media_parts we have to consult media_items because it is the only table that can 'translate' between the different types of IDs. A bit tricky indeed, but it can be done:

SELECT metadata_items.title,media_parts.file FROM metadata_items JOIN media_parts JOIN media_items WHERE metadata_items.library_section_id=1 AND metadata_items.id=media_items.metadata_item_id AND media_items.id=media_parts.media_item_id;


So what do we have there? First look at how we refer to a column in a specific table with "table_name.column_name". This avoids ambiguities between identically named columns in different tables (as is the case with "id" here). Then we use the command JOIN which extends the query to multiple tables. In the WHERE clause we define which IDs must match so that only lines are printed that all belong to the same movie.

This should be enough to get you going. Of course SQLite can do much much more and some of the above commands can be constructed in a shorter and more elegant way. I leave that to the experts.


Using SQLite queries from scripts
You might think 'how can you possibly remember all that stuff?' Well, actually I can't and you needn't either. I put what I need in a simple shell script and run that. As I mentioned before I had been looking for a simple way to generate a list with movies that require further tagging because important stuff is missing. You can also pipe the output of your script through the whole universe of command line utilities that UNIX offers. I use "sed" for example to beautify the output somewhat. You can also pipe the output to a command line IMDB scraper (yes, such things do exist) and do a lot of amazing things.


Editing the PLEX database
You can of course modify the data in your PLEX database using SQLite but I won't explain how to do it. Too many things can go wrong and I don't want to be flamed by someone who forgot a "," somewhere and now has a 5,000 title movie database where each movie is titled "SELECT," or so. LOL. Do make backups before you undertaking any such experiments.



Part II
The above is, of course, far from complete. One thing I have not mentioned so far is how "cast & crew" information is handled. The database layout is similar to what we see on the Internet Movie Database web site. The fields "tags_director", "tags_writer" and "tags_star" appear on top of the page and are stored in the "metadata_items" table in PLEX, but if you dig deeper IMDB also has an extra "Cast & Crew" information that contains a more detailed list of all the people involved in making a particular movie - partly repeating the same information.

PLEX does exactly the same, it uses an extra table for this type of information and it may be a little tricky to understand how it works.

Three tables are involved:

  • 'metadata_items':
    This is the table we already know with all the general info on a movie, where each movie has exactly one 'id'
  • 'tags':
    Think of the table "tags" as a people database for a moment. It contains the name of a person or category as 'text', the function of a person or category as 'tag_type'. Each person or category with one specific function has exactly one 'id' here. Some people can appear twice or even thrice with different IDs because they may appear in movies aswriters, actors and directors. Clint Eastwood would be a prominent example. Clint Eastwood as director would have tag_type=4, the actor would be tag_type=6,
    the writer would be tag_type=5
    [*]'taggings':
    this one brings movies and people together. Whenever Clint Eastwood appears in a movie and entry in taggings is generated (or should be) that consists of the ID from metadata_items (metadata_items_id), the ID from tags (tag_id) an optional text (text) that may contain the role he plays in that movie, and an index that gives the position in the cast & crew list.


table "taggings"
----------------------------------------------------
text                         = character played in movie
tag_id                       = id from table "tags"
metadata_item_id             = id from table "metadata_items"
index                        = Position in list	

table "tags"
-----------------------------------------------------
tag                           = name of person
tag_type                      =  1=genre,2=collection,4=director,5=writer,6=actor,8=country
id                            = tag_id from table "taggings"


Complicated? Well, not really, is it? What is the benefit of all this, why do we need this cast & crew list when we already have tags_director, tags_star and tags_writer? The reason is that the people's tags in "metadata_items" are limited: There is just one single row for each movie, but there may be more than one piece of information, i.e. multiple actors, multiple writers, multiple genres, etc. A database, however, stores one piece of information once in one place and refer to that same information from all other places. Actually, the information about actors, directors, writers, genres, collection in our table metadata_items are derived from other tables to speed up some often needed queries where we do not need the complete information.

If we want to know all movies that have Angelina Jolie in it, we can search all of the tags_star fields in the table metadata_items for that string (as discussed above). But the proper and more reliable way is to look up tag="Angelina Jolie" where tag_type=6 in the table "tags" and feed the found ids to the table "taggings" that returns the ids in the metadata_items. We could do it like this (clumsy):
SELECT title FROM metadata_items JOIN taggings JOIN tags WHERE tags.tag="Angelina Jolie" AND tags.tag_type=6 AND tags.id=taggings.tag_id AND taggings.metadata_item_id=metadata_items.id;

Or much more elegantly with nested SELECT commands and the keyword IN that you may already know from shell scripts, C, Python or whatever:
SELECT title FROM metadata_items WHERE id IN (SELECT metadata_item_id FROM taggings WHERE tag_id IN (SELECT id FROM tags WHERE tag="Angelina Jolie"));


"IN" is pretty cool. You can use it very often with nested commands where you pass on results from one table to query a second one.

That was the database READING perspective. If PLEX writes actors to the database it needs to
  1. check in the table 'tags' whether it already knows that actor and, if not, generate a new entry,
  2. generate a new entry in 'taggings' for every new movie the actor appears in
  3. optionally update the 'tags_star' for that movie too

Just as a hint for those who want to edit their databases themselves.

As a little piece of inspiration I have attached a bash script with some examples. If it works (and I don't know if it will for you) it sends a list of your movies to your Desktop and will display some more or less useful statistics. You just need to make the file executable (chmod a+x) and run it in the terminal. This for Mac. If you change the Plex library path in the bash script there is a chance that it might work for Linux too.


Part III - Images


How do I get hold of the image files associated with a movie (poster, fan art) using SQLite? Thanks to some helpful souls we are now in possession of this knowledge. It is not for the faint-hearted but if you are the type of guy who would open up each and every gadget in the household to explore how it worked only to find out that it did not (after trying to put it together again at least)... read on.

The images are stored in a "bundle" which is simply a directory in ~/Library/Application Support/Plex Media Server/Metadata/Movies. Each movie has such a directory or bundle of its own. And within those bundles, PLEX stores all sort of information it has retrieved from the depths of the Internet using its scrapers. You might find some interesting things in there, so don't hesitate to explore those directories. The directories, however, have cryptic names that have no resemblance whatsoever to the movie in question. We need to find out what movie belongs to what directory.

The ingredients required to reconstruct the path name of the directory associated with a particular movie are:

- the value of the column 'guid' in the table 'metadata_items'
- the value of the column 'user_thumb_url' in the table 'metadata_items'
- the static elements of the path that always remain the same for each movie

Okay, what is that "GUID" you might ask? We have not dealt with that one so far? It is a Globally Unique ID for the movie (in most cases it is derived from the IMDB title ID which has become a quasi standard for identifying movies).

Let's take a look at an example. On my computer, the movie poster for 'Inception' is stored here:

$HOME/Library/Application\ Support/Plex\ Media\ Server/Metadata/Movies
/7/bdb9c04ce18744eba67fd655757bdabf487ae9d.bundle/Contents/_stored/posters/ com.plexapp.agents.localmedia_92206304aa517b8756d458b05f3ffaebda9d044f

The red parts are fix and always stay the same (at least for movie posters). The violet part must be calculated from the 'guid' in 'metadata_items. We'll get to that soon. The blue part must be extracted from 'user_thumbs_url'.

Let's start with the 'guid'. Easy:

SELECT guid FROM metadata_items WHERE title="Inception";


This will return a string that, in my case, looks like this:
com.plexapp.agents.imdb://tt1375666?lang=en


It can be translated into a directory name (remember the violet part from above) using the so-called SHA1 algorithm for which we have a command line utility in Mac OS X. I can pipe the above into the command 'shasum' and it will return the actual directory name… almost, that is:
echo -n "com.plexapp.agents.imdb://tt1375666?lang=en" | shasum
7bdb9c04ce18744eba67fd655757bdabf487ae9d -


You see, not quite yet… We still need to insert an "/" after the first character to obtain the violet string. And there is also trailing garbage that we have no use for (the "-"). We can fix that all in one go using 'sed'. We could also use awk, bash script or whatever. But 'sed' is so horrible that I must show it to those who have never seen it. Stare at a line of 'sed' code for any prolonged amount of time and you run the risk of developing eyeball cancer.

echo 7bdb9c04ce18744eba67fd655757bdabf487ae9d | sed -e 's/^&#092;(.&#092;{1&#092;}&#092;)/&#092;1&#092;//' -e 's/ .*//'  
/7/bdb9c04ce18744eba67fd655757bdabf487ae9d


Now to the blue part. Quering the value of 'user_thumb_url' from SQLite should be piece of cake by now:

SELECT user_thumb_url FROM metadata_items WHERE title="Inception";


It returns this:
metadata://posters/com.plexapp.agents.localmedia_92206304aa517b8756d458b05f3ffaebda9d044f


We can throw away the whole part with "metadata:/" using the awful 'sed' command again:

echo "metadata://posters/com.plexapp.agents.localmedia_92206304aa517b8756d458b05f3ffaebda9d044f" | sed 's/^.*:&#092;///'
/posters/com.plexapp.agents.localmedia_92206304aa517b8756d458b05f3ffaebda9d044f


Nice, now we can assemble all the parts and the toaster will be up and running again. But this has been a space odyssey, hasn't it? No-one right in his mind would do that step by step in the terminal. In case of a script or programming language - be it Bash, Pearl, Python, C or whatever - we would define a function that does the whole trick swiftly and without further ado. I.e. we feed it a movie title or, better, the metadata_id (because it is unique) and it tells us where the image file is. That would be easy.

Here is how such a function could look like in a Bash script (also using some string substitution techniques built into Bash):

#&#33; /bin/bash

# --------------------------------------------------------------------------------------------------------------------
# Translates metadata_id into file name and path of  corresponding movie poster,
# takes metadata_id as mandatory argument.
# --------------------------------------------------------------------------------------------------------------------

plex_home="&#036;HOME/Library/Application Support/Plex Media Server"

find_movie_poster(){
  raw_path=&#036;(sqlite3 "&#036;plex_home/Plug-in Support/Databases/com.plexapp.plugins.library.db" "select guid,user_thumb_url from metadata_items where id=&#036;1" )
  directory=&#036;(echo -n &#036;{raw_path%|*} | shasum | sed -e 's/^&#092;(.&#092;{1&#092;}&#092;)/&#092;1&#092;//' -e 's/ .*//' )
  filename=&#036;(echo &#036;{raw_path#*|} | sed 's/^.*:&#092;///')
  real_path="&#036;plex_home"/Metadata/Movies/"&#036;directory".bundle/Contents/_stored"&#036;filename"
}

find_movie_poster &#036;1
echo Your movie poster is here:
echo "&#036;real_path"
qlmanage -p "&#036;real_path"


Yeah... you can do some pretty complex things in Bash + command line utilities. It is absolutely possible to export the PLEX database to an XBMC-importable XML file using a simple bash script (I know it because I've just finished scripting it).

Viewing all articles
Browse latest Browse all 6828

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>