I gave this talk at OSU’s Open Source Club on 2015-03-26.
Table of Contents
- Differences and features
- Getting started
- Installing PostgreSQL
- Connecting to PostgreSQL
- Setting up users & databases
- Interesting uses
- Document store
- GIS database
- Search engine
- Interface to other data stores
Differences and Features
PostgreSQL database model
Depending on what database you are used to, PostgreSQL structure and vocabulary might seem a little bit different. Firstly, the PostgreSQL model has four layers, as compared to MySQL’s three. A Postgres server will contain multiple databases, which contain schemas, which contain tables.
Server -> Database -> Schema -> Table
PostgreSQL has a few features which I’d like to highlight.
The Postgres developer community has always considered it very important to implement the full SQL standard. Therefore, Postgres is predictable and regular. Postgres does build on the standard, and in several cases, the extensions Postgres supports have become part of the standard later.
Several languages for stored procedures
Postgres supports several languages for functions and stored procedures. It supports the more common SQL and C, as well as:
- Python (which I usually prefer)
There are 42 different types built in to Postgres (at the time of writing - version 9.4), all of which are useful (a full list is available in the documentation). The non-standard types which I find most interesting:
- as well as date, time, and timestamptz (which includes timezone data, and can be retrieved in any timezone).
- I can talk for days on why UUIDs are the best datatype for primary keys, but basically:
- UUIDs can be generated, not incremented. If you generate the same UUID twice, make a note, because almost no one gets to experience that - they’re guaranteed to be unique.
- UUIDs do not imply order in your records, unlike the standard serial integer. This means that in order to get order in your records, you’ll have to use a timestamp column instead - which has much more information.
- I can talk for days on why UUIDs are the best datatype for primary keys, but basically:
- Postgres now natively speaks JSON (as of 9.4). We’ll use this in one of the applications.
- tsvector & tsquery
- These two types allow us to perform text search.
PostgreSQL is designed with extensibility it mind, which has caused its featureset to expand as users add more and more functionality.
Tables, views, etc.
PostgreSQL has some standard database functionality:
Postgres also has some more interesting database functionality:
- Foreign tables & foreign data wrappers
- Materialized views
- Generalized indexes (GiST and GIN)
I’ll talk about these in more detail later.
- Linux: your distro will have a package
- Mac: there’s a packaged app
- Windows: you’ll have to install it manually from the Windows installer
- Graphical: PGAdmin III
- CLI: psql (which is included with Postgres)
Connecting to Postgres
CLI, from server:
sudo su - postgres psql
PGAdmin (will require a user that can access the system remotely - if you’re following along, please use the psql client):
- Click the plug, and fill out the wizard
Set up UTF-8
By default, Postgres has two template databases are ASCII-encoded. We’ll want UTF-8, so we should convert our template databases over before we make any others.
update pg_database set encoding = 6, datcollate = 'en_US.UTF8', datctype = 'en_US.UTF8' where datname = 'template0'; update pg_database set encoding = 6, datcollate = 'en_US.UTF8', datctype = 'en_US.UTF8' where datname = 'template1';
Users and databases
Postgres is slightly different from other databases in that it does not distinguish between users and groups. A “role” in Postgres is a node that can hold permissions, and a “user” is just a special case of role (users can log in, but roles cannot). Any role can be granted to any other.
We don’t want to always access Postgres as the superuser, so let’s create some users.
create user jeff with password ‘hunter2’;
Generally you’ll want to create a database with the same name as your user.
psql will automatically connect to that database, but you can specify a different one with
psql -d <database>.
create database jeff with owner jeff;
A full list can be found in the official documentation, but these are enough to get you started.
|\c||Connect to a DB||
Foreign tables & FDWs
Foreign data wrappers allow you to use other data sources as if they were tables in Postgres. A “foreign table” is a table that is built on a foreign data wrapper.
Some cool foreign data wrappers:
I’ll go more in detail about how to use these later.
A standard view is defined by a query, and the query is run every time the view is called. Materialized views store the result of the query, and then work as a table, putting them midway between a table and a view. They can be refreshed manually right now (and concurrently with use, as of Postgres 9.4). Automatic, regular refreshing is coming with version 9.5.
Combining a matview with a foreign table allows us to keep a local, fast-to-access copy of the remote data.
Using Postgres as a document store (i.e. in place of MongoDB)
Using the UUID and JSON types, we can set up Postgres as a document store. Though there currently is no REST API for Postgres, our SQL query remains the same every time - and when you see the speed benefits, you’ll be sold.
There are two methods of setting up our database for use as a document store. The first one uses two columns, of type UUID and JSON.
create schema jeff; create table jeff.blog ( id uuid PRIMARY KEY, document json );
In order to use this table, we just do:
select document from jeff.blog where id = '00000000-0000-0000-0000-000000000000';
We can create an even simpler table, with just one column, using a Postgres JSON-specific operator (
->>) which accesses the value of a column in the JSON.
create schema jeff; create table jeff.blog ( document uuid );
The query now looks like this:
select document from jeff.blog where document->>'id' = '00000000-0000-0000-0000-000000000000';
Using Postgres as a GIS database
Using the PostGIS extension (which should be in your distro’s package repository, but can be downloaded from their website) Postgres becomes a GIS database - it handles geographical information (quite well I might add). PostGIS adds two types, geography and geometry, and some handy functions for querying those types.
PostGIS has integrated support for the [TIGER data]http://postgis.net/docs/Extras.html), which is freely available data about the geography and political boundaries of the United States. There are several builtin functions which generate bash scripts to download & load the TIGER data automatically. Using TIGER, you can geocode (convert to latitude & longitude) any address in the United States.
There a several open GIS programs which support the use of PostGIS as a backend, but I prefer QGIS.
Using Postgres as a search engine
If you have your site content stored in Postgres, you can use the tsvector and tsquery types to implement search. It gets the job done quickly if there is an index in place; it’s not as full featured as ElasticSearch, but it gets the job done and does it well.
This query demonstrates the functionality and operators:
select to_tsvector(‘hayhayhayhay needle hayhayhay’) @@ to_tsquery(‘needle’);
Using our blog table from before:
select document from jeff.blog where to_tsvector(document->>'content') @@ to_tsquery(‘needle’);
We can use the pg_trgm extension to provide search suggestions. Before we do, we’ll have to generate a table that contains all the unique words in our blog. Let’s assume it looks like this:
create table jeff.blog_words ( word varchar, document id references jeff.blog );
We’ll also have to add the pg_trgm extension in order to do this:
create extension pg_trgm;
If we get no results back using our needle, we can run a query like this:
select word from jeff.blog_words where similarity(word, needle_word) > 0.5 order by word <-> needle_word;
which will return a list of words used in my blog in order of similarity to the needle word.
The similarity function returns a number between 0 and 1 describing the number of matching trigrams in the words in relation to the total number of trigrams. The <-> operator returns the distance, which is 1 - similarity(). Ordering is done in ascending order by default, so ordering by <-> returns the most similar words first.
Using Postgresql as an interface to other data stores
Using FDWs we can connect PostgreSQL tables to other databases. First we’ll have set up our foreign data wrapper. Since the method for this differs, you’ll want to see the documentation for the specific FDW you intend to use. From there we have a standard method (I’ll use the built-in Postgres FDW as my example here):
- Create a server
This object holds the information regarding the connection to the remote server. For this example, we are authenticating with the same user we are currently logged in to. Further information is available in the official documentation.
create server jeffdb2 foreign data wrapper postgres_fdw options (host 'db2.jeffcasavant.com', dbname 'jeff');
- Create a foreign table using the server record
create foreign table jeff.passwords ( url varchar, username varchar, password varchar ) server jeffdb2 options (schema_name 'secure', tablename 'passwords') ;
From then on, the foreign table operates just like a local table - we can grant permissions, select, insert, update, and delete from it. The only thing we won’t want to do (assuming it’s another DB on the other end) is set any constraints. They will be handled by the entity on the other end of the foreign table.
Using the FDWs I mentioned before, and similar setup, we can set up just about anything as a foreign table. Multicorn is a library which allows you to write FDWs in Python, by implementing as few as two of the basic SQL functions.
Postgres is a very capable database. It’s fast and efficient with almost any workload, and it’s SQL Standard compliant. Its focus on extensibility has caused it’s feature set to blossom. There are many more capabilities that Postgres supports which I have not mentioned here (such as sharding, clustering, replication…) which are useful in high availability environments.
In short I recommend Postgres to everyone, for almost everything. It can handle anything you throw at it and does it well.