Importing spatial data to PostGIS
If you’re new to PostGIS and want to try it out, one of the first things you’ll want to do is to import some data into your database. I want to provide an overview of how to accomplish this. The post provides an introduction to some common data import terminology, details on where to find spatial data and tools for importing data to PostGIS, and specific instructions for how to use each tool to perform the import.
For an introduction to PostGIS, see my previous blog post here.
Introduction
If you’ve worked with other data types before, then most of the process for importing spatial data will be self-explanatory. However, for someone working with spatial data, acronyms like CRS and SRID might be confusing.
For data import, you should know the coordinate reference system (CRS) your data is in and which spatial reference identifier (SRID) number is used to reference that specific CRS. The SRID defines all the parameters of a data set’s geographic coordinate system and projection. Using an SRID is convenient because it packs all the information about a map projection into a single number. When creating spatial objects for insertion into the database, the SRID is required.
Where to find spatial data?
Before you can load any data to your database, you need to acquire it. I think that one of the most underrated skills for GIS experts is to know where to find suitable data. If you are using PostGIS to manage the data that you have in your organization already, you don’t have to start your process by searching new data, but rather working with your own. I have added here a few data sources that you can try out:
- OpenStreetMap: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Many people refer to OSM as the Wikipedia of maps. If you seek easy ways to get an extract of the data, you can check for GeoFabrik for Shapefiles or osmdata.xyz for GeoPackages.
- Natural Earth Data: Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software. So, if you need country boundaries, states or railroads in the world on a very general level, this is your data of choice.
- Free GIS Data: The page contains a categorised list of links to over 500 sites providing freely available geographic datasets.
So, after you have loaded some files to your local disk, you can start looking at different ways on how to import them to PostGIS.
Tools for importing data to PostGIS
Although the official documentation states that there are two ways to get data into a PostGIS/PostgreSQL database, the ways to achieve this are much more diverse. The two recognized by the documentation are formatted SQL statements or using the Shape file loader/dumper (shp2pgsql), so we can start with those.
Formatted SQL
So loading with plain SQL is not recommended and nobody probably does it, but going through this method first, gives you a good idea of what is happening behind the scenes. It’s important to know how PostGIS handles spatial data types and the content of the geometry column, which can’t store strings or integers.
For example, a valid insert statement to create and insert an OGC spatial object would be:
INSERT INTO data.islands(geom, the_name ) VALUES ( ST_GeomFromText(‘POINT(0 0)’, 4326), ‘NULL ISLAND’);
Another example is a method where you already have latitude and longitude data as text or numbers in your table and you want to build a valid geometry object from those.
First you need to add a geometry column for your data:
SELECT AddGeometryColumn (‘data’,’islands’,’geom’,4326,’POINT’,2);
Next you could update the content of the geometry column based on the columns from the data
UPDATE data.islandsset geom = ST_SetSRID(ST_MakePoint(longitude, latitude),4326);
In a nutshell: a simple insert works, but remember to build your geometry in a valid way!
For better performance, you should use COPY statements instead of INSERT, as it results to much better performance.
shp2pgsql
The most common data format for spatial data has traditionally been the ESRI shapefile. shp2pgsql is a command line tool to import ESRI shapefiles to the database. Under Unix, you can use the following command for importing a new PostGIS table:
shp2pgsql -s <SRID> -c -D -I <path_to_shapefile> <schema>.<table> | \ psql -d <databasename> -h <hostname> -U <username>
On Windows the process is also simple. Additionally there’s also the shp2pgsql-utility which is a small GUI to import shapefiles to PostGIS.
ogr2ogr
The tool I use most often for loading data to PostGIS is probably ogr2ogr. It is a very powerful tool to convert data into and from PostGIS to almost all possible vector data formats. GDAL/OGR is a software package that powers a great variety of different geospatial software tools and it comes with QGIS already installed on the server computer. Conveniently ogr2ogr comes with the QGIS installation and Windows users can access the commands directly through OSGeo4W Shell. So for a single import of a shapefile, you would run the following command:
ogr2ogr -f “PostgreSQL” PG:”host=<hostname> dbname=<dbname> user=<yourusername> password=<yourpassword>” <dir>\yourdatafile.shp -lco SCHEMA=foo
The parameters used to run the ogr2ogr-command here are:
-f output file format name
-lco layer creation option
Combining ogr2ogr commands with simple bash allow you to load a folder full of shapefiles!
for %i in (*.shp) do ogr2ogr -update -append -f PostgreSQL PG:”host=port=5432 dbname= user= password= schemas=myshapefiles” %i
This small and simple script has been useful many times. This will create you a table automatically and append all the files to the correct tables.
Loading data with QGIS
QGIS allows a few different approaches when it comes to loading spatial data to PostGIS. In my previous blog post you can find some basic information about QGIS and how to connect to PostGIS. Inside QGIS you can do data import through DB Manager, but I recommend using Export to PostgreSQL, as it is using COPY statements in the background rather than INSERTS and performance is much better. You can load data layers from the project or from disk. Below you see a screenshot from the main dialog.
Besides ogr2ogr this is another tool for data imports that I use very often, as most of my workflows are strongly QGIS related.
Loading data with Python & psycopg2 or GeoPandas
If you are integrating PostGIS into an application and want to automate things, GUI-workflows are not viable options. Instead you might want to look into loading data with Python. The most obvious choice for making a connection from Python to PostGIS is psycopg2. The most important thing to understand when working with psycopg2 are the data types for geometries that were already mentioned earlier. Inserting latitude and longitude values as such to a geometry column will not work, but instead you have to build your insert value with the lat and long parameters as follows:
ST_SetSRID(ST_MakePoint(%s, %s), 4326)
Other aspects of psycopg2 workflows won’t differ much from normal Python-PostgreSQLdata pipelines.
If you are using Pandas library with your data, you might be also interested in knowing that there is a GeoPandas library especially for geospatial data. Just recently there was an addition to GeoPandas to import a GeoDataFrame to PostGIS! It supports the same functionalities as Pandas, so you can also use “replace” and “append” when pushing the GeoDataFrame to PostGIS.
raster2pgsql
If you are working with raster data, the benefits of moving your data from a file to a database are not nearly as obvious as with vector data. However PostGIS offers functionalities to store and analyze raster data and a wide range of raster functions. Check for example this blog post about working with raster in PostGIS. For loading raster to PostGIS, two main options are raster2pgsql tool or GDAL with Python. Just like shp2pgqsl, raster2pgsql comes packaged with a PostGIS bundle installation.
Basic structure of a raster2pgsql command is as follows:
raster2pgsql raster_options_go_here raster_file yourtable > out.sql
Just like shp2pgsql, this outputs a SQL file that you can then run in PostGIS.
Tools for OpenStreetMap data
OpenStreetMap is a slightly specific case, but worth going through here separately, as it is widely used. The OpenStreetMap data structure in a database is by default divided into lines, roads, points and polygons. Depending on the application, you might want to change the style the data is loaded. However, the default structure works well with many tools and services. Most Linux distributions include osm2pgsql, which is a good generic tool in importing a small piece of OSM data to PostGIS. osm2pgsql is actively maintained and widely used.
A basic way to load the data into PostGIS for rendering would be
osm2pgsql –create –database postgres data.osm.pbf
This will load the data from data.osm.pbf into the planet_osm_point, planet_osm_line, planet_osm_roads, and planet_osm_polygon tables.
A thorough walkthrough on the osm2pgsql import process can be found at this blog.
Imposm3, written in Go, is designed to create databases that are optimized for rendering. You need a json mapping to define the data schema. For a simple import with imposm you could try the following command:
imposm import -connection postgis://user:password@host/database \ -mapping mapping.json -read /path/to/osm.pbf -write
Conclusions
Now you might be overwhelmed with the number of tools available to achieve the same goal. But what is the best way of loading data to PostGIS?
It depends. Are you loading static files or something through a live API? Do you have PostGIS installed on your own server or do you have a hosted service? How will you be using the data? For example, with OpenStreetMap data, you must consider whether you’re updating the data or performing a one-time load. Note that also ETL tools such as FME and Deegree allow data loading, if those fit your purpose better than any of the tools I’ve presented here.
As a bonus, if you don’t want to load data at all, I should also mention the ogr_fdw. If you are familiar with traditional PostgreSQL foreign data wrappers, this is basically the same thing but for spatial data. Basically, the data can be in a ESRI shapefile or a shapefile, and stored to PostGIS as a foreign table.
Hopefully, this gives you a good overview about the different ways to get spatial data into your PostgreSQL tables. Happy exploring!