Loading Natural Earth data into PostGIS for Illustrator/MAPublisher

Natural Earth Data

The Natural Earth datasets are an incredible resource. If you’re a cartographer you probably already know about it, and if not, you should definitely check it out. It’s a project spearheaded by some fantastic folks affiliated with NACIS. NACIS is a great organization that does so much for the cartographic community–check them out! I use Natural Earth Data a lot so it made sense for me to get all of the data into a format where I could import/crop any dataset into my Illustrator/MAPublisher document without having to even touch a GIS. update: Version 4 just got released on 10.30.2017 and it’s better than ever!

MAPublisher Data Import Spatial Filters

One of the things that I’ve come to really appreciate when making maps with Illustrator/MAPublisher is the ability to import and crop data from a large dataset automatically via the MAPublisher advanced data import spatial filters. Being able to import data directly into your map document from large datasets (like OSM, NHD, GNIS, NaturalEarthData, etc.) without having to worry about clipping data and futzing with different CRS’s in QGIS is really nice. Prior to MAPublisher 9.8.1, the only data type that supported the spatial filters was File Geodatabase. This is fine if you have ESRI products, but I use GDAL/QGIS, and support for file geodatabases, both with the OpenFileGDB driver or the ESRI FileGDB driver, is sketchy at best. The OpenfileGDB driver is read-only and using the ESRI FileGDB driver seems to crash QGIS a lot–especially in an active edit session on a file geodatabase.. BUT, as of MAPublisher 9.8.1, imports from PostGIS with spatial filters is supported. This is fantastic because PostGIS is well supported with OSS and is extremely scalable.

So, now to get all of that nice data into a PostGIS database. This will explain the procedure I followed to import the entire suite of Natural Earth vector datasets into a single PostGIS database.

      1. Install QGIS.
        1. After downloading and opening the .dmg, you’ll get a volume with Qgis and some dependencies, including GDAL. Install all of them (1-4) in order. If you get an error saying that it can’t be installed because it’s unverified, enable the installation of apps from Anywhere in System Preferences>Security and Privacy.
      2. Setup GDAL utilities for the command line
        1. Open up a new terminal window (Applications>Utilities>Terminal.app) or use spotlight, my favorite tool: (cmd+space>type “ter”>Enter).
        2. Run the following three commands:
          • echo 'export PATH=/Library/Frameworks/GDAL.framework/Programs:$PATH' >> ~/.bash_profile
          • source ~/.bash_profile
          • gdalinfo --version
        3. You should see the GDAL version returned (2.1.2 in my case). Now you can use GDAL utilities from the command line, and since that line was added to your .bash_profile file, they will be available every time you start your computer.
      3. Install Postgres.app
        1. Download the .dmg and run it. Drag the elephant to the Applications folder, start up the app, and click “initialize”. That was easy! Now you have postgres (and PostGIS) installed. The server should be running look something like  this: 
      4. Download the NaturalEarth full vector dataset.
        1. On the download page, look for the link for ‘Download all vector these as SHP/GeoDB’

        2. Once you download and extract the .zip file, you should see bunch of folders. Six containing all of the themes broken up by category:

      5. cd over to the data.
        1. Fire up a terminal and ‘cd’, or change directory, over to your downloaded data. mine is in my ‘data’ folder in my home directory:
        2. cd ~/data/_Generic/NaturalEarth/natural_earth_vector
        3. pwd can be used to echo out your current location, and ls -lah will list the current folder with all the deets:

      6. Setup the PostGIS database.
        1. Fire up a psql window by clicking on the elephant in your menu bar>Open Postgres>double-click on any of the default databases:
        2. At the psql prompt, run the following to setup a new database with PostGIS and some schemas to organize the data with.
          CREATE DATABASE naturalearth;
          \c naturalearth
          CREATE EXTENSION postgis;
          CREATE SCHEMA z10m_cultural;
          CREATE SCHEMA z10m_physical;
          CREATE SCHEMA z50m_cultural;
          CREATE SCHEMA z50m_physical;
          CREATE SCHEMA z110m_cultural;
          CREATE SCHEMA z110m_physical;
          CREATE EXTENSION hstore;
        3. Setup DB search paths. This will allow MAPublisher to be able to index the extra schemas for use with the data import spatial filters.
          ALTER DATABASE naturalearth SET search_path = z10m_physical, z10m_cultural, z50m_cultural, z50m_physical, z110m_cultural, z110m_physical, public;
        4. That should be it. The database is all ready for the ogr2ogr imports.


      7. Import all .shp’s into PostGIS
        1. In a terminal (not the psql one) ‘cd’ over to your shapefiles:
        2. cd ~/data/_Generic/NaturalEarth/natural_earth_vector

          List the directory to see what we’ve got. you should see folders for the 6 main categories:

        3. now, cd into the first one, ’10m_cultural’
          cd 10m_cultural/
        4. Run the ogr2ogr import for all .shp’s in this theme:
          for f in *.shp; do PGCLIENTENCODING=LATIN1 ogr2ogr -update -append -progress -skipfailures -nlt GEOMETRY -lco GEOMETRY_NAME=geom -lco SCHEMA=z10m_cultural -lco PRECISION=NO -f "PostgreSQL" PG:"dbname=naturalearth user=username" $f; done;
        5. So, what are we actually doing here:
          for f in *.shp; do #search the folder for all files with .shp extension and do all the rest of this business to them.
          PGCLIENTENCODING=LATIN1 #set the encoding to play nice with some of the feature names with special characters. it's still not perfect.
          ogr2ogr #invoke GDAL/OGR. gdal for raster, ogr2ogr for vector.
          -update -append #append the data to an existing table if preset, not overwrite.
          -progress -skipfailures #show a progress bar. if there are problems, keep truckin.
          -nlt GEOMETRY

          #nlt=new layer type. typically this wouldn’t be necessary as the geometry type would be automatically detected. However, these shapefiles have mixed polyline/multipolyline and polygon/multipolygon geometries in the same .shp. PostGIS will default to “promote-to-multi”, which is usually preferred, but this will result in Illustrator/MAPublisher importing even simple lines and ‘single-ring’ polygons as compound paths. This is annoying when you later need ot break the paths, run join tools on them, etc. SO..specifying “GEOMETRY” will retain the single/multi designation of each feature as it is in the .shp and will be imported correctly into Illustrator/MAPublisher. If you were going to do a bunch of geoprocessing or analysis on this data, this would be no bueno, but since all we’re doing is importing into Illustrator, this is preferred.

          -lco GEOMETRY_NAME=geom

          #lco=layer creation option. Set the name of the geometry column to ‘geom’. This isn’t usually necessary either, but some versions of MAPublisher have the spatial filters hard coded to only detect the geometry properly if ‘geom’ is the name of the column containing the actual geometry.

          -lco SCHEMA=z10m_cultural

          #import all of these .shp’s into this schema. This is important to set correctly based on the name of the folder of shapefiles you are in. z10m_cultural in this case. You’ll change this for all six categories of the Natural Earth Data. the ‘z’ is simply because a schema cannot start with a number.

          -lco PRECISION=NO

          #some of the .shp’s have rounding errors on the fields when importing into PostGIS. Use this to make sure that even features which have these long rounding errors get imported too.

          -f "PostgreSQL" PG:"dbname=naturalearth user=username"

          #this is the data format that ogr2ogr is writing TO. PostGIS in this case. the dbname, and the username you setup. If you have Postgres running on a port other than 5432 or have a password setup, you’ll need to specify that here too.

          $f; done;

          #$f is each shapefile as the ‘for’ command loops through the folder, done; is to the way to end the loop.

      8. Results. After running the command you should see a bunch of progress bars and no errors:
      9. Run for the 5 other categories.
        1. After running that command for one of the folders, ‘cd’ back a directory (../) and then into another one of the directories, and run the command again. paying close attention to modify the schema to match the folder that you are in.
          cd ../10m_physical
          for f in *.shp; do PGCLIENTENCODING=LATIN1 ogr2ogr -update -append -progress -skipfailures -nlt GEOMETRY -lco GEOMETRY_NAME=geom -lco SCHEMA=z10m_physical -lco PRECISION=NO -f "PostgreSQL" PG:"dbname=naturalearth user=username" $f; done;

      10. Clean up the database
        1. Once you’ve done this for all six folders, switch back to your psql terminal and run a vacuum on the database. this will clean up all of the transactions:
          VACUUM FULL;

      11. Import at will!
        1. Once you’ve done all this, you should be able to access this database via Illustrator/MAPublisher. From Illustrator go Advanced Data Import>Add…>PostGIS>New Connection>and enter your connection details:

          Once connected to the database, you should see all of the tables organized in the schemas you setup:

          The best part is now you can be working on a map of any area in the world and have quick access to the entire Natural Earth vector data. Import any or all of the themes you want in a single go cropped and reprojected to your area of interest. Just make sure to use the spatial filter to limit the imported features to your specific map area:



Switch between two versions of GDAL on a Mac

I’m a big fan of Qgis and use it all the time. It’s gotten so stable and fast. It is such a joy to use now! On one of the recent versions that I upgraded to it went from using GDAL 1.11.x to using 2.1.x. This is great, but sometimes I need to be able to write to File Geodatabases. The FileGDB driver and the ESRI FileGDB API SDK that I installed for GDAL 1.11.x no longer works under GDAL 2.1, and as far as I can tell, there is no ESRI FileGDB API SDK that is compatible with GDAL 2.1.x.

So, GDAL 1.11.x is still installed on my machine with the FileGDB drivers, but how can I switch back and forth between 2.1 and 1.11 so that I can still use GDAL 2.1.x with QGIS, but be able to use GDAL 1.11.x on the occasions that I have to deal with file geodatabases? Luckily, there is a simple symlink that points to the current version and re-pointing the symlink is all we need to do.

So, here we go:

First, add GDAL to your PATH variable so you can use GDAL/OGR commands
in a terminal window. (skip this if you already use GDAL from the command line).

echo 'export PATH=/Library/Frameworks/GDAL.framework/Programs:$PATH' >>~/.bash_profile

source ~/.bash_profile

#cd over to the GDAL Framework ‘Versions’ directory

cd /Library/Frameworks/GDAL.framework/Versions/

#list the folder to see the installed versions and the ‘Current’ symlink.

ls -lah

you should see something like this, with the ‘Current’ Symlink pointing at version 2.1:

 #check the current GDAL/OGR version:

gdalinfo --version

 should be 2.1.x, the same as the symlink.

#just for kicks, show that we don’t have access to the ESRI FileGDB driver with 2.1.x:

ogrinfo --formats | grep "FileGDB"

 as expected, we only have the OpenFileGDB driver.

Ok, now let’s make the switch.

#unlink, and then re-point the symlink at the older version of GDAL:

unlink Current && ln -s 1.11 /Library/Frameworks/GDAL.framework/Versions/Current

#and list the folder again to show the modified symlink:

ls -lah

now your active GDAL version has changed to the older version. 1.11 in this case.

#check your GDAL version, it should be the older one now:

gdalinfo --version

#check the available formats, this should show the ESRI FileGDB drivers (if you’ve installed them):

ogrinfo --formats | grep "FileGDB"

Now you can use GDAL as normal. Your newer installation of QGIS might not work properly with this older version of GDAL, so just use it from the command line. Once you’re done messing with your file geodatabases, re-point the symlink back to GDAL 2.1.x:

#re-point the symlink back to GDAL 2.1.x:

unlink Current && ln -s 2.1 /Library/Frameworks/GDAL.framework/Versions/Current

#and check the version, which should be switched back to 2.1.x:

gdalinfo --version