One of the free software projects I’m involved in is SpatiaLite, which is a set of spatial (as in geospatial) extensions to SQLite.
Most users of spatialite probably don’t even notice – its often embedded, just as sqlite is. However if you are a user who is accessing it through SQL, then occasionally that SQL bit will become a bit confusing.
One user recently asked about why the “Disjoint” query didn’t work as expected. Hopefully the explanation might help other people avoid the same confusion.
I’ll demonstrate this with the SpatiaLite command line tool:
bradh@incana:~$ spatialite :memory: SpatiaLite version ..: 3.0.1 Supported Extensions: - 'VirtualShape' [direct Shapefile access] - 'VirtualDbf' [direct DBF access] - 'VirtualXL' [direct XLS access] - 'VirtualText' [direct CSV/TXT access] - 'VirtualNetwork' [Dijkstra shortest path] - 'RTree' [Spatial Index - R*Tree] - 'MbrCache' [Spatial Index - MBR cache] - 'VirtualSpatialIndex' [R*Tree metahandler] - 'VirtualFDO' [FDO-OGR interoperability] - 'SpatiaLite' [Spatial SQL - OGC] PROJ.4 version ......: Rel. 4.7.1, 23 September 2009 GEOS version ........: 3.3.2-CAPI-1.7.2 SQLite version ......: 3.7.4 Enter ".help" for instructions spatialite> CREATE TABLE areas (AreaName text); the SPATIAL_REF_SYS table already contains some row(s) spatialite> CREATE TABLE points (PointName text); spatialite> SELECT AddGeometryColumn('areas', 'Geometry', -1, "POLYGON", "XY"); 1 spatialite> SELECT AddGeometryColumn('points', 'Geometry', -1, "POINT", "XY"); 1
So now we have two tables (“areas” and “points”) with appropriate geometry columns. We’ll add some content:
spatialite> INSERT INTO 'areas' (AREANAME, Geometry) VALUES ("area1", GeomFromText("POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))")); spatialite> INSERT INTO 'areas' (AREANAME, Geometry) VALUES ("area2", GeomFromText("POLYGON((20 0, 30 0, 30 10, 20 10, 20 0))")); spatialite> INSERT INTO 'areas' (AREANAME, Geometry) VALUES ("area3", GeomFromText("POLYGON((10 0, 30 0, 30 10, 10 10, 10 0))"));
creates the areas, and
spatialite> INSERT INTO 'points' (PointName, Geometry) VALUES ("pointA", GeomFromText("POINT(2 2)")); spatialite> INSERT INTO 'points' (PointName, Geometry) VALUES ("pointB", GeomFromText("POINT(22 2)")); spatialite> INSERT INTO 'points' (PointName, Geometry) VALUES ("pointC", GeomFromText("POINT(12 2)")); spatialite> INSERT INTO 'points' (PointName, Geometry) VALUES ("pointD", GeomFromText("POINT(-10 -10)"));
creates some points.
Some of the points are within one or more areas, and one (“pointD”) is not. The problem is to find the points that are not within any areas.
Here is the original attempt:
spatialite> SELECT PointName, AsText(points.Geometry) FROM areas, points WHERE Disjoint(areas.Geometry, points.Geometry); pointB|POINT(22 2) pointC|POINT(12 2) pointD|POINT(-10 -10) pointA|POINT(2 2) pointC|POINT(12 2) pointD|POINT(-10 -10) pointA|POINT(2 2) pointD|POINT(-10 -10)
Well clearly we have PointD, but we also have other points, sometimes multiple times.
So what is the issue? Lets take away the WHERE clause, and look at what is happening:
spatialite> SELECT PointName, AreaName FROM areas, points; pointA|area1 pointB|area1 pointC|area1 pointD|area1 pointA|area2 pointB|area2 pointC|area2 pointD|area2 pointA|area3 pointB|area3 pointC|area3 pointD|area3
OK, its a simple SQL join. We are getting the product of each table.
So lets look at one area at a time:
spatialite> SELECT PointName, AsText(points.Geometry) FROM areas, points WHERE areas.AreaName is "area1" AND Disjoint(areas.Geometry, points.Geometry); pointB|POINT(22 2) pointC|POINT(12 2) pointD|POINT(-10 -10) spatialite> SELECT PointName, AsText(points.Geometry) FROM areas, points WHERE areas.AreaName is "area2" AND Disjoint(areas.Geometry, points.Geometry); pointA|POINT(2 2) pointC|POINT(12 2) pointD|POINT(-10 -10) spatialite> SELECT PointName, AsText(points.Geometry) FROM areas, points WHERE areas.AreaName is "area3" AND Disjoint(areas.Geometry, points.Geometry); pointA|POINT(2 2) pointD|POINT(-10 -10)
So when we do the WHERE Disjoint() part, we’re only considering whether this particular combination of point and area is disjoint (point is contained within area, for those still confused by what I meant earlier). So our original query was just the same as running the query on each area in turn.
So if that isn’t going to work, then how do we do “not in” for an arbitrary selection of areas and points?
There are probably a few ways to solve this. I came up with two.
Using Contains operation, and selecting anything missing.
Lets look at how to find any point that is contained in an area:
spatialite> SELECT points.rowid FROM areas, points WHERE Contains (areas.Geometry, points.Geometry); 1 2 2 3
So we can then just select any row in the points table that didn’t appear in the result of that “Contains”:
spatialite> SELECT PointName, AsText(points.Geometry) FROM points WHERE (points.rowid not in (SELECT points.rowid FROM areas, points WHERE Contains (areas.Geometry, points.Geometry))); pointD|POINT(-10 -10)
Building a union area
Instead of looking at each area, we could just build a single area that is the union of each of the areas table entries, and then check if a point appears in that union area:
spatialite> SELECT PointName, AsText(points.Geometry) FROM (SELECT ST_Union(areas.Geometry) AS AllAreas FROM areas), points WHERE Disjoint(AllAreas, points.Geometry); pointD|POINT(-10 -10)
Any more ideas?
Anyway, I hope that helps someone, sometime.