I’ve already run into a number of issues with MySQL 8.0. Most of them have been related to the GIS features (e.g., all functions are now named ST_, geometries must exactly match the column type, etc…). One new issue I ran into today is that my spatial indices are being ignored because I didn’t create the spatial column with an SRID (whatever, that is!).
SRID and Spatial Indices
Since my current database isn’t using the spatial indices (which are actually there!), I’m going to at least run an experiment while I fix the problem, which I will describe after the chart below. The experiment is to see how much speed-up spatial indexing provides by filling in the chart below with the before/after timing measurements.
Before (no spatial index) | After (spatial index) | |
ST_intersects short line medium line long line | 0.976s 1.508s 1.969s | .0037s .028s 1.960s |
ST_contains short line medium line long line | 0.957s 1.497s 1.958s | .002s .002s .002s |
ST_intersects and ST_contains short line medium line long line | 1.930s 2.995s 3.880s | .005s .030s 1.992s |
“short line” represents a LINESTRING of about 7.5 miles.
“medium line” represents a LINESTRING of about 500 miles.
“long line” represents a LINESTRING of about 3000 miles.
https://topocreator.com/beta/items/getdataset/5117/106 – short
https://topocreator.com/beta/items/getdataset/5891/106 – medium
https://topocreator.com/beta/items/getdataset/9437/106 – long
Note about the LONG line results … it’s entirely possible that the vast majority of the time was spent processing the extremely long query to represent all of the coordinates of the 3000 mile bike ride across the country (Race Across America).
Note about the ST_contains optimized results (identical). It’s likely that there is some optimization that quickly throws out bounds that are far too small to contain the medium and long routes by first looking at the “envelope” sizes. Therefore, it only takes a fraction of a section to throw out all 3350 counties as not being possible to “contain” the envelope of the 3000 mile route. It’s also possible that this is somewhat related to why the long line isn’t optimized very much with ST_intersects.
Starting with the MySQL 5.7 version, the GIS functions and data within MySQL were made more precise by the MySQL development team. You could no longer mix/match geometries within a geometry field. For example, you could no longer store a LINESTRING inside a field that was created as a POLYGON. Additionally, all GIS functions not prefixed with ST_ or MBR were deprecated. But even in MySQL 5.7, GIS data was still essentially cartesian (i.e., abstract coordinates).
Starting with MySQL 8.0, GIS data could now be associated with a Spatial Reference System (SRS) that identifies how to interpret the GIS data. Each SRS is referred to within MySQL 8.0 by its SRID. GPS decimal degree coordinates are SRID 4326. A mercator projection commonly used by Google Maps and other web map systems is specific by SRID 3857. This should be the number of meters away from the equator/prime meridian. This allows the ST_Distance function to properly take into account latitude when doing distance calculations on the embedded GIS data. All of this is explained very well on the MySQL developer blog.
While this is all fine and dandy and actually really useful, it does make a problem when you are already have a ton of SRID-less data and schema that has been imported from MySQL 5.6 all the way up MySQL 8.0. One particular problem that I addressed in previous blogs is that you have to manually (or at least via a custom script) ensure that all data within a field matches the type of that field. The import won’t even work until you do this as I discovered previously.
One thing I did not discover until yesterday is that even if you manage to get all the data imported, the previous spatial indices are now invalid. Another MySQL developer blog documents the process you must go through to upgrade the spatial indices, which can be summarized as verify that the spatial data is only using one SRID, tell MySQL which SRID to use for the index, and then recreate (drop/create) the index. 3/16 7:18AM
I wanted to go ahead and “upgrade” my coordinates so that they are SRID 4326 to take advantage of spatial computations which depend on latitude … but unfortunately, Mysql 8.0 made the decision to use longitude / latitude instead of latitude / longitude. This is unfortunate because there is far too much of my code (as well as the database itself) that have the points stored the other way. Thankfully, the spatial index should work fine using cartesian coordinates, and I already have php code for correctly calculating distances based on latitude (i.e., the earth is a sphere so that the farther away from the equator you are, the closer the longitude lines are together).
The other commands are shown below:
drop index bound on datasets; drop index bound_2 on datasets; alter table datasets modify column bound POLYGON not null SRID 0; create spatial index bound on datasets (bound);
Upon completing the indexing, I was able to finish populating the table above with my performance improvements. As a macro example, I created my lifetime overlay map which requires performing 5810 of these queries in a little over two minutes instead of several hours, which reflects the 100-300x improvement shown on an individual query (depending on which type of query). And here is the lifetime county map.
1 comment