MySQL 8.0

I’ve already run into a number of issues with MySQL 8.0. Most of them have been related to the GIS features (e.g., all functions are now named ST_, geometries must exactly match the column type, etc…). One new issue I ran into today is that my spatial indices are being ignored because I didn’t create the spatial column with an SRID (whatever, that is!).

SRID and Spatial Indices

Since my current database isn’t using the spatial indices (which are actually there!), I’m going to at least run an experiment while I fix the problem, which I will describe after the chart below. The experiment is to see how much speed-up spatial indexing provides by filling in the chart below with the before/after timing measurements.

Before
(no spatial index)
After
(spatial index)
ST_intersects
short line
medium line
long line

0.976s
1.508s
1.969s

.0037s
.028s
1.960s
ST_contains
short line
medium line
long line

0.957s
1.497s
1.958s

.002s
.002s
.002s
ST_intersects and ST_contains
short line
medium line
long line

1.930s
2.995s
3.880s

.005s
.030s
1.992s
Performance experiment to measure impact of spatial indexing on performance. The operations are being applied to a single column over a dataset that has approximately 3350 entries consisting of POLYGON data for US county boundaries.
“short line” represents a LINESTRING of about 7.5 miles.
“medium line” represents a LINESTRING of about 500 miles.
“long line” represents a LINESTRING of about 3000 miles.
https://topocreator.com/beta/items/getdataset/5117/106 – short
https://topocreator.com/beta/items/getdataset/5891/106 – medium
https://topocreator.com/beta/items/getdataset/9437/106 – long
Note about the LONG line results … it’s entirely possible that the vast majority of the time was spent processing the extremely long query to represent all of the coordinates of the 3000 mile bike ride across the country (Race Across America).
Note about the ST_contains optimized results (identical). It’s likely that there is some optimization that quickly throws out bounds that are far too small to contain the medium and long routes by first looking at the “envelope” sizes. Therefore, it only takes a fraction of a section to throw out all 3350 counties as not being possible to “contain” the envelope of the 3000 mile route. It’s also possible that this is somewhat related to why the long line isn’t optimized very much with ST_intersects.

Starting with the MySQL 5.7 version, the GIS functions and data within MySQL were made more precise by the MySQL development team. You could no longer mix/match geometries within a geometry field. For example, you could no longer store a LINESTRING inside a field that was created as a POLYGON. Additionally, all GIS functions not prefixed with ST_ or MBR were deprecated. But even in MySQL 5.7, GIS data was still essentially cartesian (i.e., abstract coordinates).

Starting with MySQL 8.0, GIS data could now be associated with a Spatial Reference System (SRS) that identifies how to interpret the GIS data. Each SRS is referred to within MySQL 8.0 by its SRID. GPS decimal degree coordinates are SRID 4326. A mercator projection commonly used by Google Maps and other web map systems is specific by SRID 3857. This should be the number of meters away from the equator/prime meridian. This allows the ST_Distance function to properly take into account latitude when doing distance calculations on the embedded GIS data. All of this is explained very well on the MySQL developer blog.

While this is all fine and dandy and actually really useful, it does make a problem when you are already have a ton of SRID-less data and schema that has been imported from MySQL 5.6 all the way up MySQL 8.0. One particular problem that I addressed in previous blogs is that you have to manually (or at least via a custom script) ensure that all data within a field matches the type of that field. The import won’t even work until you do this as I discovered previously.

One thing I did not discover until yesterday is that even if you manage to get all the data imported, the previous spatial indices are now invalid. Another MySQL developer blog documents the process you must go through to upgrade the spatial indices, which can be summarized as verify that the spatial data is only using one SRID, tell MySQL which SRID to use for the index, and then recreate (drop/create) the index. 3/16 7:18AM

I wanted to go ahead and “upgrade” my coordinates so that they are SRID 4326 to take advantage of spatial computations which depend on latitude … but unfortunately, Mysql 8.0 made the decision to use longitude / latitude instead of latitude / longitude. This is unfortunate because there is far too much of my code (as well as the database itself) that have the points stored the other way. Thankfully, the spatial index should work fine using cartesian coordinates, and I already have php code for correctly calculating distances based on latitude (i.e., the earth is a sphere so that the farther away from the equator you are, the closer the longitude lines are together).

MySQL 8.0 … booo! They have violated the original 4326 spec on imported data. Newly imported data using their ST_GEOMFROMTEXT function correctly expects latitude/longitude. But existing “cartesian” data which you try to convert to 4326 is expected to be in longitude/latitude order. I am sure this inconsistency made sense to them. I’ve always stored things as lat/lng because that is how Google Maps refers to everything in their Google Maps API. But I’ve always felt like existing GIS functionality was expecting the X (longitude) first, and I knew it would come back to bite me some day. Still, it would be nice if MySQL 8.0 was consistent and lat/lng was fine everywhere instead of just some places. Boo!

The other commands are shown below:

drop index bound on datasets;
drop index bound_2 on datasets; 
alter table datasets modify column bound POLYGON not null SRID 0;
create spatial index bound on datasets (bound);

Upon completing the indexing, I was able to finish populating the table above with my performance improvements. As a macro example, I created my lifetime overlay map which requires performing 5810 of these queries in a little over two minutes instead of several hours, which reflects the 100-300x improvement shown on an individual query (depending on which type of query). And here is the lifetime county map.

5,810 rides representing my lifetime of riding (with a GPS). There is a substantial number of pre-GPS rides that I cannot display on this map. But still, this is approaching 300,000 miles of riding over the past 20 years covering 875 counties.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *