An epic “merge”

I have a development system which has diverged quite a bit from the corresponding production system. I have deleted, added, tweaked, fixed, edited, and looked at code on both systems separately without committing on either system. This has dragged on for several years now as I dreaded the day I would need to merge the two. The day has come. Skip to the details of the merge if you don’t want to read my long treatise on the history of topocreator.com.

History

I’ve been working on topocreator.com for many, many years (originating with a project I gave my software engineering students in 2007 after wondering on my bike ride into work exactly how tall the hills were I could see stacked up against each other from a vantage point crossing Little Shades Creek on Rocky Ridge Rd).

I eventually got a site up and running and decided to separate my production website from a development version of the website where I wanted to experiment with new features. I also was using CVS as the version management system before eventually converting everything to Git some number of years ago that I no longer remember.

So at some point somewhere in the last 3-5 years, I had an identical copy of the working version of the software on the production system as well as the development system (XAMPP running as a VM on my mac). I only worked on the project sporadically and mostly just fixed bugs as I discovered them on the production system.

I never thought that the codebase was too far apart, and there were the troublesome config files and paths that were different on both systems, so I didn’t think too much about continuing to work on them separately. But a couple years ago I wanted to do a fairly major overhaul to the system and add some new features. I did this on the development system. Unfortunately, it seemed like it was going to be too much effort to get all those changes working on the production system – an underpowered remote linode without enough storage to support the features I had added to the development system.

So what ended up happening is I had a limited feature version of the website on the production system I could use when away from home without network access to the home system … while continuing to use and improve my development system as my own personal production system (i.e., the one I used) so that I could access the more advanced features. This was not a great idea for a number of reasons as documented below.

First, this split my data between two different systems. The production system accessed the production database, but the development system accessed the development database. Both needed all my rides … so I needed to upload all the rides to both systems … separately … and at different times. Now when I’m ready to merge the two systems together, I will need to make sure that the ride data is also merged AND not duplicated. This shouldn’t be too hard, and I will feel much better about life once I have a single consistent dataset associated with my ride life.

Secondly, the code itself diverged quite a bit. I fixed bugs on the production system as I ran into them. I also even added some of the features that I had also added on the development system (because I had either forgotten I had already written the code there … or because there was enough differences between the two that it made sense to just re-implement the feature on the production system). I also fixed bugs on the development system that may or may not have been fixed … or have been fixed independently … on the production system.

That pretty much sums up the challenge. I’m going to tackle the code merge first before tackling the data merge since it will also involve writing a bunch of code that I will want to be sure to include in the git repository and don’t want to have to merge that code later!

Code Merge using Git

2/20/22 11:06AM – I am starting out with a big picture view of the situation with a simple git status command.

Big picture view: development system on the left … production system on the right … with annoying .DS_Store files that somehow magically appear even when mounting the filesystem remotely over sshfs … thank you Mac OSX for cluttering up the filesystem with horribly named and capitalized files. Note that I have already added them to the .gitignore on the development machine (Mac – left), but haven’t added them to the production machine (Linux – right). This is one of the many inconsistencies I need to resolve … note that they both need committing!

Baby step

First, I thought I’d just tackle the .gitignore conflict since it would also resolve the “untracked files” on the production system. Seems simple enough. I have detailed the process with timestamps after outlining it first. I started the process on 2/22 8:55AM and I just now finished the last bits on 3/2 9:15AM.

  1. Commit my relatively sparse .gitignore as a standalone commit on both systems.
  2. Push the the development system commit (which has more changes) to github. 2/22 8:59AM – surprise #1: the production system has a more recent commit than the development system. Going to resolve by git pull onto the development system, which is probably going to require a pretty big merge of all the bug fixes I have made over the years
    1. Good news … only one file affected: Model/dataset.php has been modified on the development system, but it also has been modified and pushed on the production system.
    2. Ok, no significant changes on production system commit (had removed a blank line at the end of the file and also removed some already commented out code). Note that there are some MAJOR changes in Model/dataset.php on production that haven’t even been committed yet.
    3. Committed the changes on development system.
    4. Then pulled again from github and auto-merged the remote commit (which only listed Model/dataset.php) as having a conflict.
    5. Attempt to push again, but failed b/c we needed to pull again. Git pull, led to auto-merge, then had to git push / git pull/auto-merge/push twice with my development being 4 commits ahead of production. After the second git push, it says “up to date”. 2/22/22 9:15AM
  3. Pull on the production system, which should give a conflict that possibly can be merged without any work on my behalf. One potential surprise here is if I have committed from anywhere else (I actually have a bunch of different computers I’ve used as development over the years) that it might need to pull/merge multiple files instead of just the .gitignore.
    1. Interestingly, the conflict detected on production is Model/Dataset.php and not .gitignore. Those are very important changes to support the new version of MySQL on production (but not development).
    2. After committing, I was able to git pull with no errors (auto-merged). Scary to do that on a production system, so I ran some basic tests afterwards. All passed.
      2/22/22 9:22AM
    3. Git push the changes and just out of curiosity, repull the changes on the development machine. 2/22/22 9:35AM
  4. Merge (hopefully automatically) the .gitignore changes and then recommit the merged .gitignore just to completely get it taken care of. Theoretically you could do this one at a time for all files that need merging. But I think it is helpful to resolve any surprise issues when only trying to merge a single file, then having to address a bunch of issues before having either system be clean.
  5. Also, another advantage of just working with the .gitignore is that it should have NO IMPACT on the production system.
  6. Ready, set, go…after planning out everything above, I began the process at 2/22/22 8:55AM. Also apparently, today’s date has a lot of 2s in it. Too bad I didn’t start this at 2:22AM or PM.

Deciding to crawl

Originally, the plan was to do a bunch of the conflicting files at once, but after seeing how much it takes to resolve a single file, I’ve decided for at least the next few files to do them one at a time. I just finished up Config/util.php in about 15 minutes as follows:

  1. Production only had a small change, so I decided to commit and push it first. Then knowing that I would receive an error if I tried to pull, I went ahead and went straight to committing, but mistakenly tried to push (I should have known that would error out given that it was the same file that I just pushed from production) …
  2. No problem, I simply went ahead and pulled and auto-merged, leading to a new auto-commit on the development machine. This meant I could go ahead and push right away since it was already auto-committed. No errors.
  3. I pulled on production to grab the new copy of util.php and it passed my adhoc testing.
  4. Of note about the changes that I had made on production: I added a function to retrieve timezone info using the Google timezone API. Of note is that my API key is embedded in the function. This is not a problem if I use the same server for production and development, which is the current plan once I am able to migrate all the data from the development server. But that API key will not work on my development machine unless I add it to the list of allowed addresses, which is not a problem, but I just have to remember to do it.

Stepping it up slightly

I’ve decided to try doing an entire folder (Components) at once … It’s just two files … let’s see all it goes.

  1. The decision of which to commit first (development or production) is based on which is going to be easier to merge. Production only has a few small changes in both files according to the git diff command, so I’m going to start by commit/pushing those changes. 2/27 7:15pm
  2. No need to try to git pull on development. This will cause an error. So I’m going to go ahead and commit on development next … then pull to get the error and begin the merge process (possibly automatic). Definitely not automatically merged…

    2/27 7:24PM
  3. When this happens, Git automatically edits the file to highlight where the conflict is. So it’s a little misleading … but the messages above indicate that the RouteComponent file could not be auto-merged. Even the though the “Automatic merge failed” occurs immediately after the GlobalMapperComponent line, that message is referring to the entire operation. And there is no specific conflict message regarding the auto-merge of GlobalMapperComponent. I discovered this by grepping for “HEAD” and it only appeared in the RouteComponent file similar to the example shown here (I forgot to take the screenshot) – https://stackoverflow.com/a/49591903
  4. As it turns out, I wanted the changes in both files, so I manually edited the file, saved it, added using git add to the next commit, committed, pushed, and then pulled on production. All “seems” good on production as I did some impromptu testing of what I think is the affected features, and those worked fine. I did not run my usual tests as I’m going to do another couple folders quickly, so I just wanted to verify the updated code first.
    2/27 7:36PM

Next up: the model folder – only two changes on development, but LOTS of changes on production because I had to update nearly all the models to support MySQL 8.0 version of the GIS command ST_AsText instead of just AsText. My changes on development were quite a bit more substantial and hopefully on different lines. So I’m going to take the same approach … commit/push production … then deal with the conflict on development. Only one file couldn’t be auto-merged … easy fix. 2/27 7:47PM

Now tackling the “View” subfolder starting with one of its subfolders “Elements” with changes ONLY on development. Easy commit/push followed by a pull/test on production. 2/27 7:50PM

Next up is the “View/Items” subfolder … entire process took less than five minutes and followed same strategy as “View/Elements” subfolder. Manual merge of menu changes on same line, but otherwise the auto-merge functionality worked perfectly. 3/1 8:25AM

I thought the merge of the GoogleMapV3Helper.php file would be a relatively straightforward merge, but as it turns out I had made a bunch of changes on both sides and added a feature independently on both sides that required support from several controller files, several view files, and several template files that I also needed to first commit from development before my route markers would display properly again on production. So this greatly accelerated my jump to the end of the merge as I just went ahead and committed/pushed everything from production, merged it back into development, and then committed/pushed from development back into production.

One last hiccup was syncing all the config files (which probably shouldn’t be in github to begin with). Once I manually straightened all of that out with a series of git commit/push commit/pull/merge/pushes, I was able to obtain the following screenshot with my tests passing on production:
Note that I had to modify the config already on production to match my production setup … the dangers of including config files in your version management system. This should generally be avoided, but it’s how I currently have everything setup, so I’m just going to leave it that way for now.

Next steps: I am going to wipe out the development system I’ve been using and clone the working copy from production onto a more compatible development system so that the config changes between development and production are trivial.

As part of the creation of a new development setup, I am also going to tackle merging my development “pseudo-production” database with the actual production database as there is unique data on both that I consider “production” since I have been using the development system as my own personal “production” system for quite some time.

All finished: 3/2 9:15AM

2 comments

Leave a comment

Your email address will not be published. Required fields are marked *