sharing-the-how-why-and-when-of-database-engine-migration
Engineering
Jun 17, 2022

Sharing the How, Why, and When of Database Engine Migration

Marketa Slukova
Senior Backend Engineer

Senior Backend Engineer Maggie Slukova recently led the Catalog team in migrating one of their databases to PostgreSQL. From hidden benefits to the value of deadlines, here’s what she learned.

I joined GetYourGuide as an Associate Backend Engineer in January 2020, and am based in the company’s Zurich office. My team’s main area of responsibility is the Catalog domain, which stores most of GetYourGuide’s activity and location data. Because this essential database sits right at the heart of our product, it’s crucial that it runs smoothly. For that reason, in 2021 we decided to migrate a central part of the Catalog domain – the management database – to the PostgreSQL database engine. Here’s how we did it, as well as some of the surprising benefits of making the change.

{{divider}}

The Challenge

The Catalog domain lies at the core of GetYourGuide’s product. It’s where we store all static activity data: things like text content, photos, categorisation – everything except for pricing and availability. The data comes from suppliers creating products that can be edited by internal staff and eventually booked by customers on our website. This splits our domain into two main areas: supplier, or management; and customer, or delivery. While delivery is concerned with presenting data to customers, management entails supplier and internal traffic. We have separate databases and APIs for each.

Selecting a New Database Engine

The nature of the management database means it is fairly small in size, has heavy write traffic with a lot of asynchronous tasks, and contains a number of views supporting internal search functionalities. This database was using MySQL 5.7, a fairly outdated version of MySQL, which does not support features such as materialised views or advanced JSON field functionalities. Due to certain infrastructural limitations, we were not able to upgrade the version to a newer one.

The schema of the management database changes often. However, we were experiencing a lot of issues when running database migrations with MySQL. One of the main differences between MySQL and PostgreSQL is their schema change handling: unlike MySQL, PostgreSQL offers transactional schema changes, making it an advantageous solution.

Most of our other databases – including the delivery database – were already using PostgreSQL with few issues. For these reasons we decided to migrate the management database to PostgreSQL.

Migrating a Database Engine: Takeaways, Learnings, and Process

Migrating between database engines is a process in which you recreate your existing database with all its data in a new engine and then point your service to the new database. The nature of your database and service defines the complexity of this project. For the Catalog team, the task was relatively straightforward due to the service’s capacity to handle downtime, as well as a solid testing cluster infrastructure. Nevertheless, the experience offered great insight and learnings into how other organisations might handle their own database migration, as well as key considerations along the way.

1. Aim for Downtime

Migrating a database with downtime

The simplest scenario for a database engine migration is one where your service can accommodate downtime. Once code changes are ready to be deployed and service is down, create your new database, copy data over, and deploy the new service. Ideally you will have tested this whole flow prior in a testing cluster.

With downtime

Migrating a database without downtime

If your service cannot accommodate downtime, you are dealing with a much more complex project. The added complexity comes from both databases having to be in the exact same state when you deploy your code changes. This means that in addition to the initial data copy, you need to keep the data continuously synchronised across both databases exactly to the point when your code changes are deployed.

In practice this introduces a large amount of overhead, code duplication, and complexity. On the flip side, this approach allows you to write to both databases prior to the switch – you can test your code while the old database is still kept up-to-date – removing risks of data corruption and keeping the possibility to rollback without data loss. However, the extra amount of code and complexity you introduce to your service also comes with risks. Finally, this approach requires a significant amount of effort and time.

Without downtime

Migrating a database without downtime on reads but downtime on writes

If you are not willing to make this investment, you might want to consider a partial downtime: letting reads go through, but stopping writes. This is especially easy with tools such as AWS Database Migration Service that can create a copy of your database (with a different engine if needed) and then keep the databases in sync with continuous replication. Once you stop writes going through, you wait for all data being copied over, deploy your code changes, and enable writes again – all while serving read traffic.

At GetYourGuide, our data is mostly edited during office hours which gave us the flexibility to plan a downtime window outside of this time without incurring much disruption.

Without downtime on reads but downtime on writes

2. Weigh Risk vs Complexity

The biggest risk of a database migration is data loss or corruption. In theory you can completely mitigate this risk by making substantial code changes to your service. However these changes also come with the possibility of introducing bugs. It’s important to realise both risk, and the risk introduced by the complexity of a solution mitigating the original risk.

In our case, we decided for a leaner approach that involved some risk of data corruption and loss. As prevention we leveraged our testing cluster infrastructure and integration test coverage, as well as preparing a very detailed rollout plan with worst-case scenario rollback instructions at each step. This worked out very well – we were able to move smoothly through the changes and switched our databases without any hiccups.

3. Consider the Codebase Benefits

One of the most striking things that stood out to us during this project was how much the database engine matters for the quality of your codebase. We were able to significantly simplify our way of enum and JSON field handling and as a by-product, also remove a lot of code duplication. In several cases we refactored old code into a much more readable and cleaner version, while also making it work with the new database engine. When choosing a database engine, don’t simply look at its features, but think about how its syntax and principles will fit your service – be it the nature of your service, its programming language, or the Object-Relational-Mapping (ORM) tool you are using.

4. Prioritise Migrations for Added Benefits

In engineering, the tools we use and interact with define a lot of our day-to-day work. It’s extremely valuable to reflect and evaluate the fit of these tools and change them once they are no longer suitable. Repetitive meaningless tasks, frustration at having to use these tools, and constantly having to find workaround solutions due to their limitations are all signs that you might want to prioritise upgrading your technologies. Not only will it remove pain points and increase efficiency, it might be that your existing setup is making you miss out on product innovations that these tools enable. In our case we were able to leverage PostgreSQL library PostGIS to drastically improve our geospatial data handling and offer a new customer facing API with this data.

5. Set Meaningful Deadlines

Most of the work the Catalog team does revolves around contributing to the core GetYourGuide products. As such we are rarely faced with launches or strict deadlines for our projects. The PostgreSQL migration was different: it involved a complex launch, and, because we had to schedule a downtime window weeks ahead with both internal and external stakeholders, also a strict deadline. This cutoff helped us have a clear definition of when the project was done, collaborate as a team to resolve critical issues prior to the launch, and a reason to celebrate when it was finished. The experience taught us the value of meaningful deadlines and the team has since used them in a number of different internal projects.

Conclusion

Migrating a database engine creates unique technical and organisational challenges and as such, can be a great learning opportunity. For the Catalog team at GetYourGuide, key takeaways included the importance of prioritising technical migrations, the impact our choice of database engine had on our wider service, and the value of meaningful deadlines. Finally, it underlined the strength and success that comes from great team collaboration.

Other articles from this series
No items found.

Featured roles

Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent

Join the journey.

Our 800+ strong team is changing the way millions experience the world, and you can help.

Keep up to date with the latest news

Oops! Something went wrong while submitting the form.