testing-legacy-backend-code-in-php-using-an-in-memory-database
Engineering
Nov 15, 2021

Testing Legacy Backend Code in PHP Using an In-Memory Database

Constantin Șerban-Rădoi
Senior Backend Engineer

Constantin Șerban-Rădoi, Senior Backend Engineer in the FinTech team, explains how to effectively use an in-memory database to enable testing legacy backend code in PHP.

{{divider}}

Problem - Legacy code is hard(er) to test

Testing greenfield projects is generally considered to be much simpler than legacy code that has been lying around for decades and went through the hands of dozens if not hundreds of engineers. Oftentimes a greenfield project starts very nice, with close to 100% test coverage. It doesn’t take too long though until one engineer decides to skip adding a test for this new class due to the lack of time and decides to cut some corners. Shortly, the code gets more and more difficult to test due to new dependencies being added and “short-term hacks” being forgotten forever without any concrete plans for improvement.

Testing such code “the clean way” generally becomes quite difficult without major refactoring or even rewriting. When an engineer encounters such a component that is “untestable” they might become complacent and say “let’s add this new logic and skip the testing”. However, this not only increases the chances of breaking the code in the future without noticing, but it also leaves it even harder for the next engineer to extend or make changes to that component with confidence.

With some effort and with the right tools though, new tests can be written for legacy code. This will not only give confidence to the engineer that the component they changed will work as expected, but it will also encourage the next people who will change the code to add more tests. Sooner than later, the code coverage will reach a better state and then major refactorings do not seem so impossible anymore, as the functionality is covered in tests and at least there is a place to start in terms of baseline functionality.

In the next section we will cover possible solutions for testing legacy code and/or making the code more testable in general.

Possible strategies to test legacy code

There are a couple possible approaches to testing legacy code, each with its own advantages and disadvantages. Let’s look at each one and compare them.

Approaches to legacy code

Depending on the use case one or the other approach may work better. In general, it is always advisable to make sure tests are added at the same time with newly introduced features so that regressions can be caught. If a certain code base does not have any tests at all and sits on a critical business path it may be useful to start top-down with an integration test to provide at least some level of confidence.

Then, little by little, the smaller / more independent components would be the next candidates for unit testing. Finally, the more complex parts of the system could benefit from refactoring / splitting into smaller, easier to test pieces. Remember, there is no silver bullet, one-solution-for-everything. Everything is a compromise between the amount of effort invested and the benefits from the added confidence of changing code that is well tested.

Deep dive - Using an in-memory database and static mocks to the rescue

Let’s now look at a use case we had for testing hard dependencies (calls to static methods from within the chain of methods under test). A simplified such method written in PHP which is used throughout the code base in many places would be as follows.

class Booking
{
 public static function getById(string $id): Foo
 {
    $booking = new Booking(DatabasePool::instance());
    $booking->loadById($id); // loads the object using the db instance obtained above
    // ... Do other operations (e.g. caching the result)

    return $booking;
 }
}

One can argue that getById should not be a static method in the model class in the first place and rather be a non-static method of a BookingFactory class that receives the database in the constructor. However, this would require a risky refactoring of almost the entire code base, since getById is used all over the place and dependency injection was not something that was available at the time the codebase was created initially.

A simpler step we can make to test such a method is to mock the static call to DatabasePool::instance and set up the mock the way we want before we execute the method under test. This can be done with a simple call such as

$db = Mockery::mock(sprintf('alias:%s', DatabasePool::class));

Then the $db mock could be set up to return any interesting data that’s required to test our getById method.

While this approach works, it can bump into a big problem: It doesn’t scale well when too many such hard dependencies are used. Additionally, if one wants to test actual SQL queries as part of the various model’s methods (e.g. specific filterings etc.), it becomes even more difficult to do with a mocked database instance. To help engineers in these scenarios and limit the amount of necessary refactoring to a minimum we turned to using an in-memory database. In code we defined a wrapper class as

class InMemoryDatabase implements iDatabase.

This implements the same interface returned by DatabasePool::instance() and underneath we opted for a simple Sqlite database which is cleared out after every test execution. Now, instead of using a mock we use an actual database object which we can use for setting up the required data dependencies of our test.

This approach is top-down (third strategy in the previous section) in that it aims to test the major functional blocks without needing to change the existing code implementation. For example, a specific method on the Booking class might depend on another entity being present in the database (e.g. a Tour) which is obtained in a similar fashion, using a static method Tour::getById($this->tourId).

Now it becomes clear that we cannot just rely on a mocked database object, because it would require too much set-up. However, with the in-memory database it is as simple as creating real models and populating the in-memory database with those during the test set-up.

Then, when we execute the actual method on the Booking object which we wanted to test, the Booking object will fetch the already inserted Tour from the in-memory database and then we’ll be able to make our assertions as usual for the result of that method.

A simplified test of the scenario presented above might look as follows

class BookingTest extends TestCase
{
 public function testGetTourName(): void
 {
    $inMemoryDb = new InMemoryDatabase();
    $mockDatabasePool = Mockery::mock(sprintf('alias:%s', DatabasePool::class));
    $mockDatabasePool->shouldReceive('instance')->andReturn($inMemoryDb);

    $tour = new Tour($inMemoryDb);
    $tour->setName('MY-TOUR');
    $tourId = $tour->insert();

    $booking = new Booking($inMemoryDb);
    $booking->setTourId($tourId);
    $bookingId = $booking->insert();
   
    $realBooking = new Booking($inMemoryDb);
    $realBooking->loadById($bookingId);

    // Asserts that the tour name from the tour is the same as
    // that returned by the booking getTourName method.
    $this->assertEquals('MY-TOUR', $realBooking->getTourName());
 }
}

One can easily imagine that a model’s method might depend on several other models, so with the in-memory database approach it becomes relatively easy to add a test scenario without doing too much refactoring on the entire code base.

Conclusion

To summarize this article I will reiterate that the approach presented here should only be used with great consideration and only when a deeper refactoring is not feasible. In-memory databases add their own complexity to the testing set-up. The tests become slower to execute. The in-memory database schemas need to be maintained up-to-date with the real database schemas. They can still be relatively difficult to reason about if too many tables are involved in a test, in which case it is always recommended to consider a refactoring and better separating the concerns in the code.

The main idea of this blog post is that having tests is always better than having no tests at all and that it doesn’t need to be a greenfield project to start introducing tests.

Glossary

  1. Integration tests - In the context of this blog post they represent tests that validate the interactions of two or more components within a system. For example, an integration test might set up a Booking object and the associated entry in the database and then perform a cancellation on that booking, verifying that at the end the Booking and other associated database entities are in a specific state.

Other articles from this series
No items found.

Featured roles

Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent

Join the journey.

Our 800+ strong team is changing the way millions experience the world, and you can help.

Keep up to date with the latest news

Oops! Something went wrong while submitting the form.