Implementing Looker at GetYourGuide.
In today’s post, Zoran Stipanicev, Senior Data Engineer, presents a high-level overview of our Looker implementation process. This could be useful to anyone evaluating Looker and to those who recently started the migration to it.
At GetYourGuide we like to say that we are a data-driven company. But what does this mean for us and how does it reflect in our day-to-day work? It means that people are expected to make decisions based on data which implies they should use reports to prioritize work and dig into data to find opportunities to improve business performance (profit, customer satisfaction, etc.). And that, in turn, implies that we need a tool that will support our reporting needs as well as the data exploration needs of our users.
Implementing Looker at GetYourGuide.
Prior to implementing Looker, we used an open-source ROLAP (Relational Online Analytical Processing) cube for 2 years before we started to feel the limitations of this kind of solution. The number of users needing data on a daily basis started to increase and data volume started to grow at a faster rate. This resulted in a decrease in query performance and an increase of time-outs, after which users would contact our Data Platform team to pull the data with ad-hoc queries. We also had plans to start using one of the big data platforms and we needed a reporting solution that would be able to support it effectively. In order to address these issues, we started looking into available commercial products.
We chose Looker at the beginning of July 2016, after evaluating it and comparing it to some other tools. To begin, we took some time every week to get to know Looker’s features in more detail and had regular sessions with our Looker analyst. These sessions were really useful as it was a chance for us to get answers to our technical questions, suggestions on which features to use and how to use them. Of course, we prepared in advance and had questions ready in order to make the most out of each session. After a month of trying out features, we decided to write our coding standards for Looker (I would highly recommend that you do that, too). We have basic guidelines on how to name various objects, how to use Looker features and which one to choose if there are multiple options.
After the second month of learning Looker, we started migrating our ROLAP Cubes to Looker. This is where the speed of modelling with LookML really came through and the whole migration was done by a very small team in 2 months (alongside regular daily tasks). Looker allowed us to remove aggregate tables which enabled us to give our users access to data at the lowest granularity. With ROLAP cubes, our users couldn't get the data for a single booking for instance. By migrating to Looker we improved our user’s experience in exploring the data as they no longer needed to wait for us to get booking level data. Another plus of ditching aggregated tables was having all of the measures and dimensions available to users which meant we no longer had to choose what to add to aggregated tables.
An additional aspect of the speed of modelling with LookML became apparent after the initial migration was done. Our model was not perfect and a lot of things had to be improved. These imperfections were a result of speed because we didn’t have the time to take a step back and think about how our users would access or use the data. The process of improvement was ad-hoc. For example, a user would have a question on how to pull some data and, if the model had to be improved, in most cases it would be done within a few minutes (usually less than 5). This would be impossible to do with aggregated tables. This process is even faster now due to the improvements with LookML IDE. There were cases that required us to dig a bit deeper or think harder about some Looker features. We will cover these learnings in a future post.
What makes us so fast with LookML?
LookML is a query modelling language. This means that you are not building a data model in Looker but defining how a query will be built. If you have good SQL skills (LookML can be viewed as a layer on top of SQL), modelling in Looker shouldn’t be a problem. Basic building blocks in Looker is a View which can be based directly on a table or on a query. Views are joined together in an Explore to generate a query. If you are unsure about how to build an Explore in Looker, what views to use or how to join them, we suggest that you write a query you want Looker to generate. Having a clear idea of an end result will help you to see if you have all elements to build your Explore and identify missing ones. Afterwards, to check if your model is correct, you can build a Look (query/report) and check the generated SQL.
As we previously mentioned, an important element in verifying models you define is seeing generated queries which you can easily do using the SQL tab in Explore UI, which is the UI used to explore data or build reports (Looks) in Looker. We have enabled that for all our users as it can help some more advanced users to learn SQL by checking what Looker is generating. Also, you can reuse code with the help of extensions. You can extend Looks to reuse dimensions and measure definitions, or you can extend Explores to reuse relations and joins.
What can you do to make the implementation run more smoothly?
User education is a very important aspect for a successful migration to a new tool. You should educate your users even if they have used something similar before. Our initial user education consisted of one basic session and three advanced sessions. The purpose of the basic session was to introduce users to Looker and to show its basic features. The advanced sessions were focused on 1 or more advanced features which we thought would enable our users to get better insights from data. We kept sessions short to make it easier to fit them in our user’s schedules and it’s easier to keep the focus. We also recorded the sessions to enable faster onboarding of new users. We have recently started testing more hands-on sessions which received positive feedback so it’s definitely something we will continue doing.
Since we are constantly improving our model, we created a monthly newsletter to keep everyone up to date. The idea behind the newsletter is to present changes to the model to help our users utilize them in the right way by providing example reports and by clarifying terminology used to reduce assumptions.
The effect of the newsletter has been better than we initially expected and we received an increase of more than 10% week over week off active users count. An additional benefit of the newsletter is reviewing changes to the Looker model at least once a month. And for me personally, it’s when I put on the “hat” of an end user and think about how they will access new data which sometimes results in last minute improvements to the model.
What we like about Looker.
On the modelling side, we like that it’s explicit. Meaning, it won’t try to join the data on its own, you must define all the joins explicitly in the model and users don’t have to think about joining the data. Users are free to focus on exploring or consuming the data. For end users, there are no user type limitations, all users can explore the data, create, and consume the prepared reports.
What could be better?
While Looker supports code versioning, not everything is versioned. Looks and dashboards which are not defined in the LookML are not versioned. This can cause problems if someone (accidentally) changes an important report.
Like the sound of how we work? Come join our Engineering team.