May 15, 2018

Where to Find an Insight? Part 2

GetYourGuide

Careers Team

Open roles

Consequences

In Part 1 of this article we have explored the basics of collecting user data. In the end, we have also come up with a scalable process:

Adding new event types to track is delegated to teams
Adding new attributes for events requires a checkup and an entry in a global list of attributes with a comment describing it

Of course there are downsides to this approach. If any team can track any amount of events - there is no single source of truth in regards to which event types are tracked and the circumstances in which they are triggered. This approach also creates a bit of confusion when onboarding new Product Managers and Data Analysts since they require at least a bird’s eye view on the whole system before they can start answering business-related questions. Defining our own event types on a team-by-team basis without any exposure to the outside also causes teams to become too self-contained. The analytical knowledge exists in a silo, instead of being distributed across teams.

To analyse the data one has to ask the data warehouse a question in a particular manner. Doing so is called making a query (Fig. 1) - essentially a couple of lines of code. Depending on the complexity of the question, writing a query could take a couple of hours. In our situation, the independent teams will write their queries as they go, without knowing if a similar query has already been written by someone else. This is not very efficient. So, how do we deal with the consequences?

Figure 1. A process of writing a simple SQL query to solve a particular business-related question.

‍

Going the distance

On the surface it may appear we’ve moved away from our initial quest - enabling everyone to come up with an insight; however, in reality, we are only two small steps away from our goal. What we need now is a single source of truth that would allow the whole organization to speak the same language and a shared, searchable repository of all queries which is easily accessible and open to contribution from everyone.

To address the latter, we have already found a great solution - the Databricks service. We can keep all our queries there, shared and searchable, with an overview of the table format and column names. We have also started an online Data Guide, written by the best technical writer we could find, so that the whole company shares the same vocabulary when writing queries.

Nevertheless some questions still lingered:

What are the events triggered on a particular page?
When is a particular event triggered?
What are the attributes that this event can hold?

The problem is the answers to these questions will change faster than we will be able to update our Data Guide. The manual solution does not scale, and, frankly, it’s almost unethical to employ anyone to support such a Sisyphean task.

To avoid this problem, we have implemented instead a system that generates documentation automatically. We have one tool that creates a list of all of the event types tracked in the last 2 weeks, with all the attributes contained. Another tool matches up the definitions with the attributes, and outputs the result to a file, one for each event type. Ultimately these files end up as pages in our Data Guide (Fig. 2).

Figure 2. Stages of creating a usable and easily kept to date documentation. Automatically generated events is being passed over to an expert Tech Writer, who then integrates it with manually written content in the Data Guide.

But why write if you can show? The possibilities are endless, especially if you employ a screenshot tool that will snap a gif of the website when it detects an event being fired. In this way, all of the events triggers would be clear even for those with little technical knowledge, thus enabling them to contribute to the generation of insights and analysis.

One of our ultimate goals is to enable our company to scale 10x as efficiently as possible. There is no way to reach this goal if we don’t have perspectives from as many diverse viewpoints as we can. With this data analytics setup, we believe we are enabling everyone in the company today to provide us with an insight that might change the face of the industry tomorrow.

‍

…

Thank you Nikita for sharing the origin story of our Data Guide. If you want to join us and generate further insights, check out our open positions.

‍

Join the journey.

Our 800+ strong team is changing the way millions experience the world, and you can help.

Open roles

Related blogposts

Engineering

March 22, 2024

Basemath—GetYourGuide’s Way of Sequential Testing. Part II: The Theory

Alexander Weiss

Senior Data Science Manager

Engineering

March 13, 2024

Basemath—GetYourGuide’s Way of Sequential Testing. Part I: The Application

Alexander Weiss

Senior Data Science Manager

Engineering

February 21, 2024

How we Leverage Postgres for our Search Data Processing Pipeline at GetYourGuide

Dharin Shah

Senior Software Engineer

Keep up to date with the latest news

Oops! Something went wrong while submitting the form.

Life at GetYourGuide

Our teams

Tech at GetYourGuide

Locations

How we hire

Blog

Where to Find an Insight? Part 2

Consequences

Going the distance

…

Thank you Nikita for sharing the origin story of our Data Guide. If you want to join us and generate further insights, check out our open positions.

Other articles from this series

Featured roles

Marketing Executive

Marketing Executive

Marketing Executive

Join the journey.

Related blogposts

Keep up to date with the latest news