how-we-got-google-tag-manager-data-flowing-through-our-data-pipelines
Engineering
Aug 16, 2021

How We Got Google Tag Manager Data Flowing Through Our Data Pipelines

Ben Ryves
Staff Software Engineer

Ben Ryves, senior data engineer on the Marketing Platform team explains how his team implemented a solution to get Google Tag Manager data into their existing monitoring systems. The challenge? They had to do this without disrupting the team's existing setup. Not to mention, they couldn’t compromise or alter the critical path of existing tags.

Monitoring and observability is at the heart of all the engineering work we do at GetYourGuide. We have a saying, If we can’t see it — it didn’t happen. Effective monitoring and dashboards reduces our mean time to resolution (MTTR), and enables us to have confidence that our services are running, even when we’re not looking.

You may also be interested in: How we built our new modern ETL pipeline

{{divider}}

If something breaks, our logging reports it to Sentry. Sentry tells Datadog, Datadog tells Pagerduty, and Pagerduty tells the person on call. If there's a blip, a problem can even fix itself: our systems can self-repair, restarting pods on Kubernetes, observing the results, and squashing firing alerts before they even need to reach a human operator.

This pillar of our infrastructure allows us to move faster, without the fear of breaking things. But until recently, one key part of our infrastructure was still stuck in the old ways — ticking along silently, without any automated systems, alerting, or observability. I’m talking about our integration with Google Tag Manager (GTM), a tool we use to make it easy for non-developers to add instrumentation to our customer site.

You may also be interested in: Inside our recommender system: Data pipeline execution and monitoring

We’ve been using GTM since the company started, before we had Datadog or Kubernetes. The tags there report critical business metrics:

  • tracking conversions and user journeys through the site
  • providing crucial information to Google’s bidding algorithms
  • giving us real time insights into the performance of the business

GTM tags are critical —but until recently, they weren't monitored by default. Data flowed directly from the browser to the consumer, skipping our systems and our monitoring. If a tag broke, no one heard about it until someone noticed. In general, if you can’t see it — it didn’t happen.In this article I’m going to lay out how we implemented an end-to-end solution to get tag data into our systems, without compromising or altering the critical path of existing tags.

What is GTM?

If you haven’t worked with GTM before, it’s best understood as a platform which allows users to add functionality to a site in a sandboxed environment. Tags run in a container separated from site logic, executing when different events are observed such as a page loading, or a user booking a product. Tag code is decoupled from the rest of the site, and technical knowledge isn’t required to add new tags.

When implementing a monitoring solution, we wanted to work within these constraints as much as possible — maintaining the separation between tag code and site code, and ensuring non-technical users would be able to add monitoring to tags in a frictionless, familiar manner.

Furthermore, we wanted to make tag data available in our data warehouse running on Databricks, so that users could incorporate data reported by tags in their data views, while also reducing our dependence on third parties where possible. Finally, we also wanted to make tag data available to our data pipelines, allowing data engineers to build enriched data flows and dashboards from streamed tag data, and allowing us to forward tag information to our data monitoring infrastructure.

In short, our requirements were:

  • Real time visibility of tags
  • Enabling non technical users to add monitoring to tags
  • Making tag data available to both end users (analytics) as well as developers

All that with zero disruption to our existing setup, while being able to scale to millions of users and tens of thousands of firing tags per second.

You may also be interested in: Laying the foundation of our open source ML platform with a modern CI/CD pipeline

Based on the above needs, the solution we arrived at is split into two parts: how we handle firing monitoring information concurrently with a tag, and how we consume that information and make it available to end users. When considering these constraints, we arrived at the architecture shown in the below diagram.

image4.png
GTM architecture

From GTM, we dispatch data to one or more aggregating servers. The aggregator processes the message, forwarding the results to our internal Kafka broker. Finally, downstream apps postprocess the data on Kafka, deriving new data sources, forwarding information to Datadog, and storing tag data on S3 for consumption by Spark. The only state in the system is Kafka, meaning we can scale the aggregator linearly, adapting as our traffic expands and contracts.

Firing monitoring tags

The main use case of GTM is adding pixels, small code snippets that dispatch fire-and-forget GET requests to an endpoint. Beyond providing a number of prebuilt tags, Google also provides support for creating a blueprint for tags through their tag template feature.

Based on this, we decided to implement the sending side of our solution as a programmable tag template which sends data to a server in our infrastructure. When researching and implementing this side of the solution, there were a few articles that were extremely helpful — namely Simo Ahava’s article on custom templates, his article on monitoring Google Tag Manager, and Jaroslaw Kijanowski’s article on sending data from GTM to Kafka.

Our approach differs slightly from the above articles in that our monitoring logic is decoupled from our non-monitoring tags. To do this, we implemented a programmable custom template that users can configure to forward information to our servers. On creating the tag, a user specifies what information from the data layer they want to monitor, and what endpoint the data should be sent to. This monitoring tag is then configured to fire after the tag it's monitoring, rather than as a callback.

The advantage of this approach is that no modification needs to be made to an existing tag to add monitoring, and any existing tag can be monitored regardless of implementation. Additionally, monitoring tags can send more information about the context of the tag that they're monitoring, such as the owning team. Since our tag template is configurable and can send information just like a regular tag, we can also use the template to test tags before we deploy them by pointing the tag at our test infrastructure to use during development.

image3.png
Kafka Tag

Before continuing, it's worth noting that there are a few downsides to this approach. Having a separate tag adds a maintenance burden and means that monitoring tags have to be created by the owning team. Additionally, there is some performance impact resulting from adding an extra tag to each page, and using GET requests means that we have to send the JSON payload as a URL encoded string. We haven't run into an issue with the latter yet, since the payloads we send should be small to minimize the performance cost, but it does add some complexity to the server, which has to decode and extract the payload.

Catching the tags on the backend

Compared to sending tag information, receiving it is simple. We implemented a small stateless Akka server on our Kubernetes infrastructure with a couple of versioned endpoints. The server does the minimum work possible:  decoding the message payload, extracting it from the request, and forwarding it to our internal Kafka broker for downstream processing.

It has no state, meaning it's easy to scale it horizontally, and it's simple enough that it can be run on a developer's machine without much work. A single server can handle thousands of requests a second with a minimal processing time, and we have autoscaling in case a spike in traffic leads to a possible overflow in requests.

You may also be interested in: Enhancing the outbox pattern with Kafka Streams

We preferred this approach to a serverless solution mainly due to turnaround time. At the time of implementation, our setup didn't support sending data directly to Kafka from a lambda function, and we wanted to have a monitoring solution in place as soon as possible.

Also, having a constantly running server means we can take advantage of Hotspot's JIT compilation, and using Akka instead of Spring makes the server start incredibly fast in the case we do need to scale. All of this results in an extremely quick, robust solution. At the time of writing, our server has a P99 of ~2ms over the last day. Having a low response time is important — if tags take too long to execute, Google will terminate the connection, leading to tag data being dropped before it reaches our systems.

image2.png
Latency

Moving past Kafka

Once the data is in Kafka, we're done! Seriously — we have the information we want in our infrastructure, and we can do whatever we need to it. Kafka lets us attach as many consumers as we require, and apps can easily be set up to forward data to Datadog, write data to S3, and process and filter data for further analysis. We already have a number of downstream apps, mostly written by the Marketing Platform team which handle a number of use cases:

  • reading tags off the topic, checking constraints and forwarding the results to Datadog
  • transforming the topic into a partitioned JSON table on S3 which can be read from Spark
  • visualising real time tag data as a stream

The monitoring app

Being able to monitor tags was the key motivation for building the entire data pipeline that we've just set out, and so it's interesting to look at how that specific app is implemented. As we set out in our needs, one thing we wanted to provide was the ability for non engineers to monitor their tags without intervention from engineering teams. This functionality is accomplished by the real time tag monitoring service, a Kafka streams app which runs in Kubernetes.

The monitoring service is implemented as a rule checking engine. It watches the incoming event topic for tags which are in the engine, and then examines those tags to see if data in the tag payload conflicts with the specified rules.

A rule is specified as a path which resolves to a value in the JSON payload, for instance .conversion.value, and a constraint, for instance x >= 0, an expression which in English could be written "If the conversion value is less than 0, this rule fails". A rule also has a distinct name, and if a rule fails, then an event is sent to Datadog specifying which rule failed, and a message is also sent to a downstream topic to capture what rules failed, and what the initial payload was so that we can later dig into the issue further.

As you can probably tell from reading, we didn’t quite meet our original goal. Implementing rules isn’t too complex, but still requires some technical understanding of how JSON is structured, and an ability to write limited expressions. However, the rule engine does decouple rules from program logic, and in the future we’re aiming to simplify things even further by adding a user interface for adding new rules, as well as examining the possibility of automating rule discovery through statistical models.

image1.png
Tag outcomes

Overall, the improvement is significant: we can now monitor our tags, and we can easily add and extend our setup as we need to. As a result of the new system, we can move faster - both when modifying our customer site and when adding new tags. We can also test things end to end, capturing regressions before they make it to production using canary analysis in our continuous integration pipelines. Since we implemented the system three months ago it’s already proven its value, catching two major regressions to our tag setup which would have led to significant business impact which was mitigated by our ability to immediately detect and fix the issues.

Other articles from this series
No items found.

Featured roles

Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent

Join the journey.

Our 800+ strong team is changing the way millions experience the world, and you can help.

Keep up to date with the latest news

Oops! Something went wrong while submitting the form.