looker-based-reporting-solution-for-external-users
Engineering
Jul 1, 2022

Looker-Based Reporting Solution for External Users

Pei Xu
Senior Data Engineer

Pei Xu, Senior Data Engineer, explains how the team built a Looker-based scalable reporting solution for sharing data and insights with thousands of external business partners.

GetYourGuide cooperates with many partners, including suppliers, online publishers, travel agents, business process outsourcing (BPOs) of customer service, and media agencies. In order to better understand their own business, partners often require data and insights from us. For example, understanding what are the most popular activities for optimizing their page content, or acquiring an overview of their general performance and potential income. Traditionally, there have been three main channels for external users to receive such information from GetYourGuide:

{{divider}}

Email

When sharing reports to external users, whether it’s for general performance reporting or product recommendations, our internal colleagues usually download relevant reports from our data analytics platform (Looker) in a preferred format (such as PDF or Excel), and share with external business partners manually.

Automated Sending

Since Looker supports automatic report sending, some reports are also being shared with external users through a scheduler with a predefined format, datetime, and destination (Email, sftp server, Amazon S3 buckets, etc.).

Partner-Dedicated Portal

For some partners, we provide a dedicated portal for them to login to view their performance and get insights under a pre-created analytics page. However, the reports shown on the portal often don't support the level of flexibility required by partners.

The Challenges

There are a few drawbacks to the traditional ways of sharing data and insights:

Efficiency

First of all, they require a high degree of manual effort. While downloading a report from Looker and sharing it with partners doesn’t require much effort, preparing a recommendation email or exposing a new report on the partner portal is definitely not a small task. The former requires people to spend time pulling data from Looker, then prepare the email content. The latter requires both backend and frontend engineering effort as essentially it’s a UI change.{{divider}}  

Scalability

Secondly, even with the automated sending option, if too many schedulers are created, it will also lead to high load in our database. For this reason, this method is not scalable.

Limited personalized data

Thirdly, with generic reports being sent to all targeted users, data slicing and dicing is limited. Although some partners can log in into the partner portal to view their performance, normally reports have limited options for switching between different views.{{divider}}

Data inconsistency

Since reports are being sent through various channels which might use different data sources or different logic in computing metrics, there’s also a data inconsistency problem.

Data privacy

It is common knowledge that there’s no easy way to avoid errors in manual data handling. If data is incorrectly compiled or sent to a wrong partner by mistake, we risk data privacy breaches. An automated solution eliminates such risks.

Looker-based Reporting Solution

With all these issues in mind, the team turned its attention to sourcing an alternative reporting solution. An initial step was defining our project goals:

Reduce manual effort

The new solution should help us improve operational efficiency when sharing data and insights to partners. In other words: no more manual report sharing.

Ability to easily launch new use cases

Since there are various kinds of partners who need access to the platform, the design of the system should facilitate onboarding new use cases.

More personalized data and insights

It should allow partners to acquire more personalized data and insights whenever there’s need, and provide more flexibility for partners to slice and dice data.

Always up-to-date

With the traditional ways of reporting there’s no guarantee that reports being shared reflect the current number in the system since it’s essentially a static one. The new solution should allow users to query our database tables and get the data on the fly.

Ensure data consistency

Ensure one single source of truth for all metrics.

Improve engagement

Having more powerful reports embedded on the portal will also encourage partners to login into the partner portal and get more insights for their business, thereby improving engagement.

Provide self-serve capabilities

Eventually enable self-serve for external partners to create reports on their own whenever and wherever they need – just like internal users.

Competitive advantage

Having a solid reporting platform could be a competitive advantage when it comes to acquiring new partners.

Why We Chose Looker

We know Looker has a well-known embed analytics solution that allows people to embed any dashboard / Look inside an iframe of a web page. Additionally, Looker has been very well integrated into the organization with many mature data models and various visualization features. Therefore, we decided to choose Looker as the reporting platform for our external partners.

System Architecture

Broadly speaking, there are two types of user scenarios:

  • Partners access Looker UI directly to view the dashboards;
  • Partners access Looker dashboard via an embedded URL on the partner portal.

For both scenarios, we use Snowflake as the database running behind Looker. The data in Snowflake is loaded via Apache Spark, with the same tables also being consumed by our internal users via different Warehouses as configured in Snowflake.

Snowflake

Technical Implementation

We split our project into four main pillars:

Project parts

System Management

System management is all about the high-level environment setups for enabling external analytics. We consider three components in this pillar: closed system configuration, DNS configuration, and database connection configuration.

Closed system configuration and DNS configuration are both done on Looker side, these are the preconditions for enabling embed analytics. With closed system, users will not be able to see one another or their content in Looker unless they share a common group. Since our partners are mostly from different companies, this is essential. In order to make the content accessible via embedding, we also need to make sure Looker is on the same subdomain as our partner portal where the embedding will be applied.

The Snowflake database connection setup covers all the necessary work for preparing the database environment, such as Snowflake warehouse creation, role, and permission setup.

User Management

We distinguish the user management between internal and external users.

For external users who need access to an embedded dashboard, we apply single sign-on (SSO) embedding provided by Looker. This embedding allows users to be authenticated through our own application (i.e., partner portal). The dashboard / Look will be embedded through a special URL (i.e., SSO URL) that is placed in an iframe on a page. The SSO URL is signed by a secret key provided to the user together with parameters such as permissions, allowed models, and first / last names of the user. With the SSO URL, the first time the user visits the embed page, their account will be automatically created by Looker with all the user settings as specified in the URL, and labeled as “Embed Users.”

Apart from embed users, we also need a set of permissions for managing our internal data users. They are mainly stakeholders involved in managing and creating dashboards for external users. For simplicity, we created the same roles and permission sets as our internal ones.

Content Management

With the launch of our new Looker instance for external analytics, we’d like to have a way to synchronize the content between our internal and external Looker instances for better maintenance and version control. After comparing different options, we decided to go for Looker Project Import. It allows us to import the LookML files under a specific project from instance A to instance B.

The general idea is to share the same base view files between both instances, and create model files (i.e., Explores) according to business needs. Using the same base view files ensures cases like a schema change of a table, or the definition change of a dimension / measure will be synchronized between both instances. The flow is as follows:  

Content management flow

After importing the base view files from remote, we didn’t use the view files directly for the following reasons:

  • Imported view files are mainly designed for our internal usage. Therefore, the majority of the dimensions / measures are unhidden. For external analytics, we only need a set of the dimensions / measures, so as to remove clutter for our external users.
  • Not all descriptions for dimensions / measures are applicable for our external users.

Hence, we decided to leverage the LookerML Refinements feature of Looker to refine the view files before using them. This allows us to keep the view file name and its content while being able to modify the content of the view file or adjust the definition of a dimension / measure.

In order to ensure format consistency, for every view file, even one that doesn't require refinement, we always create a refined view file with the same name in the external Looker. Take the dim_location.view as one example: the “+” sign indicates it’s a refined version of the original dim_location.view. Besides, for every view file, we also created a new field set called “external_analytics” to only include the fields which are relevant for the external analytics so as to avoid clutter.

Data Access Control

One of the most important security features that we need to implement is to ensure data privacy between embedded users. In other words, embed users should not see each other’s data unless they are from the same company. Therefore, we introduced the user_attribute parameter in generating SSO URL. This user attribute is then used as an access filter for the Explores. Below shows one usage example:

With the above access filter, every time a user runs a query on the Explore, a where clause will be added to the query according to the user_attribute value configured for each user.

Monitoring

The last step is to set up monitoring dashboards for understanding how external users interact with our system, how our system is performing, and how much cost is associated with it. The dashboards are all created in Looker.

Conclusion

In this article, we introduced how we built the Looker-based reporting resolution for our partners. Through analyzing different use cases to understand the challenges of traditional ways of sharing data and insights, to the implementation of the core pillars of the system, we have so far successfully onboarded around 9,000 partners to our external analytics solution.

The general feedback from partners is positive. The survey results after launching the first two reports show that 73% of partners find it extremely or very helpful (average rating 4.1/5) and 75% already took action or formulated plans based on the report.

We will continue scaling and improving our Looker external analytics system as the needs of our partners evolve, and share any valuable learnings.

Other articles from this series
No items found.

Featured roles

Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent
Marketing Executive
Berlin
Full-time / Permanent

Join the journey.

Our 800+ strong team is changing the way millions experience the world, and you can help.

Keep up to date with the latest news

Oops! Something went wrong while submitting the form.