Just Announced: Customize Extractions + Better Funnel Data

We’re excited to announce a couple of key updates to the Keen IO Data Explorer — Keen’s point and click tool for analyzing and visualizing data. Want to get started? Log-in to your Keen IO account.

What’s new?

Customize your extraction fields

At Keen, we believe you should be able to do what you want with your data, which is why we support extractions. We’ve made extractions even easier by enabling you to select which fields you would like to view in an extraction. This makes the extraction of your most important metrics and KPIs painless and clean. Want to check it out? Use the Data Explorer to select the fields you want to extract from an event collection.

Use the Data Explorer to select the fields you want to extract

Get the ‘actors’ from any step in a behavior funnel

Funnels are a powerful tool for understanding user-behavior flows and drop-off rates. We’ve added the ability to get the ‘actors’ from any step in a funnel, which will enable you to see who performed each step. An example use-case of this might be:

  • Which users made it all the way to the purchase form?
  • Which users watched our promotional video all the way to the end?

To use this feature, just check off the “with actors” box in the step of your funnel query in the Data Explorer.

Ready to get started? Log-in to your Keen IO account or create a free account.

Questions? Feature requests? Reach out to us on Slack or @ team@keen.io.

Happy Exploring!

A new way to debug your data models

We’re excited to announce the new and improved Streams Manager for inspecting the data schema of your event collections in Keen IO. We built the Streams Manager so you can ensure your data is structured well and set up to get the answers you need.

With Streams Manager you can:

  • Inspect and review the data schema for each of your event collections
  • Review the last 10 events for each of your event collections
  • Delete event collections that are no longer needed
  • Inspect the trends across your combined data streams over the last 30-day period

The Streams Manager can be found within the ‘Streams’ tab of your Project Console.

Inspect your data models with the Streams Manager

Ready to get started? Log-in to your Keen IO account or create a new account to start streaming data.

Questions or feedback? Hit us up anytime on Slack.

How to Do Event Modeling vs Relational DB Modeling

Event data is a powerful data format that allows us to track and analyze things that happen around us. 

However, if you’re used to relational data modeling, event modeling can feel pretty foreign and awkward. 

Let’s compare and contrast these data formats so we can better understand their weaknesses and their superpowers. It’s not a contest. Most business use both, and they each have their place.

The simplest way I can think to contrast the two is this:

Event Data = Verbs
Relational Data “Rows” = Nouns

Event data answers questions about what things have done. Actions that happened.
Relational data answers questions about the state of things.

Questions that are trivial to answer with a relational data model, like “what class is Sam currently enrolled in?” can be maddeningly complicated to answer with event data. But on the flip side, event data can make other types of questions suddenly very simple and easy, like “how many classes has Sam enrolled in over the course of his life?”.

Something is Amiss

To illustrate why it’s important to get this right, let’s look at some real-world examples of trying to use the wrong data format. It’s like trying to use a screwdriver in place of a hammer.

Trying to use event data for a relational data problem

Let’s say you’re tracking events for every time a new products are added to your inventory. You’re also tracking events for every time products are sold. This is useful data for many types of analysis. But it it would be a really nightmarish way to keep track of which products are currently in inventory, something that would be trivial with a relational table.

How many hammers are currently in our inventory?
# relational model: oh let's just go look at our inventory table
inventory table:
nuts 8
bolts 3
hammers 89
# event data model: uhhh, let's count every time a hammer was ever added since the store opened, then subtract every time a hammer was ever sold, and theoretically that should tell us how many we have.

Trying to use relational data for an event data problem

A recent example that comes to mind is a training company providing reporting on course completions.

The symptom they noticed was that when looking back at historical training course completion, the numbers would sometimes change. Like, the number of courses completed last March would inexplicable change from 3842 to 3833 a month later. What the heck!?

Turns out, they were counting the course completions by joining data across several tables. If courses were ever deleted, which occasionally they were, the 

Trying to use event data for a relational data problem

Relational Data (aka Entity Data)

If you’ve ever worked with an application database, you know about entity data. It’s the standard format for the most common type of database, the relational database. Here’s an example:


Entity Table Example (Enemies Table)

Relational data is stored in tables. Entities are things like users, products, accounts, posts, levels, etc. There is a separate table for each of type of entity, and each table has columns to hold properties about the entities. There is one row in the table for each entity. In this example, the entities are enemies.

Relational databases are really good for capturing the current state of your application. Things like users, product inventories, accounts payable, etc. You can very quickly lookup information about any entity.

One characteristic of entity databases is that they are normalized. Data is rarely duplicated. For example, you might have a table for Accounts, with attributes like the account name, type, category, etc. Accounts have many users associated with them, but you wouldn’t store information about those users in the Accounts table. Instead, you would include a key in each user record which links to its account. From a data storage (disk usage) perspective, this is very efficient.


Normalized relational data example from the excellent Wikipedia article on joins

Event Data (aka Analytics Data)

Now let’s look at the characteristics of event data. Here’s an example:a


Event data example: “publish” event

Event data doesn’t just describe entities; it describes actions performed by entities. This example above describes the action of publishing this blog post. You can imagine we have a collection of events called “publishes” where we track an event for each new post.

What makes this “event data”? Event data has three key pieces of information. I first saw these identified by Ben Johnson in his speakerdeckon Event Data (he calls it “Behavior Data”).

  1. Action
  2. Timestamp
  3. State

The action is the thing that’s happening (e.g. “publish”). The timestamp is self explanatory: the point in time the thing happened. The state refers to all of the other relevant information we know about this event, including information about entities related to the event, such as the author.

Let’s look at a more complex event. This is a “death” type event:


Event data example: minecraft death event

Here is an example data point for a player death in the game minecraft. Imagine we are recording every player death that happens in the game.

There are a lot of ways in which the player can experience death: falling from great heights, starvation from not eating enough pork chops, drowning, clumsily stumbling into lava, zombies scaring the crap out of you in a cave, etc.

Let’s say we want to analyze these player deaths in Minecraft. Perhaps we want to find out the most common type of death, the average player age at the time of death, the most lethal enemies, or any number of death-related questions. Perhaps we are trying to find out if the game is too difficult on certain levels, or if the new villain we introduced is causing way more destruction than we’d imagined, or if there is any correlation between types of users and types or frequency of deaths.

We can find out all of these things using the simple event data model shown above. The event data model has a few special qualities:

1. The data is rich (has data about lots of relevant entities)

2. The data is denormalized (we often store the same data repeatedly on all the relevant events)

Additional perks of event data: it can be nested, and that it has a much more flexible schema compared to the rigid tables of entity databases.


Interview with Michael Greer (CTO, TAPP TV, The Onion)

We recently interviewed Michael Greer, former CTO of The Onion and now Co-Founder and CTO of TAPP TV. We wanted to hear how he navigated the decision to build or buy analytics infrastructure.

In our CTO’s Guide to Getting Data Strategy Right white paper, we discuss the limitations of off-the-shelf analytics solutions, as well as the risks of building custom solutions with expensive internal resources. As we continue to navigate these discussions with our clients at Keen, we wanted to share some of their stories. TAPP decided to build their analytics capabilities in-house using Keen’s APIs, and they have been a Keen customer for several years.

According to Mike, his team of engineers has tested a variety of approaches including combinations of Segment, KissMetrics, and Google AdWords. “The reason we ended up increasingly relying on Keen was our ability to influence the metrics we were tracking with Keen — it turned out to be more engineer-friendly than anything else on the market,” says Greer.

TAPP uses a video content management system and a subscription system to allow their team to manage different video sites. These systems are also used for various internal dashboards and reporting on key business metrics. For example, reports embedded within the CMS help employees identify the most popular content, compare subscription rates across time or make revenue projections. “We run correlations, track whether users are more or less likely to subscribe when they look at a particular content piece, and much more,” explains Mike.

When asked how Mike would explain Keen’s API platform he says,

“Keen is the perfect 80% solution. It’s not turnkey and doesn’t give developers anything out of the box, but rather offers 80 percent of what’s needed and allows a company to build what they need, quickly.”

TAPP’s team also found Keen’s engineers and customer success team to be extremely helpful.

“I simply contact Keen’s customer service via chat. Engineers send us back example code which is extremely high quality. I’ve also reached out directly to the engineers who maintain the JavaScript library, so we could really see what was happening.”

Mike Greer found Keen’s pricing and platform to be easy to scale with the company’s needs.” TAPP currently has over 30 people across the company consuming data in a variety of custom dashboards and reports specific to their workflows, all of which is maintained part-time by a small team of three.

Another consideration for the executive team was the investment risk inherent in choosing a technology for such a foundational, business-critical need (and in particular one that touches many parts of the business). Two factors influenced their decision here: Keen’s high data-portability reduced their lock-in risk, and the flexibility of the platform meant they weren’t married to a single prescriptive way of doing analytics.

“Keen is a platform that’s been created by builders for builders.”

Mike cited a few additional factors that made the choice to build his analytics infrastructure on Keen the most viable for TAPP:

  • Keen has great JavaScript SDKs so it works well with their stack
  • Emergent questions from company stakeholders are very easy to answer: “Keen is sufficiently flexible for us to always be able to offer additional capabilities”;
  • A much lighter burden for the engineering team: TAPP runs their entire analytics stack with no full-time headcount dedicated to analytics infrastructure and scalability.
  • New dashboars can be added on demand. This makes it easy to add and remove key metrics as needed.

Download our latest white paper to learn more about the “Build vs. Buy” debate. Keen IO helps companies accelerate deployment of intelligent data applications and embed intelligence throughout their business.

How should deadlines be used in software engineering?

How many conversations have you had about deadlines where at least one person scoffed at the concept? Pfft!

I have heard this a number of times — including from myself — and I want to fix this.

The world of software is very different from the world of print media, where our current concept of deadlines is derived. However, just because there can be no universal definition of deadlines across all software projects doesn’t mean we shouldn’t use them or that they aren’t valuable.

Deadlines are extremely important for planning and parallelizing work. Without expectations about when dependent work will be completed, all teams would have to work in serial. This would drastically slow down delivery times.

At the same time, deadlines can be frustrating and counter-productive if people disagree about what they actually mean.

Problems and solutions

In my experience, these are the most pertinent problems with deadlines in engineering organizations, and the most promising ideas for how to solve them.

1) They are interpreted differently from person to person.

Person A: “The deadline is next week so I’m pulling a ton of extra hours.”

Person B: “Why? It’s a fake deadline and doesn’t matter.”

Person A: “But I don’t want to get fired!”

This shows that a deadline means vastly different things to Person A and Person B. This can lead to confusion and frustration on the team working towards that deadline.

Instead, deadlines need to be a rallying point. Everyone needs to understand why this deadline is important. They need to know what impact missing the deadline has on their world (that can be other teams, customers, or the business as a whole).

More importantly, deadlines that are met need to be celebrated. Wildly. This is often overlooked. Building a culture around celebration for meeting deadlines is a much better practice than berating people for missing them.

2) They are used too early in a project life cycle.

Person A: “Hey we need to accomplish [insert_really_hard_unknown_thing]. When can we have that done by?”

Person B: [quickly wikipedia’ing what that thing even is] “uhhhhh, I don’t know.”

Person A: “I need a timeline!”

Person B: “Thre–FOUR! mmmmo-wee-months. Four. Months.”

Person A: “Great, see you in four months.”

Asking for a deadline when EVERY aspect of a project is unknown is a recipe for disaster. It’s stressful for everyone involved and sets the project up for failure. So let’s take a deep breath. Wait a few days for some exploration to be done. We will be spending time to buy information, but then we can come up with a meaningful estimation of the work. This information will help us set deadlines that have a better chance of being accurate.

3) They aren’t updated often enough.

Person A: “Hey, deadline is in 5 days. Are we still on track?”

Person B: “We’re a little behind but yeah, we can make it.”

Person A: “Great!”

[ 4 days, 23 hours later ]

Person A: “Just checking back on that project. We good to go?”

Person B: “Uhm, no we’re not gonna make it. Something came up. It looks like another week of work.”

Person A: “$%@!*”

In this case, the deadline wasn’t moved or re-evaluated when new issues became known. Instead of raising that flag immediately, Person B waited until the deadline itself to inform others. Now Person A is affected as well AND the team gets to feel the stress of coming up on yet another deadline in the future.

Deadlines shouldn’t be set to force humans to do inhuman amounts of work. They should be used to set expectations externally and enable a sense of predictability for planning. They need to reflect reality as accurately as possible or trust is lost and they can no longer deliver that predictability. Now, I’m not advocating for changing deadlines hourly or daily. But perhaps weekly, or at the very least within a standard planning cadence.

Updating a deadline isn’t limited to extending the date. Scope reduction is also an option. Choosing which action to take (or a combination) is a conversation the engineering and product teams need to have.

4) All the “knowns” aren’t taken into account… just the fun ones.

Person A: “How long to ship this feature?”

Person B: “Two weeks.”

[ two weeks later ]

Person A: “Why isn’t this done?”

Person B: “Well, it technically IS done. Now we’re working on testing it, building a new deployment mechanism for it, and we’re gonna roll out a beta first. Also, I was on vacation last week.”

This deadline was made without a complete understanding of the work to be done, and the time to dedicate to it. (not to mention Person B also threw problem #3 into the mix.)

We need to make sure we take all known challenges into consideration when determining a deadline. Will we be losing person-hours to a known cause? Vacation? Company offsite? Birthday party which will cause missed mornings due to hangovers?

Also, what un-sexy tasks are we potentially forgetting about? How many tests are we gonna write for this? How do we get this sucker into a production environment? Slow down and think thoughtfully about your process and the availability of resources. It’ll make your deadlines much easier to deliver against AND it will make deadlines easier to defend against scrutiny.

On estimation: uncomfortable but necessary

Deadlines that are set by engineering teams will largely be informed by estimating work. That means that everyone on the team is going to have to get comfortable with being wrong. A lot. Saying anything that you know to be wrong and have little confidence in can be a very difficult thing to do.

We need to have a collective understanding that it’s our best guess and that we’ll get better at it over time. Estimation is a skill that gets better with repeated use. In the early stages, it’s going to feel uncomfortable, but we need to do it.

Estimating tasks

Before we can define the delivery date of a large project, we should break the project down into tasks, where tasks are defined as things we believe we can accomplish in roughly 5 workdays or less.

Some helpful questions to ask when estimating a task:

  • Is the project green field or in a pre-existing area?
  • What is the quality of code in that area?
  • How well do I know the area of code?
  • How familiar am I with the programming languages involved?
  • What touchpoints or integration points with other pieces of code are there?
  • How good is existing test coverage?
  • Does this work touch any business-critical areas? (write path, billing, load balancers, signups)
  • Has anyone done this work before? What are their thoughts?
  • What are the tradeoffs I am making?
  • What is the objective of this task?
  • Does this task need to be done at all?

Estimating projects

Projects are typically defined as a larger body of work that multiple people can work on in parallel.

Some helpful questions to ask when estimating a project:

  • How much time will we actually be dedicating to working on it?
  • What is the objective of the project?
  • Do we have any known days off?
  • What are ALL the tasks required to be done?
  • Do we have any blocking dependencies on other teams?
  • Are any tasks blocking other tasks in the project?
  • Is any new infrastructure/hardware required for this project?
  • What are the Doneness Criteria for this project?

Doneness criteria

Even knowing when something is done is difficult. Different roles within the team can have different perspectives on “done” so we need to have specific criteria that can determine what that means for the project.

Some examples of typical Doneness Criteria:

  • Deployed in production
  • Tests fully automated
  • Communicated internally or externally
  • Spent a certain amount of time in an internal or external pilot
  • Documentation in production
  • Education of sales or marketing team complete
  • Landing page launched
  • Analytics and tracking
  • Operational runbook and observability


Deliverability is essential to any company as it grows and matures. Deadlines are a major tool in the toolbelt. When used properly, they are incredibly useful. But it takes time and practice to get better at using deadlines. So, I suggest that engineering organizations treat them as alive and breathing, continue to learn about them, and document shared experiences internally and with the engineering community.

If you have any tips or suggestions on using deadlines effectively, we’d love to hear them. You can drop me a line at ryan@keen.io or ping us on Slack

Subscribe to stay up to date on future software engineering posts (no spam, ever):

Design Dashboards To Help Customers Love Your Product

You got users. Your conversion rates are looking pretty good. However, your users aren’t really using your product. You have a retention issue.

There’s different ways of solving retention problems and I want to look at how you can use dashboards and data to help your users understand and love your product.

When companies add native analytics or dashboards to their products, they usually start by building their “MVP dashboard”. This is usually a simple dashboard with a handful of charts.

Instead of staying here, you can work to improve your dashboards into something that helps users become better engaged and better customers.

Designing analytics dashboards is a tricky task because you need to decide what data to show your users and what data they should ignore. You can’t just give them hundreds of charts and expect them to sort through the noise.

In this guest blog article, I want to walk you through 3 principles that you should keep in mind when designing analytics dashboards. You can then use these principles to analyze and deconstruct how other companies decided on their own dashboard designs.

Photo by #WOCinTech

Use Data to Close the Gap Between Metrics and Action

Let’s first understand why we want to create dashboards for our users.

Despite the ever increasing abundance of data, users (and companies) still struggle to take action. This is perhaps best explained in the Paradox of Choice TED Talk which states that as you give people more choices, their ability to choose (or take action) decreases.

Barry Schwartz had a great quote on “learning to choose”:

“Learning to choose is hard. Learning to choose well is harder. And learning to choose well in a world of unlimited possibilities is harder still, perhaps too hard.” — Source: The Paradox of Choice: Why More Is Less

This is where analytics dashboards and data comes in. We can help our users “choose” better options by presenting them with relevant data.

If done well, a great analytics dashboard can do the following:

  1. Inform and motivate on progress: You can remind your users how much they have accomplished and how much closer they are to their goal.
  2. Provide a complete and holistic picture: You can provide the user with a complete picture by pulling data from different sources. This tends to apply more to business use cases than consumer ones.
  3. Incentivize continued usage: You can also inspire your users to continuing using your product while still getting value out of it.

Our goal isn’t to have the all the data in the world. We simply need to find the data that will be relevant and useful to our users. Like most things, this is an iterative process where you get better with time.

Universal Design Principles Behind Actionable Dashboards

If you’re just starting to research how to track the right metrics that you will then show your users, then I recommend looking at getting a tracking plan in place. This simple document (usually a spreadsheet) can help you organize all of your events and properties in one place.

In this article, we will assume you’ve already completed the process of finding the right metrics for your users. Instead, let’s focus on the design and structure of our dashboards. For that, we can look at 3 principles that can help you decide what data to show and what to ignore.

Principle #1: Make It Visual

Instead of simply showing the number of new signups, Mailchimp shows us the overall change in “audience”. This metric takes into account uses who unsubscribed or who provided fake emails.

Numbers are great but visuals are even better. Visuals like charts allow us to communicate more information and make it easier to digest.

Let’s look at Mailchimp, an email marketing software. One of the core numbers that Mailchimp users want is the number of new newsletter signups.

Instead of simply showing the number of new signups, Mailchimp shows us the overall change in “audience”. This metric takes into account uses who unsubscribed or who provided fake emails.

You have different options for visualizing your data including common chart types like bar, pie, and trend lines. You can also look at providing segments within a given chart e.g. newsletter audience change by list.

We can see the number in the top left and we can see a bar chart that shows us the changes broken down by day, week or month. The bar chart is showing us the trends within the selected data period.

Principle #2: Provide Context By Comparing Numbers Against Other Periods

Our second principle is about comparing data against other periods. Numbers by themselves don’t mean much. Is 15% good or bad? We simply don’t know until we add context and say “15% compared is good compared to 12% last month”.

Most companies will let you compare “this month vs last month” or “this week vs last week” which is a great start. You could also compare against other date periods which give even more context such as:

  • This month vs a rolling 6 month average
  • This month vs projections (set by user)
  • Greatest or smallest change

Mint.com does a great job at this kind of date comparisons. For example, you can see the overall trends within a 12 month period in the graph below:

The average, greatest change and least spent provide context for your financial numbers. Am I projected to spend higher than my average? Am I now spending higher than the “Most Spent”?

Mint also sends out alerts when your numbers divert from the average as seen in the example below:

The examples and screenshots above show analytics that are embedded directly into your product and in your customer’s inbox as an email– this in-app experience is what Keen IO calls “Native Analytics”. Both of these examples of Native Analytics are great because they show how you can compare numbers against other date periods to add more context.

Principle #3: Overviews and Then Allow Drilldowns

The first screen or page that your user sees is critical. This usually functions as an overview of the most important metrics. You can then click into given chart to dig deeper.

When designing your overview screen/page, keep a few things in mind:

  • What are the 4–5 things that I want my users to know? You might want them to know 20 things but the overview only gives you enough space for a handful of numbers.
  • What kind of drilldowns do I want my users to take? You told them something crucial and now you want them to dig deeper into that number.
  • What actions do I want my users to take? Besides the drilldowns, what do I want users to do after they learn about “X number”?

Remember that analytics data is all about taking action. Always keep thinking of what you want your users to do and what they data they need to take action.

Ahrefs does a good job of providing a great overview screen with the ability to do drilldowns on specific numbers.


Designing a great dashboard is half science and half art. The art part is all about trying to understand what your user wants to see while the science part is about tracking how your users are engaging and using your dashboard (and product). If you’re thinking about creating a custom embedded dashboard for your customer, Keen IO has created tools for building API-based computations and visualizations that make getting started easy.

Do you have any other useful tips for how to design analytics dashboard? Let me know in the comments or you can message me @ugarteruben.

Compute Performance Improvements & Magic of Caching

I’m happy to announce that we’ve rolled out some significant performance and reliability improvements to the Keen platform for all query types.

Improved Query Response Times

The overall response times have improved. Queries to Keen (via the API or through the Explorer) should be faster now. The following graph shows the impact of the changes.

95th Percentile Query Duration

Improved Query Consistency

We have also made our query processing more robust by fixing a bug in our platform that could cause query results to fluctuate (different results for the same query) during certain operational incidents like this one.

The Magic of Caching

These dramatic results have been possible due to more effective caching of data within our query platform.

We’ve been working on improving query response times for many months and to understand the most recent update it would be useful to have a little background on how Keen uses caching and how it’s evolved over time.

Query Caching Evolution

At the lowest level we have a fleet of workers (within Apache Storm) responsible for computing query results. Any query can be considered as a function that processes events.

Query = function(events)

Workers pull pending queries from a queue, load the relevant events from the database, and apply the appropriate computation to get the result. The amount of data needed to process a query varies a lot but some of the larger queries need to iterate over hundreds of millions of events, over just a few seconds.

If you want to know more about how we handle queries of varying complexity and ensure consistent response times I wrote a blog post on that earlier which is available at here.

Simplified view of a Query being processed

(Simplified view of a Query being processed)

We started experimenting with caching about a year ago. Initially, we had a simple memcached based cache running on each storm worker for frequently accessed data. At this stage, the main problem that we had to solve was invalidating data from the cache.

Cache Invalidation

We don’t store individual events as individual records in Cassandra because that won’t be efficient, so instead we group events (by collection and timestamps) into what we call ‘buckets’. These buckets sometimes get updated when new events come in or if our background compaction process decides that the events need to be re-grouped for efficiency.

If we used a caching scheme that relied on a TTL or expiry, we would end up with queries showing stale or inconsistent results. Additionally, one instance of cache per worker means that different workers could have a different view of the same data.

This was not acceptable and we needed to make sure that cache would never return data that has been updated. To solve this problem, we

  1. Added a last-updated-at timestamp to each cache entry, and
  2. Set-up memcached to evict data based on an LRU algorithm.

The scheme we used to store events was something like the following:

Cache Key = collection_name+bucket_id+bucket_last_updated_at_

Cache Value = bucket (or an array of events)

The important thing here is that we use a timestamp bucket_last_updated_at as part of our cache key. The query processing code first reads a master index in our DB that gives it a list of buckets to read for that particular query. We made sure that the index also gets updated when a bucket is updated and has the latest timestamp. This way the query execution code knows the timestamp for each bucket to read and if the cache has an older version it would be simply ignored and eventually evicted.

So our first iteration of the cache looked something like the following:

Query Caching V1

(Query Caching V1)

This was successful in reducing load to Cassandra and worked for many months but we weren’t fully able to utilize the potential of caching because we were limited by the memory on a single storm machine.

We went on to create a distributed caching fleet. We decided to use Twitter’s Twemproxy as a proxy to front a number of memcached servers. Twemproxy handles sharding of data and dealing with server failures etc.

This configuration allows us to pool the spare memory on all our storm machines and create a big, distributed-cache cluster.

Query Caching V2

(Query Caching V2)

Once we rolled out the new configuration the impact was pretty dramatic. We saw a major increase in cache hit-rate and improvements in query performance.

Improved cache hit rate after distributed caching rollout

(Improved cache hit rate after distributed caching rollout)

Improving Query Consistency

Keen’s platform uses Apache Cassandra, which is a highly available and scalable, distributed database. We had a limitation in our architecture and usage of Cassandra such that we were susceptible to reading incomplete data for queries during operational issues with our database.

Improved cache hit rates meant that most of the query requests were served out of cache and we were less sensitive to latency increases in our backend database. We used this opportunity to move to using a higher Consistency Level with Cassandra.

Earlier we were reading one copy (out of multiple copies) of data from Cassandra for evaluating queries. This was prone to errors due to delays in replication of new data and was also affected by servers having hardware failures. We now read at least two copies of data each time we read from Cassandra.

This way if a particular server does not have the latest version of data or is having problems we are likely to get the latest version from another server which improves the reliability of our query results.

How modern products use embedded analytics to engage their users and keep them coming back

How modern products use embedded analytics to engage their users and keep them coming back

Spotify’s “year in music” uses listener stats to deliver compelling insights to users

Data is so ubiquitous, we are sometimes oblivious to just how much of it we interact with — and how many companies are making it a core part of their product. Whether you’re aware of it or not, product leaders across industries are using data to drive engagement and prove value to their end-users. From Fitbit and Medium to Spotify and Slack, data is being leveraged not just for internal decision-making, but as an external product offering and differentiator.

These data-as-product features, often displayed as user-facing dashboards, are known as embedded analytics, white-label analytics, or native analytics, because they are offered natively within the context of the customer experience. We’ve gathered 25 examples of native analytics in modern software to highlight their power and hopefully inspire their further adoption.

Ahrefs Lets Website Owners Drill Down on Referrers

Every day, Ahrefs crawls 4 billion web pages, delivering a dense but digestible array of actionable insights from 12 trillion known links to website owners (and competitors), including referrers, social mentions, keyword searches, and a variety of site rankings.

AirBnB Helps Hosts Improve their Ratings and Revenue

In addition to providing intimate housing options in 161 countries to 60M+ guests, Airbnb also reminds its more than 600,000 hosts of the fruits of their labors — with earnings reports — and gently nudges them to provide positive guest experiences — with response rates and guest ratings.

Etsy Helps Build Dream Businesses

The go-to online shop Etsy, which boasts 35M+ products, provides its 1.5M+ sellers with engagement and sales data to help them turn their passion into the business of their dreams.

Eventbrite Alerts Organizers to Sales and Check-ins

Event organizers use Eventbrite to process 4M tickets a month to 2M events in 187 countries. They also turn to Eventbrite for real-time information, to stay up to date with ticket sales and revenue, to track day-of check-ins, and to understand how to better serve and connect with their attendees.

Facebook Expands Reach of Paid Services

With Facebook striving to take a bigger bite out of Google’s share of online ad sales, its strategic use of data has spread beyond the already robust Facebook Ads Manager to comprehensive metrics for Pages, including, of course, key opportunities to “boost” posts.

Fitbit Helps Users Reach Their Fitness Goals

Fitbit’s robust app, connected to any of its eight activity trackers, allows its 17M+ worldwide active users to track steps, distance, and active minutes to help them stay fit; track weight change, calories, and water intake to stay on pace with weight goals; and track sleep stats to help improve energy levels.

GitHub Tracks Evolving Code Bases

GitHub, the world’s largest host of source code with 35M+ repositories, allows its 14M+ users to gain visibility into their evolving code bases by tracking clones, views, visitors, commits, weekly additions and deletions, and team member activity.

Intercom Targets Tools — and Data — to Users’ Needs

Intercom, the “simple, personal, fun” customer communications platform, delivers targeted data-driven insights depending on which of the platform’s three products a team uses: Acquire tracks open, click, and reply rates; Engage tracks user profiles and activity stats; and Resolve tracks conversations, replies, and response times.

Jawbone UP Enables Ecosystem of Fitness Apps with Open API

Jawbone’s four UP trackers helps users hit fitness goals by providing insights related to heart rate, meals, mood, sleep, and physical activity both in its award-winning app, and through an extensive ecosystem of apps that draw data from the platform’s open API.

LinkedIn Premium Tracks Funnel Conversions

LinkedIn’s Premium suite of networking and brand-building tools helps demonstrate the ROI of sponsored campaigns by providing users with visibility into their engagement funnel — from impression, to click, to interaction, to acquired follower.

Medium Provides Publishers with Key Reader Metrics

Though Medium’s model is sometimes murky — publishing platform, publication, or social network? — it provides clear insights to its writers (or is that publishers?) in the form of views, reads, recommends, and referrers for published stories.

Mint Helps Users Budget and Save

Mint encourages users make better finance decisions and save up for big goals by giving them visibility into their spending trends, especially as they relate to personalized budgets.

Pinterest Allows Pinners to Track Engagement

The internet’s favorite mood board, Pinterest provides it 110M monthly active users with traffic and engagement stats including repins, impressions, reach, and clicks.

Pixlee Illuminates Its Unique Value Proposition

Pixlee helps brands build authentic marketing by making it easy to discover images shared by their customers, and then deploy them in digital campaigns. To help its clients understand the impact of this unique value proposition, Pixlee uses native analytics to serve up an on-brand, real-time dashboard that presents custom metrics like “lightbox engagement” alongside traditional metrics like pageviews and conversions.

Shopkeep Improves Business Decision Making

Shopkeep’s all-in-one point-of-sale platform uses a wide range of data — from best-selling items to top-performing staff — to helps businesses make fact-based decisions that improve their bottom line.

Slack Delivers Visibility Into Internal Communications

The messaging app of choice for more than 60,000 teams — including 77 of the Fortune 100 companies — Slack delivers stats related to message frequency, type, and amount, plus storage and integrations.

Spotify Shares Stats as Stunning Visuals

Spotify’s stream-anywhere music service turns data insights into beautiful, bold visuals, informing their listeners of how many hours of songs they listened to in a year and ranking most-listened-to artists. They also help artists get the most from the platform by highlighting listeners by location and discovery sources.

Fan insights by Spotify

Square Zeros In On Peak Hours and Favorite Items

Going beyond credit card payments to comprehensive business solutions, Square provides business owners with real-time reports that include hourly sales by location, which help them hone in on peak hours and preferred products.

Strava Turns Everyday Activities Into Global Competitions

Strava turns everyday activities into athletic challenges by comparing its users’ performance stats against the community’s for a given walk, run, or ride. The app also used its 136B data points to create the Strava Insights microsite, providing insight into cycling trends in its 12 cities across the globe.

Swarm Updates the Foursquare Experience with New Gamified Features

Swarm adds additional gamification and social features to the original Foursquare check-in experience, providing users with their popular check-ins broken out by type, as well as friend rankings and leaderboards for nationwide “challenges.”

Triptease Builds Strong Relationships with Hotels

The Triptease smart widget allows hotels to display real-time prices for rooms listed by competing sites like Hotels.com to help convince guests to book directly and help the hotel build richer customer relationships. To keep a strong relationship with their own hotel-users, Triptease uses native analytics to show the impact on revenue of widget-enabled conversions, as well as the hotel’s real-time price rankings compared to other websites.

Twitter Beefs Up Its Business Case

As the internet’s 140-character collective consciousness positions itself more decisively as a boon for businesses, it has beefed up and beautified its analytics dashboard. Twitter’s dashboard now includes impressions, profile visits, mentions, and follower change for the past month, plus cards for Top Tweet, Top Follower, and Top Mention.

Vimeo Provides “Power” Stats in a Straightforward Interface

“We basically wanted to give users a power tool, but didn’t want them to feel like they needed a license to operate it,” explains Vimeo senior product designer Anthony Irwin of the video-hosting platform’s analytics tool. Today, Vimeo’s 100M+ users can dig deep — or stay high-level — on traffic, engagement, and viewer demographics.

Yelp Extrapolates Conversion-Generated Revenue

More than a ratings site for local businesses, Yelp also helps its 2.8M businesses engage and grow relationships with their customers. To highlight this value proposition, the company provides business users with a tally’s of customer leads generated through the platform, as well as a calculation of estimated related revenue.

Zype Helps Users Track Video Revenue

With a single interface, Zype makes it easy to publish and monetize video content across various platforms. Core to its value is the ability to provide users with key stats including monthly earnings, new subscriptions, and successful revenue models.

Want to see your stats featured in our next post? Send us a note

Building analytics into your product? We can help with that! Check out Native Analytics.

Originally published at keen.io.

Introducing: Auto Collector for Web

Want to quickly test out Keen? Need to get a quick sense of the key interactions on your website or web app? You’re in luck! We just released an Auto Collector for Web.

What does it do?

  • Drop in snippet to automatically collect key web interactions
  • Auto tracks pageviews, clicks, and form submissions
  • Auto enriches events with with information like referrers, url, geo location, device type, and more

Ready to get started? Just drop in the snippet and start seeing web events flow in within seconds. You can also add your own custom tracking.

Check out our guide to learn more and head on over to your project settings to get started.

Happy Tracking!

How companies are delivering reporting and analytics to their customers

Today’s users of technology expect stats and charts in each and every one of their favorite apps and websites. Many companies are turning advanced analytics into a paid feature, while others are bundling analytics into their core product to improve engagement and retention. Keen’s Native Analytics lets every company differentiate with data and analytics that are truly native to their product.

In this on-demand webcast you’ll learn:

  • Key applications of Native Analytics and how companies like Triptease, Bluecore, and SketchUp use Native Analytics to deliver analytics to their users, right within their products and drive ROI
  • Why ease of use and the right capabilities are crucial to your success
  • Key considerations for a successful Native Analytics implementation

How companies deliver embedded analytics and real-time reporting for their customers

Don’t forget to download the Native Analytics checklist.

Thinking about adding Native Analytics to your product or want to improve your existing implementation? Contact us for a free consultation!

Originally published on Tumblr

Announcing our new podcast: Data Science Storytime!

We’re excited to announce the debut of Data Science Storytime, a podcast all about data, science, stories, and time.

In Episode 1, Kyle Wild (Keen IO Co-founder and CEO) and I brainstorm the concept of the show, debate the difference between data science and non-data science, and recount the story of the action-hero data scientist who skipped a meeting with Kyle to rescue a little girl trapped on a mountain (or so he assumes).

Tune in for all this and plenty more as we consider the many ways data shapes our lives and activates our imagination, today and in the future.

If you like what you hear, make sure to subscribe to get a new episode every two weeks. And follow us on Twitter @dsstorytime. Thanks, and enjoy the show!

IoT Analytics over time with Keen and Scriptr

This is a guest post written by Ed Borden, Evangelist at Scriptr.io, VP Ads at Soofa.

A large part of Internet of Things applications typically involves management operations; you want to know what your assets are doing right now and if you need to react in some way.

I think about that ‘realtime’ domain as an area with a particular set of tools, challenges, and thought processes, and the ‘historical’ domain as another. In the historical domain of IoT, I think about what the high-value information will be in the form of questions, like:

  • How long was this parking space unoccupied last week?
  • Which truck in my fleet was in service the longest?
  • How long was this machine in power-saving mode?
  • What are the 5 best and worst performers in this group?

For these types of questions, Keen is my go-to. However, architecting the answers to these questions takes a little bit of shifting in your architecture design.

You might typically push events to Keen as they are happening, but if you are only pushing data based on the changes in state of a thing (as is the common model for asset tracking/management-type scenarios), you won’t have enough information to ask these types of questions since you need to know how long the thing has been in each state. So, when an event comes in:

  1. you need to cache the timestamp and state the thing is going into, and
  2. create an event based on the previous cached state that was just transitioned out of, which must include the “duration” of that state.

Once this is done, Keen really shines at the rest! You can simply do a “sum” query on the durations of events, filtering by groups of devices and timeframes.

The below snippet using Keen IO will tell you how long a parking space was occupied:

var timeOccupied = new Keen.Query("sum", {
   event_collection: "deviceUpdates",
   target_property: "duration",
   timeframe: "this_7_days",
   filters: [ 
      { operator: "eq",
        property_name: "hardwareId",
        property_value: hardwareId
        operator: "eq",
        property_name: "deviceState",
        property_value: "occupied"

If you want to sum all of the parking spots on the street, give each event a “streetId” and filter by that instead of “hardwareId”.

The below snippet will tell you how many parking spaces were occupied longer than an hour (because street parking is limited to one hour and you want to know where the most violations are occurring):

var violationsOccurred = new Keen.Query("count", {
   event_collection: "deviceUpdates",
   target_property: "duration",
   timeframe: "this_7_days",
   filters: [ 
      { operator: "gt",
        property_name: "duration",
        property_value: 60
        operator: "eq",
        property_name: "deviceState",
        property_value: "occupied"

I could do this all day! That’s because once you have this sort of infrastructure in place, the sky really is the limit on the types of high-value information you can extract. And you did this all without managing any database infrastructure or API surface of your own?!

So, how do we implement the complete system? Here Keen can use a little help from an IoT service called Scriptr.io. Scriptr.io has a web-based IDE which lets you write some Javascript, hit “Save”, and that code instantly becomes a hosted webservice with a URL. Using Scriptr.io’s fast local storage module and Keen connector, we can do some caching and light processing on that ‘in-flight’ datastream in a simple and expressive way that ALSO requires no devops/infrastructure! A match made in #NoDevOps heaven. It would look like this:

//Any POST body to your Scriptr script's URL can be accessed 
//with the 'request' object  
var eventData = JSON.parse(request.rawBody);

//The 'storage' object is a key/value store which we access with 
//the current device's ID
var lastEventData = storage.local[eventData.hardwareId];

//In this example, we'll assume these are epoch times, otherwise we'd convert
var eventDuration = eventData.timestamp - lastEventData.timestamp; 

//Add the duration to the last event data object which we'll push to Keen
lastEventData.eventDuration = eventDuration; 

//This is the Scriptr.io -> Keen.io connector
var keenModule = require('../../modules/keenio/keenioclient'); 
var keen = new keenModule.Keenio("my_Keen_credentials");

//Next, record the Keen event
  collection: "deviceUpdates",
  data: lastEventData

//Cache the current event by the device's ID
storage.local[eventData.hardwareId] = eventData;

Below, you can see this in the Scriptr IDE:


There you go — Big IoT Data! You can learn more about the Scriptr.io platform here or the Scriptr -> Keen connector here.

How I Wrapped My Head Around Analytics-as-a-Service

In July I joined Keen IO, transitioning from working with organizations focused on the software development lifecycle (SDLC) to data analytics. Keen’s analytics SaaS is an end-to-end platform that allows organizations to approach data from an analytics-as-code perspective. To speed my learning process I began creating comparisons across the two industries to better understand the value and benefits that customers glean from using an API-centric analytics platform. (We call it the Intelligence API.)

In this post I’ll be sharing my thought process as I developed a deeper understanding of analytics and how it relates to SDLC concepts: specifically agile, continuous integration, continuous deployment, DevOps, and infrastructure-as-code. For a bit more background on each of these terms, feel free to read my previous blog post.

Applying Agile to Analytics

Agile is no longer just a software development term. It is now ubiquitous across marketing, sales, and organizational design. The primary principles of agile revolve around delivering what the customer needs through short sprints involving collaborative teams.

As I started contemplating analytics-as-a-service, I discovered that organizations want to be able to start collecting data on web, mobile, or IoT devices with ease and almost immediately, rather than having a drawn-out process of building an entire in-house analytics infrastructure from scratch.

The goal is that the heavy lifting should be taken care of on the backend by a service that can be responsive to peaks and valleys and the projected growth of the data collected. It doesn’t take an organization long to get started with an AWS server, and analytics should be no different, where you should be able to start making data-driven decisions to guide product direction, customer acquisition, and resource allocation strategy within days.

Achieving Continuous Integration

Continuous integration grew out of a need to add automation to agile practices, first starting with the concept of accomplishing automated unit tests on the host machine of a developer. This quick feedback allowed developers to fix their code as it was still fresh in their minds and get instant feedback regarding quality.

By using an analytics-as-code approach via API, developers can automatically gather data from cached queries, allowing these organizations to quickly ask questions of large datasets and return near-instant results. This ability to run complex queries without a lag-time allows analysts and business leaders to build momentum on ideas and begin executing on their insights in the time it takes to run a CI build.

Continuous Delivery and Deployment of Data

Continuous delivery and deployment started to become ubiquitous as top web organizations began promoting their ability to roll out code to subsets of their user bases for A/B tests in production, with the ability to quickly deploy fixes and updates while production was still running.

I found this process similar to the way analytics-as-a-service provides flexibility with data models, allowing users to collect and store unstructured data that can be analyzed in the future alongside yet-to-be-collected data from emerging technologies such as Internet of Things, Virtual Reality (VR) and Augmented Reality (AR). In this way, organizations never have to limit their future analytics capabilities by being forced into an overly rigid data model today.

Analytics Through the Lens of DevOps

While all of the unit and production testing grew in importance, organizations realized that they needed to adapt their IT operations teams to be more flexible and able to respond to internal teams’ requests. This led to the prominence of DevOps, where traditionally siloed operation teams became part of agile development teams, with the goal of delivering configured infrastructure on demand.

With analytics-as-a-service, organizations don’t need to rely on specialized data scientists, data engineers, database architects, and datacenter architects to deliver powerful analytics capabilities. Instead, the ability to ask and answer complex questions via code allows every team, from sales to marketing and product to engineering to begin discovering value.

If all of these teams can easily access, query, and customize their data views to match their specific needs then their day-to-day operations will be optimized and they essentially have a data scientist on each of their teams. One great thing I learned about Keen was that companies can even provide this ability to their own customers by integrating data visualizations directly into their products as a native part of the user experience.

From infrastructure-as-code to analytics-as-code

The most exciting evolution that unlocks all of the above capabilities for the SDLC is the adoption of infrastructure-as-code. Initially popularized by Amazon, infrastructure-as-code is now a core tenet of many IaaS or PaaS offerings, where you’re able to modify, setup, or tear down your infrastructure by utilizing an API or CLI.

Admittedly, not every analytics-as-a-service provider follows this model, but Keen has always maintained the API as the backbone of the service, with all analytics capabilities built on top of it: collection, storage, query, visualization, access, account provisioning, and more being added all the time. Since our customers are able to use our services in a programmatic fashion (a la: analytics-as-code), they are able to scale from 1 to thousands of unique projects, collections or analysis by writing a few lines of code. They don’t need to build up a huge in-house infrastructure or assemble a specialized IT and data science team to run the system. A developer is all that’s required.

Closing Thoughts

This post is by no means trying to make a comprehensive comparison of every aspect of these technologies. My goal was to walk through my own thought process in case it helps others with an SDLC background make sense of the comparatively new field of analytics-as-a-service.

If you’d like to discuss these ideas in more detail or if you have any follow-up questions about Keen, analytics-as-a-service, or analytics-as-code, I’d be happy to chat. Feel free to reach out to me at vladi@keen.io

Why we go crazy for data we love

Have you ever obsessively refreshed a dashboard to check your favorite stats? Fitness, finance, travel, sports, politics, gaming, trending cat-memes, whatever…

I’m guessing the answer is: Of course. Who hasn’t? I’m doing it in another tab right now!

At Keen, we talk about the business value of data all the time. For teams, customers, companies, decision-makers. Numbers make everyone smarter. Charts and graphs = insights! And that’s true, definitely. But I think there’s another piece of the data story, and this is it: people just love their data. They’re into it.


Maybe I should warn you that I’m trained as an English teacher, not a data scientist. But I believe the real reason people go crazy for data is because it’s a concrete manifestation of an abstract desire.

Okay, I know, you think I’m your crazy English teacher from junior year, but hear me out on this.

What are things that people want?

Success and Mastery: This one is obvious. All those Key Performance Indicators for company and personal growth. Users subscribed. Miles run. Levels upped. Retirement dollars saved. Success is a feeling but a bar chart is a rectangle, and a rectangle is real!

Love and Belonging: Love may be the most complex human emotion, but stats on Tinder are surprisingly precise. A friend recently showed me his dashboard and said he now knows, with mathematical certainty, what his type is (and whose type he is).

Significance and Impact: Who hasn’t watched the counter tick up and up on how many people like the picture of your child, or dog, or spouse, or brisket? How many re-tweets you got when you came up with just the right witticism about that thing that happened?

When it’s done right, data taps into some serious emotion.

So we’ve decided to share some of the data we love, why we love it, and what we can learn from it. To kick things off, here are the top two data obsessions on my list.

Data Obsession #1: Flightdiary.net

I love to travel. To be more specific, I love to rack up frequent flyer miles in creative ways and see how far they can take me. And I mean I want to see it!

That’s why I flipped when I found Flightdiary.net. It is an aviation geek’s dream. I enter my flights and then they get visually represented on a map of the world.

Colors represent how often I’ve flown a route: yellow for once, red for twice, purple or purpler for three or more, white for flights yet to be flown.

Why do I love this? Because, as Tears for Fears sang in the 1980’s, everybody wants to rule the world. It used to be you needed an armada for that. Now I can just admire my colorful lines on this map and suddenly I am an explorer of continents, discoverer of destinations.

But what if I want to know more?

Of course I want to know more, and that’s why I can look at graphs of my top routes, longest routes, flights by airline, by aircraft, by seat location.

What’s in it for Flightdiary?

Sometimes I ask myself this. Mostly I ask because I want to make sure they survive so I can keep building my empire. And they don’t charge me any money for it, so what’s the deal?

Honestly, I don’t know. But I speculate that they are working to get me hooked, and then in the future they will use the data to show me ads or send targeted offers based on all the stuff they know about where I like to go.

That’s just a guess. But I wouldn’t mind, because they’re not just using my data for their own purposes. They’re letting me get value out of my data, too. In this case, the value is emotional, but what’s better than that? I like emotions.

Data Obsession #2: Querytracker.net

Like many English teachers and copywriters, I harbor an ambition to publish a book someday. And I have learned from past experience that writing the book is not the hardest part of the bargain. The most challenging part is getting an agent.

It sounds so Hollywood: getting an agent. Like something that happens by magic. But I don’t want to stake my dreams on magic. Dreams are dreamy enough as it is. I want to invest in data.

That’s where Querytracker.net comes in. This website maintains a database of all known literary agents. I can sort by genres they represent, whether they’re open to new clients, where they’re located, etc. I can save my favorites, keep track of all my queries and log the responses.

That’s amazing, and it’s all free!

But what if there’s a premium membership with even more data?

Well gee whiz, am I going to skimp on my dream? No way! I ponied up my 25 bucks like they were on fire. And here’s some of the stuff I got for it.

Data Timeline This feature shows me all the data points (without user info) of other members who have queried a particular agent. So if an agent hasn’t responded to a single soul since 2013, I know to save my heartache for another day.

By contrast, if a particular agent seems to be lightning quick with rejections but more reflective about requests for pages, then I know the data is giving me permission to be hopeful about a slow reply.

Premium Reports With these velvet-rope reports, I can see things like Submission Replies broken down by whether there was no response, a rejection, a full or partial manuscript requested, all the way up to the ultimate response: an offer of representation.

Learning from Data We Love

It’s only natural to analyze the value of data in purely numeric ways. Of course you should consider the numbers. And showing customers their data can absolutely pay off in very measurable ways: higher signups, referrals, advertising revenue, premium feature upgrades, and more.

But I think there’s a meta side to the whole analytics equation. By measuring and quantifying things your customers care about, you can get intangible benefits as a happy side bonus. Things like: loyalty, enthusiasm, buzz, excitement that might make them write blog posts about you (note: Flightdiary and Querytracker have no affiliation to me or to Keen — I just think they’re awesome.) As Nir Eyal puts it, you can use data to get your customers “hooked” on your product.

Do you want to use data to enhance customer love?

It’s actually pretty easy to build analytics into your app, site, game, or device so you can show customers the data they care most about. In fact, Bluecore built a customer-facing dashboard in less than 24 hours!

If you want to know more, you can check out our guide on building customer-facing analytics or drop us a line at team@keen.io

And if you have some data you love that you’d like to share on our blog, let us know. or ping us on Slack.

Key Metrics for Successful AdTech Analytics

In today’s digital marketing landscape, AdTech companies are are popping up everywhere. The ability to precisely target your audience online can reap great rewards for your marketing and revenue objectives, and entrepreneurs and founders are taking note.

But how do you know if your AdTech efforts are successful? How do you know if your customers campaigns are successful? More importantly, how do your customers know that your AdTech product is working for them?

That’s where AdTech Metrics and Analytics come in.

In this post, we’ll look at the top metrics AdTech companies need to measure — and display to their customers — to ensure success and get a competitive advantage.

Companies like AdRoll, Facebook, and Twitter all have ad platforms that display AdTech metrics to their customers. We’ll show you how you can build and embed these kinds of analytics into your products, too — with real-time results and unlimited scalability.

Spoiler alert: it’s actually pretty easy to do.

AdRoll’s AdTech Dashboard

Step 1. Focus on what’s unique

Building an AdTech platform can be quite complicated. From building responsive ad units to creating detailed customer segments, countless hours must be spent on the delivery of ads. So once an ad is running and racking up views, clicks, and revenue, you want to make sure you can store and display this data in a scalable and real-time manner.

To do this, first you’ll need a scalable data infrastructure for data storage and collection, and then you’ll need a way to query and display individualized data to each of your clients.

We’ll show you how you can do all of this, starting with the first order of business: figuring out which metrics you need to track and display.

AdTech Metrics that Matter

So you’ve built an AdTech platform that can place and serve up ads for your clients. Great! The next thing a client will want to know is “Are my campaigns performing?” To answer this question, it’s super important to know what metrics you should be surfacing for your clients.

To get you started, here are six key metrics used by leading advertising providers such as Facebook and Google:

  • Impressions: the number of people who have seen an ad, with a breakdown between Unique Impressions and Total Impressions.
  • Exposure: the average number of times an ad is served per person.
  • Engagement Rate: the number of people who have “engaged” as a percentage of all ad views. For most ads, an engagement is typically a click-through to the advertiser’s site, but can be a video play or other interaction.
  • Conversion Rate: the percentage of people that convert on a desired outcome, such as becoming a paying customer, as a result of an engagement.
  • Relevance Score: a score between 1 and 10 that indicates an ad’s effectiveness and relevance to an audience, calculated by combining other metrics such as Conversion and Engagement Rates.
  • Revenue: total value of all purchases made as a result of an engagement with an ad or campaign.

Twitter’s AdTech dashboard

Using the right data model, you can produce these metrics with only three events:

  • Ad Views: runs once on ad load.
  • Engagement: runs each time a user engages with an ad.
  • Purchases: runs once after a purchase is completed on a client’s site.

These two events also support standard metrics like User Locations and Referral Sources if such information is needed.

Now that you’re familiar with what to track, you’re ready to learn why the above metrics are so important.

Why Certain AdTech Metrics are Important

You might be familiar with the metrics of Impressions, Exposure, Engagement Rate, Conversion Rate, Relevance Score, and Revenue but here’s a quick refresher as to why they matter:


Guaranteed to be the largest number on a dashboard, Impressions, which are the total views of an ad, is crucial when quickly assessing the success (or failure) of an ad campaign.

Sure, Impressions alone don’t specify how many people interacted with an ad, but even Google understands the importance of being seen by thousands of potential customers, which can be very exciting for your clients. Where metrics like Engagement Rate show actual interactions, Impressions show the possibilities of untapped engagement, seemingly limited only by the size of the audience.

In addition, with Interactive Advertising Bureau’s in-progress Viewability Standard, now more than ever, an impression means someone has actually viewed a client’s ad.


One of the goals of advertising on the web is to accustom potential customers to a brand, so it makes sense that increased exposure to a specific brand can be an effective strategy to improve an ad’s performance.

For example, Retargeting, a popular method for increasing exposure, is known for “high click-through rates and increased conversions” as stated by Adroll, a leader in Retargeting. Your clients can greatly benefit from this strategy, so displaying the exposure levels of their ads is critical.

Engagement & Conversion Rates

The usefulness of Engagement and Conversion Rates stems from the precise data these metrics produce. As both of these metrics measure what percentage of users perform specific actions, there is little room for ambiguity when interpreting the resulting data. Where Impressions provide an opportunity for estimating potential, Engagement and Conversion Rates are great for evaluating what actually occurs. This lets a client actively manage the performance of their ads, which is a great strategy.

Relevance Score

With numerous metrics that are vying for the attention of a client, large ad networks, run by companies like Google and Facebook, have created proprietary scoring systems that estimate an ad’s quality or relevance, giving a simple score between 1 and 10. While parts of these scoring algorithms are not public, it is known that metrics like Impressions, Engagement Rate, and Exposure are used, in part, to calculate Relevance Scores.

To reward highly relevant ads, platforms like Facebook use an ad’s Relevance Score to determine what a client pays to have their ad displayed, with a higher score resulting in a lower price.

This sophisticated metric can be a difference-maker for your AdTech platform.


While impressions and engagements are useful in measuring the success of an ad campaign, it’s important to pair those metrics with how much revenue an ad is generating.

For many clients, the end goal is sales and other metrics are just part of the advertising process, so accurately listing revenue from an ad is very important.

Making it happen

Ready to dive in? Create a free Keen IO account and check out our full guide to get started with building your very own AdTech dashboard for your clients.

Have questions about AdTech analytics? Reach out to us on Slack or email us anytime!