Keen and GDPR

“A long-exposure shot of a busy London Underground station” by Anna Dziubinska on Unsplash

You’ve probably heard all about the EU’s new regulation, the General Data Protection Regulation (GDPR). The GDPR applies not only to EU-based businesses but also to any business that controls or processes data of EU citizens. Not only is GDPR an important step in protecting privacy for European citizens, it also raises the bar for data protection, security, and compliance in the industry.

At Keen, we’ve been hard at work to ensure that our own practices are GDPR-compliant. A big piece of that is ensuring that our product makes it easy for you to use Keen to handle data in compliance with GDPR requirements. In March 2018 we published a blog post that detailed the steps we would take in order to accomplish this.

Since that time, we’ve accomplished the following:

  • Appointed a Data Protection Officer and a data protection working team
  • Built a formal data map
  • Performed internal threat modeling and gap analysis (and set up a recurring schedule)
  • Adopted and formalized written policies around core areas, including (but not necessarily limited to): data protection, data backup, data retention, access management, and breach management and reporting
  • Conducted formal data protection training for all Keen employees
  • Encrypted data at rest (still in progress for some data)
  • We’re working with a 3rd party auditor to schedule annual security audits
  • Completed legal paperwork to confirm that our Data Sub-processors (primarily Amazon) are GDPR-compliant
  • Offer a Data Processor Agreement to our customers upon request
  • Received Privacy Shield certification

There are several additional security enhancements that we will continue to iterate on and improve over time:

  • More granular access controls, allowing Keen employees to be granted access according to the Principle of Least Privilege
  • Full customer data access audit history
  • Lockdown of Keen employee devices, and/or limiting access to customer data to certain approved devices

** A note about data deletion **

During our many conversations with customers about their GDPR compliance efforts and concerns, the most common theme was the need for various types of data deletion. Some examples that we’ve heard include:

  • specific property removal from all events
  • deletion (or anonymization) of all events matching certain filters (e.g. all events with a specific for “right to be forgotten” requests)
  • one-time deletions of all data before some time threshold
  • on-going “expiration” of data older than some horizon

While the Keen delete API endpoint can handle some of these at small scale, for larger use cases we felt that a more powerful toolset was needed. That toolset is now under active and on-going development, and is used internally. It can be run on customers’ behalf on a case-by-case basis. If you have GDPR-related deletion needs please contact us for more details.

Keep a lookout for more updates on our blog as we continue to make performance and security enhancements to Keen.

Keen and the EU General Data Protection Regulation (GDPR)



Update on Keen and GDPR Compliance

Keen is deeply committed to doing our part to ensure that personal data is adequately protected. As such, we are actively reviewing the requirements of EU Regulation 2016/679 (more commonly referred to as “GDPR”) and how they affect us and our customers. In this blog post we’ll try to provide as much information and guidance as possible for you to remain in GDPR compliance with Keen.

Our Data Protection Philosophy

Keen stores two different classes of data: (a) the account information of our direct customers, as provided to us via accounts on the website and/or through support channels such as e-mail or chat; and (b) data about our customers’ customers in the form of events submitted to our streams API.

We have designed our system to be resistant to attack against either class of data, but the second category (Keen’s customers’ event data) is more complicated due to the fact that we allow highly flexible content and cannot directly control what information is included or how personally identifiable or sensitive the information or data might be. For this reason we always recommend against the storage of any Personally Identifiable Information (PII) or otherwise sensitive data in event properties.

We believe that most use cases for Keen do not inherently rely on personal data and such data can be anonymized, pseudonymized, or omitted entirely without losing value. As such it is more valuable for our customer base as a whole for us to focus our engineering effort on other aspects of the product, rather that building high-assurance security protections that most customers do not need.

That said, we strive to be as secure as possible, and will continue to improve our security posture. We also recognize that some customers do have legitimate use cases for storing some amount of low-sensitivity PII (such as e-mail or IP addresses, for example), and those require a somewhat more rigorous data protection strategy than what we have in place now. So over the coming months we are making investments to move in that direction.

How Keen Secures Data Today

Our data protection strategy spans several dimensions: technology, people, and processes.


The most direct way that we protect data is by limiting access to it using standard industry best practices. All data is stored on hardware in Amazon’s AWS cloud, using a VPC to isolate all servers from the outside internet. These systems can only be accessed via a set of bastion hosts which are regularly updated with the latest security patches, and which can only be connected to using SSH channels secured by a select group of Keen employees’ cryptographic access keys. We’ve also adopted strict requirements around access to the AWS environment itself, including mandatory Multi-Factor Authentication (MFA) and complex passwords.

This structure makes direct access to our internal systems quite difficult for an unauthorized person, but it cannot protect the public-facing endpoints such as (i.e. our website) or We secure these via the access keys available in each Keen Project or Organization, which adhere to cryptographic best practices.

(Please note that we currently do not encrypt traffic between various internal services within our VPC, nor do we encrypt data at rest. Up to this point we have not felt that there was much value in doing so, since the only practical exploit of this would require direct physical access to Amazon infrastructure. However we do plan to enable basic data-at-rest encryption soon; see roadmap below.)


The Keen web UI includes a mechanism by which authorized Keen employees can view customer data directly. This is used to help investigate and address any issues or questions reported to us by customers, as well as occasionally by our operational engineering team to diagnose and mitigate degradation of service. The mechanism is password-protected and limited to those who require it to provide customer support or to fulfill other responsibilities.

We also adhere to a policy of only using this access when it is necessary, and will seek permission before viewing customers’ raw event data. (In rare circumstances where the need is urgent, such as a system-wide outage, we may skip this step — but only as a last resort.)

Currently this “root” access is all or nothing and we rely on our hiring and training processes to mitigate the risk of unnecessary access by a Keen employee. The build out of a granular access control system is on our roadmap (see below).


We adhere to the following processes to help ensure that data is kept safe:

  • Access management: when a Keen employee leaves the company, we follow a checklist to ensure that all of their permissions are revoked.
  • Design and code reviews: all changes to the system are reviewed carefully by senior engineers, as well as tested in an isolated staging environment prior to deployment to production.
  • Threat modeling: periodically we review the threat model and try to identify gaps, assess risk, and determine what mitigations (if any) should be prioritized.
  • Automated backups: all data is automatically backed up to Amazon S3 to allow us to recover in the event of a catastrophic loss, whether due to malicious attack or other unexpected events. These backups age out over time, so any data which is removed from the source will eventually no longer appear in the backups. (We currently can’t offer any guarantees about how long it will be for any specific piece of data.)
  • Data retention: Keen stores data for as long as it is necessary to provide services to our customers and for an indefinite period after a customer stops using Keen. In most cases, data associated with a customer account will be kept until a customer requests deletion. (There is also a self-service delete API which is suitable for removing small amounts of data.)

Our Security and Privacy Roadmap

We will be making improvements to all of the above according to the following roadmap.

What we are intending to deliver by the GDPR deadline

GDPR goes into effect on May 25, 2018. Prior to that time Keen intends to:

  • Appoint a Data Protection Officer and a data protection working team
  • Build a formal data map
  • Perform internal threat modeling and gap analysis (and set up a recurring schedule)
  • Adopt and/or formalize written policies around core areas, including (but not necessarily limited to): data protection, data backup, data retention, access management, and breach management and reporting
  • Institute formal data protection training for all Keen employees
  • Encrypt data at rest
  • Schedule annual security audit with a 3rd party auditor (however the audit may not be completed until later in 2018)

We also intend to do the necessary legal paperwork to be able to confirm that our Data Sub-processors (primarily Amazon) are GDPR-compliant, and to be able to offer a Data Sub-processor Addendum to the contracts of customers who request it.

What we hope to improve over time

The following are examples of additional security enhancements that will not be addressed by the May 25 deadline:

  • More granular access controls, allowing Keen employees to be granted access according to the Principle of Least Privilege
  • Full data access audit history
  • Lockdown of Keen employee devices, and/or limiting access to customer data to certain approved devices
  • Integration with an intrusion detection system/service
  • Industry certifications

In addition, we expect that threat modeling and gap analysis (both our own and those done by a 3rd party auditor) will identify opportunities to further harden the system and provide redundant layers of risk mitigation. Those will be prioritized and incorporated into our roadmap as appropriate.

Next Steps

Ultimately our goal is to make Keen as valuable as possible to all of our customers. We appreciate your understanding, and also greatly value your input. If you have questions, concerns, or feedback about our approach or how it will affect your own GDPR compliance efforts, please reach out to us at!


Order and Limit Results of Grouped Queries (Hooray!)

Greetings Keen community! I’d like to make a quick feature announcement that will (hopefully) make many of you happy 😊

At Keen IO we’ve created a platform for collecting and analyzing data. In addition to the ability to count the individuals who performed a particular action, the API includes the ability to group results by one or more properties of the events (similar to the GROUP BY clause in SQL). For example: count the number of individuals who made a purchase and group by the country they live in. This makes it possible to see who made purchases in the United States versus Australia or elsewhere.


Results of a group_by query, complete with dozens of annoying tiny slices

This grouping functionality can be very powerful, but there’s one annoying drawback: if there are many different values for your group_by property then the results can get quite large. (In the example above note all of the tiny slivers representing countries with only a handful of purchases.) What if I’m only interested in the top 5 or 10? Until now the only option was to post-process the response on the client (e.g. using Python or JavaScript) to sort and then discard the unwanted groups.

Today I’m excited to announce that, by popular demand, we’ve made this much easier! We recently added a feature called order_by that allows you to rank and return only the results that you’re most interested in. (To those familiar with SQL: this works very much like the ORDER BY clause, as you might expect.)

The order_by parameter orders results returned by a group_by query. The feature includes the ability to specify ascending (ASC) or descending (DESC) ordering, and allows you to order by multiple properties and/or by the result of the analysis.

Most importantly the new order_by feature includes the ability to limit the number of groups that are returned (again, mirroring the SQL LIMIT clause). This type of analysis can help answer important questions such as:

  • Who are the top 100 game players in the US?
  • What are the top 10 most popular article titles from last week?
  • Which 5 authors submitted the most number of articles last week?
  • What are the top 3 grossing states based on sum purchases during Black Friday?

order_by can be used with any Keen query that has a group_by, which in turn can be used with most Keen analysis types. (limit can be used with any order_by query.) For more details on the exact API syntax please check out the order_by API docs.

There is one important caveat to call out: using order_by and limit in and of itself won’t make your queries faster or cheaper, because Keen still has to compute the full result in order to be able to sort and truncate it. But being able to have the API take care of this clean-up for you can be a real time saver; during our brief internal beta I’ve already come to rely on it as a key part of my Keen analysis toolbox.

I’d like to extend a huge thanks to our developer community for all the honest constructive feedback they’ve given us over the years (on this issue and many others). You’re all critical in helping us understand where we can focus our engineering efforts to provide the most value. On that note: we have many more product enhancements on the radar for 2018, so if you want to place your votes we’re all ears! Feedback (both positive and negative) on the order_by feature is also welcome, of course. Please reach out to us at any time 🚀

Kevin Litwack | Platform Engineer

Tracking GitHub Data with Keen IO

Today we’re announcing a new webhook-based integration with one of our favorite companies, GitHub!

We believe an important aspect of creating healthy, sustainable projects is having good visibility into how well the people behind them are collaborating. At Keen IO, we’re pretty good at capturing JSON data from webhooks and making it useful, which is exactly what we’ve done with GitHub’s event stream. By allowing you to track and analyze GitHub data, we’ve made it easy for open source maintainers, community managers, and developers to view and discover more information to quantify the success of their projects.

This integration records everything from pushes, pull requests, and comments, to administrative events like project creation, team member additions, and wiki updates.

Once the integration is setup, you can use Keen IO’s visualization tools like the ExplorerDashboards, and Compute API to dig into granular workflow metrics, like:

  • Total number of first-time vs. repeat contributors over time
  • Average comments per issue or commits per pull request, segmented by repo
  • Pull request additions or deletions across all repositories, segmented by contributor
  • Total number of pull requests that are actually merged into a given branch
Number of comments per day on Keen IO’s JavaScript library repos
Number of pull requests per day merged in Keen IO’s repos, “false” represents not merged
Percentage of different author associations of pull request reviews

Ready to try it out?

Assigning webhooks for each of these event types can be a tedious process, so we created a simple script to handle this setup work for you.

Check out the setup instructions hereWith four steps, you will be set up and ready to rock in no time.

What metrics are you excited discover?

We’d love to hear from you! What metrics and charts would you like to see in a dashboard? What are challenges you have had with working with GitHub data? We’ve talked to a lot of open source maintainers, but we want to hear more from you. Feel free to respond to this blog post or send an email to Also, if you build anything with your GitHub data, we’d love to see it! ❤

Announcing Hacktoberfest 2017 with Keen IO

It’s October, which you probably already know! 👻 But more importantly, that means it is time for Hacktoberfest! Keen IO is happy to announce we will be joining Hacktoberfest this year.

What is Hacktoberfest?

Digital Ocean with GitHub launched Hacktoberfest in 2014 to encourage contributions to open source projects. If you open four pull requests on any public GitHub repo, you get a free limited edition shirt from Digital Ocean. You can find issues in hundreds of different projects on GitHub using the hacktoberfest label. Last year, 29,616 registered participants had opened at least four pull requests to complete Hacktoberfest successfully, which is amazing. 👏

Hacktoberfest with Keen IO

If you have ever seen our Twitter feed, you know at Keen IO we love sending our community t-shirts. So, we have something to sweeten the deal this year. If you open and get at least one pull request merged on any Keen IO repo, we will send you a free Keen IO shirt and sticker too.

You might wonder… What kind of issues are open on Keen IO GitHub repos? Most of them are on our SDK repos for JavaScript, iOS/Swift, Java/Android, Ruby, PHP, and .NET. Since we value documentation as a form of open source contribution, there’s a chunk of them that are related to documentation updates. We labeled issues with “hacktoberfest” that have a well-defined scope and are self-contained. You can search through them here.

Some examples are…

If you have an issue in mind that doesn’t already exist, feel free to open an issue on a Keen IO repository and we can discuss if it is an issue that is a good fit for Hacktoberfest.

Now, how do you get your swag from Keen IO?

First, submit a pull request for any of the issues labeled with the “hacktoberfest”. It isn’t required, but it is also helpful to comment on the issue you are working on to say you want to complete it. This prevents other people from doing duplicate work.

If you are new to contributing to open source, this guide from GitHub is super helpful. We are always willing to walk you through it too. You can reach out in issues and pull requests, email us at, or join our Community Slack at

Then, once you have submitted a pull request, go through the review process, and get your PR merged, we will ask you to fill out a form for your shirt.

Also, don’t forget to also register at for your limited edition Hacktoberfest shirt from Digital Ocean if you complete four pull requests on any public GitHub repository. They also have more details on the month long event.

These candy corns are really excited about Hacktoberfest

Thank you! 💖

We really appreciate your interest in contributing to open source projects at Keen IO. Currently, we are working to make it easier to contribute to any of the Keen IO SDKs and are happy to see any interest in the projects. There’s an issue open for everyone from someone wanting to practice writing documentation to improving the experience of using the SDKs. Every contribution makes a difference and matters to us. At the same time, we are happy to help others try contributing to open source software. Can’t wait to see what you create!

See you on GitHub! 👋


P.S. Keen IO has an open source software discount that is available to any open source or open data project. We’d love to hear more about your project of any size and share more details about the discount. We’d especially like to hear about how you are using Keen IO or any analytics within your project. Please feel free to reach out to for more info.

SendGrid and Keen IO have partnered to provide robust email analytics solution

1_BrGXwg7xdpOMaVgv9JXxUg.pngToday we’re announcing our partnership with SendGrid to provide the most powerful email analytics for SendGrid users.


SendGrid Email Analytics — Powered by Keen IO

Connect to Keen from your SendGrid account in seconds. Start collecting and storing email data for as long as you need it. No code or engineering work required!

The SendGrid Email Analytics App operates right out-of-the-box to provide the essential dashboards and metrics needed to compare and analyze email campaigns and marketing performance. Keen’s analytics includes capabilities for detailed drill down to understand users and their behavior.

Keen IO’s analytics with SendGrid enables you to:

  • Know who is receiving, opening, and clicking emails in realtime
  • Build targeted campaigns based on user behavior and campaign performance
  • Find your most or least engaged users
  • Extract lists of users for list-cleaning and segmentation
  • Drill in with a point-and-click data explorer to reveal exactly what’s happening with your emails
  • Keep ALL of your raw email event data (No forced archiving)
  • Build analytics for your customers directly into your SaaS platform
  • Programmatically query your email event data by API



SendGrid Email Analytics — Powered by Keen IO

The solution includes campaign reports, as well as an exploratory query interface, segmentation capabilities, and the ability to drill down into raw email data.

Interested in learning more? Check out the Keen IO Email Analytics Solutionon SendGrid’s Partners Marketplace.

.NET Summer Hackfest Round One Recap

We kicked off the .NET Summer Hackfest with the goal of porting our existing Keen IO .NET SDK to .NET Standard 2.0, and I’m excited to say that we just about accomplished our goal! Our entire SDK, unit tests, and CI builds have been converted to run cross-platform on .NET Standard. All there is left to do is a little bit of clean up and some documentation updates that are in the works.

There are some big benefits to adopting .NET Standard 2.0, here are some highlights:

  • The Keen .NET SDK can be used with .NET Core, which means it can be included in apps deployed on Linux, Mac OS, and cool stuff like Raspberry Pi
  • Mono-based projects will be officially supported in their next version, which may or may not have worked before, but now it’ll for sure work. This also means Unity can use the new .NET Standard library!
  • We can multi-target and to reduce the size of the codebase and complexity
  • All the Xamarin variations will be supported in their next version

Everyone who contributed during this event was open, collaborative, and ready to learn and teach. We were very happy to be a part of this and look forward to future ‘hackfests’.

I’d like to give a special shoutout to and thank our community contributors that jumped in on the project: Doni Ivanov & Tarun Pothulapati

I’d also like to thank Justin & Brian from our team, Jon & Immo from Microsoft, & Microsoft MVP Oren for all their work and support during our two week sprint.

How to Stop New Parents from Quitting Their Jobs

1_pWOmiQQnUYvUTMSSV_CNzA.jpegI didn’t realize how important my identity as a ‘high-performer’ was to me until I tried to return to work after having my first child. There were the predictable struggles of “why am I leaving my baby with a stranger?” panic and “ugh, it’s seriously difficult to concentrate with only 3 hours of sleep” exhaustion. What I didn’t predict was the surprising amount of self-induced shame in what I perceived as not performing at my old level of productivity.

In my old life, if I ever felt I wasn’t being particularly productive, I could easily log extra time. In my new life, those hours were directly taking time away from a new baby I was already struggling with being separated from. This sense of being a crap mom (because I was away so much) and a crap employee (because I wasn’t working the hours I used to) translated into a feeling of failing at everything. Intellectually I knew I was applying unrealistic expectations to myself but emotionally, the feeling was unshakeable.

The statistic that 43% of new moms quit their job within the first three months of returning to work suddenly made sense. Even 50% of men reportthey’ve passed up work opportunities, switched jobs or quit after babies arrive. When I saw that figure while pregnant, I thought it was shockingly high. Now I get it. I seriously wrestled with the temptation of quitting to escape the fear of being a low performer. I would have rather quit than suck at my job. You’re seeing into my psyche here and maybe wondering if I have self-confidence issues. But hearing similar stories, including stories of women turning down promotions after they returned from maternity leave, I realized I wasn’t alone in this struggle.

There are many factors that contribute to the 43% statistic, including cost of childcare, concerns about quality of childcare, desire to not be separated from child, and/or the stress of trying to juggle the new role as parent with a job. For some, leaving a paycheck may not be an option. However, these parents may choose to avoid demanding projects, not compete for promotion, avoid risky innovation, or disengage from their work in other ways.

What can be done for the new parents who want to remain a contributing member of a team, but have to navigate some very concrete logistical challenges of their new life? Many Bay Area companies are already savvy in terms of providing generous leave plans and amenities. Organizations need to explore the less obvious challenges of returning to work to move the needle on their retention statistics.

Why is it so hard?

New time constraints. Gone are the old days of working into the evening if there is a big project. Like many parents who’ve returned to work, my day has a new hard stop when I need to get my kid from our caregiver. Plus, while I am at work, I have to lock myself in an isolation chamber every three hours to pump. The actual pumping process takes 15 minutes, then another 15 minutes to wash bottles and put the milk in the fridge. This schedule makes the process of being productive a little maddening. (Pro tip: putting a refrigerator in the pumping room saves mom those 15–20 if she can put milk away in the same room and refrigerate her pump parts instead of wash them between pumping sessions.)

Lessened work relationships. The impact of this time-constraint is that I’ve become cutthroat about time management. I am much less patient with people using me as a thought partner or trying to grab unscheduled time. It has also meant spending less time eating lunch with co-workers (time now dedicated to pumping) and far less time at the proverbial water cooler chatting about the weekend (never mind I haven’t been outside my house past 7pm in eight months so I wouldn’t have much to contribute.) I don’t feel I can spend time on non-work topics. I realize diminishing connection time impacts trust and ability to collaborate, but it’s a trade-off I feel necessary.


Photo by Andrew Branch

A belief that things can “Return to Normal” While this one is applicable to both primary and secondary caregivers, I think returning fathers/secondary caregivers are hit with this one the hardest. While most people tend to remember that a woman has literally just birthed a child, there is a pervading sentiment that after a few weeks of leave, fathers/partners are ready to get back to work assuming the same role, hours, priorities that were present before their child was born. And yet, they’ve taken on a whole new job that they are expected to assimilate flawlessly. Obviously figuring out how to suddenly juggle two jobs is a stressful experience for most!

A friend reminded me of the challenge of dealing with lowered self-esteem. As she eloquently put it:

When I returned to work, I literally felt slower and dumber than I ever have in my life. A number of factors contributed: being sleep deprived literally makes you dumber, being disconnected from work for a few weeks makes you feel behind, being on painkillers for 6 weeks turns off the intellectual part of your mind, a shifting sense of identity makes you feel like you’ve lost your footing, and having a weird lumpy body that leaks pee and milk everywhere is also a thing that affects confidence.

Indeed, in the early weeks after returning to work, I remember feeling like I had a secret second life. Outside of work, I was a ball of emotions, crying in frustration because we were out of toothpaste, or crumpling in tears hearing Joni Mitchell. And then, I’d put on my business clothes and try to project confidence. We all fake our personas to an extent, but it felt particularly exhausting in those early days.

The above examples are meant to illustrate and provide insight into the attrition statistic. The subtle challenges new parents face aren’t solved by extending parental leave from 12 weeks to 16 weeks. While the extra time is nice, in my experience, it doesn’t impact the real challenge of finding homeostasis in a new life. (Switching to a year of paid leave would be a different story.) So again, what can organizations do to become more family friendly, beyond the expected table stakes of paid leave and a mother’s room?

I found three things helped me significantly. These are things that cost little to nothing for organizations to adopt to help new parents with the transition back to work:

Three no-cost things your organization can do

1: Talk About It. As Brene Brown writes:

We all experience shame. We’re all afraid to talk about it. And, the less we talk about it, the more we have it…If we cultivate enough awareness about shame to name it and speak to it, we’ve basically cut it off at the knees.

1_dQI_6oWxfHYYlSaXDMBpvw.jpegPhoto by Matthew Henry

Luckily for me, Keen has an established weekly “Introspection Happy Hour” where we literally sit in a circle and talk about our feelings. Even with this established ritual in place, it took a fair amount of time and courage to openly admit my struggles and fear that I wasn’t adequately contributing value to the company. But to my pleasant surprise, Brene Brown was right. Once I started talking about it, the shame diminished and I was able to concentrate on my work.

Perhaps your organization doesn’t have a dedicated practice of a feelings circle. You can still create socialization times that new parents can attend, such as team lunches, when parents aren’t rushing home to pick up kids from other caregivers. These times help to create the space for parents to chat about their experience with peers at work. Another way to encourage sharing of experiences is by creating a #parenting channel in Slack (or whatever interoffice communication you use). This helps normalize the return process by allowing parents to commiserate with one another and establish that this discussion is okay to have at work.

Providing access to a coach (who isn’t their boss) for parents to honestly talk through their struggles also provides a great deal of support to returning parents. Everyone at Keen IO has access to confidential coaching, a program consistently cited as the number one perk of working at our organization. If you don’t have a coaching program or budget, find the closest thing you have to a supportive confidant such as an HR Business Partner. You can also supplement benefit programs with additional resources designed specifically for new parents.

2: An Executive Champion. The first few months after my return to work, our Chief Data Scientist would periodically say things to me such as, “you are doing amazing.” One time I remember sharing a feeling of stress or fatigue; I don’t remember exactly what it was but I remember her response vividly. She responded by asking me, “Did you take on too much?” This was a key moment for me. She wasn’t placating me with a response of “you are doing fine,” she was acknowledging that I was struggling and suggesting that perhaps I’d chosen to make my scope of work too big. And that I could take less on to my plate and still be a valuable member of the team. This helped me stop trying to overcompensate by saying yes to everything. Having a leader communicate a message that I was doing fine and meeting or exceeding expectations greatly helped me stop applying an undue amount of pressure and stress on myself. This type of support and conscientiousness is something all great managers can offer their returning parents.

3: Flexibility to Customize to the Individual

Leave Customization. When I was out on maternity leave, I only went completely out of touch for 2 weeks. After that I started having once per week 1:1s on an alternating schedule with the two teams I sit on (HR & Leadership Team). This enabled me to be available as a thought partner to the people covering my areas and to stay in touch with the business. It also allowed me to spend the remaining 95% of my time concentrating on my new baby and not wondering or worrying what was going on at work. As a counter example, I have a mom friend who had to turn in her laptop during her maternity leave. For four months she was expected to go completely dark. This would have given me tremendous anxiety to be so out of touch. Maybe for others, being able to unplug completely would be ideal, but it wasn’t my friend’s choice, it was her only option. Creating choice allows the parent to tailor their leave to what will best meet their unique needs. While some people might do best with a big chunk of uninterrupted leave, others might thrive with an earlier return to work and shortened work week for a period of time.


Photo by Kari Shea

Location Customization.Keen’s culture supports working remotely. 38% of our staff is remote and we encourage our SF based team to decide where they will be most productive. Many people tend to work in the office a few days per week for meetings and at home or in coffee shops one or more days per week when they need to do heads down or creative work. Having the ability to work from home two days per week massively alleviated the stress of having to face what can be a daunting, seemingly black/white choice between my career or my baby. If I had been immediately faced with the prospect of going from being with my baby everyday to away from her 5 days per week there is a very real chance I would have quit my job. I certainly would have been more distracted and distraught. My daughter is with a nanny during the two days I work from home. Instead of stopping work to pump like I do while at the office, I can spend those 20 minutes breastfeeding her and saying hello. These periodic visits and cuddle time make all the difference to me. Plus working from home saves me 3 hours of commute time, which I can invest in being productive instead.

Hours Customization. Allowing for a flexible schedule provides the new parent with the ability to adapt. This might mean replacing the working day of 9am — 6pm with a 9:30–4:30 (to drop off & pick up kids) supplemented by some early morning hours, some evenings, some weekends, etc. Even better if the organization focuses on the value created versus the hours clocked. As our CEO often says, “I care about outputs, not inputs.” Eliminating a “butts in chairs” culture provides new parents the opportunity to find the schedule that works for their family.


1_E5TVk4U-253qbPEhpBPvbg.jpegPhoto by Nick Wilkes

Every new parent is different and their needs will be different. Having a first child is a big life change and some new parents may not know what they need right away. Flexibility is a massive support your organization can provide to help retain valuable employees as they find their footing as new parents.

With awareness of the subtle and not-so-subtle stresses affecting new parents, organizations can examine the benefits, culture, and return-to-work programs they provide to help support new parents. In return they’ll receive increased retention numbers and productive employees.

If this is a topic you want to learn or talk more about, check out this event on August 15th:

Insider Tips on Returning to Work After Parental Leave
Date: Tuesday, 8/15, 8–9:30am
Location: LUMINA, 338 Main St, San Francisco, CA 94105
RSVP here

9 Projects Showcased at Open Source Show and Tell 2017

The 4th annual Open Source Show & Tell wrapped up and we had such a great time experiencing and seeing some cool open source projects.


OSSAT 2017, Community-Submitted Talk: Musical Plushies: A STEAM-y Fusion by Ashley Qian

We got interactive with Ashley going on a journey building smart musical IoT plushies, and were wowed by Beth’s talk on unifying the .NET developer community.

Joel walked us through the inner workings of software development (the good, bad, and ugly), and show us how the purely functional and open source package manager, Nix, can help with package and configuration management. Zach took us on a journey into why the open source project Steeltoe was built, and showed us how developers can write in .NET and still implement industry best practices when building services for the cloud.

We learned from Josh at Algolia how you can scale a developer community by creating webhooks for community support, and Sarah (image left) took us along a journey understanding open source’s role in cloud computing at companies like Google.

Julia presented about internationalizing if-me, an open source non-profit mental health communication platform maintained by contributors from many backgrounds and skill sets.

There were lots of other excellent talks about open source project like Babelfish a self-hosted server for source code parser presented by Eiso, and Nicolas’s talk about helping people build better APIs following best practices via

Check out all of the topics and talks here.

Big thanks to GitHub, Google, and Microsoft for co-organizing and hosting. Looking forward to seeing you at Open Source Show and Tell next year!

We ❤ open source. We’d love to hear more about your project and share it with others. To help with any analytics needs, Keen IO has an open source software discount available to any open source or open data project. Please feel free to reach out to for more info.

Visualizing your Keen IO Data with Python and Bokeh

1_UyHFmhWuca_UFcb3hP2-QQIn a previous post I wrote, we created a basic example that analyzed earthquakes using the Keen Python Client with Jupyter Notebook. In this post we’re going to be looking at creating visualizations in Python using a visualization library called Bokeh.

Getting Started

To install Bokeh, run pip install bokeh in your shell. After Bokeh has finished installing, open up Jupyter Notebook by running jupyter notebook. In the first cell, you’ll need to set up a Keen Client in Python:

import keen from keen.client import KeenClient
KEEN_PROJECT_ID = "572dfdae3831443195b2f30c"
KEEN_READ_KEY = "5de7f166da2e36f6c8617347a7a729cfda6d5413db8d88d7f696b61ddaa4fe1e5cdb7d019de9bb0ac846d91e83cdac01e973585d0fba43fadf92f06a695558b890665da824a0cf6a946ac09f5746c9102d228a1165323fdd0c52c92b80e78eca"
client = KeenClient(

We’ll need to run a query similar to the one we used last time. In the next cell, make a count_unique query on the earthquakes collection with a daily type interval. This will return a dictionary containing the number of earthquakes per day.

earthquakes_by_day = client.count_unique(“earthquakes”,
        “start”: arrow.get(2017, 6, 12).format(),
        “end”: arrow.get(2017, 7, 12).format()
Output of the “count_unique” query

Let’s import Bokeh so we can visualize earthquakes_by_day. Run the code below in a new cell.

from bokeh.plotting import figure, show
from import output_notebook

The first line imports figure and show, two functions that will let us plot our data. The next line imports a function called output_notebook. We need to run this method before we start plotting data so our plots are drawn below our notebook cells.

Plot Our Data

A line graph would be a great choice to plot this data. In order to plot this, we need to pull out the number of earthquakes for a timeframe and the corresponding date.

y = list(map(lambda row: row[“value”], earthquakes_by_day))
x = list(map(lambda row: arrow.get(row[“timeframe”][“start”]).datetime.replace(tzinfo=None), earthquakes_by_day))
# Now we can plot! 
# `figure` initializes the chart object
pl = figure(title=”Number of Earthquakes per Day”, x_axis_type=”datetime”)
# `line` takes lists of the x and y values and plots.
pl.line(x, y)
Line graph generated by Bokeh

In the code sample above, y is a list containing the counts per day, x a list of the datetime values, and figure initializes the chart object. The linemethod takes x and y and plots those values into a line. show(pl) is the method that actually draws the chart in our notebook.

Customize Our Chart

Bokeh even lets us add tooltips to our charts! We can import HoverTool by calling from bokeh.models import HoverTool in a new cell and pass an instance of HoverTool to our figure object.

from bokeh.models import HoverTool
pl = figure(title=”Number of Earthquakes per Day”,
pl.line(x, y)
hover = HoverTool(
        (“Date”, “@x{%F}”),
        (“Count”, “@y{int}”)
    formatters={“x”: “datetime”},
Tooltips in our graph!

We have to do a little bit of configuration in HoverTool to make sure the tooltips displayed the correct date and didn’t display any values in between the data points (try removing the tooltips option and see what’s displayed). You can check out the Bokeh docs on HoverTool if you want the tooltips to look different.

Notice that there are a lot of earthquakes on 6/17! This might be an interesting place to dive deeper.

We pulled data from a Keen project using Python, drew a line graph for a month’s worth of data, and added interactivity to the chart we drew. The code for this example is available on GitHub. Try playing around with it yourself! If you want to use this example to visualize your own event data, sign up for your own Keen account and read how to get started!

Next time, we’ll plot the earthquakes that happened in that time period using Basemap and see if we can find anything interesting.

11 Beautiful Event Data Models

Photo by Samuel Zeller

One of the most common requests that I get here at Keen is for help with data modeling. After all, you’ve got to collect the right data in order to get any value out of it. Here’s an inventory of common, well-modeled events across a variety of industries:

  1. B2B Saas (create_account, subscribe, payment, use_feature)
  2. E-Commerce (view_item, add_to_cart, purchase)
  3. Gaming (create_user, level_start, level_complete, purchase)

All of the examples are live code samples that you can run and test yourself by clicking “Edit in JSFiddle”.

B2B SaaS Event Data Models

Track what’s happening in your business so that you can make good decisions. With just a handful of key events, you have the foundation for the classic SaaS pirate metrics (AARR: Acquisition, Activation, Revenue, Retention).

Create Account Event (Acquisition)

Capture an event when someone signs up for the first time or creates an account in some other way.

Subscribe (Acquisition)

Track an event when someone subscribes to your newsletter, chatbot, etc.

Use Feature (Activation)

It’s really common for product managers and marketers to want to know who is doing what in their products, so they can make roadmap decisions and setup marketing automation. Here’s an example of a event where a feature “Subscribe to SMS alerts” has been done by the user.

By including details about the feature on the event, you can provide yourself a nice dataset for later A/B testing and analysis. (e.g. did changing the button text increase or decrease usage?).

Invoice Payment (Revenue & Retention)

This is a simplified example of an invoice payment event. If you use Stripe for payments, you can consume their event firehouse into Keen directly and don’t need to model it yourself.

You can see the full Stripe Invoice object here.

Checkout more SaaS analytics uses and applications.

E-commerce Event Data Models

Track what’s happening in your store so that you can maximize sales, marketing investments, and provide detailed analytics to your vendors.

View Item / View Product Event

People checking out your goods? Good. Track it.

Add Item to Cart

Track every time someone adds a product to their cart, bag, or basket.

Successful Checkout Event

Track an event every time an order is successfully completed. Use this event to count the total number of orders that happen on your site(s).

Use the Purchase Product Event (below) to track trends in purchases of individual items.

Product Purchase Event

Track an individual event for each item purchased. That way you can include lots of rich details about the product and easily run trends on specific products.

Gaming Event Data Models

Track what’s important in your game so that you can measure activation, engagement, retention, and purchasing behavior. Checkout this related guide: Data Models & Code Samples for Freemium Gaming Analytics

New Player Event

Track every time a new player starts your game for the first time.

Level Start Event

Track each time a player starts a new level in your game.

Level Complete Event

Track each time a player successfully defeats a level in your game. The data model is the same as level_start, but you’ll have much fewer of these events depending on what type of game you’ve designed.

In-Game Purchase Event

Track when players making purchase in your game you get that $.

A new way to debug your data models

We’re excited to announce the new and improved Streams Manager for inspecting the data schema of your event collections in Keen IO. We built the Streams Manager so you can ensure your data is structured well and set up to get the answers you need.

With Streams Manager you can:

  • Inspect and review the data schema for each of your event collections
  • Review the last 10 events for each of your event collections
  • Delete event collections that are no longer needed
  • Inspect the trends across your combined data streams over the last 30-day period

The Streams Manager can be found within the ‘Streams’ tab of your Project Console.

Streams Manager within the Streams Tab of your Keen IO Project Console
Inspect your data models with the Streams Manager

Ready to get started? Log-in to your Keen IO account or create a new account to start streaming data.

Questions or feedback? Hit us up anytime on Slack.

Architecture of Giants: Data Stacks at Facebook, Netflix, Airbnb, and Pinterest

Photo by Ondrej Supitar

Here at Keen IO, we believe that companies who learn to wield event data will have a competitive advantage. That certainly seems to be the case at the world’s leading tech companies. We continue to be amazed by the data engineering teams at Facebook, Amazon, Airbnb, Pinterest, and Netflix. Their work sets new standards for what software and businesses can know.

Because their products have massive adoption, these teams must continuously redefine what it means to do analytics at scale. They’ve invested millions into their data architectures, and have data teams that outnumber the entire engineering departments at most companies.

We built Keen IO so that most software engineering teams could leverage the latest large-scale event data technologies without having to set up everything from scratch. But, if you’re curious about what it would be like to be a giant, continue on for a collection of architectures from the best of them.


With 93 million MAU, Netflix has no shortage of interactions to capture. As their engineering team describes in the Evolution of the Netflix Data Pipeline, they capture roughly 500 billion events per day, which translates to roughly 1.3 PB per day. At peak hours, they’ll record 8 million events per second. They employ over 100 people as data engineers or analysts.

Here’s a simplified view of their data architecture from the aforementioned post, showing Apache Kafka, Elastic Search, AWS S3, Apache Spark, Apache Hadoop, and EMR as major components.

Source: Evolution of Netflix Data Pipeline


With over 1B active users, Facebook has one of the largest data warehouses in the world, storing more than 300 petabytes. The data is used for a wide range of applications, from traditional batch processing to graph analytics, machine learning, and real-time interactive analytics.

In order to do interactive querying at scale, Facebook engineering invented Presto, a custom distributed SQL query engine optimized for ad-hoc analysis. It’s used by over a thousand employees, who run more than 30,000 queries daily across a variety of pluggable backend data stores like Hive, HBase, and Scribe.


Airbnb supports over 100M users browsing over 2M listings, and their ability to intelligently make new travel suggestions to those users is critical to their growth. Their team runs an amazing blog AirbnbEng where they recently wrote about Data Infrastructure at Airbnb last year.

At a meetup we hosted last year, “Building a World-Class Analytics Team”, Elena Grewal, a Data Science Manager at Airbnb, mentioned that they had already scaled Airbnb’s data team to 30+ engineers. That’s a $5M+ annual investment on headcount alone.

Keen IO

Keen IO is an event data platform that my team built. It provides big data infrastructure as a service to thousands of companies. With APIs for capturing, analyzing, streaming, and embedding event data, we make it relatively easy for any developer to run world-class event data architecture, without having to staff a huge team and build a bunch of infrastructure. Our customers capture billions of events and query trillions of data points daily.

Although a typical developer using Keen would never need to know what’s happening behind the scenes when they send an event or run a query, here’s what the architecture looks like that processes their requests.

Keen IO Event Data Platform

On the top row (the ingestion side), load balancers handle billions of incoming post requests as events stream in from apps, web sites, connected devices, servers, billing systems, etc. Events are validated, queued, and optionally enriched with additional metadata like IP-to-geo lookups. This all happens within seconds.

Once safely stored in Apache Cassandra, event data is available for querying via a REST API. Our architecture (via technologies like Apache Storm, DynamoDB, Redis, and AWS lambda), supports various querying needs from real-time data exploration on the raw incoming data, to cached queries which can be instantly loaded in applications and customer-facing reports.


Pinterest serves over 100M MAU doing over 10B+ pageviews per month. As of 2015, they had scaled their data team to over 250 engineers. Their infrastructure relies heavily on Apache Kafka, Storm, Hadoop, HBase, and Redshift.

Pinterest Data Architecture Overview

Not only does the Pinterest team need to keep track of enormous amounts of data related to Pinterest’s customer base. Like any social platform, they also need to provide detailed analytics to their ad buyers. Tongbo Huang wrote “Behind the Pins: Building Analytics at Pinterest” about their work revamping their analytics stack to meet that need. Here’s how they used Apache Kafka, AWS S3, and HBase to do it:

Data Architecture for Pinterest Analytics for Businesses
User View of Pinterest Analytics for Businesses

Twitter / Crashlytics

In Handling 5 Billions Sessions Per Day — in Real Time, Ed Solovey describes some of the architecture built by the Crashlytics Answers team to handle billions of daily mobile device events.

Event Reception
Batch Computation
Speed Computation
Combined View

Thank You

Thank you to the collaborative data engineering community who continue to not only invent new data technology, but to open source it and write about their learnings. Our work wouldn’t be possible without the foundational work of so many engineering teams who have come before us. Nor would it be possible without those who continue to collaborate with us day in and day out. Comments and feedback welcome on this post.

Special thanks to the authors and architects of the posts mentioned above: Steven Wu at Netflix, Martin Traverso at Facebook Presto, AirbnbEng, Pinterest Engineering, and Ed Solovey at Crashlytics Answers.

Thanks also to editors Terry Horner, Dan Kador, Manu Mahajan, and Ryan Spraetz .

Building an Empire on Event Data

Photo by Joshua K Jackson

Facebook, Google, Amazon, and Netflix have built their businesses on event data. They’ve invested hundreds of millions behind data scientists and engineers, all to help them get to a deep understanding and analysis of the actions their users or customers take, to inform decisions all across their businesses.

Other companies hoping to compete in a space where event data is crucial to their success must find a way to mirror the capabilities of the market leaders with far fewer resources. They’re starting to do that with event data platforms like Keen IO.

What does “Event Data” mean?

Event data isn’t like its older counterpart, entity data, which describes objects and is stored in tables. Event data describes actions, and its structure allows many rich attributes to be recorded about the state of something at a particular point in time.

Every time someone loads a webpage, clicks an ad, pauses a song, updates a profile, or even takes a step into a retail location, their actions can be tracked and analyzed. These events span so many channels and so many types of interactions that they paint an extremely detailed picture of what captivates customers.

Event data is sufficiently unique that it demands a specialized approach, specialized architecture, and specialized access patterns.

In the early days of data analysis, it took huge teams of data scientists and specialized data engineers to process event data for companies the size of Walmart. Now, however, even a single developer can capture billions of detailed interactions and begin running queries in seconds, accessing the data programmatically and in real time. This makes it possible to build intelligent apps and services that use insights from event data, to personalize the user experience, and display information dynamically.

One Major Challenge, but Many Rewards

A few industry giants have been able to build event data powerhouses because of the incredible access they have to talent. They hire expensive, specialized teams who build their own home-grown technology stacks. In many cases, companies like Facebook end up inventing their own distributed systems technologies to handle emergent data needs.

Most other companies lack this endless flow of resources. They can’t afford to build the infrastructure and acquire the headcount needed to maintain it. Even those that have the capital are struggling against a massive shortage of talent for roles in data infrastructure and data science. New candidates won’t materialize fast enough to build and support the world-class data capabilities every company wishes they had.

However, capturing event data is extremely important. It lets companies build a new class of products and experiences, and identify patterns that otherwise would be impossible to see. It also lets them build apps that perform far more advanced, programmatic analysis, and make real-time decisions on how to engage the user — suggesting the right product, showcasing the right content, and asking for the right actions.

Just as organizations migrated en masse from on-premise servers to cloud hosting and storage in the mid-2000s, many companies are starting to adopt data platforms like Keen so they can compete in areas they couldn’t build in-house.

Keen IO: The Event Data Platform

We built Keen to let customers use our platform as the foundation for all of the powerful event data capabilities they want to build. By leaving the analytics infrastructure to Keen, any developer or company can wield the power of event data extremely well, without a specialized background in data engineering.

We help over 3,500 customers crunch trillions of data points every day, gathering data with our APIs and storing it for them to analyze with programmatic queries to fuel any metrics or tools they need to build. Once they adopt Keen, customers report huge savings in engineering and analyst resources, far better accuracy in measuring crucial app and user analytics, and the ability to infuse real-time analytics into every part of their operations.

Event data is increasingly interwoven into software. Photo by Carlos Muza.

Event Data in Action

When companies build on an event data platform, they can accelerate their businesses in ways that weren’t possible before.

  • They anticipate what users will need and take the product in the right direction, by using event data to improve the user experience and test changes to the application or hardware.
  • They show users extremely relevant content and demand higher ad revenue from top advertisers because of the engagement metrics they derive from event data.
  • They provide deep reporting and quantify ROI for their customers — when SaaS products can provide reliable and accurate reporting, they deepen customer trust, engagement, and spend.

Can Event Data Bring a Richer Future?

The ability for companies to operate like they have Facebook’s data infrastructure is a game-changer. They can scale faster, make better decisions, and create smarter, helpful products people don’t even know they need yet. Event data will inevitably shape the way almost every company grows, and those who don’t embrace it will likely lose out to the ones who do.

Comments welcome, or start a conversion with us over at Keen IO.

Data Science Cultures: Archaeology vs. Astronomy

Water’s Early Journey in a Solar System (Artist Concept) Source: NASA/JPL-Caltech

I’ve been writing a lot about intentionality in data science, about how having a sense of history (present and future), can be incredibly powerful for any enterprise.

Think about how archaeologists use data to seek the truth, as compared with how astronomers do it.

Clay pot remnants. Credit: Wessex Archaeology

Archaeology starts with digging. It’s all about studying the data that’s buried in the system (i.e. the fossil record), which means studying things that probably weren’t put there intentionally (depending on your belief system). Without a time machine, it’s impossible to change the structure of the record, to apply intention to the signal, so we do the best we can with what we’ve got: we mine through the accidental signals, discarding the (literal) mountains of noise, in an effort to find the truth about history. Perhaps as should be expected, this effort is expensive and leads to mixed results.

On the other hand, Astronomy is a very different field. Astronomy starts way earlier than digging — it starts with planting. At instrumentation-time, astronomers can point the telescopes where they want, measuring and recording the signals they want. Unlike archaeologists, astronomers have the ability to design the record and its structure, to choose the signals with intention. Doing this intentionally sets up the data record to yield the discoveries they already know they want to make, but also to be somewhat future-proof (which means it can yield unpredicted, emergent discoveries, to be harvested down the road — often by different people than the planters).

The spacecraft Dawn’s spiral descent toward dwarf planet Ceres. Credit: NASA/JPL

Now let’s compare the results of these two fields.

Astronomy (with its cousin Astrophysics) has taught us amazing lessons, things about the motion of the galaxy, the origin of the universe, and the underlying physical principles of multi-dimensional reality.

Turning our gaze to the stars, we learn about the earth. That’s pretty impressive.

Meanwhile, as of last year, Archaeology is still struggling to figure out how many years ago Homo sapiens emerged. And they can’t seem to agree on it, even though all the data is right under our noses. This isn’t because they’re incompetent (some of the best pattern-seeking humans in all of science work in archaeology), but rather because the data sucks.

Clearly, one of these truth-seeking disciplines is a lot more powerful than the other, and at Keen IO, we contend this is because they can control the data model. Data modeling is powerful indeed.

Inspection of any kind — be it human introspection or scientific inquiry — is more powerful when you can apply a variety of observational frameworks, choosing the best of them.