It’s been a rough couple of months for Keen IO customers — if you’ve been tracking Keen’s Status Page recently, you’ve probably noticed that we’ve reported a large number of incidents that have negatively impacted query performance for all of our customers.
We’ve been posting follow-up information for individual incidents on the Status Page, but we wanted to take some time to:
- Acknowledge that there has been a pattern of frequent degraded query performance issues which is not acceptable to us
- Provide you with more insight into why query performance has been suffering lately
- Explain what we are doing to improve query performance
Why Has Query Performance Been So Variable?
Unanticipated Query Patterns
We deliberately designed Keen to be a very flexible analytics solution, and provided a set of APIs to Collect, Analyze, and Visualize data. Given the flexibility of the API and the growth of Keen IO, we’ve started to see customers using Keen and running Keen analysis queries that we didn’t anticipate and therefore we don’t handle gracefully.
We’ve had a streak of bad luck with our hardware and physical infrastructure and haven’t done a great job managing their impact on our service: a network outage with our hosting provider, a hardware failure, a Cassandra node failure, and a misconfigured setting on our front-end API servers.
The use of Keen IO by our customers has grown tremendously over the last several months, and we’ve run into some issues at scale that haven’t presented themselves before. A significant increase in events and queries has caused the platform to misbehave in different and unanticipated ways.
The combination of these issues has put a lot of strain on our platform, and query performance has suffered as a result.
That said, Keen should be able to handle increased customer volume, accommodate unanticipated query patterns, and keep running even when hardware/infrastructure fails. As a platform provider, it’s our job to anticipate and plan for these issues so that our customers don’t have to. We know that we need to do better. And we will!
What We’re Doing to Improve Query Performance
In the Short Term
We’ve started providing more prescriptive guidance on how to optimize queries by publishing a Query Performance Guide. We’re also working on a new version of our documentation to ensure that customers know how to optimally use Keen across the board.
We’re also currently rolling out some new internal tools to our infrastructure that will provide us much better visibility into customer query patterns, allowing us to implement better quality of service for our Analytics API and helping us harden our platform against unanticipated spikes in query volume.
Finally, we’re actively working on revising our query rate limits, and will be providing more detail on this very soon.
In addition to those short-term tactical fixes, we’re currently implementing low-level changes to our Cassandra infrastructure to better identify, isolate, and remove performance bottlenecks in our platform.
It’s important to note that rolling out fixes while maintaining an active platform for customers will not be an overnight process. The “timeframe to resolution” on this should be thought of in weeks, not days. We’re shoring up our infrastructure now to harden the platform — not just for today’s requirements, but to keep us up and running well into the future.
We are really, really sorry for the hassle and inconvenience you’ve had to put up with lately. “Scaling is hard” is an explanation, not an excuse. It’s our job to ensure that what we’ve built can stand up and perform well for all of our customers, regardless of what load we put on it, and we’re committed to keep working and improving so we can provide all of our customers a rock-solid analytics platform that can be depended on.
We’ve hit some bumps in the road. We know how frustrating it is for you and your customers when things aren’t working, and we’re working incredibly hard to make things right. We’re confident things will get better and we hope you all stick with us while we’re making these improvements.
If you have any questions, please reach out to us anytime at firstname.lastname@example.org.