Visualizing your Keen IO Data with Python and Bokeh

1_UyHFmhWuca_UFcb3hP2-QQIn a previous post I wrote, we created a basic example that analyzed earthquakes using the Keen Python Client with Jupyter Notebook. In this post we’re going to be looking at creating visualizations in Python using a visualization library called Bokeh.

Getting Started

To install Bokeh, run pip install bokeh in your shell. After Bokeh has finished installing, open up Jupyter Notebook by running jupyter notebook. In the first cell, you’ll need to set up a Keen Client in Python:

import keen from keen.client import KeenClient
KEEN_PROJECT_ID = "572dfdae3831443195b2f30c"
KEEN_READ_KEY = "5de7f166da2e36f6c8617347a7a729cfda6d5413db8d88d7f696b61ddaa4fe1e5cdb7d019de9bb0ac846d91e83cdac01e973585d0fba43fadf92f06a695558b890665da824a0cf6a946ac09f5746c9102d228a1165323fdd0c52c92b80e78eca"
client = KeenClient(
    project_id=KEEN_PROJECT_ID,
    read_key=KEEN_READ_KEY
)

We’ll need to run a query similar to the one we used last time. In the next cell, make a count_unique query on the earthquakes collection with a daily type interval. This will return a dictionary containing the number of earthquakes per day.

earthquakes_by_day = client.count_unique(“earthquakes”,
    timeframe={
        “start”: arrow.get(2017, 6, 12).format(),
        “end”: arrow.get(2017, 7, 12).format()
    },
    target_property=”id”,
    interval=”daily”
)
Output of the “count_unique” query

Let’s import Bokeh so we can visualize earthquakes_by_day. Run the code below in a new cell.

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
output_notebook()

The first line imports figure and show, two functions that will let us plot our data. The next line imports a function called output_notebook. We need to run this method before we start plotting data so our plots are drawn below our notebook cells.

Plot Our Data

A line graph would be a great choice to plot this data. In order to plot this, we need to pull out the number of earthquakes for a timeframe and the corresponding date.

y = list(map(lambda row: row[“value”], earthquakes_by_day))
x = list(map(lambda row: arrow.get(row[“timeframe”][“start”]).datetime.replace(tzinfo=None), earthquakes_by_day))
# Now we can plot! 
# `figure` initializes the chart object
pl = figure(title=”Number of Earthquakes per Day”, x_axis_type=”datetime”)
# `line` takes lists of the x and y values and plots.
pl.line(x, y)
show(pl)
 1_-l0IIA5bXrOKT0VE3zASdA.png
Line graph generated by Bokeh

In the code sample above, y is a list containing the counts per day, x a list of the datetime values, and figure initializes the chart object. The linemethod takes x and y and plots those values into a line. show(pl) is the method that actually draws the chart in our notebook.

Customize Our Chart

Bokeh even lets us add tooltips to our charts! We can import HoverTool by calling from bokeh.models import HoverTool in a new cell and pass an instance of HoverTool to our figure object.

from bokeh.models import HoverTool
pl = figure(title=”Number of Earthquakes per Day”,
    x_axis_type=”datetime”)
pl.line(x, y)
hover = HoverTool(
    tooltips=[
        (“Date”, “@x{%F}”),
        (“Count”, “@y{int}”)
    ],
    formatters={“x”: “datetime”},
    mode=’vline’
)
pl.add_tools(hover)
show(pl)
1_ZK1qXd6DHijn42u17114Ug
Tooltips in our graph!

We have to do a little bit of configuration in HoverTool to make sure the tooltips displayed the correct date and didn’t display any values in between the data points (try removing the tooltips option and see what’s displayed). You can check out the Bokeh docs on HoverTool if you want the tooltips to look different.

Notice that there are a lot of earthquakes on 6/17! This might be an interesting place to dive deeper.

We pulled data from a Keen project using Python, drew a line graph for a month’s worth of data, and added interactivity to the chart we drew. The code for this example is available on GitHub. Try playing around with it yourself! If you want to use this example to visualize your own event data, sign up for your own Keen account and read how to get started!

Next time, we’ll plot the earthquakes that happened in that time period using Basemap and see if we can find anything interesting.


Analyzing your Keen IO data with Jupyter Notebooks

At Keen IO, we provide the Explorer to help our customers quickly analyze and visualize their data. However, sometimes we can’t analyze or visualize data the way we want to using the Explorer. When I run into this problem when I’m analyzing Keen IO data, I turn to Python.

One of the tools I use regularly to analyze our own data is Jupyter Notebook, a browser application that allows data scientists to run Python code in an interactive environment and create inline visualizations. I use this tool often because I can quickly visualize and tweak multiple queries at once using Python, and I can also easily share those results with others. If my teammates need a slightly different analysis, they can easily copy my example and modify it to their own needs. Additionally, Jupyter Notebook is a great way to document how we’re analyzing our data.

Over the next few weeks, I’m going to share some of the ways I use Python and Jupyter Notebook to analyze and visualize my data. In this post, we’re going focus on setting up Jupyter Notebook and making queries with Keen IO’s Python client.

Setup

Follow the installation instructions on the Jupyter Notebook. After you’re done installing Jupyter Notebook, use pip to install Keen IO Python client.

pip install keen

Getting Data

We’re going to be looking at earthquake data that I’ve been storing in a Keen IO project. Insert jupyter notebook in your command line, run it, and create a new Notebook project. Set up a client object in Python so we can query data using the project id and read key provided in the example below.

import keen from keen.client import KeenClient
KEEN_PROJECT_ID = "572dfdae3831443195b2f30c"
KEEN_READ_KEY = "5de7f166da2e36f6c8617347a7a729cfda6d5413db8d88d7f696b61ddaa4fe1e5cdb7d019de9bb0ac846d91e83cdac01e973585d0fba43fadf92f06a695558b890665da824a0cf6a946ac09f5746c9102d228a1165323fdd0c52c92b80e78eca"
client = KeenClient(
    project_id=KEEN_PROJECT_ID,
    read_key=KEEN_READ_KEY
)

Run a simple query to test if everything’s working. In the example below, we’re getting number of earthquakes from the time range we’ll be working with (October to February).

total_earthquakes = client.count_unique("earthquakes",
    timeframe={
        "start": "2016-10-01 00:00:00+00:00",
        "end": "2017-2-28 00:00:00+00:00"
    },
    target_property="id"
)
# 43782

That’s it! We can run all the types of queries in our Python environment. For instance, if we wanted the number of earthquakes by day, all we need to do is add a line to the query above:

earthquakes_by_day = client.count_unique("earthquakes",
    timeframe={
        "start": "2016-10-01 00:00:00+00:00",
        "end": "2017-2-28 00:00:00+00:00"
    },
    target_property="id",
    interval="daily"
)
#print(earthquakes_by_day)

Because we’re just dealing with Python objects, we can just use Python code to quickly get us answers about our data, like minimum and maximum number of earthquakes per day.

print(max(earthquakes_by_day, key=lambda x: x["value"]))
print(min(earthquakes_by_day, key=lambda x: x["value"]))

We’re only scratching the surface of what we can do with Python and Keen IO. You can clone this project and analyze your own Keen IO datasets! Next time, we’ll talk about how to visualize the data we’re getting back from Keen using Matplotlib.