Time-Series Data

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Intro

It's not uncommon that you want to collect time-series data. What is time-series data? It's simply data organized by time. It might be the number of clicks you get each day, or each hour, or every 5 minutes, or the number of sales made, or the current CTR or whatever.

OpenTSDB

OpenTSDB promises to be a good way of handling this data. It's built on hadoop, which means it is not as trivial as a set of tables in an RDBMS. That's not as bad as you think, since SQL is really not ideal for this kind of data.

Relational Model

People come up with a variety of schemes. Let me list them.

Direct Table

This is something like this:

item_metrics
------------
id (unique PK)
item_id (FK to items)
datetime
timezone
interval (NOT in seconds)
metric1
metric2
...

This is fairly useful, but it suffers from a number of problems.

  1. You need to add columns when you come up with new metrics.
  2. If you only want some of the metrics, the database usually has to work to retrieve all the data anyway, giving you no savings on the backend.
  3. When you go to display the data, it's more common for graphing software to want data transposed, IE, a list of metric name - values rather than a list of metric1, metric2, etc... pairs.

You run into another problems as well. What happens when you DB can no longer handle the load of recording and reporting metrics? You'll have to come up with some kind of sharding system. At first, you'll partition the table by datetime. This is reasonable. Then you'll want to break the table so that all the metrics are not stored together. In the extreme case, you're going to end up with a number of tables with the columns (unique id, item id, date time, value), one for each metric.

This leads to the second way of recording metrics, the meta-table table.

Meta-Table Table

Meta-tables are tables that store the information necessary to make a table. It's a fairly mindbending thought experiment for those who haven't encountered it before.

Basically, a table is just a series of rows. Each row has a series of columns. If you created a table that records the row number and the column name and column value, voila! You're storing a table in a table.

In metrics, it looks something like this.