Here at AlltimePower, we use Chartio as one of our primary business intelligence tools. One of the features of Chartio I've grown to really like are their "Data Stores". I'm sure there are a variety of uses for these, but we use them as a kind of buffer between our production data schema and the data schema we want to provide to business users. Another way to put it is that we use it as a simple ETL pipeline, all hosted and run by Chartio. Of course, this approach won't work for every dataset. But, if you don't yet have too much data for it, I think it's a really simple ETL solution. And, you might be surprised by how much data it can handle. If you don't yet have millions of users, there's a good chance it will work just fine.
Let's walk through a simple example. Say your production database has a
users table and a
posts table. You can use data stores to add a computed column like
post_count to your
users table, or to pull in user attributes to your
posts table, like email address.
Now when exploring your data, you can quicky bring these new fields into your charts using Interactive Mode without having to think through any joins.
As an added benefit, any charts or queries that use the data store as a source won't ever hit your production database. Instead, Chartio runs the queries above at specified intervals and saves the results. Subsequent querying hits this saved copy. Depending on how frequently you refresh your data store, this could help ensure you don't choke a production database with BI queries. For example, you could have your data store refresh once a day during a known low-usage period. In our case, our "production" database is already a read replica, so that's not a huge concern.
This also brings up a related drawback, though, which is that queries on data stores will not be 100% up-to-date. You can schedule up to hourly refreshes and/or manually refresh tables as needed.
Another warning: you may run into some weird database translation errors, since data stores are a Chartio abstraction and don't strictly adhere to the standards of a given database.
For example, we use PostgreSQL, and Chartio seems to translate Postgres
date fields into a
text data type. However, this translation goes a bit haywire on dates that are too far in the future. For example, it's possible for me to create a row in Postgres with a date of
Jan 1, 30000. When I try to create a stored table that includes that row, though, I've seen two different things:
- The stored table is created, but it's missing that row
- The stored table fails to refresh, with the error
Strings with NULL characters are not supported
In our case, this was just some bad data (we aren't running 30,000 year projections or anything), and the fix was simply to update that data and add validations to prevent those values going forward. Still, it took some digging to get to the bottom of this, as it was not obvious from the error behavior or message.
This isn't really meant to be a complaint about Chartio. Creating an abstraction that works across such a wide variety of databases is difficult stuff. I just want to shed a bit of light on what's going on behind the scenes to highlight that the behavior of data stores isn't going to be 100% consistent with the behavior of the source database. So be careful out there!