Many tools have been written on the collection side, statsd, fluentd, … and some are actually pretty cool. The concern for me is the way these tools express their analytical and storage opinions, which is done on the storage side. The data collection side isn’t an issue, if anything, its a breath of fresh air relative to what else I’ve seen.
The problem is that I want to collect very detailed metrics, store them, and then use analytical tools to perform analytics. I don’t want to pre-aggregate the data.
This puts a little higher load on the storage side, but this isn’t an issue as our FastPath manager nodes have ample processing/storage power on their own.
Its a balance.
This said, I am experimenting with pulling data from collectl dumping it into influxdb. I can ingest detailed plotfiles from it easily, and get the data in quickly. The dashboards can then extract the data, and generate nice plots. Working with Grafana and Tasseo for the moment. I played a little with influga as well.
In all the cases, the time series database is the limiting factor, though in Grafana, everything looks beautiful, as long as you adhere precisely to their opinionated view of how data should be queried (which means I need to adapt our storage of data to the presentation tool to make it work well).
I’d love to use kdb+ for the TSDB. Just need to figure out either how to plug it into influxdb backend as the storage DB, or create a graphite/carbon/ceres frontend for kdb+ that drives it. kdb+ is blisteringly fast, and purpose built for the types of analytics we want to use. If we could have Grafana and Tasseo talk directly to kdb+ that’s also an option.
Sadly I don’t have much time right now to do the research around what it would take to make this happen. Maybe in a few weeks.
Viewed 72113 times by 4960 viewers