Hello everyone, it has been a while since our last post. The Kickstarter campaign has proven to be quite involving, but things start to get back to normal and new posts will keep appearing.
As now we’ve an official product/platform, there’s a plan to start creating practical posts, complemented by video blogs to show real use cases step-by-step. Stay tuned!
Moving on, this post will present a very promising solution to store, presenting and analyse time series data. Perfect to save all the IoT data produced.
Time Series Data
Before anything else, it’s important to understand what a Time Series Data is and how/where it can be used. Well, as the name say all data has time as primary key and measurements collected from sensors and actuators are the perfect use case.
Let’s have a look on a Temperature measurement, the “Hello World” of all sensors! The data itself is very simple, it can be, for example, 21.7 (representing Celsius degrees). Although this single value would be enough to “drive” a fan or heater on a standalone project, the temperature value by itself is useless on a bigger solution where multiple sensors and actuators interact with each other.
Also, without additional context, the data can’t be used to produce statistics or provide insights. In this case it’s necessary to append extra information to each measurement, like timestamp and tags. The timestamp is pretty standard, just the time when the temperature reading was collected. Now the tags are the extra bits which will help to filter and enhance the measurement, for example, the node name and location where the information is coming from, the measurement unit utilized, the sensor type, etc.
On a standard SQL database, this information would be store in multiple tables, with a few constrains for data integrity. But SQL databases, although very power full, they have being designed for CRUD operations (Create, Read, Update and Delete), resulting on a poor performance.
Here the main goal is “write once, read many”. At the end, after a sensor is read and the information is collected, there’s no need to update it, making the Time Series and Non-SQL databases perfect candidates. Providing incredible speed, horizontal scaling and great schema flexibility!
Note that a fully implemented solution might not be exclusively based on non-SQL storage. It’s very likely both models will be implemented together, as some parts of the system requiring full CRUD operations, like a table containing users and password or the node inventory.
*id_node is a Foreign Key to Node.id
Measurement for Temperature
temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.03 1463824801000000000
temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.20 1463825101000000000
temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.25 1463825401000000000
temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.40 1463825701000000000
Most of the non-SQL DBs are schema-free. All information is stored on a flat model, repeating details, in this case TAGs, for every entry. This might sound silly because more storage is required, but on the other hand the performance and flexibility are greater. Some of the non-SQL databases might call each record a “Document”, made of multiple key=value pairs. For InfluxDB, they’re called just measurements and the timestamp is always present in nanoseconds.
This post is not meant to be a tutorial about how to install and configure InfluxDB. All information can be found here: https://docs.influxdata.com/influxdb. Some parts of the documentation could be a bit more complete and better organized, but this is a quite new project and improvements are very frequent.
The easiest way to install InfluxDB is using a pre-package version. For example, for CentOS, just download the latest RPM and run:
wget https://dl.influxdata.com/influxdb/releases/influxdb-0.13.0.x86_64.rpm sudo yum localinstall influxdb-0.13.0.x86_64.rpm sudo service influxdb start
After the DB is installed and Running it’s time to create a DB and insert some data:
curl -POST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE sensor" curl -i -XPOST 'http://localhost:8086/write?db=sensor' --data-binary 'temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.03 1463824801000000000' curl -i -XPOST 'http://localhost:8086/write?db=sensor' --data-binary 'temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.20 1463825101000000000' curl -i -XPOST 'http://localhost:8086/write?db=sensor' --data-binary 'temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.25 1463825401000000000' curl -i -XPOST 'http://localhost:8086/write?db=sensor' --data-binary 'temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.40 1463825701000000000'
In a nutshell, to insert data just need to post one entry per line using standard HTTP protocol. The first argument is the “measurement name”, followed by tags in key=value pairs, the reading value and the timestamp in nanoseconds.
The measurement value can be a single or multiple key=value, but it’s important to have the most important one as “value=xxx”. For example, to add “Time taken” the POST line would looks like:
curl -i -XPOST 'http://localhost:8086/write?db=sensor' --data-binary 'temperature,node=MyFridge,location=2ndShelf,unit=Celcius value=5.03,time_takem=199 1463824801000000000'
There are a few other options to insert data to InfluxDB, all described on its documentation.
Now with some data, lets have a quick look into the DB:
influx --database sensor
This will bring the CLI console:
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring. Connected to http://localhost:8086 version 0.13.0 InfluxDB shell version: 0.13.0 >
In the console just try a few commands: > show measurements name: measurements ------------------ name temperature
> show tag keys name: temperature ----------------- tagKey location node unit
> select * from temperature name: temperature ----------------- time location node unit value 1463824801000000000 2ndShelf MyFridge Celcius 5.03 1463825101000000000 2ndShelf MyFridge Celcius 5.2 1463825401000000000 2ndShelf MyFridge Celcius 5.25 1463825701000000000 2ndShelf MyFridge Celcius 5.4
The data should be there. Just spend a bit more time playing with the data and getting familiar with the SQL-like commands.
Grafana is a plotting tool, which produce very professional looking graphs and charts based on multiple data sources, including InfluxDB. Again this post will not get into all features for this product, for all documentation access http://docs.grafana.org/.
Similar to InfluxDB, just use the package version to install it, here the example for CentOS:
sudo yum install https://grafanarel.s3.amazonaws.com/builds/grafana-3.0.2-1463383025.x86_64.rpm sudo service grafana-server start
The Web GUI interface will be available on http://localhost:3000. User the default user and password: admin/admin.
The first step is to add data sources, in this case for the InfluxDB Database created earlier. Just follow the screenshots:
A way to test the end-point URL is trying to access, for example, the URL: http://localhost:8086/query?db=sensor&epoch=ms&p=&q=SELECT+*+FROM+temperature. If that works a JSON result containing the Temperature Readings should appear.
Update 1: On the HTTP Settings, it’s possible to define “Access = Proxy”. This will force the Grafana to hit InfluxDB’s URL instead of the client browser hitting it directly. This seems to be a nice workaround in case there’s a Firewall or reverse proxy preventing direct access.
To query any data via Grafana, a Dashboard is required, just create one following the screenshot:
Grafana offers two query methods, interactive and raw. For 95% of the cases the interactive will be easier and offer all options required. For the other 5% the raw method allows writing of the SQL itself. To debug all queries, keep eye on the InfluxDB log:
tail -F /var/log/influxdb/influxd.log
The log will show every request done via Grafana to InfluxDB and help to fixing any issue.
Now, just follow a the screenshots below to display the data:
InfluxDB + Grafana vs. Splunk
Splunk is a very powerful tool and it’s free to index up to 500Mb per day. In terms of functionality Splunk has way more features compared to InfluxDB and Grafana, especially if stats are coming from non pre-formatted messages and log files.
From a practical point of view Splunk can can be an overkill if the plan is to only ingest simple measurements. Also the learning curve for Splunk to start building some neat dashboards can be a bit stepper.
Below the same Battery Voltage measurement plotted on Splunk for the first 4 moths and the same data now plotted on Grafana up to recent days.
Now that the basics of InfluxDB and Grafana have being explained the next step is to setup a process to regularly collect data from the nodes.
The simplest way is by choosing a familiar script language and translating a data from a remote node into a HTTP Post into InfluxDB.
Advanced techniques might be used, for example, by having a Queue (like RabbitMQ) in the middle, like discussed on previous topic.
Soon a practical example will show how to implement it end-to-end using two Talk² Whisper Nodes, a Raspberry PI and a Google Cloud VM… stay tuned!