Analysing Data with PostgreSQL and Grafana Header Image

A large number of threats can affect an organisation and good practice is to keep a record of the threat and its characteristics. This can result in a huge amount of information; processing and analysing this information can be time consuming. One method to make the analysis of a high volume of data easier is by using a SQL database. A SQL database allows you to query large datasets so you can extract meaningful information far more easily than using other methods, such as spreadsheets.

For this example I’ve used PostgreSQL, Postico and Grafana. PostgreSQL (sometimes just Postgres) is our main database management system while Postico sits on top as a slightly more user friendly client. Grafana is a visual dashboard that will allow us to present the results of our queries in an attractive way, allowing even non-technical people to utilise it.

I’ve used macOS so the instructions will be focused around that, but after installation most of the instructions are platform agnostic so will broadly be applicable to most operating systems. Both Postgres and Grafana can be installed using Homebrew via Terminal so fire up a terminal window and type

$ brew install grafana

and

$ brew install postgresql

to install both packages to your machine. Postico is available directly from the developers website which should be linked above and follows the standard method of being dragged to the applications folder.

You will also need your dataset which you might have already, if not there will be data all over the internet you could use for the exercise. One massive source of CSV file data is the website of the Office for National Statistics. However, for the exercise try to have data with a column that would be set to a time formatting.

An optional download for the exercise is also Microsoft’s Visual Studio Code. Everything in this exercise could be done with a standard text editor, but VS Code does have a few helpful shortcuts along with support for a massive number of languages, including SQL.

Getting Started

With everything downloaded and installed, we’ll need to set it all up. In your application folder you will have a Postgres app, click on this and you’ll be shown a very minimal window with an “Initialize” button located on the right. Click this button and it will create the SQL server on your Mac at the default locations which you may alter in future if you use Postgres more frequently; but we won’t during this exercise. With Postgres’ setup complete open Postico to make sure it works correctly, but this should open your database with little complaint.

Grafana setup is a little less obvious. First start the process using Terminal by typing

$ brew services start grafana

This may take a few minutes.

With the Grafana process started open your web browser of choice and type localhost:3000. This should load the Grafana login screen; the details of which are stored within its configuration files and can be changed to your choice of username and password. For the exercise we’ll keep them as the default of username: admin and password: admin. Once you’ve logged in we’ll need to add Postgres as a data source so click on the big cog/gear to the left of the screen and navigate to the data source tab.

Click “+ Add a data source” and change the “Type” to PostgreSQL; eliminating a number of options. The name isn’t important so call it whatever is relevant to you. Host will need to be “localhost:” and whatever port your database is set to communicate on; I never changed this so for me it was 5432 but if that port number isn’t working navigate to the Postgres app and click on “Server Settings…” and you should see your Postgres port number. Your database name will also have to match where you will put your data and what the name is in Postgres; if you have followed the steps so far this will likely be your user shortname. Last of all change “SSL Mode” to disable and hit save and test which should result in a success message.

<figcaption id="caption-attachment-239" class="wp-caption-text">Screenshot of the PostgreSQL Data Source settings on my Mac</figcaption></figure>

Creating the table

With your CSV file to hand, open it in VS Code or a text editor of choice. At the top of the CSV file your headers should be listed with a comma separating them all (header, header two, header three, e.t.c…). Copy these header details into a new VS Code file and change the type of code to SQL to provide syntax highlighting (the button is located in the bottom right and within a new document will be set to “Plain Text”). With your text copied into the new document will now need to be formatted in the following way:

SQL Table Creation Query <figcaption id="caption-attachment-241" class="wp-caption-text">The CREATE TABLE SQL query</figcaption></figure>

This first query doesn’t yet import our data but creates the table that our data will be copied into. I’ll run through each aspect of the query so it becomes clear what it does.

The first line “CREATE TABLE IF NOT EXISTS newtable” is a literal command. Postgres will check if the table “newtable” exists, if doesn’t it will create it, if it does exist it will proceed no further. Ensure you finish the line with the start bracket.

On a new line with an indent we have our first column with its data type. “header” is plain and simple text so can consist of something like “dog food” or “We need to order 54 more boxes of dog food”.

Notice with our second column header we have the underscore (“_”) between the words and also applied to the space between any word in any of our other headers. Just leaving a space and not including the underscore will cause a failure in the creation of our table so ensure you include these instead of any of your spaces.

Our description and status headers have also been encased in double quotes. This is because these words are already reserved by SQL itself and so will be interpreted as a command, causing a failure in the creation of the table.

For our “header_number” we’ve assigned it the type of INT which is short for integer; which according to a document on the Postgres website is a number “-2147483648 to +2147483647”. Numbers can also be assigned to other numeric types such as FLOAT. Each of these assignments will have their own characteristics but one of the biggest issues to keep in mind is that the more complex the assignment, the greater amount of memory it will use; if you have a large dataset this can have a huge impact on the speed of your database. The document I’ve linked to has a brief overview of each number type, but for this example the two I’ve used are INT and FLOAT. The key difference between the two in this case is that INT cannot have a decimal point, but FLOAT can.

Another datatype used is INET. Postgres can use this type to natively understand IPv4 addresses, allowing them to be used and understood easily for computation as IP addresses. TIMESTAMP will identify the data as a time record, but make sure you’re aware that the data will have to be in a certain format to be understood.

Finally, as each record will need to have a unique number assigned to it due to the nature of relational databases we include a line for each record to be given a primary key. This is created with out ID SERIAL PRIMARY KEY. Theres no need for a comma after our primary key line so end with a close bracket and semi-colon. Do ensure that, as above, you’ve indented all your header names.

Copy this text to your clipboard, navigate to Postico and connect to your preferred database if you have more than one (I’ve used my shortname database titled matt). Once connected choose SQL Query on the left, paste your copied text and then click execute statement. Your table should then be created. Once complete refresh your database information in Postico with cmd + r and you should end up with a table entry on the left. Clicking on it should show a table with your headers created, but no data within the table.

Importing your data

Initially when shown this I was shown a method of importing my data using a script; this exercise will focus on using the terminal command that allows the data from a singular file to be imported.

$ psql -d matt -c “COPY newtable(header, header_two, header_number, header_ip, header_time, \”description\”, header_second_time, \”status\”, latitude) FROM ‘/Users/matt/Documents/CSVfilefolder/csvfile.csv’ delimiter ‘,’ CSV HEADER;”

Again with this command you’ll need your column headers in the same format as the table creation except the reserved word headers will need \” instead of quotes alone.

This should then import your data which will populate the table within Postgres/Positco (ensure you refresh Postico to see it).

[Optional] – Importing data from multiple CSV files using a script

An alternate way, and the way I was originally shown, is to import the data from a number of files (in my case around 340 files) using a script. This method however will rely on your data being formatted in a uniform fashion across all your CSV files. You can either create your script in a basic text editor, or use VS Code and choose “Shell Script” as the type of file.

The script I used is as follows:

for csvfile in *.csv

psql -d matt -c “COPY global(header, header_two, header_number, header_ip, header_time, \”description\”, header_second_time, \”status\”, latitude) FROM ‘/Users/matt/Your/File/Directory/$csvfile’ delimiter ‘,’ CSV HEADER;”

done

You’ll see our terminal command mentioned previously (psql -d…) is largely the same but the file name at the end of the file path is $csvfile signifying what ever csvfile our script is currently working on.

The for statement encasing our command is saying for EACH csvfile in *.csv do the following command. the * character is our wild card character so is simply saying that star character could be anything (ourcsvfile.csv, anycsvfile.csv, alargecsvfile.csv e.t.c.). Once your script is completed save the script file in the same directory as your CSV files. Navigate to that directory in terminal and give your “script.sh” file executable permissions by using the terminal command

$ chmod +x script.sh

Then run the script using

$ script.sh

Depending on your data size this may take some time; when complete your data should have been imported to your Postgres database.

Presenting your data in Grafana

With the data imported into your table, you’ll now be able to show this data within Grafana for an easier way to analyse and present your data. Within Grafana create a new dashboard by clicking the + button in the left side panel. For the first panel choose single stat which will show a single number extracted from your data. With this panel added click on the disclosure triangle near the panel title and choose edit. Navigate to the “Metrics” tab where we’ll enter our SQL query that will produce our stat.

SELECT COUNT(header_two) FROM newtable WHERE $__timeFilter(header_time);

Breaking this down we can see that our SELECT statement is selecting the COUNT of our header_two column which will return the amount of rows we have in our newtable (as we have said FROM our newtable). We could leave the query at that, but this number would remain a static number which may not be all that useful. Adding the WHERE statement allows the stat to take into account the attached time recorded. This means if we want to return the count between two points of time (Grafana has a time frame selector in the top right) it will only return the count for that period.

Another panel to add is the graph panel, add a graph panel and again navigate to its “Metrics” tab.

SELECT date_trunc(‘hour’, header_time) AS time, COUNT(date_trunc(‘hour’, header_time)) AS “Total”

FROM newfile

WHERE $__timeFilter(header_time)

GROUP BY time;

This time our query within SELECT is returning the header_time column with superfluous data removed as well as the number of entries within header_time. Our as statement simply changes the name this is outputted as to total instead of header_time. Our FROM directs the query to pull its data from our newfile table, WHERE allows us to add conditional statements (e.g. all data must equal 1). For our WHERE statement above includes a macro that is part of Grafana “$_timeFilter”, running this statement as is through Postgres/Postico will return an error. For the GROUP BY statement we are asking our data to be grouped by time which was previously created within “date_trunc(‘hour’,header_time) AS time”.

Our two panel queries complete you should have a (very) basic dashboard in Grafana. I would advise also taking the time to look through the settings for each of the panels to see what works best with your data; one change that I found enhanced my dashboard was to change my graph from lines to bars. The instructions I’ve written here should work as a great starting point and allow you to expand your Grafana knowledge and if you have a browse online you’ll see some very impressive dashboards.