Real-time log analytics with Matomo

I must admit, I like to know how many people visit my site. However, I hate trackers, and I won’t consider using Google Analytics which would retrieve much more private information about my users for its own account.

I really appreciate Matomo (formerly Piwik), which is open source, offers a quality interface and plenty options.

By default, Matomo uses a javascript tracker, but it also has a script to import server logs. This makes it possible to have statistics without any tracker, impossible to block and therefore reliable, with a very low load for the server.

Most of the time, these scripts work once a day, but it is possible to implement it almost in real time without the need of a rsyslog or syslog-ng setup. We will see here how to read Nginx logs in real time under Matomo.

Nginx configuration

First of all, we’re going to use a more readable and parsable json log format in nginx. In /etc/nginx.conf, we declare a new log format named matomo:

# /etc/nginx.conf

log_format matomo '{'
  '"ip": "$remote_addr",'
  '"host": "$host",'
  '"path": "$request_uri",'
  '"status": "$status",'
  '"referrer": "$http_referer",'
  '"user_agent": "$http_user_agent",'
  '"length": $bytes_sent,'
  '"generation_time_milli": $request_time,'
  '"date": "$time_iso8601"}';

Finally, in the server { } tag which contains the configuration of your site, indicate the location and format where your logs will be stored:

access_log /var/log/nginx/access.log matomo;

Finally, after restarting nginx, the logs should then start in the specified file.

Matomo log importation

Matomo comes with a small python script, very efficient, to read the logs and import them: log-analytics/import_logs.py.

We will use logtail, which allows, at each use, to read only the new lines added to the logs since the last time it has been launched. This avoids double counting.

We can then pass the result to the Python script:

/usr/sbin/logtail /var/log/nginx/access.log \
     | /usr/bin/python \
          path/to/matomo/misc/log-analytics/import_logs.py
          --url=https://<your-matomo-url> \
          --enable-http-errors \
          --enable-http-redirects \
          --log-format-name=nginx_json -

To be recognized, the host must be correctly filled in the Matomo parameters. If you have several sites, it allows to distribute them automatically.

This code can be launched with a timer systemd or cron, for example every minute. This makes it possible to update in real time the last visits and the visitor map on Matomo.

However, in order to update the graphs and store long-term data, it is necessary to archive them.

By default, this is done automatically by browsing the site from the browser.

However, it is also possible to disable this possibility, and launch it from a cron task. The command is:

php path/to/matomo/console core:archive \
    --url=<your-matomo-url> \
    --force-all-websites \
    --force-date-last-n=2