Tracking my website traffic

Optimising scarce attention into the shopping cart widget

January 30, 2021 — April 16, 2022

computers are awful together
faster pussycat
UI
workflow
Figure 1

I would like to record traffic to my website using some minimalist tracker system which ideally does not give my users’ data to Google.

This is often conflated with “analytics” which is, as far as I can tell, the process of putting your traffic data into a dashboard which looks colourful in the annual report.

For my purposes, I currently use Gauges, which at USD6/month is probably too expensive for the minimal service they actually provide me. Also, their site is so derelict that I am pretty sure the business will fold soon. On the plus side, they are easy to use, simple, and provide me the data I want. Hopefully, they do not share their data with anyone? I have not really been sufficiently diligent there? I just assumed that since they are not as all-pervasive as Google, they cannot be as toxic. There is a script to download all the data that I care about (just popularity) at the bottom of this page.

If I had time, I might build a lighter, cheaper option using some serverless functions. There are some examples of that below.

At some point, it probably becomes onerous to keep track of the various privacy requirements of web tracking in your jurisdiction. I am not an expert in that.

1 DIY

I can build my own analytics platform, which is insane NIH generally, except my needs are so extremely simple that it just might work.

2 Open source, self-hosted analytics

Alternatively, I can spin up a server and use an open-source analytics option, which is also insane because it is overkill. A whole server sitting there doing nothing but waiting for mouse clicks? That I am supposed to maintain? 🤮

If you are a more dedicated hobbyist than me, here are some open-source semi-DIY options, in descending order of modernity.

3 SaaS

Probably the most sensible except for the weird pricing structures that always seem to be targeting a web store with 100 times my traffic

A comprehensive list of alternatives is onurakpolat/awesome-analytics: A curated list of analytics frameworks, software and other tools.

Figure 2

4 To Google Analytics or not

Question: Are my Google search results harmed by not using Google Analytics to track my traffic? Obviously, I am keen not to do market research for them for free, but if my website tracking facilitates the prominence of my website then I guess I am no longer doing it for free? Is this a real thing? What is my price for selling your info to the man? It turns out to be extremely hard to get actual information on this. Google seems to be secretive, and all the obvious searches on this theme are squatted by SEO firms trying to sell me their product.

5 Geocoding

If I want to roll my own, I will possibly want geocoding so I can locate my readers. In fact, I would rather store a geolocation estimate than an actual IP for privacy reasons. Here are some options:

# Database URL
https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&license_key=YOUR_LICENSE_KEY&suffix=zip
# SHA256 URL
https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&license_key=YOUR_LICENSE_KEY&suffix=zip.sha256

6 Getting data out in a sane format

I do not want their weird dashboards. I want numbers I can crunch.

6.1 Download my gaug.es data

#! /usr/bin/env python3
"""
Export traffic using
https://get.gaug.es/documentation/reference-listing/content/
"""
import requests
import sys
import re
from time import sleep
import datetime
from socket import timeout
import json

MAX_ATTEMPTS = 10

def next_chunk(
        url=f'https://secure.gaug.es/gauges/SITE_ID/content'
        ):
    for attempt in range(MAX_ATTEMPTS):
        r = requests.get(
            url,
            headers={
                "X-Gauges-Token": TOKEN
            },
            timeout=5.0
        )
        if r.status_code == 200:
            return r.json()
        else:
            r.raise_for_status()
        sleep(5)

def get_nexturl(resp):
    if resp["urls"]["next_page"] is not None:
        return resp["urls"]["next_page"]
    elif resp["urls"]["older"] is not None:
        return resp["urls"]["older"]
    return None

def squish(resp):
    return [_squish(record, resp["date"]) for record in resp["content"]]

def _squish(record, date):
    del(record["title"])
    del(record["url"])
    record["date"] = date
    return record

def main():
    tracks = []
    resp = next_chunk()
    print(resp)
    tracks.extend(squish(resp))
    nexturl = get_nexturl(resp)
    while nexturl is not None:
        sleep(1)
        resp = next_chunk(nexturl)
        print(resp)
        tracks.extend(squish(resp))
        nexturl = get_nexturl(resp)

        with open("track.json", "w") as h:
            json.dump(tracks, h, indent=1)

if __name__ == "__main__":
    main(*sys.argv[1:])