A beginner’s guide to web analytics and privacy

What can JavaScript tracking and analytics actually find out about us? And what can we do to protect ourselves?

online privacy and tracking

Digital technology has become such a huge part of all our lives; we take our phones everywhere with us (yes, everywhere) and it’s only recently that people are questioning the impact of carrying a camera and microphone on them at all times.

It’s normal to discuss conspiracy theories about how your phone is listening to you and using your private conversations to throw up adverts. The conspiracy theory about FBI agents watching you through your webcam became so widespread that it became a meme.

One of the reasons these theories are so widespread is because most of us don’t actually understand what’s going on. Just exactly how are our devices and websites tracking us? And should we care?

Your webcam probably isn’t being hacked

Thankfully, although the technology enabling these more offensive types of tracking (webcam hacking, microphone hacking) does exist, it’s not widely used. Yet we do leave traces and create tracks every time we use the internet.

A key part of questioning whether we are being tracked on the internet, and knowing what you can do about it, is understanding the different levels and scopes of “tracking” that are present in the wild west of the Internet, the scopes to which being “followed around” makes sense.

Understanding the specifics can help everyone

With a fuller understanding of some of the technical aspects of what can be tracked or inferred about you on the web, and by whom, hopefully this can lead to you making better decisions – whether you’re just a casual consumer of web services, whether you’re a website owner yourself who cares about your users’ privacy, or whether you’re actually providing a service which offers some sort of tracking capability yourself.

Disclaimer – if you’re not already aware of us, GoSquared offers a web analytics product.

We obviously have our own point of view on these matters. We believe that analytics services have their place and are incredibly useful tools in the arsenal of a website owner for improving performance, fixing errors, and crafting a great user experience. But that these powerful tools also need to be wielded wisely.

We believe in protecting the privacy of visitors, and encourage everyone to be more transparent with the data they choose to collect so everyone can make more informed choices. We’ll touch on some of the things we do ourselves to that end later.

The what, how, and why of online privacy

In this post we’ll provide a brief run-down of the various capabilities and technologies you might see around the web, why you should care, and what you can do to take control of your digital privacy.

And for the sake of clarity – in here we’re not discussing things like shady behaviour by hardware manufacturers, or ISPs, governments or other bad-actors intercepting your web traffic and using it to spy on you – that’s a whole separate discussion.

Even still, if you set aside any potential concerns of “interception” or “listening-in” and you trust that when you visit a website, you’re communicating safely and securely with that site, what might that site – or any of the third-party services it employs – be doing to “track” you, and to what degree? And what even is possible for such sites and services to do?

As it turns out, quite a lot. So let’s dig in.

What actually do we mean by “tracking”?

online tracking footsteps

When people talk about “tracking” on the internet, they usually mean one of roughly three things:

  • A web site or service which uses their own infrastructure to record actions or events you take on their site (“first-party tracking”)
  • A site or service which uses a dedicated analytics/tracking tool such as GoSquared to record the actions on their site (“third-party, on-site tracking”)
  • A site or service which uses some third-party analytics tool, advertising network, social-network-based login or sharing system (e.g. “log in with Facebook” or share/like buttons), which itself gathers data on browsing actions or events and collates it across the different sites where it is used. (“third-party, cross-site tracking”)

Mixed in with these three concepts are also questions of exactly what sort of data is being recorded, stored and analysed.

  • Is a service recording every single keyboard or mouse interaction you make or just those that have an effect on the web site or app?
  • Does that data include records that can be tied directly back to you as an individual person?
  • Is the data aggregated or anonymised or is it used to build profiles on individual people?
  • Is your data sold off to some less-than-reputable data-broker who uses it to run psychological experiments on you or try to manipulate an election?

With all these questions and variables it can be incredibly easy to throw in the towel and assume that all tracking on the web is evil, everyone is spying on everything you do and selling your personal details to the highest bidder.

Thankfully it’s not actually like that – and especially in the realm of cross-site tracking and personal data, there are increasingly more laws and technical protections which give the end-user better control over their data.

Do they know it’s ‘me’?

online tracking and privacy do they know its me, purple head with question mark

A key aspect of discussions about online analytics and tracking is knowing what of your online activity can be correlated together by these services.

That is, as you browse between various pages on a website, or between sites on the internet, do these services know that all that activity was performed by a single person (you), or do they have a set of un-correlated data points that could have been from anyone?

Or is it somewhere in between, where – say – your activity on a particular site can be brought together in a single browsing “profile”, but the moment you browse to a different site, it’s no longer tied to your previous activity?

Show me your ID

The way that services are able to group activity by a single person are all variants on the same theme – having some kind of unique identifier (or ID) which refers specifically to you.

How this identifier is generated and stored, whether you as the end-user have any control over it, and whether it can be shared across different websites to track your activity around the web, depends on the exact technology used, but here are a few ways in which a website might be able to tell that it’s “you” on the other end of that browser window.

Cookies aren’t evil

online tracking on the internet

Put simply, a cookie is just a short string of text.

It’s not something that’s ever necessarily shown to you as a user, but whenever your browser fetches or runs content from the web, cookies provide a mechanism for the web site or service providing that content to say to your browser “remember this value for me please, and send it back to me when you next request this content”.

A cookie has a few associated components:

  • A name – web servers can set multiple different cookies so this gives a way of instructing the browser which one to read or write.
  • A value which can be set to any string of text.
  • An expiration which dictates how long the browser will hold the cookie around for.
  • A scope which dictates exactly where the cookie’s value is accessible.

Cookies have a wide variety of applications – for example, a web site with a login system will probably set a cookie when you log in which identifies your login session; the browser will then send that every time you request a new page from the site, so it will remember who you are (rather than you being instantly logged-out the moment you visit another page).

Such a cookie might have an expiration so you don’t stay logged-in forever, and will be scoped so it’s only accessible on the site that set it, so nasty third-parties can’t steal your login details.

For the purposes of identification for tracking/analytics, the website would do something like this:

  • The first time you visit the site, check to see if the tracking id cookie is set.
  • If it’s not, generate a (usually random) unique value and store this in the tracking id cookie.
  • For every subsequent request to the site, the tracking id cookie will remember this value and all such activity can be grouped together.

Cookies in browsers are stored and retrieved on a per-domain basis (and can optionally be scoped to be restricted even further). That is, if site1.com sets a cookie named login_id then the value of that cookie can only be read by site1.com and not say site2.com.

But how then can cookies be used to follow you from site to site?

First- and third-party cookies

Cookies as a technology aren’t necessarily limited to just being accessible by the web site or service you’re directly looking at. It’s common for any web page to include external resources, such as images or fonts or scripts from another domain (for example a page on site1.com might include images from images.site1.com or analytics from some-analytics-service.com).

Whenever you view a page with such external resources and your browser performs the network requests to fetch that content, the third-party web server on the other end of that content request can do the same process of setting and retrieving cookies, but using the cookies associated with that server’s scope.

Once that content is loaded by the browser and included in the web page, it has access to the first-party cookies and identifiers associated with the page in question.

It’s this bridging of first- and third-party cookie access that allows an analytics service to tie your identity together between different sites and “follow you around the internet”.

More recently, many browsers have started imposing restrictions on third-party cookies (and, more recently, first-party cookies that are set by third-party scripts). These restrictions concern how and when the cookies can be set and retrieved, and how long the values can be held. More on that later.

Other cookie-like technologies

digital privacy key for online security

Alongside cookies, there are a few similar technologies on the web that sites can use for storing and retrieving data, such as localStorage, indexedDB etc.

These behave similarly to cookies in that they allow storing and retrieving values, although they are generally more tightly scoped so the values are only accessible on the site that created them (and thus they can’t be used for tracking you between different sites).

Aside: the EU cookie law

In 2011, EU countries adopted the memorably-named Directive 2002/58/EC, commonly known as the Cookie Law, which governs cookies and cookie-like technologies for storing persistent data in users’ browsers as they use the web.

The intention of the cookie law’s introduction was to ensure that web sites didn’t abuse the power of cookies and other cookie-like technologies for purposes such as tracking and profile-building without the consent of the user.

The law includes some exemptions for functionally-necessary cookies (for example, requiring extra explicit consent for cookies necessary for core functionality such as a log-in system makes little sense). We wrote at the time about the law, how it dictates various categories based on “necessity”, and the like.

Fingerprinting

fingerprinting digital tracking online privacy

The presence of the Cookie Law, alongside browser-level restrictions on exactly how and when persistent identifiers can be set and retrieved (especially by third parties) has led to the development of so called “fingerprinting technology” – that is, the ability to generate a unique (or “unique-enough”) identifier that can be used to correlate your browsing activity together without having to generate and store a particular value.

The idea behind fingerprinting technologies is to examine as many attributes as possible from your browsing session, such as browser / operating system versions, screen resolutions, what system fonts you have installed, various aspects and quirks of the other hardware and software you have running on your computer.

When enough of these attributes are inspected, they are likely to provide a combination that is unique to you, and is predictable enough that it can be used to correlate your activity (that is, two web page views with the exact same combination of browser, screen, fonts, other hardware etc. are very likely to be from the same person).

There are a lot of different ways that fingerprinting techniques can work, but for more information, there are excellent resources such as Am I Unique? and some of the commonly-used fingerprinting libraries such as Fingerprint.js.

Putting it all together to pinpoint “you”

following around the internet privacy

Cookies, both first-party and third-party, and technologies like fingerprinting are some of the ways websites and services can correlate your browsing activity together under a single profile that represents “you”.

Exactly which of these technologies a service might use, and whether that profile of “you” is specific to only one web site, or whether it includes activity from across the web, varies widely from service to service, and is dictated by each service’s business-model and ethical practices.

Why should I care about this? And what can I do about it?

why should I care about online privacy?

With discussions left, right, and centre around online tracking, profiling, and privacy, it’s becoming increasingly clear that everybody should be concerned about this, and should care about the discussion for a variety of reasons – from the everyday end-users (that is, all of us as we browse around the internet), to the people who build and run websites, to the companies and people who provide any manner of data-tracking services.

Why internet users should care about online privacy

how can I stop being tracked?

Since you’re reading this post, it’s a fairly safe assumption that you use the internet to browse sites, consume content etc. And as such, this sort of activity inevitably affects you.

In today’s information age, there’s a scramble among internet services to gather as much information on as many people as possible, then mining that data to influence behaviour.

At one end of the spectrum it could be as simple as a website optimising their signup form by making it less confusing; at the other end are the (perhaps overblown) stories of Facebook running psychological experiments and companies trying to influence elections.

As someone who uses these sort of internet services, you may be happy with your information being used in this manner, or you may not.

Regardless, the most important thing you should care about is knowing who is collecting your data and for what purposes it’s being used.

In terms of what you as a user can do about this, you can choose to draw the line wherever you like when it comes to what sort of data you’re happy with websites or third-parties collecting from you when you use them.

It’s easy to take the full-nuclear option and use a browser or browser extension that blocks any and all tracking. But such a blocker is unlikely to be aware of the spectrum that exists between data tracking for useful purposes, and the hoover-like trackers which collect any and all data they can in the hope that it might provide some interesting insight or a way to make more money by reselling that elsewhere.

For example, you might be perfectly happy with a web application tracking its own performance or errors or behaviour flows for the pure purpose of improving your own experience (after all, nobody likes apps or sites that are buggy or slow); or you may be happy for an online shop to follow your browsing patterns or purchase history if it’s able to use that data to make (useful) recommendations.

By and large, most people find that the line between “useful” and “unnecessary invasion of privacy” lies at roughly the same point as the line between websites tracking your own first-party use of their site (perhaps using a third party service) and a third-party provider that tracks and collates your browsing activity across the internet.

The good news is that web browsers themselves are adding more and more controls and features, specifically geared towards curtailing the spread of the more invasive cross-site tracking which serves little purpose towards actually improving the experience for the user. Some examples of this are:

  • Safari’s Intelligent Tracking Protection, which is a series of controls which limit how third-party resources can set, persist, and retrieve cookies for the purpose of identifying you across different sites
  • Many browsers are implementing controls to reduce the efficacy of fingerprinting by reducing the variability of different signals that are often used (i.e. if everybody’s browser looks identical to the code running on a web-page, it’s harder to use fingerprinting methods to identify anyone uniquely)
  • Some browsers are implementing specific features such as the Facebook Container extension available for Firefox, which effectively segregates your activity on facebook.com from any third-party tracking Facebook may be using through their social logins, like buttons, and ads across the web (i.e. it doesn’t stop Facebook from correlating your browsing activity between sites, but it makes it harder for them to directly tie it back to your own account on facebook.com)
  • In addition to some of the protections being built directly into browsers, there are also various pieces of legislation, most notably the EU GDPR introduced in May 2018, specifically designed for giving people more control over who is collecting their personal data and for what purposes, along with extra controls for those actually doing data collection to ensure that they get the proper consent from those users.

    GDPR, in particular, is the reason there’s been a positive explosion in “data permission” popups and modals throughout the web. As a user they can be incredibly tedious (and hopefully in the coming years their usability may improve slightly so they’re a bit less annoying) but if you care about how your data is being collected and used, the best thing you can do is to pay attention when you see one of these popups – take heed of how the site says it’s collecting and using your data, and decide for yourself what you are comfortable with. The whole idea of these popups is to give you more control over these things, after all.

    Why website builders should care about online privacy

    info about building a website for privacy

    Every time you collect a piece of data from one of your visitors or users, or when you use a third-party service to collect that data on your behalf, that’s yet another place where that data is being used and processed. And as someone building a website it’s effectively your responsibility to make sure that the amounts of data collected, and who it’s sent off to, is sensibly balanced against how useful that data might be to you for, say, improving the overall quality of your site.

    With services such as Google Tag Manager and Segment, it’s become ever-increasingly easy to integrate any and every third-party service into your site. All these services come with the promise that they’ll help increase your conversion, or provide simple social-media sharing or logging-in, and can be installed incredibly easy, often with just a few clicks. But just because the friction has been removed from the process of adding a third-party service doesn’t mean you should throw everything imaginable onto your site without consideration for the wider implications.

    Before deciding to track a new piece of data on your users, or setting up tracking/ads/analytics with a new third-party service, consider carefully:

    • Do I need to be collecting this data?
    • For what purpose will I be using this data?
    • If I’m collecting data through a third-party, how will they use this data? Will they send it to other third-parties or use it for purposes other than what I’m using the third-party for?
    • Again, if I’m using a third-party, what techniques are they using for gathering or sharing data, and how happy am I with those?

    Having process such as this is actually already mandated by the GDPR, but as a website owner you may want to go beyond what’s legally required and more carefully choose third-party providers based on their own attitude to privacy and treatment of data.

    Besides the user-focused controls browser vendors are adding to reduce the efficacy of third-party cookies and fingerprinting techniques, there’s very little that you as a website builder can do to exert any control over the data-gathering scope and methods employed by any third-party service that you might use for analytics, ad-serving or social content. Once that service is embedded on your site’s code it can do (pretty much) anything it likes, within the scope of what the browser will allow. With that being the case, it’s, therefore, your responsibility as the website builder to ensure that you’re well-assured of what all these services may be doing.

    Why you should care about online privacy as someone building a service that tracks or handles this sort of data

    using this data properly for security

    There’s a prevalent and ever-increasing trend in recent years for people to care a lot more about their own personal data and how it’s being used.

    And secondary to that, many providers of web sites and apps are showing an increasing consciousness of their own responsibilities to protect their users’ privacy.

    For the providers of third-party data services (such as GoSquared), it’s therefore only going to become increasingly important to ensure that such services can be provided in a way that provides value while simultaneously respecting the privacy wishes of website owners and their own users.

    While there will likely always be a market among websites that care far less about these issues and would rather just increase their business and make more money, regardless of the consequence, we believe that the trend will lean more towards those that take a stand for providing services that bring value, but not at the cost of abusing privacy.

    So what can a third-party analytics service like GoSquared do?

    Well, we believe that transparency is key here – again, it’s effectively required by things like GDPR, but being completely clear about what data a service collects, how it’s treated, and what purposes it may be used for, is the most important aspect for building trust.

    As businesses become ever-more conscious about their own obligations, they will want to ensure that they’re ever-more careful about which third-party providers they use for handling data.

    Here at GoSquared we aim to be as clear as we can be about the methods we use for gathering data, and the care with which we handle that data on our customers’ behalf:

    • When a website uses GoSquared Analytics to collect data on its visitors, nobody else has access to that data, beyond the aggregated, anonymous, global metrics that we publish.
    • We never re-sell any of the data we collect or provide access to anyone else.
    • Our business model is based purely off the analytics product we provide – we don’t make money from aggregating or selling data to third parties.
    • We don’t use third-party cookies or any other techniques for tracking users’ identities between different websites. That is, we don’t do anything to correlate your activity on GoSquared Customer A’s website with your activity on GoSquared Customer B’s site – to all intents and purposes those are two completely separate identities.
    • We don’t use any sort of browser-fingerprinting, or any other ways to work around the privacy protections put in place by browsers, and we engineer our analytics infrastructure to be respectful of users’ choices (for example, we wrote a little about how we detect tracker-blockers with our analytics code to stop our Live Chat features breaking, but in a way that respects the wish for not gathering data)

    Make an informed choice about your own digital privacy

    disguising yourself online for privacy and tracking

    We hope that this has been an informative post and has helped you to learn about the various practices going on that fall under the umbrellas of privacy and tracking.

    Now when a headline screams “YOU ARE BEING FOLLOWED ON THE INTERNET” – you’ll be able to understand what that means for you and your digital life. What’s important is that we are all able to make an informed choice about our own digital privacy, and education is a big part of that.

    This information, and any further reading you go on to do, will hopefully give you the confidence to go out into the Wild Wild Web on your own terms and with a little more knowledge about the ways in which you can protect yourself, and the steps being take by governing bodies and browser vendors to help protect you.

    download analytics guide

Subscribe to the GoSquared newsletter.

Join 15,000 people. Get our latest posts delivered to your inbox every week.