flypig.co.uk

Personal Blog

View the blog index.

RSS feed Click the icon for the blog RSS feed.

Blog

13 Apr 2020 : How to build a privacy-respecting website #
Even before mobile phones got in on the act, the Web had already ushered in the age of mass corporate surveillance. Since then we've seen a bunch of legislation passed, such as the EU ePrivacy Directive and more recently the GDPR, aiming to give Web users some of their privacy back.

That's great, but you might imagine a responsible Web developer would be aiming to provide privacy for their users independent of the legal obligations. In this world of embedded javascipt, social widgets, mixed content and integrated third-party services, that can be easier said than done. So here's a few techniques a conscientious web developer can apply to increase the privacy of their users.

All of these techniques are things I've applied here on my site, with the result that I can be confident web users aren't being tracked when they browse it. If you want to see another example of a site that takes user privacy seriously, take a look at how Privacy International do it (and why).

1. "If you have a GDPR cookie banner, you're part of the problem, not part of the solution"

It's tempting to think that just because you have a click-through GDPR banner with the option of "functional cookies only" that you're good. But users have grown to hate the banners and click through instinctively without turning off the tracking. These banners often reduce users' trust in a site and the web as a whole. What's more, on a well designed site they're completely unnecessary (see 2). That's why you won't find a banner on this site.

2. Only set a cookie as a result of explicit user interaction

On this site I do use to cookies. One is set when you log in, the other if you successfully complete a CAPTCHA. If you don't do either of those things you don't get any cookies.

The site has some user-specific configuration options, such as changing the site style. I could have used a cookie to store those settings too (there's nothing wrong with that, it's what cookies were designed for), but I chose to add the options into the URL instead. However, if I had chosen to store the options in a cookie, I'd be sure only to set the cookie in the event the user actually switches away from the default.

In addition to these two cookies, I also use Disqus for comments, and this also sets cookies, as well as tracking the user. That's bad, but a necessary part of using the service. See section 5 below for how I've gone about addressing this.

3. Only serve material from a server you control

This is good for performance as well as privacy. This includes images, scripts, fonts, or anything else that's automatically downloaded as part of the page.

For example, many sites use Google Fonts, because it's such an excellent resource. But why does Google offer such a massive directory of free fonts? Well, I don't know if they do, but they could certainly use the server hits to better track users, and at the very least it allows them to collect usage data.

The good news is that all of the fonts have licences that allow you to copy them to your server and serve them from there. That's not encouraged by Google, but it's simple to do.

The same applies to scripts, such as jQuery and others. You can embed their copy, but if you want to offer improved privacy, serve it yourself.

Hosting all the content yourself will increase your bandwidth, but it'll also increase your users' privacy. On top of that it'll also provide a better and more consistent experience in terms of performance. Relying on a single server may sound counter-intuitive, but if your server isn't serving the content, all of the stuff around it is irrelevant already, so it's a single point of failure either way. And for your users, waiting for the very last font, image, or advert to download because it's on a random external server you don't control, even if it's done asynchronously, is no fun at all.

Your browser's developer tools are a great way to find out where all of the resources for your site are coming from. In Firefox or Chrome hit F12, select the Network tab, make sure the Disable cache option is selected, then press Ctrl-R to reload the page. You'll see something like this.
 
Using the developer tools to find external content

Check the Domain column and make sure it's all coming from your server. If not, make a copy of the resource on your server and update your site's code to serve it from there instead.

Spot the difference in the images below (click to enlarge) between a privacy-preserving site like DuckDuckGo and a site like the New York Times that doesn't care about its readers' privacy.
 
DuckDuckGo content source New York Times content source

4. Don't use third party analytics services

The most commonly used, but also the most intrusive, is probably Google Analytics. So many sites use Google Analytics and it's particularly nefarious because it opens up the door for Google to effectively track web users across almost every page they visit, whether they're logged into a Google service or not.

You may still want analytics for your site of course (I don't use it on my site, but I can understand the value it brings). Even just using analytics from a smaller company provides your users with increased privacy by avoiding all their data going to a single sink. Alternatively, use a self-hosted analytics platform like matomo or OWA. This keeps all of your users' data under your control while still providing plenty of useful information and pretty graphs.

5. Don't embed third-party social widgets, buttons or badges

Services can be very eager to offer little snippets of code to embed into your website, which offer things like sharing buttons or event feeds. The features are often valued by users, but the code and images used are often trojan horses to allow tracking from your site. Often you can get exactly the same functionality without the tracking, and if you can't then 2 should apply: make sure they're not able to track unless the user explicitly makes use of them.

For non-dynamic sharing buttons often the only thing needed is to move any script and images on to your server (see 3). But this isn't always the case.

For example, on this site I use Disqus for comments. Disqus is a notorious tracker, but as a commenting system it offers some nice social features, so I'd rather not remove it. My solution has been to hide the Disqus comments behind an "Uncover Disqus comments" button. Until the user clicks on the button, there's no Disqus code running on the site and no way for Disqus to track them. This fulfils my requirement 2, but it's also not an unusual interaction for the user (for example Ars Technica and Engadget are both commercial sites that do the same).

When you embed Disqus on your site the company provides some code for you to use. On my site it used to look like this:
 
<div id="disqus_thread"></div>
<script>
var disqus_shortname = "flypig";
var disqus_identifier = "page=list&amp;list=blog&amp;list_id=692";
var disqus_url = "https://www.flypig.co.uk:443/?to=list&&list_id=692&list=blog";

(function() { // DON'T EDIT BELOW THIS LINE
	var dsq = document.createElement("script"); dsq.type = "text/javascript"; dsq.async = true;
	dsq.src = "https://" + disqus_shortname + ".disqus.com/embed.js";
	(document.getElementsByTagName("head")[0] || document.getElementsByTagName("body")[0]).appendChild(dsq);
})();
</script>

On page load this would automatically pull in the flypig.disqus.com/embed.js script, exposing the user to tracking. I've now changed it to the following.
 
<div id="disqus_thread"></div>
<a id="show_comments" href="#disqus_thread" onClick="return show_comments()">Uncover Disqus comments</a>
<script type="text/javascript">
    var disqus_shortname = "flypig";
    var disqus_identifier = "page=list&amp;list=blog&amp;list_id=692";
    var disqus_url = "https://www.flypig.co.uk:443/?to=list&&list_id=692&list=blog";
    function show_comments() {
        document.getElementById("show_comments").style.display = "none";
        var dsq = document.createElement("script"); dsq.type = "text/javascript"; dsq.async = true;
        dsq.src = "https://" + disqus_shortname + ".disqus.com/embed.js";
        (document.getElementsByTagName("head")[0] || document.getElementsByTagName("body")[0]).appendChild(dsq);
        return false;
    };
</script>

The script is still loaded to show the comments, but now this will only happen after the user has clicked the Uncover Disqus comments button.

For a long time I had the same problem embedding a script for social sharing provided by AddToAny. Instead I now just provide a link directly out to https://www.addtoany.com/share. This works just as well by reading the referer header rather than using client-side javascript and prevents any tracking until the user explicitly clicks on the link.

There are many useful scripts, service and social capabilities that many web users expect sites to support. For a web developer they can be so convenient and so hard to avoid that it's often much easier to give in, add a GDPR banner to a site, and move on.

6. Don't embed third-party adverts

Right now the web seems to run on advertising, so this is clearly going to be the hardest part for many sites. I don't serve any advertising at all on my site, which makes things much easier. But it also means no monetisation, which probably isn't an option for many other sites.

It's still possible to offer targetted advertising without tracking: you just have to target based on the content of the page, rather than the profile of the user. That's how it's worked in the real world for centuries, so it's not such a crazy idea.

Actually finding an ad platform that will support this is entirely another matter though. The simple truth is that right now, if you want to include third party adverts on your site, you're almost certainly going to be invading your users' privacy.

There are apparent exceptions, such as Codefund which claims not to track users. I've not used them myself and they're restricted to sites aimed at the open source community, so won't be a viable option for most sites.

Compared to many others, my site is rather simple. Certainly that makes handling my readers' privacy easier than for a more complex site. Nevertheless I hope it's clear from the approaches described here that there often are alternatives to just going with the flow and imposing trackers on your users. With a bit of thought and effort, there are other ways.
 

Comments

Uncover Disqus comments