What happens when you browse

Browsing the web is communicating. You might not send as much text in terms of number of words, but it is always the browser which initiates and maintains the communication by requesting the bits and pieces which are woven into what is eventually displayed on your screen.

Browsers like Mozilla Firefox, Google Chrome, Opera, Safari & Internet Explorer all work in a similar manner. When we type a URL (e.g. „http://happybunnies.com“) in the address bar, the browser requests the page (which is just a special kind of text) from a remote server and then transforms it into colored blocks, text and images to be displayed in the browser window. To see the text the way the browser sees it, one just has to click on the View –> Page source menu entry in the browser. What comes up is the same webpage but in HTML – a language mainly concerned with content, context and links to other resources (CSS and JavaScript) which govern the way these contents are displayed and behave.

When the browser tries to open a webpage – and assuming there are no proxies involved – the first thing it does is to check its own cache. If there are no past memories of such website, it tries to resolve the name into an address it can actually use. It is an internet program, so it needs an Internet Protocol address (IP address or just IP). To get this address it asks a DNS Server (a kind of phonebook for internet programs) which is installed in the router of your internet access by default. The IP address is a numerical label assigned to every device in the (global) network, like the address of a house in the postal system – and as the address of your home, you should be very careful to whom you hand out the IP address you are browsing from, though by default it is visible to everyone.

Once the IP address has been received, the browser opens a TCP (just a communication protocol) connection to the destination host and starts sending packages to a port at this address, typically no. 80, unless another path is specified (ports are like doors to the servers, there are many but usually only a few are open). These packages travel through a number of servers on the internet (up to a couple of dozen depending on where the target address is located). The server then looks for the requested page and, if found, delivers it using the HTTP protocol.

When the HTTP response arrives, the browser can close the TCP connection or reuse it for subsequent requests. The response can be one of many things, from some sort of redirection to a classic Internal Server Error (500). Provided the response proceeds as expected and we get to the happy bunnies, the browser will store the whole page in a cache for further use, decode it (uncompress it if compressed, rendered if video codec, etc) and display/play it according to instructions.

Now, the process can be illustrated in a little conversation between browser (B) and server (S):

B: „Hallo.“

S: „Hey!“

B: „May I get that page with the happy bunnies, please?“

S: „Well, here you are.“

B: „Oh, maybe you could also give me a big version of that picture of that bunny baby cuddling a teddy bear.“

S: „Sure, why not.“

[…]

B: „That's all for now. Thank you. Bye.“

But this is not the only thing that happens when you browse; there are lots of activities happening in the background that you might not be aware of. Depending on how you have configured its options, your browser will probably be adding the page to browser history, saving cookies, checking for plugins, checking for RSS updates and communicating with a variety of servers, all while you're looking at bunnies.

A topography of you: footprints

Most important: you will leave footprints. Some of them will be left on your own computer – a collection of cache data, browsing history and naughty little files with elephantine memory called cookies. They are all very convenient; speed up your browser's performance, reduce your data download or remember your passwords and preferences from your favorite Social Networks. They also snitch on your browsing habits and compile a record of everywhere you go and everything you do. This should bother you in any case, but especially if you are using a public computer station at a library, work at a cybercafe, or share your apartment with a nosey partner!

Even if you configure your browser to not keep a history record, reject cookies and delete cached files (or allocate zero MB of space for the cache), you would still leave a trail of breadcrumbs all over the Internet. Your IP address is recorded by default everywhere, by everyone, and the packets sent and received are monitored by an increasing number of entities - commercial, governmental or criminal, along with creeps and potential stalkers.

Democratic governments everywhere are redesigning regulations to make Internet providers keep a copy of everything so they can have later access to it. In the USA, section 215 of the American PATRIOT act 'prohibits an individual or organization from revealing that it has given records to the federal government, following an investigation'. That means that the company you pay every month as a customer to provide you with Internet access can be ordered to turn over your browsing and email records, and the law bans them from letting you know about it.

Most of the time, though, surveillance is not a 1984 affair. Google collects your searches along with your browser identification (user agent), your IP and a whole bunch of data that can eventually lead to your doorstep, but the ultimate aim is usually not political repression but market research. Advertisers don't fuss about advertising space any more, they just want to know everything about you. They want to know your dietary and medication habits, how many children you have and where you take them on holidays, how you make your money, how much you earn and how you like to spend it. Even more: they want to know how you feel about stuff. They want to know if your friends respect those feelings enough so that you can convince them to change their consumption habits. This is not a conspiracy, but rather the nature of Information Age capitalism. To paraphrase a famous observation of the current situation, the best minds of our generation are thinking about how to make people click ads.1)

Some people think ads can be ignored or that having advertisers cater for our specific needs is a win-win situation, because at least we are spammed with things we may actually want. Even if that was the case (it isn't): should we trust Google with such reports of our life? Even if we trust Google to 'do no evil', it can still be bought by someone evil; benevolent Larry Page and Sergey Brin could be overruled by their own board, their data base be sequestered by a fascistic government. One of their 30,000 employees worldwide could cut loose and run with our data. Their servers can be hacked. In any case, their priority will always be their real customers, the companies paying for advertising. We are just the product being sold.

Moreover, in the Social Networks our browsing habits are generating a Permanent Record, a collection of data so vast that the information that Facebook keeps about a given user alone can fill a thousand pages. Nobody will be surprised to learn that Facebook's purpose is not to make us happy – again: if you are not paying for it, you're not the customer, you're the product. But even if you don't care about their commercial goals, consider this: the platform has publicly admitted hackers break into hundreds of thousands of Facebook accounts every day.

For a taste of what lurks behind the curtains of the websites you visit, install a plugin/add-on called Ghostery to your browser. It's like an x-ray-machine that reveals all the surveillance technology which might be (and often is) embedded in a web page, normally invisible to the user. In the same line, Do Not Track Plus and Trackerblock will give you further control over online tracking, through cookie blocking, persistent opt-out cookies, etc. Our following chapter Tracking will equip you with expertise in such topics.

Even in between your computer and the router, your packages can easily be intercepted by anyone using the same wireless network in the casual environment of a cafe. It is a jungle out there, but still we choose passwords like „password“ and „123456“, perform economic transactions and buy tickets on public wireless networks and click on links from unsolicited emails. It is not only our right to preserve our privacy but also our responsibility to defend that right against the intrusions of governments, corporations and anyone who attempts to dispossess us. If we do not exercise those rights today, we deserve whatever happens tomorrow.

  1. If you are a Unix user, you can use the tcpdump command in the bash and view real time dns traffic. It's loads of fun! (and disturbing)
  2. If this exchange is happening under an HTTPS connection, the process is much more complicated and also much safer, but you will find out more about that in a most fascinating chapter called Encryption.
1) This Tech Bubble Is Different (http://www.businessweek.com/magazine/content/11_17/b4225060960537.htm), Ashlee Vance (Businessweek magazine)
encs/cph/what-happens-when-you-browse.txt · Poslední úprava: 2013/01/12 21:45 (upraveno mimo DokuWiki)
 
Kromě míst, kde je explicitně uvedeno jinak, je obsah této wiki licencován pod následující licencí: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki