How much personal information do we share on-line?

Disclaimer: I am a software engineer and as much am involved in many projects some of which may require the use of data mining. I am in no means a data mining expert.

We go on with our on-line identities without paying much attention to the details we share about our personal lives, interests, hobbies and friends. We have profiles on Facebook, accounts on Twitter, Flickr, YouTube, discussion boards and other public websites. These are all open and available to anyone. Could that affect us in any way?

Excuse me, did you just say data mining?

Data mining is the process of sorting through large amounts of data and picking out relevant information. […] Wikipedia page on Data mining

What good can it do? Imagine yourself sitting at home on a Saturday, no plans whatsoever, but things tend to get a bit boring. You go to www.i-have-no-plans-please-help.co.uk and you enter your user name, the very same user name you share across most websites. Here is where the magic happens:

  • You have a Facebook profile registered under that user name. Since your profile is not made public by default, another user can only see your friends and in some cases your and their status updates. Looking at the most popular groups and events, data can be extracted about your interests and causes you support.
  • You also happen to have a Twitter account where your timeline is visible to anyone. Your list of friends can be cross-referenced with the one downloaded from Facebook.
  • There is a page on Flickr where you share your favourite photos you’ve taken whenever there was a cool party or a concert nearby. Groups can be cross-referenced with those available on Facebook.
  • In some cases you might be sharing your current location via services such as Google Latitude.
  • YouTube introduced a new feature called Active Sharing not a long time ago. In a few words, it allows anyone to see videos you have watched. Groups, channels and friends are available and can be cross-referenced.
  • Let’s not forget Google itself where you can run a query on your user name or your full name (available on any of the websites above) and get a somewhat accurate list of all other discussion boards and websites you actively participate in.

Now imagine all that information gathered in just a few seconds. Using data mining relevant bits can be filtered and further analysed. Here is one possible suggestion www.i-have-no-plans-please-help.co.uk might throw at you:

Hey, it turns out last Saturday you went to a fancy Chinese restaurant [via Twitter] and you enjoyed the food [via Flickr]. Your friends are currently not very far away [via Google Latitude] and are organising a party [via Facebook], but they don’t have any good ideas for a nice place [via discussion boards]. You should take them to the Chinese restaurant and after that you might catch a movie. It seems like everyone is talking about RocknRolla [via Twitter, Facebook, IMDB boards]. And while you are at it, stop by Jennifer’s place, her sister is in town [via Facebook] and you have a few hours to kill anyway.

* via xxx indicates the service(s) used to gather the information

It sounds a bit scary, doesn’t it? But we are talking software here. The website doesn’t really know you. It just analyses data and makes the best of it. What is scary is if someone decides to target you. We are bound to have someone take all this goodness and profit from it.

Don’t get me wrong. I am not saying stop using Facebook, Twitter, Flickr and (your favourite services here). Rather, refrain from sharing too much about yourself and your friends. I have to dash now, I’ve just received an SMS recommendation for a new coffee shop I must try… wonder how they know I like coffee so much?

It's just as difficult to live in a self-made hell of privacy as it is to live in a self-made hell of publicity.
— Michael Hutchence