OkCupid at centre of data storm » Charity Digital News

OkCupid at centre of data storm

IT analytics

Dating site OkCupid is at the centre of a data security storm. A group of researchers has released a data set of nearly 70,000 users of the site, that was publicly available.

The collation of data places OkCupid back into the headlines again, after Mencap had called on the dating website OkCupid to remove a “deeply offensive question” and apologise to people with a learning disability for the offence caused.

Now, researchers from Aarhus University, Denmark have released a paper called “The OKCupid dataset: A very large public dataset of dating site users,” which contains information on users including usernames, sexual preferences, orientation and more.

Researchers say a slice of OkCupid user data is available to the public and that some profiles can be found using a search engine to bring up specific usernames and part of the information the person provided to the dating service.


Poor judgment

Commenting on this, Rob Sobers, director at Varonis, said: “We have to live under the assumption that, if we make data public, it can and will be scraped and collected and archived permanently. The profile data is public, so technically this is not a hack or a breach. Anyone could easily get their hands on any individual user profile that is in the dump. However, what the researchers did was compile it all into one big structured data set, which makes it easy for both good guys and bad guys to analyse.

“They should have stripped the usernames from the data dump to anonymise it. It was poor judgment not to do that. They claimed they left the usernames in the dump so that they could back-fill the dataset with more information in the future. But they could have used an anonymous unique ID and kept the mapping of anonymous IDs to usernames private and it would solve that problem.

“It’s helpful to create data dumps for studies. OkCupid does this themselves. They often release really interesting findings about their users based on aggregate data. But today we have to be more security and privacy conscious—publishing data dumps with PII and sensitive information without adequate de-identification isn’t a good thing.

“I don’t think the researchers here were after bragging rights, it seems like they were just naïve vis-à-vis the privacy implications of compiling OkCupid’s data into an easy-to-exploit format without any prior notice to OkCupid or the people involved.”

Related reading