As everybody is well aware by now, data on more than 30 million users of the extramarital dating site Ashley Madison have been leaked this very week. In Tecnilógica, we came with the idea of developing a world map of infidelity (which you can see here) with the anonymous data from these accounts and dump it into CartoDB. So we immediately got down to work.
From the filtered data collecting of the dating service (30 million users), the first step was to remove sensitive data from the database (name, address, email, etc.). The reason for this is twofold: first, we downsize the size of the data to operate with, and on the other, we protect the privacy of the users. The remaining data contains the user gender, the sign-up date, and the city where they are located. Thus, it is impossible to connect a point on the map with a particular user. Also, we deleted the data from users living in little ‘lifeless’ cities to avoid the effect “we caught you because you are the only user from Boreland-upon-Nowhere”.
The next step was to find a visualization that would allow us to better understand the data. To work quicklier, we began to experiment with a subset of data ‘just’ 3 million records, while in parallel we set up a database server to handle the full dataset. At this point, the first decisions about design and visualization were being taken.
When viewing the data, we decided to make a map with two layers. The first one, an intensity (or heat, winkle, winkle, nudge, nudge) map, would give us an overview of the distribution of users across the world. For the second layer we decided to find the male / female ratio per city, highlighting those cities with more than a 15% of female users. Said layer allowed us to analyze patterns that remained hidden so far: AM is a service predominantly used by men, but in countries like India, South Africa or Brazil, the proportion of women (15-20% out of the total) is not as unbalanced as in the Center and South of Europe, where the rate does not exceed the 10%.
Once we knew the type of data we we were going to use, how to access, and how to represent it, we started operating onto the entire database. For this, we used a MySQL database installed on a server with 8 cores and a 16GB memory. However, queries to the database took some time, which did not play in our favor, because we wanted to publish the map as soon as possible.
After eight hours of work, we finally published the map, including more than 50,000 points which group together 30 million users. As has been said above, we chose CartoDB as the visualization tool because of its flexibility, ease and power of use. Finally, we named it MALFIDELECO (again, we leaned toward Esperanto as the baptismal language for the project, as we did in other ‘crazy projects’ like Animestato, Famostrato, or Kuligraso).
Here’s the link to the map again. We look forward to your comments, your feedback is always welcome! 🙂