Status update
Added 2018-04-13 17:35:38 +0000 UTCHi. As you all know, Steam has changed their security settings to prevent Steam Spy and similar sites from accessing user information. Here is my interview about it.
Going forward, Steam Spy still collect user information, but the number of queried profiles decreased significantly - from 2M per 3 days to 60K, give or take. All of those 60K users are hardcore players that own a lot of games, so this particular data is still useful for X-Analysis or geography, but not so much for stats on owners or players.
This why I just blocked regular users from accessing sensitive data on Steam Spy. it's no longer correct and I expect most of them do not follow the latest developments. You as Patreon backers can still access it, of course.
I've been told by multiple people that even less precise Steam Spy is going to be useful as long as people using it understand the caveats. This why I'd like to try and keep the site working in some capacity.
I, of course, wrote an email to Valve with a possible privacy-conscious solution and pinged my friends there. The Steam team is, of course, ignoring me. Well, they have their annual corporate holidays on Hawaii until Monday, so that might be the reason. Anyway, I don't expect Valve to cooperate, they never do.
The second option is to use the massive bunch of data the site acquired to run machine learning based on the number of still accessible parameters.
I only started doing this today (had some fires in the last two days at my day job), but the results are promising. When games are divided into categories with lower precision, the algorithm was able to correctly identify the number of owners in almost 90% of cases. Unfortunately, the performance decreases as the games sales go up (an exact opposite of the previous version of Steam Spy), because I have less data on big games. But I've only run this algorithm on a small sample of 12,000 points (7MB) with no pre- or post-filtering. I can easily increase the sample to at least 1M data points to see if it will help with the accuracy at bigger numbers.
If I manage to get this algorithm to work with a tolerable accuracy, I'm planning to keep Steam Spy running. Some sensitive data will not be available to unregistered or even non-paying users then to prevent abuse by people with little understanding of statistics.
If I'm unhappy with the algorithm, I will most likely cancel this Patreon and keep Steam Spy as an archive.
Will be glad to hear your opinions and suggestions.