Over the last two days, we've looked at the number of messages across Usenet. We seem to have found an interesting phenomenon: a surge in messages that doesn't connect to an obvious technical failure on our part.
One obvious question is to figure out where those posts are hiding.
Let's see if we can trace this out a little further. Here's the curves for three different news hierarchies, microsoft.public, tw, and alt.binaries.
In each of them, the orphans are the smaller scale (and are usually something like 10% of the message scale).
These were pretty arbitrary choices, but there is a method to this madness. These are three of the biggest hierarchies in our records.
We look pretty carefully at microsoft.public, and we have better archives, so if the phenomenon is there, it'll be easy to pin down. alt.binaries often gets scapegoated as a home for odd behavior, because it's so very big and heavily trafficked with illegal material (CD rips, DVD copies, cracked computer games, and so on).
Now, I'm not sure what's going on with the bump in the orphans count in microsoft.public, but the post count isn't going nuts. Our feed from Taiwan isn't quite as reliable as I might like, but it's pretty stable too.
And, wow, alt.binaries has our bump. Peak to baseline, it looks to be somewhere around the 300,000 that we're interested in.
Now, a lesser research team might accept this and call it a day. "Yeah, that's those binary posters," they might say. "Not even Google tracks 'em -- they just want to swap pr0n and copies of Return of the King."
And while this is true, there is always more data to sift. Let's go on. Maybe this whole phenomenon is buried within a single newsgroup or two. (Then again, maybe it's not).
Let's try to be systematic, now. I got a table of the posts-per-day for every alt.binaries group with more than ten thousand posts. For each group, I compared a baseline period (May 2003 - Feb 2004) to the bump period (May 2004 - June 2004), and checked whether the average posts per day during the later period was more than twice the average posts per day of the earlier period.
Now, for each of them, I looked for a groups where the average number of daily messages from May 2003 through Feb 2004 was under half the number of messages for May 2004 thorough June 2004. I also looked for groups that had a maximum day over 50,000 messages. Less than that, and we just aren't able to build up to this large a spike.
It's not hard to find echoes of the bump in a number of groups ...
… but the most dramatic one that I'm finding seems to be in alt.binaries.dvd. Alt.binaries.dvd had nothing before January, 2003, but got up to 50,000 posts per day during our time period. Tomorrow, we'll look a little more into what happened.
January 20, 2005 02:15 PM | TrackBack | in Data and Documents