Exploratory Data Analysis Of The GiantBomb.com Userbase

Avatar image for paulwgraham
paulwgraham

171

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Edited By paulwgraham

Introduction

To me Giant Bomb feels like a small site. I mostly see the same people in livestream chat and posting on the forums. Yet intellectually I know that GiantBomb.com must get a fair amount of traffic and have a sizable number of users in order to pay the salaries of the crew, artists, and engineers that run the site.

So I've always wondered: How many users does Giant Bomb have? How many users are Premium? How many users actually post to the forums?

Fortunately, on GiantBomb.com user profiles are public by default and contain all the information I need to answer these questions.

Methodology

I wrote a web-crawler that over the last couple of months has slowly scanned every Giant Bomb user profile it could find. I payed special attention to making sure the crawler respects the rules laid out in GiantBomb.com/robots.txt and that the crawler is throttled to only make one request per second.

Disclaimer

Given how long the data took to collect it will be by its very nature somewhat out of date. The following analysis doesn't account for the normal churn of users.

Also, it should be noted that the neither the manner in which the data was collected nor the methods used in the analyses have been vetted for correctness.

The Users

I was able to find 1,052,304 GiantBomb.com user accounts of which 988,483 have publicly available profiles.

The following charts show user sign-ups by ISO-8601 ISO week

No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided
No Caption Provided

From the above charts it's clear that something strange started happening in the 29th week of 2020.

The average number of user sign-ups per ISO week for 2019 was 642. For 2020 the average number of user sign-ups per ISO week was 1902. For 2021 the average number of user sign-ups per ISO week was 6,881.

The following chart shows user sign-ups by year.

No Caption Provided

413,673 users signed-up before 2020. 574,810 users signed-up in 2020 and after.

No Caption Provided

So what started this massive influx of new users? It's hard to say for sure but there are a few possibilities. In 2020 the Covid-19 pandemic kicked into full swing and during ISO Week 11 was when Giant Bomb began primarily streaming on Twitch with their "Lockdown" streams. This no doubt had an effect on user sign-ups.

However, after examining the data I believe the biggest contributor to this ongoing influx of user sign-ups is spam accounts.

The Spam

Did you know Giant Bomb has a significant spam problem? I didn't. I mean I do occasionally see spam posts on the forums but those posts are quickly deleted and the spam user accounts used to create the posts are quickly banned.

What I was surprised to discover is that not only are there a ton of obvious spam user accounts on GiantBomb.com the spammers are sticking their spam messages in the "About Me" sections of their user account profiles.

In addition to the weird sex stuff and obvious scams that are typical for spam the spam on Giant Bomb contains some interesting offers.

Do you want to buy some vegan deodorant ? Do you live in Florida and need to get your Tesla modified? Do you need to get a wedding dress altered in the UK? Are you desperate to buy a Feng Shui plant from Vietnam?

If you answered "yes" to any of the above questions the spam accounts on GiantBomb.com have you covered.

What's interesting is that the typical user account with "About Me" profile spam will never post on the Giant Bomb forums or comment on any videos. This means that there is practically no way an average real Giant Bomb user will ever see the spammers profile let alone the spam it contains.

One may ask: "If no one will ever see it why do the spammers bother?" Well, I think the answer to that has to do with the fundamental nature of spam. The vast majority of all spam will never be seen or acted upon. However, placing spam on sites is essentially free for the spammer. Therefor it makes a sociopathic kind of sense for a spammer to place their spam messages in as many places as possible regardless of effectiveness because any click-through they receive is pure upside. Thus, the only criterion that matters to the spammer is that it is possible to post spam on GiantBomb.com. It does not matter if it is an ineffective place to do so.

It should be noted that there appears to be a number of legitimate businesses being advertised by the spam user accounts. While it is possible that the owner of a florist in Fort Lauderdale, Florida is moonlighting as an operator of a spam spreading botnet I feel it's more likely that said florist innocently signed-up for an online marketing service without realizing it was a shady spammer.

It's also interesting that since "About Me" spam user accounts don't typically interact with GiantBomb.com in any way other than filling in their respective "About Me" sections I can confidently identify these types of spam user accounts by looking for user accounts with filled-in "About Me" sections that have never posted on the forums or commented on videos, have not contributed to the Giant Bomb WIKI, and don't currently hold any kind of Premium user status.

There are 239,113 "About Me" spam user accounts. 17,934 of them signed-up in 2020.

The following charts shows the number of "About Me" spam user sign-ups in 2020, 2021, and 2022 versus all other sign-ups those years.

No Caption Provided
No Caption Provided
No Caption Provided

The "About Me" spammers account for a sizable chunk of all user sign-ups in 2020, 2021 and 2022 but what about the rest of the sign-ups? Are they real users? For the majority of those remaining sign-ups I believe the answer is most likely no. For 2021-W26 there was an average of 4042 of sign-ups per day. Here are the usernames for some of the sign-ups that happened on 2021-W26-3:

"bernardc6w", "webstermvu", "lauryznq", "michale6mp", "bettieikh", "bruceq9n", "armandopnq", "mortonj0i", "garthqem", "macy1v0", "hayliemyc", "cristopherfil", "alvenat3a", "brodywxd", "g8udhyo060", "vivienxgn", "devanu5x", "wilmam5l", "angel27q", "grahamo0o", "baileyjw0", "eddieu_d", "hilario990", "brainhg4", "aidenapx", "anderaoyxe", "deon5wb", "kenyagmp", "aylinpwu", "marlenu_l", "cecilenzk", "jamisont4g", "cloyd8ec", "piercem70", "paxtonrnfv", "charitytbb", "everettempa", "triston6th", "cassidynah", "maiyarlq", "edwarde9j", "altalvj", "cecilianvg", "nigellie", "brandt5fv", "edwinaujr", "imaj6e", "edmund2gf"

These usernames seem like they were generated by spammers. They all seem sort of similar but also don't seem like the kind of usernames humans would choose.

Unlike for the "About Me" spam user accounts there doesn't seem to be a simple test that can be used to identify this type of spam user account. These accounts don't seem to interact with GiantBomb.com in any measurable way. My guess is that these spam user accounts are meant to lie dormant until such time they are used to post spam to the forums; after which they are quickly banned.

If spam user accounts can't be distinguished from real user accounts then it is impossible to get an precise count of real user accounts. However, this doesn't mean that it is impossible to get a relative sense of how many real users signed-up in a given year in comparison to some other year. To this end I find it helpful to not focus on identifying spam user accounts but to instead focus on finding the user accounts of real users.

User accounts of what are most likely real users can be found by looking for basically the opposite of what to look for when searching for spam user accounts. If a user has posted on the forums or commented on videos, has contributed to the Giant Bomb WIKI, or holds any kind of Premium user status then that user is most likely a real user.

The following charts show the sign-ups for accounts that belong to probable real users for the years of 2019 and 2020.

No Caption Provided
No Caption Provided

Obviously, these charts don't include real user accounts that don't interact with GiantBomb.com in any public way but if it is assumed that the proportion of active real users to inactive real users stays the same then the relative increase or decrease of real user sign-ups can be calculated.

For the year of 2019 there were 4671 user account sign-ups for known real users. In 2020 there were 3663 user account sign-ups for known real users. This represents a decrease in real user sign-ups by roughly -22%. This is a considerable difference from the percentage increase of overall user sign-ups which went from 33,588 in 2019 to 99,665 in 2020 giving a increase of 197%.

Active Users

So why does Giant Bomb feel like a small site despite having so many users? I believe the answer to that is that most users don't post. Of the 1,052,304 of Giant Bomb user accounts only 135,407 have ever posted, commented or contributed to the WIKI. Of those 135,407 active user accounts only 2690 have posted and/or commented in 2022.

Premium

There are 29,403 Giant Bomb user accounts with premium status. 24,276 are on yearly plans. 5,127 are on monthly plans.

No Caption Provided

edit: added missing chart.

Avatar image for jmacineurope
jmacineurope

1

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#1  Edited By jmacineurope

Homie, your y axis is all out of whack.

Avatar image for mortface
mortface

177

Forum Posts

258

Wiki Points

0

Followers

Reviews: 0

User Lists: 1

@jmacineurope: I thought so too when scrolling down, but realised it's actually normalised to the max y value across all years. It does make it harder to see week-by-week comparisons within a year, but it makes it easier to see the huge spike in the covid years (which I think is kinda the point, going into the spam users section).

Avatar image for noblenerf
noblenerf

983

Forum Posts

196

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

If you ever visit Giant Bomb during the middle of the night in North America you'll see the bots in action. Every night, all year long.

Always wondered why measures were never put in place to crack down on the spam problem.

Avatar image for broshmosh
Broshmosh

534

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

This comes across much more like an analysis on Giantbomb's spam accounts, with extraneous info relating to users.

Avatar image for alexw00d
AlexW00d

7604

Forum Posts

3686

Wiki Points

0

Followers

Reviews: 0

User Lists: 5

V interesting tbf. The spam is VERY prevalent in what I suppose is American sleep time, despite the internet being borderless.

Avatar image for turtlefish
TurtleFish

415

Forum Posts

210

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

@paulwgraham: FWIW, nice piece of data analysis. I suppose the one thing we really want to know, but we'll never be able to know, is track premium subscriptions over time.

Avatar image for paulwgraham
paulwgraham

171

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

@turtlefish: So it technically could be done but you would need to take frequent snapshots of the data. However, I rate limited the requests that I made to the site to one per second. Since the user count is at 1,052,304 and growing just scanning the profiles alone took 12.18 days. Every other endpoint I hit took an additional 12.18 days. Hence why this project took me two months instead of two days.

Avatar image for carpe_dmt
carpe_dmt

136

Forum Posts

2

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

FANTASTIC content, very good work duder.

Avatar image for dragon_puncher
Dragon_Puncher

692

Forum Posts

15

Wiki Points

0

Followers

Reviews: 0

User Lists: 6

Great thread! The amount of active users here in 2022 is scarily low, not gonna lie.

Avatar image for wunder_
wunder_

1247

Forum Posts

1611

Wiki Points

0

Followers

Reviews: 2

User Lists: 11

#10  Edited By wunder_

This is awesome stuff! 30k subs is really high - do you have historical data on that or is that not possible?

edit: oops, saw your reply to turtlefish.

Avatar image for paulwgraham
paulwgraham

171

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

@wunder_: So no as far as I know there is no way to get public historic data on how many Premium users there are. I checked the Wayback Machine but they don't archive Giant Bomb user profiles.

However, I have been thinking today about the problem you and @turtlefish posed of tracking the number of premium users over time going forward. Even though it's not something I personally would be interested in doing I can think of two things that might make this kind of tracking feasible while sticking to the restriction of only one request per second.

1. Find a way to identify more spam accounts. I believe that if there was a way to reliably identify spam accounts that it's possible that the large majority of the accounts on GB belong to spammers. It might turn out that there are so few accounts belonging to real users that it would be possible to scan all of their profiles in a couple days or maybe a week.

2. Use statistics. Using the data I have already collected for this project I took a random sample of 86930 users and found 2349 premium user accounts. This gives an estimate of 28613 premium user accounts out of the 1052304 user accounts I have data for. This estimate isn't too far off of the actual total for the population which was 29,403 premium user accounts. I used a sample size of 86930 because that is right around the total number of user accounts that could be retrieved and processed in a single day.

Avatar image for paulwgraham
paulwgraham

171

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#13  Edited By paulwgraham

@jessicareed:Not to be rude but I believe you are a chat bot. Your account was created today and I think you are posting on an old thread in order to build social credibility so that your account can be used for other purposes in the future. I'll walk you through my reasoning:

"This is a great topic to explore!" <super generic, yet upbeat>

"I always found myself wondering about <insert extracted topic>."

"It's amazing to see how much data can be extracted from public user profiles on the site." <actually natural sounding>

"It's a good thing to see how you can use data integration tools to analyze the data and get answers to your questions." <awkward phrasing, affirming yet doesn't add any new ideas to the discussion>

"I'm sure many users on this forum would love to see the results of your exploratory data analysis." <huh?>

If I'm wrong then this is super awkward and I apologize. I'll tell you what @jessicareed if you are a human (or a facsimile advanced enough to fool me) complete the following phrase and PM it to me and I'll gift you a month of premium: "It's a <blank> baby! <blank> <blank> up <blank> <blank> <blank> <blank> <blank>!"

For any humans reading this all of my Giant Bomb inspired projects are all done and dusted but if you enjoyed this one there is another ancient one I did in a similar vein that you may like: Finding And Fixing Broken YouTube IDs On GiantBomb.com

Avatar image for bisonhero
BisonHero

12795

Forum Posts

625

Wiki Points

0

Followers

Reviews: 1

User Lists: 2

If I'm wrong then this is super awkward and I apologize. I'll tell you what @jessicareed if you are a human (or a facsimile advanced enough to fool me) complete the following phrase and PM it to me and I'll gift you a month of premium: "It's a <blank> baby! <blank> <blank> up <blank> <blank> <blank> <blank> <blank>!"

Based on these criteria, I've just now learned I am probably a bot, because I cannot do the task you've outlined. All those years of correctly identifying photos of fire hydrants, for nothing.

Now I know how Harrison Ford must feel.

Also, it is absolutely a given that the account is either a bot, or is an account lightly manned by a human to bypass spam protection so they can later spam post a link somewhere. I assume the spam operation is primitive and it's not worth their time to PM you an answer to your question, but I also assume we're not that far off from connecting spam bots to something like a ChatGPT that could produce an answer to your blank-filled phrase without understanding the question at all. Your task for the month of premium is riding a fine line of giving too much/too little context for an outsider/machine to figure out.

Avatar image for paulwgraham
paulwgraham

171

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

#15  Edited By paulwgraham

@bisonhero: I too once failed the Turing test. There was a relevant xkcd about this kind of thing that I think about more and more now that AI is getting crazy. These days there does feel like there is an overlap between the comments made by the most advanced of AI and the dumbest of humans.

As for the phrase that pays:

I figured this would make a suitable shibboleth.