Where do you get the source for "all players who're in a clan, have competed in a ladder, or have posted on a forum" I'm not sure if I'm just being thick, but I'm not sure where that info would come from...?
You'd just write a script to scrape the clan pages, the forums, the ladders, and anywhere else on the site where player activity is evident (you could just crawl a bunch of pages and check for URLs to player pages and store everything there, but it all depends on how slow/thorough you want your script to be).
After that is it just a simple parse or do you need to extract information from individual player profiles?
You'd have to extract from individual profiles. This would be a rather slow process, and I'd recommend storing that information so you can conduct further analysis without having to recapture it.
it's on line 284 of the code and looks easy enough to extract, are you saying it would need to be taken from every profile, or is there somewhere that there is a player database?
Don't rely on the line number. I'd recommend parsing the HTML (Scrapy handles this for you). Watch out for weird edge cases.
Edited 10/31/2015 05:07:27