My statistical mind loves this ratings system. The thing I'm finding that I don't like is how hard it has been to overcome some bad early losses to players that aren't very highly rated now. It's only a week into this, but I cringed when I went back and look at how I played in my "bad losses". I know I can hang with the better players, but I haven't gotten much of a chance to prove it. My rating is currently 1546 and my overall record is 15-8.
Maybe the latest update to the game creation algorithm addresses this, but you would think with a rating around 1500 that I would have played about as many players above me as below me. In reality, I have completed games against 16 players rated below me and only 7 rated higher than me.
In addition, even if I was able to play and defeat the current top 3 players (Teddy, Impaller and Poop) in my next 3 matches, I would still only climb from 30th to 20th. While time will even everything out eventually, it just seems my early "bad losses" have been very difficult to overcome. After starting 1-3, I've won 14 out of my next 19 matches.
It seems to me that the ratings system would be much more effective if everyone had a more standard distribution of opponents until a player's Ladder position was more established with a larger sampling of matches. I think the game creation algorithm should try to schedule games more evenly throughout the Ladder for the first 20-30 matches a player plays. Then once they settle in to a ranking it might make more sense to try and schedule more matches against similarly ranked opponents, while still allowing people a fair chance to climb the ladder.
MathWolf's suggestion in the "Skewed ratings reults" thread ([Link text](
http://warlight.net/Forum/Thread.aspx?ThreadID=1097&Offset=31)) to reweigh the results as a continuously decreasing function of time would be a **huge** benefit. I think there will be a lot of players who will make dramatic improvements in their play. In my mind equally weighing results for the past 3 months will prove to be too long of a tail to accurately reflect a player's current skill. The Whole-History Rating that Crafty found seems like the ideal solution!