Ladder changes polls: 2014-08-26 06:24:39 |
Pushover
Level 59
Report
|
Wow, thanks Fizzer. Really excited to see what comes out of this! :D
|
Ladder changes polls: 2014-08-26 10:20:55 |
[WM] Gnuffone
Level 60
Report
|
Some of your options may need a little more explaining some people may not know the difference between 'true skill' and 'bayeselo' for example. The bayesElo is common call ELO raing. Is the rating system we use in 1v1 2v2 and seasonal ladder ATM. Is dynamic, each time one of player you played before change rating, influence your rating as well. Is based in 2 thing: win rate and average rating of opponent you played. In trueskillmeant (that we use in realtime ladder) is static (count only the true skill meant at the moment you played your opponent). Is based by 2 thing: true skill meant that measure your skill, and varation, that you need to have <145 for get a rank. The forumla is TrueSkill - 3*Varation. I can safety say, the second is much better.
|
Ladder changes polls: 2014-08-26 10:57:51 |
master of desaster
Level 66
Report
|
for me, it hardly depends on how much you see the ladder as something, that ranks people, or as something you play to get good matchups. if you just want to get good matchups, ELO might be better. you don't need to concentrate very good on every game, because after a certain time, losses expire and you can start a serious run on the ladders. if you want to play competitive and only then, whenyou"re super concentrated, trueskill is better, because it's harder to be gamed (on the 1vs1 ladder maybe impossible).
for newbies on the ladder, who want to see an improvement faster, ELO is better. of course old games are less weighted on trueskill, but they still take you down.
|
Ladder changes polls: 2014-08-26 13:07:05 |
Math Wolf
Level 64
Report
|
I (of course) agree that TS is better.
I do want to counter the recurring myth that old games still "take you down".
The "games don't expire" on TrueSkill argument is not really correct. Games do expire immediately actually. The moment your new rating is calculated, the influence of this game has been taken into account and it will never be reused afterwards. At that moment, your rating will be decreased, but once you win your next game, TS will increase your rating again.
If you play enough games that contradict the loss, TS will correct itself and you will get the same (actually, a very similar) rating as you would have when that loss never happened. Thus, if you play quick, your losses "expire" faster than they would with BayesElo. (Fun fact: losing against a bad player may actually penalise your variance more than your mean if you consistently win against good players. And reducing the variance again can be done easily by, you can guess it, playing more.)
If you want to interpret "expiring" as "having no influence at all on your current rating", playing enough games in TS will achieve the same thing anyway. How much exactly depends on the parameters and how close to 0 you want the influence to be. Exactly 0 means "<0.001" for TrueSkill, which could take 100+ games, however "<1" (which is what will make a difference in practice), may be achieved pretty quickly. If I ever find some time, I plan to do some simulations on this to give an idea about the exact numbers. Of course, this depends on many factors, basically the ratings of all your opponents.
Additionally: the technical gradual decline of the influence of an older game on the current rating of TS is much more logical than the hard-cut of 100% - 0% influence of a game in BayesElo. This is known to give weird results such as when tennis players play the final of a Grand Slam and still lose points (and ranks) because they won it the previous year. It is illogical (and to me a little frustrating) that the largest changes in the ladder occur not because of recent great wins, but because of expiring old bad losses.
Edited 8/26/2014 13:10:50
|
Ladder changes polls: 2014-08-26 13:33:03 |
szeweningen
Level 60
Report
|
I kinda want to hug Math Wolf for that post :)
|
Ladder changes polls: 2014-08-26 16:25:11 |
professor dead piggy
Level 59
Report
|
"If I ever find some time, I plan to do some simulations on this to give an idea about the exact numbers." Anything short of this will not convince me. It was a very good post but i disagree with the idea that 'If you play enough games that contradict the loss, TS will correct itself and you will get the same (actually, a very similar) rating as you would have when that loss never happened.' because sometimes ''enough games'' means hundreds and hundreds. If you start playing ladder as a nub and lose 50% of your games against an average of 1800 rated players and play 100 games then when you get good enough to regularly win 2000 rated games you will need to play a vast number of games to beat me (piggy) who joins, plays 2 dozen games and wins 19 of them with an average opponent rating of 1900. If i fluke 22 wins which i have done before then i am practically untouchable to anyone who has ever played the ladder as anything other than a pro. Guys like latnox, widz, gambler etc would never have a chance.
Edited 8/26/2014 16:25:45
|
Ladder changes polls: 2014-08-26 20:53:56 |
RvW
Level 54
Report
|
Here's a short discussion: http://en.wikipedia.org/wiki/TrueSkillHere's a lot more detail (including all the "scary" mathematics; if you don't understand, feel free to skip those detail and only read the parts you do understand): http://research.microsoft.com/en-us/projects/trueskill/details.aspxYou might also like the following blog post (which is the sixth hit when googling "elo trueskill comparison"), which has some relevance towards WL ;) http://blog.warlight.net/index.php/2012/01/trueskill/
|
Ladder changes polls: 2014-08-26 21:09:59 |
Ⓖ. Ⓐrun
Level 57
Report
|
Fizzer, why not create your own ranking system, something that is a tweaked basket of ranking systems that best suits WL? Or, maybe make it 50%-50% or 70% B-ELO, 30% TrueSkill (or whatever you think is best)? An average (of some sort) of the two ranking systems would be best I think: get the best of both, reduce the problems either might have. -The Late Gui Is this feasible? If so, I'd suggest something like that for the 2v2, 1v1 and potential 3v3, while keeping the existing TS for RT, and bELO for the Seasonal.
|
Discussion is locked - replying not allowed
Post a reply to this thread
Before posting, please proofread to ensure your post uses proper grammar and is free of spelling mistakes or typos.
|
|