Disclaimer: Not claiming 100% correctness, but I’m happy to be corrected by you guys.
The ladder has been in place for a month now and I thought I was doing a quick recap of what we have learned so far.
First of all, I am coming from the perspective of someone who has been wanting a change from Bayeselo for a long time, and I was happy to hear that Fizzer was gonna change to TrueSkill.
All problems solved, I thought, but is it true? We do indeed have TrueSkill in place now, but it in practice has been tweaked to an extend that the natural spirit of TrueSkill got lost or at least changed.
There are good and bad changes here, and I wanna discuss them in this post.
Before I dive into it, let me say that I focus on 1v1, but of course this can be extended to the team ladders as well. Some effects are even a bit worse for the team ladders.
Not wanting to sound too negative at this point, in the end I’m going to propose quick and easy solutions, so stay tuned.
What is TrueSkill?This is not point of discussion here. For the background of how it works, I recommend these links, but I’m focusing on the practical aspects.
https://www.microsoft.com/en-us/research/wp-content/uploads/2007/01/NIPS2006_0688.pdfhttps://trueskill.readthedocs.io/_/downloads/en/latest/pdf/How is TrueSkill implemented in Warzone ladders?Now for this, I had to solve this with trial and error basically, but anyways, here are the parameters: (I’m still not claiming to be 100% correct, since I do not have actual evidence these are the parameters)
Rating = 3(μ -3σ)
μ_start = 183,33
σ_start = 50
The important invisible parameters are:
β = 500
τ = 0,1
draw_probability = 0
I’m gonna leave this link for people wanting to play around with it:
https://trueskill-calculator.vercel.app/Differences in the implementation to normal TrueSkill:The standard TrueSkill implementation has these parameters:
Rating = μ -3σ
μ_start = 25
σ_start = 8,33
The important invisible parameters are:
β = 4,166 (0,5*σ)
τ = 0,0833 (0,01*σ)
As you can see, pretty much everything is changed. Now let’s discuss the effects of these changes.Rating formula: 3(μ -3σ) vs. μ -3σ
I like this change! This basically just stretches the ratings but has no effect in how fast players overtake each other or anything like that. It really just makes the range of ratings a bit bigger, which I like.
μ_start: 183,33 vs. 25
xc_start: 50 vs. 8,333
Again a good change. This is just a design decision. Higher μ means the rating corridor is gonna be wider (similar to my previous point), and higher σ means more uncertainty about the rating. Higher uncertainty practically means that with a lower amount of finished games, you’re gonna be much lower rated.
You could say that a high σ is one of the best ways that TrueSkill has at hand to eliminate the previous problem of the Bayeselo system, which are ladder runs with low amount of games.
β = 500 (10*σ) vs. 4,166 (0,5*σ)
τ = 0,1 (0,002*σ) vs. 0,0833 (0,01*σ)
These two have to be looked at together, since they influence each other too much.
To put it simply, ß describes the rating differences that player A and B should have when player A beats player B with 76% probability.
τ put simply describes how volatile μ (and therefore ratings) are.
Both factors influence σ and σ, but especially with Fizzers implementation, it can be said that β has a much higher influence on σ and τ has a much higher influence on μ.
The relevant point here is not so much the absolute numbers, but more how they behave in relation to σ. As we can see, Fizzer implemented β with a crazy difference to σ then it was initially intended. For τ, the difference is only factor 5, but that still slows down rating updates significantly.
Why did Fizzer change the values?I think he had good reasons to do so, with the following intentions:
Fizzer wants people to start somewhere from where there is a huge range to climb up.
→ This is implemented quite nicely, with the above described parameters to stretch the rating range and with a high σ_start. These two in combination give the impression that everyone starts at a rating of 100, when in fact, everyone starts at rating Rating 550! (3*μ)
That means, the average player will slowly climb as σ declines from 100 to 550, without actually playing better. Just playing average. Most people will not realize this and feel motivated. Genius!
Fizzer wants people to play a lot.
→ This is implemented effectively by the extremely high β and too low τ. The σ is consistently coming down, but almost in a linear slow way for a long time. Playing a looooot is encouraged greatly because of that, because as I described above, simply by playing a lot you will climb consistently for many games (potentially years).
Fizzer really wants people to play a loooooot.
→ Making the same point again, but it should be emphasized how big the effects of this are. You have to play approximately 700 games to get σ down to where it’s effects are not so relevant anymore. In comparison, the normal TrueSkill implementation with their β and τ configuration takes about 12 games. Assigning a reasonable rating fast is one of the strengths of TrueSkill, but you can see that this strength was intentionally removed.
700 games vs 12!
Fizzer wants climbing up to last a long time, even for the top players.
→ Effectively implemented by both low τ, but more importantly for this one, the high ß. 500 for a 76% win probability is nuts, especially since it is multiplied by 3 in the final formula.
To put this into some perspective, in Elo rating, it is 200 rating difference for this same probability. (
https://www.walkofmind.com/programming/chess/elo.htm)
To put it into perspective with an example, the top players on MTL reached 2300 rating with the mean starting rating of 1500, a 800 points difference. For this system, as an easy (not 100% accurate) reference you could say that the 2300 MTL player would reach a rating of about 6550 with the average player sitting at 550 rating.
But of course reaching that will take a long time (not because of skill issues though).
The advantages and disadvantages of this systemI’m going to cut my points short and concise, without deeper explanations at this point. All my conclusions can be explained by my thoughts and explanations above.
The advantages of this system:
1. Once stabilized with lower σ, it is going to be extremely skill-based.
2. It motivates (especially average) players over a longer time period, because they climb slow and steady.
3. Ladder runs are no longer possible.
The practical problems and disadvantages of this:
1. Until σ is stabilized, it is going to be pretty activity-based.
2. The adjustment of σ is so slow, that people potentially won’t feel the effects and therefore motivation from climbing steadily.
3. Ratings below 0 are possible and also too likely with such a slowly adjusting σ.
4. People don’t want to play 700 games before getting a stable rating, and this is also not in the spirit of a rating system.
5. New joiners in a year or two will have to finish hundreds of games before even attempting to play for top ranks.
6. It will take years to establish meaningful top ratings.
7. The top ranks in the future are potentially too skill-based (not volatile enough - and yes, I’m actually saying that)
Final thoughts and possible improvementsI am still happy with the change to TrueSkill. As you can see in the advantage list, many of the previous problems have been successfully solved!
The concrete configuration caused some negative effects too. But that’s expected, nothing’s perfect in the first iteration, and it can easily be finetuned.
My proposal:
Make just the following four parameter changes, in order to create a balance between the desired effects (steady climbing, playing a lot, etc.) and an even more enjoyable ladder.
Rating = 2(μ -3σ)
μ_start = 283,33
β = 100
τ = 10
That, imo, creates a nicer balance. It can be a whole different discussions, how exactly they should be and I’d also be happy to participate in that, but I hope I could make clear that they somehow should be changed in this direction. Oh, and if someone pays me for it, I'm gonna create a pretty slidedeck for this, since this is a shitty text. I hate texts.
Edited 6/8/2023 13:22:21