VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-22 16:21:51 |
bostonfred
Level 7
Report
|
I think the rankings work great! This is exactly how things are supposed to work! That game I started with Doushi back on March 6th should make it to turn eight soon, which will be fun! And while I'm here I wanted to congratulate some people on their rankings in the 2v2 ladder! Congratulations to Blue Precision and Eitz (0-2), on their fourth place ranking in the ladder! And congrats should also go to impaller and waya (2-0) for their fifth place ranking! Because the ladder works great right now!
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-22 17:29:22 |
Duke
Level 5
Report
|
methinks he might be a wee bit sarcastic in his praise.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-22 17:36:48 |
Ruthless
Level 57
Report
|
technically there are no rankings yet on the 2v2 ladder because we all need to complete 10 games. The first to get 10 games gets 1st place!
So really, those rankings don't say anything yet fred
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-22 17:44:00 |
Doushibag
Level 17
Report
|
It does say something... I can hear it.. in my head... it tells me things..
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 06:43:11 |
Blue Precision
Level 32
Report
|
Fred I stated, if you cared to read what I wrote carefully, that the system works great because players will a good sample size of gamesl - like me - have a accurate ratings. You may have skimmed over too quickly my example where a Ruthless would mistakenly be a below average player early on if he so happened to lose 6 of his first 10.
What your sarcastically saying is nothing other than the fact that the system is not accurate with teams that have played two games. No kidding. The teams that beat us at the time were undefeated so technically Eitx and I could theoretically be the 3rd strongest team as he had never lost to another team with a loss. Again with this few games the system simply cannot draw conclusions that are accurate yet. Unless you want a system that is psychic I'm still not sure what the problem is?
Answer me this: Should the combatant ranked 14th who started with say 1700 pts before going on a 5 game win streak against players ranked 11th, 12th, 13th, 27th and 35th, pass the person ranked 9th with 1800pts, who just beat 2nd and 5th but lost to 1st, 7th and 10th? I would argue no and that the 9th ranked person is known to be capable of beating high seed whereas the 14th ranked person cannot be known to be anywhere better than 11th.
My point to all this is that loses shouldn't be the be all end all, neither should wins. It who - specifically - you can beat and who - specifically - you have lost to that counts. I.e. In a ladder rankings are only meaningful if you are ranked in relativity to the others around you and for this you need a decent sample size. I repeat, a decent sample size. All together now...
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 14:35:24 |
crafty35a
Level 3
Report
|
I'd just like to strongly urge anyone who has voted "No - keep using Bayesian ELO" to consider changing their vote to "No - don't switch to traditional but I'd like some other rating system that isn't mentioned here."
I honestly can not think of a single advantage to using Bayesian Elo over either Whole-History Rating or TrueSkill Through Time. We should not have to hack Bayesian Elo by doing things like counting only the last three months of results, or adding a decay function. The improvements provided by these modifications are already **built in** to the other rating systems, *since they do not count all games with equal weight*, regardless of when they occur. That is the fatal flaw of BayesElo, in my opinion. Why try to hack a solution ourselves, when people who specialize in this sort of thing have already created some of the most accurate rating systems in the world have already done the hard work?
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 14:55:02 |
bostonfred
Level 7
Report
|
Blue - I was kidding. Sorry to get your neck up.
Crafty - I agree.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 16:58:44 |
Blue Precision
Level 32
Report
|
lol, Boston I wasn't aimed at you (personally), I agree with with your points in the main, just didn't think your argument countered mine at all. And, I fear that forums are the same as newspapers, most people when believe what you write not the point your trying to prove or your intensions.
My fear is we change the system and viola, to all our shock its still not Perfect. Then we call cry and moan again how this other system would solve our problems. Some players play slow, some fast, some prefer many games, some few. No system is going to balance all this out and reward everyones preferences equally.
My final comment on this thread is that our time would be better spent to tweak what we have... ala my suggestion of the cumulative boot timer for all active games to cure Doushi's discovered (and used) system exploitation rather than rotating though endless amounts of systems that could all have various ways to exploit them.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 17:46:12 |
Duke
Level 5
Report
|
Maybe Fizz purposefully uses inferior rating systems to encourage more participation in the forums.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 20:29:17 |
crafty35a
Level 3
Report
|
Although it's not my favorite idea, I do think a modified Bayeselo would be an improvement of the current system. In the interest of seeing how that would work out, I have run the ratings through the Bayeselo program using a simple weighting system.
Basically, all I did was add the game result multiple times for more recent games. The oldest games are included only once, the newest games are included five times. I know that these numbers are not ideal (probably far too aggressive since we have only a month's worth of data). But I just wanted to get a feel for how this kind of a system would work.
Here is the spreadsheet with the new ratings: https://spreadsheets.google.com/ccc?key=0AtX-tYU73JXCdEE4TVdLbjlMSkcwbW1RaWtnVHJ6eFE&hl=en&authkey=CKXLk_UP
A few things to note:
- Doushibag does separate even further from the pack, but I don't think this is really a fault of the rating system. The stalling tactic is something that needs to be addressed separately. Note that he does does still have a very large error bar (+/- columns). However, I think the +/- should actually be much higher numbers -- the program is not able to account for the stalling tactic, so it does not know that he essentially has 4 losses that "should" already be on the books.
- The "True games played" column is the actual number of games each person has completed. The "Effective games played" column is the number of games actually entered into the rating program, including weighting.
- The Δ (delta) column shows how much each player's rating has changed going from standard Bayeselo to weighted Bayeselo. +3 means they have moved up three spots int he rankings, etc.
Is this an improvement over the current system? I certainly think so, though it's hard to say immediately since we only have a month worth of data. Not many players will have significantly improved or gotten worse in that time. Ultimately, I think one of the two rating systems I have mentioned in my previous post would be much preferable to using this weighted system. But this would be very, very simple to implement, and we always have the option to go to another system at a later date.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 22:00:59 |
bostonfred
Level 7
Report
|
I don't support anything that makes Doushi's tactic work better. He just made another move after 71 hours - it's just suck.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-23 23:10:43 |
Math Wolf
Level 64
Report
|
For who is interested, the system how I would make it and which deals with most of the problems. This is a good system, sadly enough it probably doesn't exist other than in theory. If someone would like to implement it for Warlight, it think it may be better than any other earlier proposed systems.
The underlying system would be still the BayesELO since it is best in ranking players according to their true skill level as soon as possible and it doesn't have any bias other than data-inflicted bias. (Note that TrueSkill and ELO do have a bias). However, some improvements need to be made.
First, the current system of only playing the 20% around you needs to remain to avoid abusing the system. A small change for the better for a ladder with a limited number of players would be 10%+ 10 players extra. This would give the same number of opponents when there are 100 players in the ladder, but slightly more possible opponents than the current system when there are less players (which is, I think, needed, I'm playing the same people again and again).
Now, I have strong reasons to assume that a continuously decaying system is better. The main question is: what kind of a decay.
Most logical would be either an exponentially decaying or linearly decaying function of time. I'd propose the exponentially decaying function of time.
- Exponentially decaying would give a weight to all games you've ever played, even those 5 years ago, but the weight would be very small after a long time. The interpretation is that weight of the function would decay multiplicatively: for example: after every month a result is only worth half as much as it was worth the previous month. A good decaying factor would be between 0.7-0.8 per month I think.
An example for 75% a month: (formula: 0.75^(time) with time in months)
2 hours (next update): weight 99.92%
1 day: weight 99.05%
3 days: weight 97.16%
1 week: weight 93.5%
1 month: weight 75%
2 months: weight 56%
3 months: weight 42%
6 months: weight 18%
1 year: weight about 3%
16 months: weight 1%
2 years: weight 0.1%
- Linearly decaying would be similar to what crafty has done except it would be the continuous version. So the weight would be of the form 100% - time passed*penalty, where the penalty could be 20% per month for example, with a limit after 5 months (then the weight would be 0). This is simpler and easier to understand for players, but a bit harsher I think and I'm not a fan of this method.
As said, I'm a fan of exponentially decaying as it is the most natural and smooth way of rankings to decay.
Next: to correct for people who are inactive for a while (let their rankings decrease) and to get a similar effect as TrueSkill, I'd add a fictional tie against a 1500 (0.5 win, 0.5 loss?) every 2 weeks or so (or, even better in my opinion: against a 1000 opponent every month) and several of those at the beginning when a player just joined the ladder. With this, people who leave the ladder can still be ranked accordingly and lose their points slowly as if they aren't playing and losing their skill. If they join again at a later date, they didn't lose the results of their earlier games as they are still in the system and still counting.
The result will be that people with a lower number of games will be slightly penalized, and even more when one plays like Doushibag. Then, the only games that are added are the 1500/1000 games, which will clearly pull the player down.
However, an active player on a winning streak against decent opponents can get a rather high ranking pretty fast as recent games will count for more. To defend this ranking, he'll need to be able to keep winning on a regular basis.
Therefore, the reason why I'd like a 1000 better than a 1500, is
- firstly, because this has a real penalising effect if one doesn't finish much games on a regular basis.
- secondly, because if someone's a bad player and stops playing, he'll be pulled up towards 1500 in the other case, while white the fictional 1000 opponent, players will always end up around 1000 if they stop playing for a long time, which is around the bottom of the ranking.
Notice that this ranking system has all advantages previously mentioned and required, included with as few disadvantages as possible.
It doesn't however, takes into account the fact that people are holding out their losses, as this is simply not possible by any ranking system (ELO and TrueSkill neither!). It does correct for it as much as possible though without putting too much penalty on slow players in general.
A separate mechanism (cumulative boot time, ...) will be required either way if we want to correct for this.
Of course, I'd love to hear any comments (positive and negative) on this model.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-24 00:15:24 |
crafty35a
Level 3
Report
|
My only real complaint about your proposed system, MathWolf, is one that may not really apply to Warlight. Basically, the idea behind Elo systems (and all rating systems, for chess, at least), is to be able to use the ratings of two players to predict the outcome of a game between those two players. You take the two ratings, plug them into a simple formula, and you know that player A has X% change to win, player B has Y% chance to win, and there is Z% chance of a draw.
Unless we believe that playing fewer games truly makes you a weaker player, your fictional games against a 1000 player are negatively affecting the accuracy of the rating system. But should we should care about this? Just because your system wouldn't win any chess rating competitions, does not mean it is not a good idea for Warlight. My feeling is that most people on here are more concerned with a fair system that rewards strong players who play often, and that not many of us care if the rating numbers can be plugged into a formula to calculate expected future game results.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-24 17:21:47 |
Duke
Level 5
Report
|
I'd only care about that issue if there were 100s or 1000s of players on the ladder, where close rating match ups really matter. As it is now the only effect they have is limiting the pool of potential opponants so much that the ladders aren't much fun any more. I don't want to play the same 3-8 people over and over again.
That needs to be addressed as much as the accuracy of the ratings.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-24 18:03:51 |
bostonfred
Level 7
Report
|
Duke, there are only seventeen players you can play. Right now, Doushi has five games and he's not playing any of them, so that leaves sixteen. And he has actve games against five of the other players you can play, so they all have a wasted spot on the list of new games they can get. Meanwhile, you have no hope of moving up, because the current system won't give you enough points to move up without beating someone higher than you in the rankings, and the only guy higher than you in the rankings has five likely losses he refuses to take like a man.
Doushi has taken all of the fun out of the ladder games for me, he's screwing over impaller, blue brecision, fatguy, and shogun similarly, and it sounds like it's impacting you, too, because you're not getting new games against as many new opponents. I don't know what kind of enjoyment he gets out of his faux #1 ranking, but it's not something he earned, and it's not something to be respected. It's a jerk move that's hurting the site owner's chances of getting $30 from new people. I know I was recommending it to people before, and I've helped to get more of the FBG people here, but I'm no longer telling people, you should pay to join the ladders are awesome. I'm not recommending against it, but this ladder thing just isn't fun anymore. It needs to be fixed.
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-28 07:15:10 |
Doushibag
Level 17
Report
|
Well I haven't really weighed in fully on this yet and I've been meaning to, so let me get to it. Stay a while and listen, this won't be short. First I'll start by saying I voted for "No - don't switch to traditional but I'd like some other rating system that isn't mentioned here."
I'll start by pointing out what I see as some flaws of the current system and why some of the proposed changes blur, but don't fix it.
|>Fizzer: If a player goes 10-0 (ten wins, zero losses), that does not guarantee that they’ll be ranked in the top 10. This only guarantees that they’ll be placed above the ten players they defeated.
The problem with the current system is that it tries to instantly place people and has behavior that just doesn't make sense. If you beat a few people, particularly early, it assumes you're better than them, but why should it? This is how people are skyrocketing to the top with early wins and few to no early losses and this will continue to be a problem. You can try to mask it and reduce the ability to do it, but the base problem is still there. People are being too easily overrated.
The point value given should represent a fair guesstimate of each players skill level. But it should do it in a way that makes sense and doesn't shift people around too quickly. These ratings, when compared between two players, should give reasonable odds for how often one will beat the other. It's important to note that even with a decent rating difference it's still possible for the lower person to get lucky and beat the higher person. This should happen with a frequency that roughly approximates what the difference in rating suggests it should. It's important not to change the rating for arbitrary reasons.
Another weird thing this system does and I'm not even sure why, but when a high person plays a low person and beats them they will lose points anyways. This makes no sense. The system predicted they would win and they won, why are they losing points and the lower person gaining points just for having had the game? I've tested this with the simulator and will show you what I mean. I've chosen to have A-Train play myself and picked two other people for relative comparison. One person near me and one lower on the ladder, but well above A-Train. This has me, with 1st pick, beating A-Train in 1 games, 5 games, 10 games, and 30 games.
Starting ratings:-------1 Game---------5 Games------10 Games-----30 Games
Doushibag------2169----2157 (-12)-----2129 (-40)---2111 (-58)---2091 (-78)
Impaller-------1998----1998 (+/- 0)---1997 (-1)----1996 (-2)----1996 (-2)
Unyoto---------1353----1353 (+/- 0)---1353 (+/- 0) 1353 (+/- 0) 1352 (-1)
A-Train--------994-----1005 (+11)-----1036 (+42)---1060 (+66)---1095 (+101)
Out of curiosity I also did it with having Impaller beat him 30 times:
Doushibag 2176 (+7) ; Impaller 1986 ; (-12) ; Unyoto 1352 (-1) ; A-Train 1044 (+50)
As is shown, the more games I play against him the more points I lose, even if I win every game. And he goes up in points continously just having played me. Yes I know this is no longer possible (and shouldn't be) with the new matching system (which was a needed improvement). But the point is that it shows a flaw in the system.
My guess is that the system is based on the flawed idea that everyone is fairly static in skill, there's very little luck, and that everyone will play everyone else an equal number of times. These are all wrong assumptions and create a flawed system.
Although the new system should probably have a decaying function, it doesn't fix the inherent problems with the current system (it's an improvement not a fix). Newer games matter more than older games and this should be reflected and not simply with a hard cutoff. Additionally you could make the half life different based on rating levels. Lower for low ranked people, higher for higher ranked people as they're presumed to be changing less quickly and to matter more.
|>Mathwolf: Next: to correct for people who are inactive for a while (let their rankings decrease) and to get a similar effect as TrueSkill, I'd add a fictional tie against a 1500 (0.5 win, 0.5 loss?) every 2 weeks or so...
I disagree with this idea. This is just going to throw off people's ratings. There have been complaints that people are forced to play too many games when the 1 game at a time option was removed. People are also suggesting you need to play a fair amount of games and steadily play games to achieve and hold a rank. I think people should be able to play as many games as they want and shouldn't be forced to play more. Nor do I think people who play less or more slowly should be particularly punished. Let people play how they want. People are afraid that people will get high and then sit on it. This isn't really an issue if a rating is properly earned and confidence in that rating is taken into account.
So how should it work? Well early on the system needs to recognize that its confidence in your rating is low. You're rating shouldn't move too quickly, but at lower ratings you should be able to move more quickly than you can at higher ratings. These ratings will have to be set levels built into the system. Additionally when you first enter the system it should move you a bit more quickly until it has a couple wins and losses for you. A winning streak will have you moving up and playing tougher people until it gets you to lose. A losing streak will make you play easier opponents until you get your first win. This is just to try to ball park you, then it will slow down. Until you have a loss and a win and a certain confidence percentage a '?' should be placed next to your rating. This shows that the system is not sufficiently confident with your rating and that it could be notably off.
To move up to the next level you need to show you can win a high enough percentage of the time. A few games won't do it. At lower levels you could move up quickly with some winning streaks, but at the higher levels you really have to prove yourself to move up. Although once you achieve a higher rating you don't have to play a lot of games to hold it, nor should you have to. You've proved you belong there and whatever games you continue to play will just refine that a little. A player who plays a lot shouldn't be rewarded too much either by continous movement up or have some inherent advantage over someone who doesn't.
Now how to solve some other problems: Slowing losses
I have an idea here that I think would work decently, instead of messing with the boot time. On the one hand you want to allow people to play at a slower pace and on the other hand you don't want someone slowing all their games and taking up vital slots such that people can't play other games. Here is an idea I thought of: Game Speed Slots --
A)You have 2 game slots with a 3 day boot.
B)You have 3 game slots with a 2 day boot. With a possible weekly 1 day boot extension or the like to handle weekend play.
C)You have an unlimited number of slots with a 1 day boot. Possibly with a similar extension. Enough to keep the game moving at a fast pace, while still possibly allowing the occasional longer turn.
Perhaps having the first option being 2 days with limited extensions (based on time/turns, so a genuinely long game doesn't have issues)
This way you can only super slow 2 games. If you want to play more games you need to play them at a faster pace. You can have it set to deliver you games to fill the slots as soon as they are available, but aren't forced to have any games. This will not pull you off the rankings either. You will stay on the list regardless and will simply have a "?" next to your name and possibly be highlighted in red if your confidence is below a certain threshold. This means that you can always play more games so long as there is someone to play with. Also if your confidence falls below the threshold and you earn the '?' it still matches you and treats you like you are the rating it has currently assigned, but acknowledges that your true strength could be significantly different from your current rating.
Continued in next post..
|
VOTE: Should the ladders switch to a traditional ELO model?: 2011-03-28 07:15:50 |
Doushibag
Level 17
Report
|
Continued from previous post..
|>Impaller: Should the algorithm always give you a ladder game when you're at less than the maximum number of games? Or should there be times where it waits to pair you up with someone who would be the optimal pairing for you?
I think it should limit who you play to people in your ballpark instead of reaching too far to get you a game. Something that would really be interesting if it could be pulled off which would improve your pool of players is a handicap system. For a game like Warlight this is quite difficult, but this would allow stronger players to be handicapped in a way such that matches with someone of a notably lower rating have a closer to even ratio for win/loss prediction. This would be difficult to figure out, but if pulled off could help increase the pool of potential players each person has to play against while still keeping games interesting and with each player having a reasonable chance of winning. (IE if someone normally has a 95% chance of winning, this could bring it down to 50-70% and the result would be adjusted accordingly). This would be difficult to do and obviously very experimental and likely to need many adjustments before it could even work marginally, but if done could really improve the overall functioning of the ladder system. Perhaps something to think about for later once a good ladder system in general is established. The handicap could be something like 0.5 extra armies per turn per X point difference (rounded, not fractional amounts beyond .5). With a limitation on how far off one could be still. I think if you're on a winning streak it should favor people higher and if you're on a losing streak it should favor people lower, to break the streak. Long streaks should be pretty unlikely (except a little perhaps during initial rating games)
In these proposed types of systems instead of super shifting around like the current system you have to work your way up, granted it shouldn't take too long, there shouldn't be any real harm in letting people guesstimate where they should start, up to a point. What you would do is place a games played restriction on different levels and then when people first come into the ladder it would allow you to instead of coming in at the bottom, come in a bit higher. What this essentially does is start you off playing higher level opponents so you can more quickly move up if it's appropriate to do so. If someone overrates themself on this initial entry they'll just lose several games and it will shift them down quickly to where they should really be. This is just to avoid having new ladder players who've played plenty of Warlight from having to play the lowest rated players on the ladder.
Warlight has the special problems of having slower games compounded by games with an above average luck factor in relation to other games that use rating systems. There are no real ways around this. In relation to games like Go and Chess, I think the overall deviation with Warlight may not be as pronounced between players of various levels. Warlight may have many more possible moves, but most of those are obviously bad. If you were looking at it from that perspective, things are hard to compare.
I don't know if this system works exactly as I'd desire for Warlight or exactly what type of system you'd define this as, but I got concepts and ideas from the GoKGS system and the system works well for Go. Detailed some here:
http://www.gokgs.com/help/rank.html
http://www.gokgs.com/help/rmath.html
http://senseis.xmp.net/?KGSRatingMath
It seems, according to the Whole History Rating documentation ( http://remi.coulom.free.fr/WHR/WHR.pdf), that the GoKGS system may actually be a modification of that system.
The approach is incremental, but still factors in previous games for further adjustments. You can't pop into the top spot just by winning your first 10 or even 20 games. You don't have to play someone else to prove you're better than someone, so if you're avoiding another player it can still get an idea based on who does better overall against other players of various strengths. The game slot idea will help with the stalling by limiting ability to stall while still allowing slower games. There's no perfect system, but something that doesn't require people a zillion games or constantly keep playing lots of games to keep their rating would be nice while still not allowing people to shoot up too easily (And not giving them the top spot before they even have a loss). People should have to prove themselves, but not have to play excessive amounts of games or simply reward those who do.
|
Post a reply to this thread
Before posting, please proofread to ensure your post uses proper grammar and is free of spelling mistakes or typos.
|
|