Warzone

<< Back to Ladder Forum

Posts 1 - 8 of 8

Seasonal Ladder all-time rankings: 2014-11-15 08:04:12

TeddyFSB

Level 60
Report

I combined results of all seasonal ladders except SL-X (multiplayer one). I used BayesElo to rank, and applied a statistical penalty equal to the field in '-' to get a TrueSkill style ranking. Alternatively, I could have cut off on # of games at e.g. 20.

I did not merge aliases, so e.g. fatality and zibik are both in top 10.

Top 50 all-time:

Rank	Name	        Elo	+	-	games	score	oppo.		TrueSkill=(Elo - '-')
1	TeddyFSB	2085	108	94	64	81.00%	1764		1991
2	Yeon	        2103	155	122	56	91.00%	1607		1981
3	AngryBirds	2108	182	142	26	85.00%	1762		1966
4	RuthlessBastard	1993	51	48	244	77.00%	1721		1945
5	NoMercy	        2085	181	150	24	83.00%	1723		1935
6	KilluaZaoldyeck	2085	188	150	20	80.00%	1813		1935
7	zibik21	        1974	65	61	165	79.00%	1666		1913
8	Tenshi	        1974	78	72	105	75.00%	1721		1902
9	gronaapex	2051	188	154	20	80.00%	1758		1897
10	fatality	1986	106	96	56	75.00%	1727		1890
11	ACLTears	1943	71	66	123	75.00%	1696		1877
12	dreuj	        1967	99	92	64	73.00%	1716		1875
13	Timinator	1925	59	56	165	72.00%	1700		1869
14	timon92	        1932	67	63	141	74.00%	1685		1869
15	ciekiueLite	2014	191	152	20	80.00%	1748		1862
16	Widziszapex	1936	83	77	92	75.00%	1674		1859
17	RPA	        1967	121	108	40	73.00%	1759		1859
18	13CHRIS37	1897	46	44	278	71.00%	1686		1853
19	MikeGG	        2014	236	167	23	91.00%	1571		1847
20	Summer	        1918	81	76	88	70.00%	1718		1842
21	Yves	        2172	518	330	4	100.00%	1734		1842
22	Garrett	        1953	123	112	40	73.00%	1729		1841
23	Mannerheim	1962	137	123	35	74.00%	1720		1839
24	briskapex	1914	83	78	85	72.00%	1690		1836
25	abr	        1904	75	71	98	67.00%	1720		1833
26	unknownsoldier	1962	156	131	30	80.00%	1670		1831
27	Rubik87	        1909	92	83	79	77.00%	1625		1826
28	Belzeb	        1980	180	156	20	75.00%	1722		1824
29	Vadas	        1909	102	95	60	72.00%	1674		1814
30	dwoogee	        1899	93	86	64	70.00%	1703		1813
31	JV	        1886	82	76	84	70.00%	1697		1810
32	Krzychu	        1876	70	67	114	67.00%	1712		1809
33	masterofdesaster1870	68	65	118	66.00%	1707		1805
34	JUPITERReLite	1921	130	119	33	70.00%	1730		1802
35	OttomanEmperor	1938	167	137	31	84.00%	1576		1801
36	GreenTea	1931	146	133	25	68.00%	1765		1798
37	WMdeadpiggy	1888	105	98	54	70.00%	1676		1790
38	Ko	        1945	173	156	20	70.00%	1746		1789
39	Gaia	        1903	128	116	33	70.00%	1728		1787
40	EZPickens	1870	95	88	65	71.00%	1671		1782
41	Luxisapex	1891	125	111	40	75.00%	1650		1780
42	Troll	        1896	137	116	40	80.00%	1604		1780
43	Gambler        	1876	113	104	44	68.00%	1702		1772
44	JohnnyFive	1917	166	147	20	70.00%	1738		1770
45	CarlSoderberg	1908	149	140	20	60.00%	1827		1768
46	MasterAdrianokx	1908	164	141	23	74.00%	1687		1767
47	WMAnonymous	1865	107	101	44	64.00%	1740		1764
48	MannerheimeLite	1888	143	131	28	71.00%	1699		1757
49	MajorRisk	1861	118	106	40	70.00%	1681		1755
50	PaniX	        1852	111	100	53	75.00%	1596		1752

Seasonal Ladder all-time rankings: 2014-11-15 08:59:24
Pulsey Level 56 Report	List made by TeddyFSB, TeddyFSB is ranked first. Classic. ;)

Seasonal Ladder all-time rankings: 2014-11-15 13:28:52

Math Wolf

Level 64
Report

Nice list!

I'm a little surprised I'm not in there as I ranked top 40 in almost all seasons, but it makes sense as it seems to favour players who played a limited amount of seasons and finished high in those they played. (Win % all very high)
Yves (and some others) don't belong in there.

I don't understand why you get TS by substracting the - from the BayesElo. It doesn't work like that afaik. Would you mind to explain that part (reasoning behind it) into more detail?

If you want to make a measure similar to TS, you should use Elo - 3*SD, where the - could be used as an estimate of the SD. This is however not TS, this is still a number of games corrected Elo.

Edited 2014-11-15 13:33:20

Seasonal Ladder all-time rankings: 2014-11-15 19:13:20

TeddyFSB

Level 60
Report

Thanks MathWolf!

As far as I remember, +/- is the 2-sigma interval, so Elo - '-' is roughly equal to Elo - 2*SD. Changing the rankings to Elo - 1.5*'-'. Yes, it's not TS, it's statistically-penalized Elo, to reduce the number of false positives at the top due to small sample sizes. I think it's a better approach than taking mean Elo.

Here's the top 50 with the adjusted score = Elo - 1.5 * '-'. That looks better, I think, with Yves dropping out of top 50. You are in at 36;) Still not enough to push RuthlessBastard to the top spot. I thought he'd be a clear #1.

Rank	Name	        Elo	+	-	games	score	oppo.		Elo – 1.5 * '-'
1	TeddyFSB	2085	108	94	64	81.00%	1764		1944
2	RuthlessBastard	1993	51	48	244	77.00%	1721		1921
3	Yeon	        2103	155	122	56	91.00%	1607		1920
4	AngryBirds	2108	182	142	26	85.00%	1762		1895
5	zibik21	        1974	65	61	165	79.00%	1666		1882.5
6	Tenshi	        1974	78	72	105	75.00%	1721		1866
7	NoMercy	        2085	181	150	24	83.00%	1723		1860
8	KilluaZaoldyeck	2085	188	150	20	80.00%	1813		1860
9	ACLTears	1943	71	66	123	75.00%	1696		1844
10	fatality	1986	106	96	56	75.00%	1727		1842
11	Timinator	1925	59	56	165	72.00%	1700		1841
12	timon92	        1932	67	63	141	74.00%	1685		1837.5
13	13CHRIS37	1897	46	44	278	71.00%	1686		1831
14	dreuj	        1967	99	92	64	73.00%	1716		1829
15	Widziszapex	1936	83	77	92	75.00%	1674		1820.5
16	gronaapex	2051	188	154	20	80.00%	1758		1820
17	RPA	        1967	121	108	40	73.00%	1759		1805
18	Summer	        1918	81	76	88	70.00%	1718		1804
19	abr	        1904	75	71	98	67.00%	1720		1797.5
20	briskapex	1914	83	78	85	72.00%	1690		1797
21	ciekiueLite	2014	191	152	20	80.00%	1748		1786
22	Garrett	        1953	123	112	40	73.00%	1729		1785
23	Rubik87	        1909	92	83	79	77.00%	1625		1784.5
24	Mannerheim	1962	137	123	35	74.00%	1720		1777.5
25	Krzychu	        1876	70	67	114	67.00%	1712		1775.5
26	masterofdesaster1870	68	65	118	66.00%	1707		1772.5
27	JV	        1886	82	76	84	70.00%	1697		1772
28	dwoogee        	1899	93	86	64	70.00%	1703		1770
29	Vadas	        1909	102	95	60	72.00%	1674		1766.5
30	unknownsoldier	1962	156	131	30	80.00%	1670		1765.5
31	MikeGG	        2014	236	167	23	91.00%	1571		1763.5
32	Belzeb        	1980	180	156	20	75.00%	1722		1746
33	TheWarlightMaste1829	57	56	147	58.00%	1750		1745
34	JUPITERReLite	1921	130	119	33	70.00%	1730		1742.5
35	WMdeadpiggy	1888	105	98	54	70.00%	1676		1741
36	MathWolf	1802	43	42	293	67.00%	1630		1739
37	EZPickens	1870	95	88	65	71.00%	1671		1738
38	Vendetta	1837	72	68	103	67.00%	1675		1735
39	OttomanEmperor	1938	167	137	31	84.00%	1576		1732.5
40	GreenTea	1931	146	133	25	68.00%	1765		1731.5
41	dungaapex	1829	69	66	109	66.00%	1671		1730
42	Gaia	        1903	128	116	33	70.00%	1728		1729
43	Fizzer	        1801	49	48	222	64.00%	1658		1729
44	CONQUISTADORS	1802	51	49	203	67.00%	1637		1728.5
45	fwiw	        1811	59	57	153	65.00%	1661		1725.5
46	Luxisapex	1891	125	111	40	75.00%	1650		1724.5
47	WMGnuffone	1840	80	77	86	66.00%	1671		1724.5
48	Bajtannightfly	1819	66	63	126	67.00%	1651		1724.5
49	Troll	        1896	137	116	40	80.00%	1604		1722
50	Gambler	        1876	113	104	44	68.00%	1702		1720

Edited 2014-11-15 19:29:24

Seasonal Ladder all-time rankings: 2014-11-15 20:22:25
Timinator • apex Level 67 Report	the changed bumped me up 2 places, good thing! only 5 more formula-changes and i'm first ;) Edited 2014-11-15 20:22:33

Seasonal Ladder all-time rankings: 2014-11-15 21:50:00

Math Wolf

Level 64
Report

Alright, that makes sense, thanks for the explanation. I do agree the correction is better than just the mean.

I'm not completely convinced about the SD, afaik: +/- together are 2 SD, so the interval [mean - "-";mean + "+"] is 2 SD wide, so the "-" is an estimate of the downwards uncertainly (downwards SD), while the "+" for the upwards uncertainty. In that case, you still have to multiply by 3 instead of 1.5.

I'm not sure of that though, but checking some SD's from the RT ladder (top 5 currently ranked) gave similar results, so I think my claim is correct:

games   SD
  34  122.50
  44   92.34
  93   60.81
 201   57.84
 252   51.42

Then again, by taking -1.5 you'd be taking some sort of "average" of a BayesElo and TS-like correction, which makes sense as well.
A full TS-like penalisation is not really appropriate here as new players could not retrospectively join earlier seasons and would maybe favor older players like me a little too much.

As a separate question: how did you gather the data?

Seasonal Ladder all-time rankings: 2014-11-15 22:15:27

TeddyFSB

Level 60
Report

I am pretty sure the +/- outputs of BayesElo are the limits on 95% CL, so this is close to +/- 2SD.
http://warlight.net/Forum/Thread?ThreadID=1216

TrueSkill priors have 2 parameters: initial uncertainty in ranking and a built-in intrinsic uncertainty factor (Fizzer apparently set it at 50 points) to allow continuous rating progression. This is so that the prior on the rating always has a bit of spread, otherwise if you let it asymptotically approach zero SD, rating bumps after each game will become smaller and smaller, and algorithm won't recognize that skill can change over time. The original paper refers to it as the "dynamics factor":
http://research.microsoft.com/pubs/67956/NIPS2006_0688.pdf

If you set that parameter to 0 instead of 50, SD after 252 games would be much lower than 51.

So I am pretty sure I am making a -3*sigma subtraction.

I got the data from logs: http://data.warlight.net/Data/BayeseloLog4000.txt (4001, 4002, etc.) and then had to normalize them because player IDs were specific to each season.

Seasonal Ladder all-time rankings: 2014-11-15 23:32:08

Math Wolf

Level 64
Report

It seems you are right, but it is very difficult to find proof of it (same problem as in the topic you refer to). I've never seen written out documentation for BayesElo.

Based on this discussion (and more specifically that post): http://www.open-aurec.com/wbforum/viewtopic.php?t=949#p4023 and the fact that Rémi Coulom wanted to mimic and improve EloStat, I guess they are indeed the limits of the 95% credible interval (Bayesian equivalent of 95% confident interval). Funny thing is that Rémi himself notes further down in the topic that he doesn't understand completely how EloStat calculates the errors. :-)

And you are completely right, I had forgotten about the dynamics factor of TrueSkill that will overvalue the standard errors for a large number of games. There's no reason to assume the standard error is similar for both methods anyway, so my comparison didn't make much sense.

Thanks for all the explanation!

Posts 1 - 8 of 8

Post a reply to this thread