TL;DR: Fork
https://github.com/carykh/PrisonersDilemmaTournament, add new strategies in code/exampleStrats, run code/prisonersDilemma.py to see in code/results.txt how well you did. Submit your best strategy on
https://docs.google.com/forms/d/e/1FAIpQLSevMcli8KSpOBMUTOdRuJwNBzkuGDghAfAagEP1akNhPBgXew/viewform by 05/26 10 PM UTC (that's 6 PM US-East, 3 PM US-West, and midnight 05/27 in Central European Time). carykh will run these strategies in a giant FFA with the sample ones and the top 3 will get $1000/$300/$100.
(see:
https://htwins.net/prisoners-dilemma/)
The Prisoner's Dilemma is a thought experiment where two players can either cooperate ("C") or defect ("D") for some amount of points. E.g., if you and your mate are both getting interrogated by the police and either of you could sell the other out for a lighter sentence. The idea is that if both of you defect, you'll both end up worse off, but defecting is the rational move (highest expected value) for each of you. Here's the concrete points for this contest:
Both Cooperate (
C/C): +3/+3
You Cooperate, Opponent Defects (
C/D): 0/+5
You Defect, Opponent Cooperates (
D/C): +5/0
Both Defect (
D/D): +1/+1
Note that defecting gets you either +1 or +5 points (based on whether your opponent cooperates) while cooperating gets you either 0 or +3. So far it's uninteresting...
But what if you could play it over and over again as a non-zero-sum game? Then two-players who keep cooperating would rack up +3's while two rational sociopaths who keep defecting would rack up +1's. You can actually play this out and see how it works in Nicky Case's "Evolution of Trust" game:
https://ncase.me/trust/Long story short, the long-running and ridiculously dominant champion of Iterated Prisoner's Dilemma strategies is Tit for Tat (
https://heritage.umich.edu/stories/the-prisoners-dilemma/): Cooperate, but if your opponent defects, defect back.
carykh, a Stanford undergrad, is now running a contest where
you get to come up with a better Iterated Prisoner's Dilemma strategy than Tit for Tat! And your strategy gets to face off against some classic ones (from leading intellectuals of the past) as well as other players' strategies, and the top 3 win cash prizes.
You should enter!
Coming up with a better strategy than Tit for Tat might sound daunting, but, well it's easy to code and test new strategies (you just have to write a Python function that takes in a numpy "history" array of your and your opponent's past moves, as well as "memory" which you get to set and which can be anything you want it to be- you just pass it to yourself between function calls). And it's not that hard to come up with something that does better than Tit for Tat, although Tit for Tat will probably dominate the meta. You can find all of my progress so far in
https://github.com/l4vr0v/PrisonersDilemmaTournament (code/strats/nprtt.py is my best one, while code/strats/npnntt.py is what I submitted to the contest). Here's a couple of insights from me to get started:
1. Being nice is important. Some strategies like joss (tit for tat but randomly defect 10% of the time) and detective try to cleverly initiate defection to take advantage of "always cooperate." This is risky because the game is non-zero-sum! There's a strategy called the grim trigger (aka the grudge strategy) which will
always defect once you defect. You will turn the remainder of a game of freebie +3's into a cycle of +1's, and this alone will almost totally offset the gain you make by cheating "always cooperate." Throw in other strategies in the mix like tit for tat, which we'll undoubtedly see many variants of in the final submissions, and you can see that initiating defection is
probably a dead end that sets you back.
2. De-escalate conflict. Suppose you're tit for tat up against joss (tit for tat but with a 10% chance of a random defection). Once joss defects, your mutual tit-for-tat behavior will lock you in a loop of D/D's (+1 each), costing you your prior C/C's (+3 each). You might "win" against joss by scoring higher than joss, but that doesn't matter- you want to maximize your own points, not the point difference! So trying to break the D/D loop is worth it because even D/D C/D C/C (+4) is an improvement over D/D D/D D/D (+3) and if you're able to restore a healthy C/C streak then that improvement will be vast!
While in the ordinary Prisoner's Dilemma defection has the highest expected value, in the Iterated Prisoner's Dilemma a cycle of cooperation racks up points that very quickly offset an occasional cheating event. In fact, my best strategies all got cheated pretty badly by joss, detective, and others, but they still scored more points than tit for tat in those matches so they came out ahead. (As for joss, detective, and other abusers getting more points out of those matches than my strategies, I refer you back to #1 - I don't think strategies that initiate defection are the frontrunners so giving them a leg up doesn't matter that much.)
Here's my breakdown of the problem:
You need a
Response Rule to deal with defections. You can't initiate defections, but if your opponent does, you need some rule for how to respond. This could be based on their most recent turn (e.g., like tit for tat) and determine only your next turn, or you can do something more complicated that takes their whole history into account and/or plans ahead on your part instead of going turn-by-turn.
You need a
Forgiveness Strategy to break defection loops. You need to determine when and how to offer an olive branch and see if your opponent responds, because even that risk of a C/D is outweighed by the potential massive rewards of returning to C/C. Of course, your opponent's behavior might not be deterministic (they could be random, flipping back and forth, or just always defecting) so this has a lot of room for strategizing. Forgiveness strategies is actually the entirety of where my solutions built their advantage over tit for tat.
The meta is very tit-for-tat dominated, so I suspect in the big FFA, most 1v1 matches will just be cycles of C/C that rack up +3 or close to +3 per decision on average. This means that even small improvements in how you handle non-nice opponents, who initiate defections, and rational opponents, who might back down from a defection cycle, can make a huge difference in your performance. Also, there's some really smart people on this site who know math and game theory far better than I do, so perhaps they can offer you insights that far exceed my paltry perspective above.
I hope many of you enter, I hope we're able to come up with something that ends tit-for-tat's streak of dominance, and I hope some of you brilliant people on this site are able to translate the strategic intuition and reasoning you built playing Warzone Idle Battle Royales or Warzone Classic into a strong Iterated Prisoner's Dilemma strategy. Good luck, everyone!
Edited 5/20/2021 16:28:55