Understanding Strategic Modeling in Universal Paperclips
The game Universal Paperclips is an addictive Javascript game in which the goal is to turn a modest paperclip factory into a monstrous company that converts the entire universe to paperclips.
One of the interesting parts of the game is the “Strategic Modeling” module that can be unlocked after a few hours of paperclip making. Here it is, extracted from the Javascript source code and slightly modified to ease experiments:
This module allows for wining “yomis”, a resource that can be used for unlocking various projects in the game. The module is inspired by game theory examples such as the prisoner’s problem or the related game of chicken. Several strategies play against each other and win some points depending on a payoff matrix that is created randomly for each tournament. The goal is to predict which strategy is going to win the largest number of points.
In Universal Paperclips, there is no precise explanation about the Strategic Modeling module. In forums, the players discuss the performance of each strategy based on simulations with many random payoff matrices. In other words, they discuss the choice of a strategy without any a priori knowledge of the payoff matrix for a given tournament. This is not optimal at all and this article explains how to easily choose the best strategy for a given tournament depending on its payoff matrix, in order to maximize the rewards in the first stage of the game, when the tournaments have to be run manually. We will see why the best strategy for a given payoff matrix is very often A100 or B100, with some exceptions in which the TIT FOR TAT and BEAT LAST strategies perform best.
How stragegic modeling works
During a tournament, a set of strategies compete against each other. At the beginning, there is a single available strategy called RANDOM. Up to 7 strategies can be unlocked in the game and added to the list. Each tournament is divided in rounds in which one strategy plays against another strategy. The number of rounds in the tournament is the square of the number of strategies, because each strategy competes against all the strategies, including itself. For example, if there are 3 available strategies (RANDOM, A100 and B100), there will be 9 rounds: RANDOM will play against RANDOM, A100 and B100, A100 will play against RANDOM, A100 and B100, etc.
During a round, each strategy can choose between two decisions that we call A and B. These decisions are given fancy names in the game, for example macro and micro, opera and football, or stay and go. When the two strategies of a round have chosen between A and B, each strategy receives some points depending on the payoff matrix. For example, let’s consider the following matrix:
Move A | Move B | |
Move A | 1,1 | 2,3 |
Move B | 3,2 | 6,6 |
It means that if both strategies choose A, each of them wins 1. If they both choose B, they win 6. If one chose A and the other one choose B, the former wins 2 and the latter wins 3.
During each round, the strategies play the game 10 times. For instance, if the first strategy always chooses A and the second always chooses B, the first strategy will get 2×10=20 points in the round and the second will get 3×10=30 points.
When all the rounds have been played, the strategies are sorted by decreasing number of points. Our goal is to predict which strategy is going to win before the end of the tournament, since the Yomi rewards for the player of Universal Paperclips is the score of the strategy he picked times the number of strategies that are below this strategy in the sorted list.
The strategies
Let’s call AA, AB, BA and BB the payoff coefficients so that the matrix can be written
Move A | Move B | |
Move A | AA,AA | AB,BA |
Move B | BA,AB | BB,BB |
They are various strategies that can be unlocked progressively in the game:
RANDOM | Choose randomly A or B with the same probability. |
A100 | Always choose A. |
B100 | Always choose B. |
GREEDY | Find the maximum payoff coefficient and choose the corresponding strategy: choose A if the maximum is AA or AB, otherwise choose B. |
GENEROUS | Find the maximum payoff coefficient and make sure that the opponent can potentially win it: choose A if the maximum is AA or BA, and B otherwise. |
MINIMAX | This is the inverse of GENEROUS: don’t give the other the opportunity to get the maximum payoff. |
TIT FOR TAT | Choose the inverse of what the opponent chose at the previous game. The first position seems to be A. |
BEATLAST | Choose the decision that would have been the best for the previous game. |
First example
Let’s consider a tournament between the strategies RANDOM, A100 and B100 with the following payoff matrix:
Move A | Move B | |
Move A | 10,10 | 9,5 |
Move B | 5,9 | 1,1 |
Because we know how each strategy behaves, we can predict the results of the tournament by analyzing each round:
RANDOM vs RANDOM | For each of the 10 games of this round, the RANDOM strategy wins, in average, (10+9+5+1)/4 = 6.25. This means that the RANDOM strategy wins 2x10x6.25=125 in this round, in average. The factor 2 appears because the strategy is playing against itself, so it is rewarded twice in each of the 10 games. |
RANDOM vs A100 | RANDOM wins either 10 or 5 (7.5 in average). RANDOM can therefore expect 10×7.5=75 for this round. A100 wins either 10 or 9, that is, 10×9.5=95 for this round in average. |
RANDOM vs B100 | RANDOM wins either 9 or 1, that is, 10x(9+1)x0.5=50 in average for this round. B100 wins either 5 or 1, that is, 10x(5+1)x0.5=30 for this round. |
A100 vs A100 | 10x10x2=200 |
A100 vs B100 | A100 wins 10×9=90, B100 wins 10×5=50 |
B100 vs B100 | 2x10x1=20 |
We can now sum the expected gains of each strategy during the 9 rounds, taking into account the rounds that are counted two times (e.g. the round A100 vs B100 yields to the same expected gains as the round B100 vs A100).
We obtain:
- for RANDOM: 125+75×2+50×2=375
- for A100: 95×2+200+90×2=570
- for B100: 30×2+50×2+20=180
Of course, we should bet on A100. When we run the tournament, we obtain:
- A100: 570
- RANDOM: 368
- B100: 184
Our predictions were quite precise despite the randomness of the process due to the RANDOM strategy.
Why A100 and B100 are often the best strategies
In fact, the strategies GREEDY, MINIMAX and GENEROUS choose either always A or always B for a given payoff matrix. For example, for the tournament studied in the previous example:
Move A | Move B | |
Move A | 10,10 | 9,5 |
Move B | 5,9 | 1,1 |
GREEDY always chooses A (to have a chance to get 10, the highest value), GENEROUS always chooses A and MINIMAX always chooses B. This means that GREEDY, GENEROUS and A100 have exactly the same expected gains, and that B100 and MINIMAX also have the same expected gains. In such cases, the best strategy is to choose either A100 or B100, because when there is a tie between two strategies, the strategy that was first unlocked wins. For example, for a tournament with the payoff matrix above and strategies RANDOM, A100, B100, GREEDY, GENEROUS and MINIMAX, we obtained:
- A100: 1149
- GREEDY: 1149
- GENEROUS: 1149
- RANDOM: 761
- B100: 404
- MINIMAX: 292
So A100, GREEDY and GENEROUS had the same expected gains and got exactly the same gains (this happens), but A100 wins because it has been unlocked before GREEDY and GENEROUS. By choosing A100 we won 5×1149=5745 yomis. We would have gained 3×1149=3447 with GENEROUS. Note that A100 could also have lost because of the randomness of the tournament, but it had the highest chance of beating all the strategies.
How to choose between A100 and B100
When TIT FOR TAT and BEAT LAST are not available, the best strategy for a given tournament is always either A100 or B100. It is actually quite easy to determine which one is the best: we just have to compute their relative expected gains. For this, we need to know how many strategies will choose A and how many will choose B, and then the expected gain can be estimated with a weighted sum of AA and AB for the strategy A100, or BA and BB for the strategy B100.
We can actually find the best strategy between A100 and B100 without a precise calculus of the expected gains, using the fact that A100 and B100 choose opposite decisions, and MINIMAX and GENEROUS as well, while RANDOM is random (indeed). Therefore, we can consider that A100 will get AA or AB with roughly the same probability. Similarly, B100 will get roughly the same amount of BA and BB. With this approximation, we simply neglect GREEDY in our predictions. For instance, if we have this matrix:
Move A | Move B | |
Move A | 2,2 | 10,8 |
Move B | 8,10 | 7,7 |
A100 roughly wins something proportional to 2+10=12, and B wins something roughly proportional to 8+7=15. GREEDY is equivalent to A100, GENEROUS is equivalent to B100, and MINIMAX is equivalent to A100. We should then choose either B100 or GENEROUS and we choose B100 since it was first unlocked and it wins in case of a tie. After running the tournament, we obtain:
- B100: 913
- GENEROUS: 909
- RANDOM: 766
- A100: 664
- MINIMAX: 624
- GREEDY: 600
This very simple “averaging” strategy performs very well and it only takes a few second to decide between A100 and B100 for a given matrix.
Using TIT FOR TAT
TIT FOR TAT is the first strategy that does not choose consistently A or B in a given tournament. Instead, it follows what the opponent did in the previous game, and for this reason it mostly wins either AA or BB. Therefore, TIT FOR TAT can be a good choice when the payoff coefficients are large in the diagonal of the matrix and small otherwise. For instance, for this matrix:
Move A | Move B | |
Move A | 10,10 | 1,2 |
Move B | 2,1 | 9,9 |
we obtain:
- TIT FOR TAT: 1229
- GENEROUS: 950
- GREEDY: 914
- A100: 896
- B100: 770
- MINIMAX: 763
- RANDOM: 715
Note that TIT FOR TAT is random when it plays against RANDOM, which decreases its performance. Moreover, it does not necessarily end up on the diagonal during the first game of each round. The gap between the values of the diagonal elements and the other values must be large enough, otherwise A100 and B100 often perform better.
Using BEAT LAST
BEAT LAST is roughly equivalent to A100 and B100 in some cases. For instance, with the following matrix:
Move A | Move B | |
Move A | 10,10 | 9,2 |
Move B | 2,9 | 6,6 |
the best choice is always A, independently of what the opponent is playing. In this case, A100 should be preferred to BEAT LAST (A100 would win if there is a tie).
BEAT LAST is also equivalent (roughly) to TIT FOR TAT when the largest values are in the diagonal. TIT FOR TAT should be preferred.
The only case where BEAT LAST is better is when the best strategy is to play the opposite decision of the opponent, that is, when AB and BA are the largest coefficients. For example:
Move A | Move B | |
Move A | 2,2 | 9,8 |
Move B | 8,9 | 3,3 |
For this tournament, we obtained:
- BEAT LAST: 1055
- B100: 930
- GENEROUS: 920
- RANDOM: 881
- GREEDY: 838
- MINIMAX: 838
- A100: 782
- TIT FOR TAT: 529
Again, because of its randomness, for the same reasons as for TIT FOR TAT, BEAT LAST should only be used when the non-diagonal elements are significantly larger than the others.
Additional links
The evolution of trust by Nicky Case is a fun lecture for understanding game theory in a broader context.
The Universal Paperclips Wiki has interesting information about the game.
A discussion on reddit about the best strategy for the automatic tournaments.
Good explanation!
Idea: A “repeat last” option on the Strategic Modelling simulator would be a nice convenience feature. After trying a “New Random Tournament” one often wants to repeat it with the same values, to account for randomness.