AI Summit Supplementary Material: UCB1 on Strategies Demo

This content is associated with a 2014 GDC AI Summit lecture on algorithms for dynamic behavior, including Regret Matching, UCB1 and MCTS. For more information see the overview page.

The program on this page demos UCB1 over actions using the game Rock Paper Scissors. The AI can choose between rock, paper, and scissors.

Note that UCB1 is not a randomized algorithm. Thus, by looking at the internal AI debug information, you can always make the play the beats the AI. But, if you do this, the AI will tend more and more towards playing randomly. Thus, there is a limited amount of exploitability before you can no longer win. (Alternate versions of UCB1 stop playing bad actions after some time to reduce the exploitability further.)

Suggested tests: (1) Try to play to win; you shouldn't win much in the long term (2) Play a fixed pattern (R-P-S); the AI should take advantage of your play.

Your Last Move	AI Last Move

Your next action:

Your Score	AI Score
0	0