This content is associated with a 2014 GDC AI Summit lecture on algorithms for
dynamic behavior, including Regret Matching, UCB1 and MCTS. For more information
see the overview page.
The program on this page demos UCB1 over actions using the game
Rock Paper Scissors. The AI can choose between rock, paper, and scissors.
Note that UCB1 is not a randomized algorithm. Thus, by looking
at the internal AI debug information, you can always make the
play the beats the AI. But, if you do this, the AI will tend more and
more towards playing randomly. Thus, there is a limited amount of
exploitability before you can no longer win.
(Alternate
versions of UCB1
stop playing bad actions after some time to reduce the exploitability
further.)
Suggested tests: (1) Try to play to win; you shouldn't win much in the long term
(2) Play a fixed pattern (R-P-S); the AI should take advantage of your play.