Scientists Calculate Poker • Dmitry Dagaev • Science News on “Elements” • Math, Programming – News Home


<b>图。 1.</b> 单挑扑克游戏” border=”0″/></div> <p> </p></div> <p> </p> <p><b>2015年初在杂志 <i>科学</i> 发表了一篇文章,宣布成功完成了一个计算机程序,该程序计算了扑克的一个版本——有限德州扑克的单挑。 该程序已经学会在大约 3.19×10 的每一个中做出正确的决定<sup>十四</sup> 可能的游戏状态。 从长远来看,以这种方式找到的策略应该会优于其他策略。 分析的结果之一是证明庄家比第二位玩家有优势。 这篇文章的作者提供了领先的职业扑克玩家在实践中测试策略并确保它是最优的。</b></p> <p> </p> <p><a href=Texas Hold’em (Texas Hold’em) is the most popular form of poker. The game uses a standard deck of 52 cards. At the start of each draw, players receive 2 cards (pockets). They look at their cards and make the first round of betting. The player who starts the bet is called the dealer (or button player, see below). button (poker)), after each draw, the next player in the circle becomes the dealer. During the transaction, the player can raise, tie the opponent’s bet (call) or refuse to participate further in the draw and fold. As a result, after the betting round, every player left in the tie bets the same amount. Then three community cards (the flop) are opened for each person, after which a second round of betting takes place. After that, another card is revealed (the turn) for a third round of betting. Finally, the fifth community card (the river) is opened, and finally, the fourth round of betting takes place. If at any time there is only one player left in the game, he takes the entire pot. If after the fourth round of betting, there are more than one player in the game, they reveal their hole cards and compare the resulting 5-card combination, each of which can consist of personal and community cards. The team with the better combination wins the pot.

Be careful(Be careful) means that only two players are playing. Limit Poker is a version of the game in which stakes can be increased by a fixed amount, and you can increase your stake no more than a predetermined number of times. Therefore, Limit Hold’em is the ultimate game. sequential game In game theory, it is customary to specify the use of trees. The vertices of the tree will correspond to different game states. Each vertex is assigned the name of the player who owns that vertex’s move. Edges emanating from this vertex correspond to actions that this player can perform. One of the players in the game is “natural” – that’s how in game theory they call an artificial player acting as a random number generator. “Naturally” randomly decides which card is dealt to the player or revealed at the table.

Sequential games can be divided into two types: complete information games (cf. perfect information) and games with incomplete information. In a game with perfect information, each player always knows where they are in the tree and what happened before. In games with incomplete information, players may not be able to determine what state the game is in. Poker is an example of a game with incomplete information: players do not know what cards their opponents have in their hands. Everyone can observe the board cards and actions at the time of the trade, but the opponent’s cards at the time of the trade are unknown.

Any finite sequence game with perfect information can be computed from the end using a reverse induction algorithm. Considering a subgame at the most recent level (ie, one where after any decision is made, the game ends and the player calculates the payment received), the best action subgame for the player with that move can be found. Also, in the same way, the optimal behavior of players can be found in all subgames in the last layer. Afterwards, knowing how rational players performed in the last subgame, one can proceed to analyze the game at the penultimate level, and so on. Sooner or later, you’re sure to get a subgame that fits the whole game, where you can then find the best move for the player with the first mover. Therefore, the best behavior of all players in any possible situation will be found, and how the game ends with the correct actions of all players. This is how checkers was calculated in 2007 – it turned out that if both sides played correctly in checkers, the game would definitely end in a draw (J. Schaeffer et al., 2007. checkers solved)。

Poker has fewer pieces in terms of possible game states. Unlike checkers, however, poker is a game of incomplete information. This makes it impossible to directly apply the reverse induction algorithm: if at some point the player does not know which vertex he is at, he will not be able to find a unique optimal solution. However, such a game can be rewritten as a matrix (normal game form): All strategies of the first player can be written horizontally and all strategies of the second player can be written vertically, which can then be found in the result matrix Nash Equilibrium. Theoretically. Here we face another problem: the resulting poker matrix will be very large. The complexity of finding a Nash equilibrium using a linear programming algorithm grows exponentially with the number of game states, so for a complex game like poker, this method is not applicable. We had to give up the idea of ​​reducing trees directly to matrices. Instead, the authors used a special modification of the Savage test (see below). Regret (Decision Theory)) aims to solve games with incomplete information in linear time depending on the number of game states. The algorithm looks at sets of information from the end and assigns them one or the other penalty depending on the strategy being played. After that, the algorithm minimizes the accumulated penalty.

Another difficulty in solving poker is that the player’s expected payoff is not necessarily expressed in whole numbers – compared to checkers where there are only 3 possible outcomes! Since we are talking about calculating payments by computer, the author must approximate infinite decimals with a given accuracy ε. But you can’t use the standard definition of a Nash equilibrium, because computational errors would interfere with answering the question of whether it is profitable for any participant to deviate from one or the other game profile.The author uses this concept ε-Nash Equilibrium, according to this strategy, if none of the players deviate from this strategy configuration and their utility increases beyond ε, the strategy configuration is called ε-Nash equilibrium. In particular, any Nash equilibrium is an ε-Nash equilibrium.

Finally, we conclude that the author of the article is science. For some sufficiently small ε, the authors propose ε-Nash equilibria (ε too small for human life to examine the difference between ε-Nash equilibria and Nash equilibria). on the diagram. Figure 2 shows the player’s first move in this strategy profile. On the left, for any starting combination of two cards, indicates the dealer’s first action (“raise” in the green cell, “fold” in the red cell), and on the right is the second player’s answer, if the dealer Raise on the first step (green is “raise”, blue is “call”, red – “reset”, mixed colors correspond to the ability to mix several of your strategies with different probabilities). In this situation, the dealer is often bluffing – he raises with a bad card, and the second player is often forced to fold without recognizing whether the dealer is bluffing. Thus, the dealer beats the second player from a distance.

米。 2. 玩家第一步的最优动作

There may also be other ε-Nash equilibria in the game we are considering.However, it should be remembered that in Zero-sum game, that is, in poker, all Nash equilibria will give players the same payoff. Therefore, finding a Nash equilibrium means that you have found strategies that you can use to guarantee yourself the best possible outcome.

Is it possible to make money by playing the found strategies? Yes, if you can reproduce what the policy dictates to do at each location. It is unlikely that a person can do this – not enough memory. But now it’s useless to play extreme heads-up on the computer. Most likely, this means that limit heads-up poker will soon disappear from poker sites – it’s hard to check if one isn’t using special procedures to help find the best answer. For poker players, however, it’s too early to get angry. Even if all the variants of limit poker are known someday, there will still be no limit poker (you can bet any size) and it’s not the end game. Because of this, it is nearly impossible to solve no-limit poker by modifying the reverse induction algorithm…

resource: M. Bowling, N. Burch, M. Johanson and O. Tammelin. Heads-Up Limit Texas Hold’em Resolved // science. 2015. V. 347. P. 145–149。

See also:
AV Zakharov “Game Theory in the Social Sciences” It’s a good game theory textbook.

Dmitry Dagayev



Publicar un comentario

All comments are review by moderator. Please don't place any spam comment here.

Artículo Anterior Artículo Siguiente