GNU Backgammon
Backgammon board image

Evaluation settings

Introduction to evaluation settings

GNU Backgammon evaluation functionallity is driven by 3 separate neural networks. The neural nets evaluates each position statically, and returns the outcome probabilities of the game at the given position. However, there are several different methods and techniques that an evaluation can use, and these can be adjusted. It's possible to set different levels of lookahead, it's possible to add noise to the evaluation, and each evaluation can be done cubeful or cubeless. All these setting together form a total evaluation setting. In GNU Backgammon there are several of these evaluations setting for each operation GNU Backgammon does.

  • Evaluation setting for Hints and Evaluations
  • Evaluation setting for analysis.
  • Evaluation setting for GNU Backgammon when it's playing.
  • Several evaluation settings for each move performed in a rollout.

The depth to search and plies

A ply is simply considered to be one turn by a player. Any position can be evaluated at 0-ply. That means that GNU Backgammon does not look ahead in the game to evaluate the position. When GNU Backgammon is evaluating a checker play decision, it looks at all resulting positions after all the legal moves with the given dice roll, and evaluates these positon at the given ply. It's possible to set the search depth by specifying the plies lookahead in any evaluation settings dialog.

Note to Snowie users: GNU Backgammon differs from some other software, notably Snowie in that GNU Backgammon zero ply is the same as Snowie 1 ply. In the same way what is called 2-ply in GNU Backgammon will therefore be similar to a Snowie 3-ply.

For GNU Backgammon, a 0-ply evaluation of a move would be done by:

Build a list of all legal moves. For each move, take the resulting board position and use the neural net to estimate the expected percentage of wins/gammons/backgammons/losses/gammon losses/backgammon losses. Rank the moves based on this evaluation.

For one ply, after doing the above step, GNU Backgammon chooses the best n moves (where n is set by the move filters). For each one, it takes the resulting position for that move and goes through all 21 possible dice rolls for the opponent. From these results, it works out the average expectation for the initial move and ranks them. This is the same as Snowie 2 ply. You can think of it as asking “what's my best move if I also consider every possible dice roll and move my opponent might make?”

For 2 ply (Snowie 3 ply), a similar process is done, but this time, not only are the opponents possible moves considered, but, for each of these, the player on roll's next move will be considered as well.

For a single move, on average there are about 20 legal moves to consider.

When doing a one ply analysis/evaluation, for the top n moves (from the move filter, GNU Backgammon needs to consider 21 rolls by the opponent, and 20 possible legal moves per roll) = 420 positions to evaluate.

Every additonal ply will multiply the previous number of evaluations by about 400 odd, which explains the huge difference in playing speed/analysis speed between 0 ply and 2 ply settings. I don't think many people would enjoy playing against GNU Backgammon at 3 ply settings, where moves could take minutes to be selected. It's also not clear that using much deeper lookahead actually gains a lot in terms of playing strength - if you really need better answers than 2 ply, rollouts are probably a lot better.

Defining evaluation settings

First of all: There are several places in GNU Backgammon where you can adjust either its skill at playing or the quality of its hints and analysis:

Playing skill:

  • *Settings→Players→Player 0 - choose Supremo or World Class. GNU Backgammon will take longer choosing its moves, but they will be very strong. At this setting, it is much stronger than Jellyfish level 7. === Hints: === Settings→Evalutation - choose Supremo or World Class if you want hints to be very accurate, but, like the playing skill settting, you may find that it can take as long as 15 seconds to get an answer on a 1GHz PC. Expert will be very fast, but for some positions where you need to consider what your opponent may do on his roll and how you will follow it up, the results will be less accurate. A wild guess would be somwhere around 2 or 3 percent of the time, World Class or stronger settings would give a different best move and maybe less than 1/2 percent of the time, the Expert result would be seriously wrong. === Analysis: === Settings→Analysis - these settings are used by the Analyse Move/Game/Match or Session command. Note that this is totally different to what is used in the Hint command, which uses the above settings. You probably want at least World Class here. My experience on a 700MHz PC is that a 7 point match takes about 15 to 20 minutes to analyse on the Supremo settings. But the results tend to be very accurate. === Rollouts: === Settings→Rollouts→General Settings tick the boxes for 'Cube decisions use same settings same as chequer play' and 'Use same settings for both players' Settings→Rollouts→First Play Both - select Expert here (this is my opinion). When doing rollouts, most of the time Expert play will be more than strong enough if you do say 1296 trials with no truncation. The rollout function has an enormous number of options, most of which are only useful when trying to investigate special positions. The simple expert setting for both players is probably more accurate than any of the Analysis functions. The downside is that rolling out 1296 trials of an early move in a game can take a couple of hours. On World Class or Supremo rollout settings, it can take more than 24 hours of computing time. ==== The evaluation settings dialog ==== A typical evaluation settings dialog is shown in this figure. The dialog consists of two columns. The left column is for setting the checker play decision evaluation settings, and the right column is for setting the cube decision evaluation settings. For each column you can select some predefined settings, or you can define your own settings. In the lookahead box, you can adjust the lookahead of each evaluation by specifying the plies to be evaluated. Each ply costs approximately a factor of 21 in computational time. Also note that 2-ply is equivalent to Snowie's 3-ply setting. For evaluation deeper than 0 plies, it's possible to reduce the number of rolls to evaluate in the lookahead. This can be set in the Reduced evaluation box. Instead of averaging over all 21 possible dice rolls it is possible to average over a reduced set, for example 7 rolls for the 33% speed option. The 33% speed option will typically be three times faster than the full search without reduction. In the box for Cubeful evaluations, you can specify if you want GNU Backgammon to evaluate the cube ownership in its evaluations. With this option turned on it generally improves the evaluation, especially when it's close to cube decisions, so we recommend that this option be turned on. In the Noise box, you can add noise to the evaluation. This can be useful for reducing the program's playing strength if you think that it is playing too strong. You can use the Noise box option to introduce noise or errors in the evaluations. This is useful for introducing levels below 0-ply. The lower rated bots (e.g., GGotter) on the former 1) GamesGrid backgammon server used this technique. The introduced noise can be deterministic, i.e., always the same noise for the same position, or it can be random. ==== Predefined settings ==== At the top of each evaluation settings column, it's possible to set a predefined setting. * Beginner This setting uses no lookahead and adds up to 0.060 noise to the evaluation. With this setting GNU Backgammon will evaluate like a beginner. * Casual play This setting uses no lookahead and adds up to 0.050 noise to the evaluation. With this setting GNU Backgammon will evaluate a bit better than the beginner setting but not much. * Intermediate This setting uses no lookahead and adds up to 0.030 noise to each evaluation. It still plays an intermediate game. * Advanced This setting uses no lookahead and adds up to 0.015 noise to each evaluation. This setting plays a good game. * Expert This setting uses no lookahead, but does not add any noise to the evaluations. This setting plays a strong game. * World class This setting uses 2-ply lookahead, it uses no noise, and it uses a normal move filter. This plays a really strong game close to the best human players in the world. * Supremo This is basically the same as the World Class setting, but it uses a larger move filter. * Grandmaster This setting uses 3-ply lookahead, no noise, and a normal move filter. This setting is extremely strong, but it's also very slow. ===== Move filters ===== ==== Introduction to move filters ==== GNU Backgammon uses a technique called move filters in order to prune the complete list of legal moves when analysing chequer play decisions. Move filters can be considered a generalization of the search space used in earlier versions of GNU Backgammon. A move filter for a given ply, say, 2-ply, consists of four parameters for each subply: - whether to analyse at all at this subply, - the number of moves always accepted at the given level, - the number of extra moves to add, - the threshold for adding extra moves. A move filter for a given ply, say, 2-ply, consists of four parameters for each subply: whether to analyse at all at this subply, the number of moves always accepted at the given level, the number of extra moves to add, the threshold for adding extra moves. For example, for 2-ply chequer play decisions there are two move filters: one for pruning at 0-ply, and another for pruning at 1-ply. The predefined setting “Normal” has: accept 0 moves and add up to 8 moves within 0.16 at 0-ply, and no pruning at 1-ply. Consider the opening position where 4-2 has been rolled: GNU Backgammon starts by finding all possible moves and evaluate those at 0-ply: 1. Cubeful 0-ply 8/4 6/4 Eq.: +0.207 0.548 0.172 0.009 - 0.452 0.121 0.005 0-ply cubeful [expert] 2. Cubeful 0-ply 13/11 13/9 Eq.: +0.050 ( -0.156) 0.509 0.155 0.009 - 0.491 0.137 0.007 0-ply cubeful [expert] 3. Cubeful 0-ply 24/20 13/11 Eq.: +0.049 ( -0.158) 0.513 0.137 0.007 - 0.487 0.132 0.004 0-ply cubeful [expert] 4. Cubeful 0-ply 24/22 13/9 Eq.: +0.037 ( -0.170) 0.508 0.142 0.007 - 0.492 0.134 0.004 0-ply cubeful [expert] 5. Cubeful 0-ply 24/22 24/20 Eq.: -0.008 ( -0.215) 0.501 0.121 0.006 - 0.499 0.133 0.003 0-ply cubeful [expert] 6. Cubeful 0-ply 24/18 Eq.: -0.015 ( -0.222) 0.502 0.121 0.006 - 0.498 0.140 0.004 0-ply cubeful [expert] 7. Cubeful 0-ply 24/20 6/4 Eq.: -0.023 ( -0.229) 0.497 0.132 0.007 - 0.503 0.144 0.005 0-ply cubeful [expert] 8. Cubeful 0-ply 13/9 6/4 Eq.: -0.026 ( -0.233) 0.494 0.146 0.008 - 0.506 0.151 0.009 0-ply cubeful [expert] According to the move filter the first 0 moves are accepted. The equity of the best move is +0.207, and according to the move filter we add up to 8 extra moves if they're within 0.160, that is, if they have equity higher than +0.047. Moves 4 through 18 all have equity lower that +0.047, so the move list after pruning at 0-ply consists of moves 1 through 3. According to the move filter we do not perform any pruning at 1-ply, so moves 1 through 3 are submitted for evaluation at 2-ply; 1. Cubeful 2-ply 8/4 6/4 Eq.: +0.197 0.546 0.172 0.008 - 0.454 0.123 0.005 2-ply cubeful 100% speed [world class] 2. Cubeful 2-ply 24/20 13/11 Eq.: +0.058 ( -0.138) 0.515 0.141 0.007 - 0.485 0.130 0.005 2-ply cubeful 100% speed [world class] 3. Cubeful 2-ply 13/11 13/9 Eq.: +0.050 ( -0.147) 0.508 0.156 0.007 - 0.492 0.136 0.006 2-ply cubeful 100% speed [world class] 4. Cubeful 0-ply 24/22 13/9 Eq.: +0.037 ( -0.159) 0.508 0.142 0.007 - 0.492 0.134 0.004 0-ply cubeful [expert] 5. Cubeful 0-ply 24/22 24/20 Eq.: -0.008 ( -0.205) 0.501 0.121 0.006 - 0.499 0.133 0.003 0-ply cubeful [expert] 6. Cubeful 0-ply 24/18 Eq.: -0.015 ( -0.212) 0.502 0.121 0.006 - 0.498 0.140 0.004 0-ply cubeful [expert] 7. Cubeful 0-ply 24/20 6/4 Eq.: -0.023 ( -0.219) 0.497 0.132 0.007 - 0.503 0.144 0.005 0-ply cubeful [expert] 8. Cubeful 0-ply 13/9 6/4 Eq.: -0.026 ( -0.222) 0.494 0.146 0.008 - 0.506 0.151 0.009 0-ply cubeful [expert] If we instead request a 4-ply chequer play decision, GNU Backgammon will use the move filters defined for 4-ply: ^Ply ^Accept moves^Extra moves^Threshold for extra moves| | 0 | 0 | 8 | 0.160 | | 1 | no pruning | | | | 2 | 0 | 2 | 0.040 | | 3 | no pruning | | | The 4-ply move filter is identical to the 2-ply for pruning at 0-ply, so after 0-ply we have the same three moves as above. Since there is no pruning at 1-ply these three moves are evaluated at 2-ply as above. There is no pruning at 3-ply. At 4-ply we do not accept any moves, but add up to two moves if there within 0.040 from the best move. Since the second best move is -0.138 worse than the best move, we do not accept any moves to be evaluated at 4-ply. Hence GNU Backgammon will actually not evaluate any moves on 4-ply. The predefined move filters all have accept 0 moves, in order to facilitate fast decisions and analysis, i.e., no need to waste much time over obvious moves. For post-mortem analysis it may be worthwhile to ensure that GNU Backgammon analyses at least two moves at the specified ply. To do this, specify accept 2 moves in the move filters you use for analysis. However, do note that GNU Backgammon will force evaluation at the specified ply if the actual move made is doubtful. This ensures that all errors and blunders are evaluted at the same level. ==== Defining move filters ==== The move filter allows you to control exactly how many moves GNU is examining at each ply. A ply is basically one move played by one side, thus if both sides played a move, it would be one whole move, but two plies, one for each side. To change the specific settings, press the Modify… button. Although the level presets, such as World Class, Supremo, etc. are tested and good, you may want to know or control how GNU filters its moves to analyse and how many. If you are playing Expert level (this is what GGRaccoon is set at) or another 0-ply setting, the Move Filter settings will not change a thing, as Expert level automatically examines all moves. At Supremo level, this changes though, as it takes a selection of the best moves from 0-ply and examines them at 2-ply. This means that for those selected moves it will calculate all the possibilities 2 plies ahead and evaluate them, allowing it to find better moves. Since Supremo is a 2-ply setting, we are only interested in the 2-ply settings of Large as in the figure above. 3-ply or 4-ply settings will have no effect here because Supremo doesn't examine at that depth. In the figure above, we can see it first will Always accept 0 moves. This first line means that it won't force any moves to be analysed at 2-ply, it will only analyse moves according to the second line. If it had said it would always analyse 2 moves, this would mean that no matter how ridiculously bad the 2nd move was compared to the 1st, it would analyse both at 2-ply. The second line says it will Add extra 16 moves within 0.320. This means that provided they aren't more than 0.320 equity worse than the top move, it will select a maximum of 16 moves to analyse at 2-ply. For example, in the figure below, the 2nd best move is no less than 0.453 equity worse than the top choice, so it didn't bother analyzing them at 2-ply as it is unlikely to change its mind on what the best move is. Take a look at next figure. Here, the exact same settings were maintained, but the 1-ply filter was activated. This just means that those 16 moves selected from the 0-ply are sent instead to be analyzed at 1-ply, and then up to 5 moves from 1-ply will be sent to be analyzed at 2-ply. So this would actually be faster than the previous setting (and weaker), since a maximum of only 5 moves would be analyzed at 2-ply depth. Tip: Feel free to experiment with the settings, as you can always reset them by simply choosing one of the level presets. In order to see if they are better, or as good but faster, I'd suggest comparing the results with Supremo. One setting I have that works quite well, is to take the basic Supremo setting and in the Move filter reduce the 16 to 12. It cuts down on the thinking time by 20-25% more or less, and I haven't seen more than one case in over 10,000 moves where it missed the best move. ===== Cubeful vs. Cubeless ===== In the evaluation settings dialog box you can specify whether or not checkerplay should be evaluated cubeful. It's recommended that you use cubeful evaluation. To get an understanding of what cubefule checkerplay evaluations are, you can take a look at this position: In this position black has rolled 51 and he has a good position. If the position is evaluated cubeless the best move is 13/7. Black can hope white does not roll 34 or 35 form the bar and has now a good chance to close white out in the next few rolls. However if white rolls one of the four hitting numbers from the bar, white will quite soon have a really hot redouble. This redouble increases whites equity so much that black actually should play this move safe. He should play 13/8 6/5. However if the evaluation was set to cubeless, an setting which assumes white will never redouble, black should play 13/8. Here's another example: This is from a 5 point match where black has 1 point and white has 3 points. Black wins the opening roll and considers playing 13/11 6/5 or 24/23 13/11. If GNU Backgammon uses a cubeless evaluation it will play 24/23 13/11. But if you're using a cubeful evaluation it will play 13/11 6/5. Slotting with 6/5 is at this score a better move even though it loses more gammons. The gammons black are losing won't matter anyway, since black will turn the cube in the next few rolls anyway. The slotting play also wins more gammons, and with the cube tuned to 2, black should play towads gammonish positions at this score. You can read more about cubeful evaluations in the Appendix. It's recommended that you use cubeful chequer evaluations. ===== Reduced evaluations ===== This option is designed to increase the speed of play by taking a shortcut. Instead of averaging over all 21 possible dice rolls, it is possible to average over a reduced set, such as only 7 rolls which would be 1/3 or 33%. The 33% speed option should be approximately 3 times faster than the full search with no reduction. One point, it has been noted that this can badly hurt GNU Backgammon's checker play so it is not advised to use it here. ===== Pruning neural networks ===== A new feature in the evaluation is the use of a set of neural networks just to prune away move candidates within a deeper ply search. This increases the speed considerably and it doesn't lose much playing strengh compared to evaluation without these pruning neural nets. Jim Segrave has just done an analysis of this and found that less than 1% of all moves come out different with the pruning nets activated. In most of these positions the move would not have made any difference to the game at all. Note: You can notDeterministic noise, the noise added to each evaluation will be based on a sum of the bytes in the hash of the board position, which (by the central limit theorem) should have a normal-ish distribution. In that way you will always have that same noise amount to a position, since the noise added to the evaluation is only depending on the position itself. If you want GNU Backgammon to evaluate and play as strong as possible, you should not add any noise.**
1) The original GamesGrid server closed operation in May 2008. The GamesGrid brand was bought and is currently being used by GameAccount Global.
evaluation_settings.txt · Last modified: 2012/04/11 22:20 (external edit)
Except where otherwise noted, content on this wiki is licensed under the following license: GNU Free Documentation License 1.3
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki