Optimizer

From Sokoban Wiki

Revision as of 11:11, 30 April 2011 by Briandamgaard (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Definition list:

Optimizer 
An optimizer is a program which can optimize (improve) a passed solution of a specific level.
board position
A specific placement of the boxes and the player in a level.
state
A state is an object in the program which contains a board positions (= all box positions and the player position) and additional data (moves, pushes, ...).


Contents

The optimizer

An optimizer tries to find a better solution than the passed one. In contrast to a solver an optimizer has the advantage of having a solution as basis.

Current solvers can't solve complex levels. In order to optimize even huge levels an optimizer has to use other methods / algorithms than a solver. The only advantage the optimizer has compared to a solver is the passed solution. Therefore it has to use as much information it can get from the solution.


What information can the optimizer get from the passed solution ?

  1. the maximum number of pushes and moves of a solution
  2. which box uses which area of the level
  3. how many moves have been made after x pushes
  4. in which order have the boxes been pushed
  5. what is the longest move part of the player until the next push has been made
  6. ...


General approach for optimizing
The optimizer can't behave like a solver. It must search "around" the passed solution. Therefore it seldom finds complete new solution. Mostly there are just small improvements of parts of the solution. A solver starts at the initial board position of a level and tries to solve the level.

Because an optimizer shouldn't just optimize pushes but also moves the player position of every board position is important.

One way an optimizer may work is:

  • It uses the existing solution as a "spine".
  • For each position along the spine it generates all legal successor board positions
  • If a successor board position happens to return to the spine, then compare the costs of the old path and the new path to this board position, and save the best of them.

This can be done using a breadth first search. First all successors of the spine board positions are generated. Then all successors of these generated successors, ...

Storing board positions

Reached board positions have to be stored, because:

  1. Duplicate board positions reached during the search for a better solution have to be identified and pruned
  2. For every reached board position the game metrics have to be saved: moves, pushes (additionally: boxchanges, boxlines, pushing sessions). This has to be done for being able to compare the current path to a board position with the old path to that board position.

Both the solver and the optimizer must store player position. The difference is that the push-only solver stores a normalized (top-left) player-position which is all that is needed to identify which access-area the player is in.

For the optimizer, the situation is different. There the actual player-position has to be stored because it matters to successor-board-positions how a given position has been reached. It influences how many steps is takes to get to the successor-board-positions.

Hence these data have to be stored for every reached board position:

  • A link to the preceding board position (for being able to determine the path how this board position has been reached)
  • the box positions
  • the player position
  • the number of moves needed to reach this position
  • the number of pushes needed to reach this position

In order to do this a "state" is created: An object containig all these data.

Unfortunately, this is not feasible for anything but the smallest levels because it would need to much memory. One solution is just to store the incremental updating steps in each state. In other words, the state must store the move that lead to the board position stored in the state.

This means, every state (except the state containing the intial board position of the level) only contains:

  • Link to the preceding board position
  • Player position
  • Pushed Box number (the boxes must have been numbered in the initial board position)
  • Move direction
  • Number of moves
  • Number of pushes

Example:

Initial state                link    1. Successor state             link      2. Successor state
position box 1: x=4, y=5   <-------  pushed box number: 1         <-------    pushed box number: 2
position box 2: x=8, y=8             move direction: right                    move direction: left

In order to determine the box positions in the second successor state information from all preceding states have to read:
From the 2. successor state we know that box 2 has been pushed to the left => remember box2s x position has to be decreased by 1
From the 1. successor state we know that box 1 has been pushed to the right => remember box1s x position has to be increased by 1

Then the initial state is reached. There we get the information about the start box positions:
box1: x=4, y=5
box2: x=8, y=8

Now the information from the other states have to be uses:
box2s x position is decreased by 1
box1s x position is increased by 1

Result (= box positions of the 2. successor state):
box1: x=3, y=5
box2: x=9, y=8

Problem when a better path to a state has been found

Current path (only important data from the states are shown):

state a (initial)  state b           state c           state d           state e            state f
-----------------  ----------------- ----------------- ----------------- -----------------  -----------------
link to: noting    link to state a   link to state b   link to state c   link to state d    link to state e
pushed box: none   pushed box no.: 1 pushed box no.: 1 pushed box no.: 2 pushed box no.: 2  pushed box no.: 1
direction: none    direction: right  direction: down   direction: left   direction: left    direction: down
pushes: 0          pushes: 1         pushes: 2         pushes: 3         pushes: 4          pushes: 5
box1: 2,3
box2: 4,3

too avoid calcuations every time here are the box positions that have been reached for every state: box1: 2,3 box1: 3,3 box1: 3,4 box1: 3,4 box1: 3,4 box1: 3,5 box2: 4,3 box2: 4,3 box2: 4,3 box2: 3,3 box2: 2,3 box2: 2,3


Now a better path to the board position of "state e" is found:

state a (initial)  state m                                               state n
-----------------  -----------------                                     ----------------- 
link to: noting    link to state a                                       link to state m
pushed box: none   pushed box no.: 2                                     pushed box no.: 2 
direction: none    direction: left                                       direction: down     
pushes: 0          pushes: 1                                             pushes: 2         
box1: 2,3
box2: 4,3

box1: 2,3 box1: 2,3 box1: 2,3 box2: 4,3 box2: 3,3 box2: 3,4

One can see that we have reached the same board position after just two pushes. In sokoban it doesn't matter which box is located on which square, hence, regarding only box positions the board positions stored in "state e" and "state n" are identical.
Now, having found a better path to the board position stored in "state e" we automatically have found a better path to all successor states of "state e", of course (because only pushes and moves are regarded, not the secundary metrics). For instance, "state f" can be reached better now, too. The reached board position in "state n" has been reached before. Hence, there shouln't be a new object created (-> "state n") but the new path to this board position should be set in the already existing object.
That has to advantages:

  1. we save the memory for one state ("state n" hasn't to be created)
  2. all states referring to "state e" or one of it successor states automatically refer to the better path.

"State e" still links to "state d" and therefore to the longer path. To set the new better path for "state e" we have to reset the link to the new "state m".

What happens when the new link is set:

state a (initial)  state m           state e            state f
-----------------  ----------------- -----------------  -----------------
link to: noting    link to state a   link to state m    link to state e
pushed box: none   pushed box no.: 2 pushed box no.: 2  pushed box no.: 1
direction: none    direction: left   direction: left    direction: down
pushes: 0          pushes: 1         pushes 4           pushes 5
box1: 2,3
box2: 4,3

box1: 2,3 box1: 2,3 bxo1: 2,3 box1: 2,4 box2: 4,3 box2: 3,3 box2: 2,3 box2: 2,3

One can see that the board position stored in "state e" now has changed to (2,3 ; 2,3)! Also all successor states of "state e" have wrong board positions now.
Furthermore the number of pushes is wrong in these states.

Therefore it doesn't suffice to just set a new link in "state e". Both the pushed box no. and the direction may be wrong after a new link is set.

Revisiting

The link in "state e" isn't set to the new "state m" but stays as it is. A new object "state n" is created and stored in the hashtable. The old "state e" is deleted from the hashtable but still stays in memory, because it is linked by its successor states. Do this all states have are consistent and correct. However, all successor states of "state e" have a too high number of pushes, because they actually can be reached better than currently saved number of pushes in them indicates. This causes problems. Example:
A new state "state g" is generated as successor of "state f". The number of pushes stored there would be 6. Maybe the board position of "state g" had been reached before in just 5 pushes. Then the new path to this board position would be pruned because it is a worse path to that board position. If "state f" had been updated because of the new better found path to "state e" the stored number of pushes in that state would have been only 3. Therefore "state g" would have only 4 pushes and would have been better reached than every before. Hence, the search would not stop here but continue from "state g" and maybe find new best solutions.
This problem is solved by the search itself:
Having reached "state n" this state has been stored in the hashtable and put on the OPEN-queue for further searching. Once "state n" is taken out from the OPEN-queue the states "f", "g", ... are generated again - they are revisited. Thereby their stored push values are set to the new better values.

Disadvantages: In huge search trees many states have to be revisited. This needs a lot of time and all states are created new when they are revisited. => time consuming and memory intesive

These disadvantages can be reduced by sorting the states by moves to better states like "n" are first taken out from the OPEN-queue before the worse states "f" are taken out. If a new best path has been found and the revisiting isn't done for all successor states of the state a better path has been found to, there will be states stored in the hashtable having wrong values (pushes, moves, ...)


Revisiting solves the following problems:

  1. If a new path to a state "x" is found all the successor states of "state x" are revisited and their stored values are set accordingly
  2. If a state has been pruned during the search because of some pruning mechanism and a better path to one of its predecessor states is found this state is generated again at some time and then may not be pruned.


Setting new links

If it was possible to set the better path by setting a new link in a state then the search would not need to create a new state object in the memory for every revisited state.

For being able of doing this these problems have to be solved:

  1. the values for pushes, moves, ... in the successor states have to be updated after a new path has been found
  2. the successor states must still "contain" the correct board positions after the new link has been set
  3. states that have to be pruned before must be put on the OPEN-queue because with the better path to them they maybe needn't be pruned anymore / the successor board positions must be stored in the OPEN-queue with their correct "estimated interest"

In the example the board position of "state e" has been reached better and therefore the link to "state m" is set. It's possible to set the correct number of pushes in "state e", too. The problem is that all successor states of "state e" also have wrong values. Either they have to be revisited, so their values can be overwritten by the new ones or they have to "correct their values by themselves". Automatic correction can only be done by a new calculation of the values every time they are needed:
If the number of pushes of "state e" are needed, they are calculated on-the-fly: the number of predecessor states of "state e" = the number of pushes needed for reaching the board position of "state e".
For the moves every state mustn't store the absolut number of moves so far but just the moves needed for getting from the predecessor board position to the current one. Then these moves can be added up.

However, revisiting solves problem 1 and problem 3. Hence there is no benefit of incremental storing of pushes and moves. Incremental storing of the metrics doesn't avoids the need to revisit all successors of a state that a better path has been found to. Here is an example (the moves represent the total number of moves and not the incremental):

state a (initial)  state b           state c           state d           state e            state f
-----------------  ----------------- ----------------- ----------------- -----------------  -----------------
link to: noting    link to state a   link to state b   link to state c   link to state d    link to state e
pushes: 0          pushes: 1         pushes: 2         pushes: 3         pushes: 4          pushes: 5
moves: 0           moves: 10         moves: 20         moves: 30         moves: 40          moves: 50

Now the search finds another path to the board position stored in state f:

state a (initial)  state m           state n           state o           state p            state f
-----------------  ----------------- ----------------- ----------------- -----------------  -----------------
link to: noting    link to state a   link to state m   link to state n   link to state o    link to state e
pushes: 0          pushes: 1         pushes: 2         pushes: 3         pushes: 4          pushes: 5
moves: 0           moves: 20         moves: 30         moves: 35         moves: 45          moves: 55

State f has been reached worse than before and therefore the link to state e isn't replaced by the link to state p.

Now the search finds a new better path to state o:

state a (initial)  state r           state s           state o           
-----------------  ----------------- ----------------- ----------------- 
link to: noting    link to state a   link to state r   link to state s   
pushes: 0          pushes: 1         pushes: 2         pushes: 3         
moves: 0           moves: 10         moves: 15         moves: 20  

The better path is set by setting a new link to state s in state o.
But no matter whether absolute or incremental metrics are stored: the better path to state f through state o and state p isn't found now without revisiting. State p can now be reached with 30 moves and state f with 40 moves.

Revisiting does solve two of the problems, but the successor states may "contain" incorrect board positions after a new link has been set. The problem is that the stored direction and the stored box number in a state cannot be trusted anymore after a new link has been set (-> Problem when a better path to a state has been found)

To retrieve the correct board position the correct box position in every state must be retrieved:
The player position after the push is stored in every state in the optimizer. The optimizer must regard moves, too. Hence the player position can't be "normalized" like in those solvers that just regard pushes but not moves. Therefore the optimizer must store the actual player position - which is the start position of the box!
This information can be trusted even if a new link is set in a state.
The box position after the push can be calculated: box target position = player position + push direction

The push direction can not be trustet when a new link is set:
A "state x" carries the following board position and the direction "up" (= this board position has been reached by pushing the box one square up):

$
@$

This board position may no be reached better by another path and the direction "right". However - this only influences the current state. When the link to the new path is set in "state x" all successor states still have the correct push direction stored. Therefore this problem can be avoided by simply updating the direction of better reached states.

To sum up:
If a new better path to a board position has been found this path can be set (= linked) to in the state carrying this board position. The metrics of this state have to be updated and the push direction has to be updated. The board position of a state can be retrieved using the player position and push direction stored in every state:

 state a (initial)  state b           state c           state d
-----------------  ----------------- -----------------  ----------------- 
link to: noting    link to state a   link to state b    link to state c   
pushed box: none   pushed box no.: 1 pushed box no.: 1  pushed box no.: 2 
direction: none    direction: right  direction: down    direction: down  
player pos.: 3,4   player pos.: 7,8  player pos.: 8,8   player pos.: 2,2
pushes: 0          pushes: 1         pushes: 2          pushes: 3
moves: 0           moves: 8          moves: 11          moves: 23

The board position of state c can be calculated because the target positions of all boxes can be calculated:
state b: box target position = 7,8 + right = 8,8
state c: box target position = 8,8 + down = 8,9
state d: box target position = 2,2 + down = 2,3

From this follow these box positions: 8,9 and 2,3.
The player position of state d can be read directly from state d, of course.

Note: There is still a little algorithm necessary to determine which "box target position" is the actual end position of a box and which one is just a position where the box is pushed away from later. This can't be determined using the "pushed box no" information because this information may be wrong!

Optimizer strategies

Reordering pushes

Often many optimizations can be found by just reordering the pushes. That means all boxes are pushed towards the same direction but the pushes are done in a different order.

Example:

     ####
    ##. ##
##### .  #
#   #  # #
# $ #  # #
# $  @   #
######  ##
     ####

Solution: lluullddRRRRuruurrdddldlUUUUdddllluulDldRRRRuruurrdddldlUUU

The best solution can be found by reordering the pushes. This can be done the following way:

Delaying / promoting pushes

For each push in the solution the optimizer tries to omit the push. Then it examines the following pushes along the solution.
If a push now cannot be performed (because the missing push had to be done first), then the search continues to the next one.
In short: the optimizer omits a push of the solution and tries if the rest of the solution still contains valid pushes. Of course the omitted pushes have to be done at some time. Hence they are stored in a queue. Every time where it's possible to perform them they are done.
The idea is to remove a push at some place of the solution and insert it at another place => reordering

Example:
Pushes of the solution: ABCDEFGHIJK
At first the optimizer tries to omit push A. Therefore A is stored in the queue and the first push which is done is push B.
Now after push B the optimizer tries to do the pushes stored in the queue. Let's say push A can't be done know. Then the optimizer tries to do push C. It can be done. Again the optimizer tries to do push A. Now it's possible to do that push.
Hence the optimizer will perform it and the solution so far looks like this:
BCA
Now the optimizer continues with D, E, ...

What happens if a push of the solution can't be done anymore ?
In this case the push is omitted, too and stored in the queue.
If the optimizer has done BCA and the push D is invalid now, it is omitted and the push E is tried to be done. If that succeeds the push sequence done so far is: BCAE
Now again the optimizer tries to do the pushes stored in the queue first.

What to do when there is more than one push in the queue ?
Example:
The solution is ABCDEFGHIJ

Let's say that we omit the 4th push: D, and the following EF thereby becomes illegal. GH is still possible. After the pushes "ABC G" the queue contains [D,E,F], so the optimizer first tries to generate the push series "ABC GD", and if that succeeds, then "ABC GDE" and if that also succeeds, then "ABC GDEF", too.

All permutations of the queue are tried, so a lot more combinations are tested. For example, "ABC GFDE", "ABC GEDF" and "ABC GDFE".

Generating permutations this way can easily go dead because of exponential growth, and therefore, there should be a x-pushes-ahead limitation. This means the search continues to the next pushes, up to an upper limit of x pushes ahead from the position where the first push was omitted. The value of x must be set to a proper value regarding the performance of the program.


Of course, it's also possible to do the opposite:
For each board position that is reached in the solution, the program can look for later pushes that are legal already now.

For each of the pushes that are legal in the current board position, the program tries to "promote" them, that is taking them out of the sequence and put them back into the path right after the investigate board position.

Although this algorithm doesn't try every possible permutation, it's often very time consuming. Therefore other methods may be used to find reordering possibilities for improving a solution:

n-box permutations

YASO Optimizer

Sokoban optimizer "scribbles" by Brian Damgaard about the YASO optimizer

Personal tools