Sokoban optimizer "scribbles" by Brian Damgaard about the YASO optimizer

From Sokoban Wiki

Revision as of 04:42, 5 July 2011 by Briandamgaard (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Optimization methods in YASO (Yet Another Sokoban Optimizer)

Global optimization

This is a simple breadth-first-search starting from the positions on the solution path. First all 1-push pushes are generated from all the positions along the solution path, then all 2-push pushes, etc. The name "Global optimization" is nonsensical; the method just needed a name when the optimizer was extended with other methods later.

..put all positions on the best solution path on the open-queue and the closed-table
..while the open-queue isn't empty, and memory isn't full
....dequeue position P from the open-queue
....generate successors S of position P
....for each member C of S (each child)
........if C is a new position then
...........add it to the open-queue and the closed-table

A search like this can only penetrate a few number of pushes down the game tree in the neighborhood of the existing solution before the memory runs full. To improve a little on that, the method performs a preprocessing step:

..for each position P on the best solution path
....if some of the later pushes are legal already after P, then
.......try these later moves already now, and recurse

This gives the method a vague ability to re-order the existing pushes, but it doesn't help much though.

After the "Vicinity search" method has been implemented, this method has lost its importance. "Vicinity search" is better, and it finds practically all the things which "Global optimization" can find.

It has been considered to drop the method, but it sometimes finds small improvements of the secondary metrics after the other methods have finished, and for that reason, expert users have asked for keeping this method in the program. After all, it can take hours to perform a "Vicinity search", so it's not a big deal when it takes a few extra minutes to give "Global optimization" a chance.

Pros:

  • No preparations, i.e., it reports and updates the best solution instantly during the search.
  • Optimizes all metrics, as opposed to "Vicinity search" which only optimizes 2 metrics.
  • Good for very small levels.

Cons:

  • Can only look a very few pushes away from the solution path before the transposition table becomes full. This makes the method practically useless for medium and large levels.


Box permutations

This is best explained by an example.

Let's say the current iteration searches for 3-box permutations, and that we after 10 pushes have reached a position, which we'll call X. The next 3 pushed boxes are the boxes A, B, and C.

Together, these 3 boxes are pushed 8 squares: "aaabccbb", at which point we reach position Y after 18 pushes. Note that "B" first is pushed one square, and then contacted again after having pushed the box "C", but in total, we've only touched 3 boxes.

This method searches for a path from X to Y, only moving the 3 boxes A, B, and C. Put differently: Can we find a path from the position X after 10 pushes to the postion Y after 18 pushes, only pushing these boxes, and where the new path is cheaper?

..for each N-box iteration until an iteration exceeds the timelimit
....for each position P on the best solution path
......find the next N boxes which are pushed
......after these N boxes have been pushed, the game position is Q
......search for a cheaper path from P to Q, only moving the N boxes

Incidentally, this method is exactly the same as the original method in the "BoxSearch" program, i.e., its "Method B" as far as I recall. Therefore, there is no need to use that BoxSearch module for optimization anymore.

In contrast to BoxSearch, YASO automatically iterates the number of box permutations, and for smooth co-operation with the other methods, YASO measures the time for each iteration, and when it exceeds a limit, the method-scheduler switches to one of the other methods.

Pros:

  • No preparations, i.e., it reports and updates the best solution instantly during the search.
  • It's good at finding local changes.
  • It works well even for big levels because the memory footprint is so small. This is in contrast to "Vicinity search" which doesn't work for big levels at all.

Cons:

  • It only operates locally, focusing on short N-box slices of the best solution path.


Rearrangements

Rearrangements - Re-ordering existing pushes

This method tries to re-order the existing pushes. It doesn't change the existing pushes.

This means the method can only reduce the number of moves (and maybe also improve the secondary metrics), but it never changes the number of pushes. There is actually one exception to that rule. On rare occasions, rearrangements produce a cycle on the best path, in which case the program drops the unnecessary looping pushes.

..for each position P on the best solutions path
.....find a sequence of pushes later in the game which:
.......can be performed now, i.,e, at position P
.......and which improves the score if they are performed after P

When the method looks for sequences later in the game, it pays particular attention to same-box pushes. For instance, if the box A is pushed in push number 25 and 26, and then again in push number 52, then the search checks if all 3 A-box pushes can be performed now at position P. If that's possible, then the search continues and checks if the next pushes starting from #53 also are legal at position "P + the 3 A-box pushes".

A reasonably accurate (but complex) description of the method looks like this:

..for each push "A" along the best path
....for each box "B"
......while there are more pushes "Bn" of box "B" on the rest of the path
.......while "... A B1 B2 B3...Bn Ss Xs ..." is a legal transformation of the path
.........where "Xs" are all non-B pushes in the interval [A..Bn]
.........where "Ss" is zero or more of the immediate successors of "Bn"
.........if this transformation is better than the existing best path then
............make this transformation the new best path

To dampen the exponential growth when this method processes large levels, there is the additional constraint (affecting the first line in the pseudo-code) that when a solution is longer than 1500 pushes, then the method doesn't try to break existing box-sessions. For instance, if there is a push-sequence like "...aaaaabbb...", then the method only tries to insert pushes after the last "a" (and the last "b" if that's the last B-box push in that box-session).

Pros:

  • No preparations, i.e., it reports and updates the best solution instantly during the search.
  • It's surprising how often it can find significant improvements.
  • The method looks at the entire solution, i.e., is not limited to local transformations.
  • It runs quite fast for small and medium levels.
  • It can do some work also for large levels, but it's rather slow, even with the constraint described above.

Cons:

  • Since it only makes rearrangements of the existing pushes, it cannot reduce the number of pushes, only the number of moves, and maybe improve the secondary metrics as well.

Rearrangements - Symmetry optimization

The "Rearrangements" method includes yet another optimization method, which isn't so important that it deserved to be mentioned as a method of its own. It's the "Symmetry optimization" method, and it runs as part of the "Rearrangements" - also because the "Symmetry" method is a type of rearrangement.

When a level has a symmetrical (starting) position, it sometimes makes a difference which way the user chooses to go. Even though it typically just applies to the starting position, the program checks each position on the best solution for symmetry and makes the best choices, if needed.

Symmetry-optimization can typically only reduce the best solution with a few moves, but it's very impressing when it does so, and one of the expert-users meant it was worthwhile to add it to the program.

Pros:

  • It's good at finding the (few) optimizations which are related to symmetry.

Cons:

  • It adds a small runtime penalty to the processing of all levels, also non-symmetrical levels. However, the calculation is fast so that's unimportant.


Vicinity search

This method (for convenience, let's call it "VS" from now on) is by far the best and the most important optimization method. That is not the same as saying that the other methods are dispensable. Each method has its merits, and they all have a role to play.

For instance, VS can only optimize 2 metrics, and therefore it always performs a quick and slightly limited form of "Rearrangements" optimization after each iteration, so the user never sees the unpolished output from the VS method itself.

In a sense, the idea behind this method is very simple:

  • First, generate a lot of positions which are slightly modified variations of the positions on the best solution path.
  • Next, perform a (breadth-first) search from the start to the end of the game, (as if the program tried to solve the level by itself) with the important constraint, that the search only is allowed to visit positions belonging to the precalculated ones.

If a solver tried such a naive breadth-first search from start to finish, it would invariably fail because of exponential growth. The VS optimizer, on the other hand, stays in the vicinity of the existing solution. It already has a solution, but if there is a better solution via some of the slightly modified positions, then VS has a good chance of finding it.

The generated positions can be thought of as a "cloud of dots" around a line made up of the existing best path. Then the search walks its way "left-to-right" only visiting the dots around the line.

The strength of VS is that it's not limited to local changes, like the N-box permutations.

While the idea may sound simple, it's not entirely obvious that it also is so successful as it turns out to be. One might have a vague feeling that (some of the) improved paths may pass through the positions in this "cloud" of slightly modified positions, but it's hard to tell beforehand how good it works.

Therefore, Sebastien Gouezel deserves a lot of credit both for the idea, and for implementing it - something which is a time-consuming and complex task. I usually refers to his implementation as a prototype, and it's important to emphasize that I don't intend in any way to disparage his work when I say that. His program is a marvelous achievement.

It's just that since I had the privilege to see his implementation before I wrote my own, I could see - not only how the general idea works - but also how to improve both the code and its structure, and also to find new data-structures which boosted the efficiency and allowed for simpler and more elegant code.

This is just the nature of things. A version 2 should always be better, otherwise nothing has been learned from version 1. If I rewrote my own YASC program from scratch, the code and the data-structures would also be a lot different, and a lot better.

To sum up, my implementation has the following extensions and changes, compared to the prototype:

  • Improved data-structures reduce the memory footprint by roughly 50%, or put differently, the optimizer can handle levels twice as big.
  • Automatic splitting when solutions are too long to be handled as a whole.
  • Automatic iteration, i.e, try again if an improvement has been found.
  • Quick-searches, i.e., if the settings are 20/10, then first try 10/0, 20/0, and 999/0. This finds many optimizations faster than the 20/10 would have done.
  • Progress messages during the process, so the user has a chance to know how far the calculation has reached, and so the user can decide if the results are worth waiting for.
  • If the user terminates the task, any improvements in the analyzed part of the game are detected and saved.

I'm quite proud of my implementation which - in contrast to the prototype - is readable and organized in well-chosen functions, with good variable names, etc. The VS module is ~3900 lines of Delphi-Pascal code, not counting service functions shared among all the optimizer modules.

While the basic idea behind VS may sound simple, it's implementation is not, so these 3900 code lines belong to the class of programming, where progress is down to 50-100 lines per day. This puts some perspective on the amount of work involved in the project.

Vicinity search - Phase 1 of 2 - Generating box configurations

When the precalculation generates the slightly modified positions of the positions on the best found solution path, it is governed by the limits from the "Settings" screen, where the default values are 20/10. We'll use these values in the following because it makes things easier to see a concrete example.

..for each position P along the best solution path
.....for each pair of boxes (since this is a concrete example with exactly 2 boxes)
.......generate positions where:
.........the first box is placed at one of its 20 nearest squares
.........and
.........the second box is placed at one of its 10 nearest squares

The nearest squares are found by going in circles around the box square, skipping walls, if any. The original box square is included and numbered "1" in the following examples.

5 squares:
--2--
-315-
--4--
10 squares:
--6--
-728-
9315-
-A4--
13 squares:
--6--
-728-
9315D
-A4C-
--B--

The enumeration of the squares is here performed in a counter-clockwise order. It's interesting to notice that for certain numbers like 5 and 13, all squares in the outer circle are included, whereas a number like 10 doesn't complete the outer circle.

This means that with a 10-squares limit, it makes a difference in what order the squares are visited. This may seem insignificant, but experiments performed by Matthias Meger has shown that it's worthwhile to compensate for it, so YASO changes the order from one iteration to the next, thereby sampling the search domain a little better.

For instance, in the second iteration, the optimizer may instead visit these 10 squares, using a clockwise order:

10 squares (clockwise):
--58-
-4126
-A37-
--9--

There is even a formula for the numbers that complete the outer circle: "1+(2*n*(n+1))" where "n" is a natural number including 0, i.e., (0,1,2,3...).

Vicinity search - Phase 2 of 2 - Breadth-first search

The search begins when the precalculation has finished producing the slightly modified variations of the positions on the best found solution path. The search is a plain breadth-first search, starting from the beginning of the game (or the currently investigated slice of the game). The pseudo-code sketch looks like all other breadth-first searches, just with the added constraint that positions are only considered legal if they are members of the precalculated set:

..put the starting position on the open-queue and the closed-table
..while the open-queue isn't empty, and memory isn't full
....dequeue position P from the open-queue
....generate successors S of position P
....for each member C of S (for each child)
........if C is a member of the precalculated positions
...........and
...........C is a new position then
...........add it to the open-queue and the closed-table

This simplicity, however, is grossly deceiving because it doesn't account for the complexity of the data-structures underneath. They are all heavily optimized to reduce the memory footprint to a minimum, so the optimizer can handle levels and solutions as big as possible.

The first important data-structure is the box-configurations generated by the precalculation. Each box-configuration is a set of bits representing all the (significant) squares on the board, with a "1" bit set for the squares with the boxes.

When the precalculation has finished, it returns all the box configurations in a vector organized as a left-filled balanced binary tree. That makes it unnecessary to waste memory on pointers between the items in the tree. Instead, the lookup function can use this simple and efficient procedure:

..start from the root box-configuration item numbered 0
..if the searched key is less than the item numbered N then
.....compare with item numbered 2 * N + 1, etc.
..else 
.....if the searched key is bigger than item N
.........compare with item numbered 2 * N + 2, etc.
.....else (the searched item matched item number N)

For me, it was a very interesting algorithmic puzzle in its own right to produce a left-filled balanced binary tree. It's very easy to produce a binary tree, and it's not too complicated to make it a balanced tree either. But it was also necessary to make it left-filled, otherwise the sketched pointer-less lookup mechanism is impossible. I didn't know how to do that, and text-books don't mention it. Fortunately, in the end I found an article on the internet, and later a Wikipedia article.

During the search in the second stage of the method, each found position must be stored both on the open-queue for later expansion, and in the closed-table to avoid cycles.

The closed table is merely a huge bit-vector, with a bit for each possible position the search ever can run into. The reason this is possible is, of course, that we know from the precalculation exactly which - and how many - box-configurations the search is allowed to visit.

During the search, each box-configuration can appear as a game-state with the player on any of the legal player-reachable squares on the board. So the game-states which the search "sees" consist of:

  • The number of the box-configuration
  • The player position

The closed-table (implemented as the "Visited?" bit-vector) simply reserves space for all combinations of box-configurations and player-positions. That's many mebi-bytes of memory, and many of the combinations are impossible in practice, so there are a lot of useless 0-bits, but that's the price to pay for making this algorithm work at all.

Note how a game-state elegantly can be uniquely identified by the simple integer number (boxConfigurationNumber * MAX_PLAYER_POSITIONS + player position), as long as we ensure that the number of box-configurations is small enough to avoid integer overflows. With 32-bit integers, this means that the "raw" game-state only costs 4 byte of memory.

These game-states must be stored on the open-queue so they can be expanded in proper breadth-first order. Unfortunately, more information is needed both to ensure the correct expansion order, and to produce the actual path when the search has worked its way from start to finish.

To that end, YASO deviates significantly from Sebastien Gouezel's prototype implementation. The prototype did indeed save game-states as a single 4-byte integer, but the price it had to pay for this was, that it had to keep all the non-pushing player moves, as well as the pushes, during the entire search.

YASO, on the other hand, adds a parent-pointer to each game-state: A pointer back to the most recent box-push game-state. Adding a pointer doubles the size of game-state from 4 to 8 bytes, so at first, it doesn't sound like much of an improvement, but the big advantage is, that as soon as a non-pushing player move have been expanded by the search, then the move isn't necessary anymore and its memory slot can be recycled.

Later, when the search has finished, the push-parent-pointers are all that is needed to reconstruct the best found path.

Since there are practically always at least 3-4 times more moves than pushes in a game, it's fair to say that the new game-state nodes, with the push-parent pointers, save roughly 50% memory.

Furthermore, the path-reconstruction mechanism in the prototype implementation was devilishly complicated. It searched through pushes as well as moves to find out exactly where the best path was, and this path-reconstruction search was about as complicated as the main search itself.

With YASO recycling the memory for expanded moves, it was also natural to recycle memory for the last data-structure, which first appears on the scene now: Span information.

The search cannot get by with the game-states only, as they've been described up till now. More information is needed to control in which order the game-states are expanded. This is what the "span information" is there for.

A span consists of a sequence of game-states sharing the same number of moves and pushes, e.g., 10 moves and 3 pushes. The search produces a huge number of small, and even empty spans, so if they were separated physically by markers between the game-states, then it might waste a lot of memory.

Therefore, YASO stores the span information separately because - just like the non-pushing player-moves - the span information is not needed anymore as soon as the game-states in the span have been expanded.

With all these quite complex data-structures under the belt, we can finish this description with a more accurate description of what the moves/pushes and the pushes/moves optimizations really look like. What follows are the comments taken directly from my source-code. They cast more light on how complicated this really is.

---------------------------------------------
          moves/pushes optimizer algorithm
---------------------------------------------

         game-state queues and repositories:
           moves-queue:
             game-states where the final step is a move;
             after expansion, moves are not needed anymore and are recycled
             when all moves in a memory-block have been expanded;
           pushes-queue:
             game-states where the final step is a push;
             pushes are kept throughout the search because the final
             path-reconstruction needs them; all saved game-states (also moves)
             carry their most recent push-ancestor; with this information
             available, path-reconstruction is straightforward;

         span-queues:
           move-spans-queue:
             a move-span encompasses a sequence of moves on the moves-queue
             having the same number of moves and pushes;
             after expansion, move-spans are not needed anymore and are
             recycled when all move-spans in a memory-block have been expanded;
           push-spans-queue:
             a push-span encompasses a sequence of pushes on the pushes-queue
             having the same number of moves and pushes;
             after expansion, push-spans are not needed anymore and are
             recycled when all push-spans in a memory-block have been expanded;

         search algorithm:
           the search is a breadth-first search;
           expansion of nodes is governed by the spans;
           expansion of move-spans and push-spans is in lockstep, so:
             1. a move-span with moves/pushes M/P-1 is expanded first;
             2. a push-span with moves/pushes M/P   is expanded next;
           initially, the start position is enqueued as a push-span;
           how to maintain the P-1/P relationship between the move-spans and
           the push-spans is best explained by an example:

                     moves                               pushes
           move M  : P-1_________________________________P_____________________
           step 1: first the span with M/P-1 moves is expanded, producing:
           move M+1: P-1_________________________________P_____________________

           step 2: then the M/P pushes are expanded, resulting in:
           move M+1: P-1,_P______________________________P,_P+1________________
           this completes the generation of moves at depth M+1;

           step 3: at the next move-depth, the expansion begins with the
           M+1/P-1 moves, producing:
           move M+2: P-1_________________________________P_____________________

           step 4: and continues with the expansion of the M+1/P pushes:
           move M+2: P-1,_P______________________________P,_P+1________________

           step 5: then follows the expansion of the second move-span with
           M+1/P moves (see the span in step 2):
           move M+2: P-1,_P,_P___________________________P,_P+1,_P+1___________

           step 6: and finally the second push-span with M+1/P+1 pushes (see
           the span in step 2):
           move M+2: P-1,_P,_P,_P+1______________________P,_P+1,_P+1,_P+2______

           there is clearly a pattern involved here; it boils down to that
           separating the different spans can be accomplished this way:
           1. after expanding a move-span with P pushes, the push-span with P+1
              pushes is complete; see step 3 and 5 above;
           2. a move-span with P pushes begins when the push-span with P pushes
              is selected for expansion; see step 4 and 6 above;
           implementation-wise, this amounts to adding a span-separator for
           each of the span-queues when the search changes from expanding moves
           to expanding pushes;

           game-states are marked as 'visited' the first time they are seen
           by the search;
---------------------------------------------


---------------------------------------------
          pushes/moves optimizer algorithm
---------------------------------------------

         game-state queues and repositories:
           moves-queue:
             game-states where the final step is a move;
             after expansion, moves are not needed anymore and are recycled
             when all moves in a memory-block have been expanded;
           pushes-queue:
             game-states where the final step is a push;
             pushes are kept throughout the search because the final
             path-reconstruction needs them; all saved game-states (also moves)
             carry their most recent push-ancestor; with this information
             available, path-reconstruction is straightforward;

         span-queues:
           move-spans-queue:
             a move-span encompasses a sequence of moves on the moves-queue
             having the same number of moves and pushes;
             after expansion, move-spans are not needed anymore and are
             recycled when all move-spans in a memory-block have been expanded;
           push-spans-queue:
             a push-span encompasses a sequence of pushes on the pushes-queue
             having the same number of moves and pushes;
             after expansion, push-spans are not needed anymore and are
             recycled when all push-spans in a memory-block have been expanded;

         search algorithm:
           the search is a breadth-first search;
           expansion of nodes is governed by the spans;
           expansion of move-spans and push-spans is in lockstep, so:
             1. a push-span with moves/pushes M/P is expanded first;
             2. a move-span with moves/pushes M/P is expanded next;
           initially, the start position is enqueued as a push-span;

           the search proceeds as follows:

           ..while the maximum search depth (pushes) hasn't been reached do
           ....increase the push search depth from P-1 to P
           ....at this point there are only pushes on the queues; there are no
           ....non-pushing moves; the pushes are sorted into spans in ascending
           ....moves order: M/P, M+1/P, M+2/P, ... M+n/P.
           ....while there are more spans with P pushes do (*)
           ......expand next push-span, which has M/P moves and pushes
           ......expand next move-span, which has M/P moves and pushes
           ......increase the moves-depth from M to M+1

           the expansion of game-states with M/P moves and pushes generates two
           types of successors:
             1. moves with M+1/P moves/pushes; they are put on the moves-queue
                for expansion within the current push search depth; in other
                words, they are expanded inside the 'while' loop marked by
                '(*)' above;
             2. pushes with M+1/P+1 moves/pushes; they are put on the
                pushes-queue for expansion after the outer loop increases the
                push search depth from P to P+1

           a small illustration helps clarify the process;
           ......O....    O: root node
           ...../.\...
           ....A...2..    2,3: non-pushing moves
           ......./.\.
           ......3...B    A,B,C: pushes
           .......\...
           ........C..

           the inner loop "flood-fills" the moves behind the contour made by
           the pushes at the next push search depth; when all moves have been
           expanded, the only unexpanded successors are the pushes at the next
           higher push search depth; at that time, the outer loop advances to
           the next higher search depth, starting with the pushes only, like
           this:

           ....A......
           ...........
           ..........B    A,B,C: pushes at move-depth M, M+1, M+2 respectively
           ...........
           ........C..
 
           game-states resulting from a non-pushing move are marked as
           'visited' the first time they are seen by the search;
           game-states resulting from a push are first marked as 'visited' when
           they are expanded, hence, there may be duplicate game-states on the
           pushes-queue;
---------------------------------------------

Pros:

  • Vicinity search is by far the best known optimization method.
  • It's not limited to local improvements like "N-box permutations".
  • The precalculation step as well as the search step can be implemented in a parallel fashion, taking advantage of multi-core CPUs. The YASO optimizer does that and is roughly twice as fast as a single-core implementation.

Cons:

  • It's time-consuming, and the precalculation step must be performed before any improvements can be found, meaning the user just has to wait.

In the YASO implementation, the user is first informed about the search results after each iteration. This is in contrast to the optimizer in Matthias Meger's JSoko program, which also reports improvements during each iteration. That's convenient for the spectator, but since it cost a little extra computation time to detect improvements while the iteration still is in progress, it's mostly a matter of taste whether the optimizer should do it.

Personal tools