Sok format
From Sokoban Wiki
m (→Implementation of the Sok Format) |
(Updated the reference implementation description to match the new file format. The reference implemenation itself is unchanged and still supports the abandoned features.) |
||
(8 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
- | |||
__FORCETOC__ | __FORCETOC__ | ||
Line 7: | Line 6: | ||
:: Sokoban (c) by Falcon Co., Ltd., Japan :: | :: Sokoban (c) by Falcon Co., Ltd., Japan :: | ||
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
- | :: File Format 0. | + | :: File Format 0.20 :: |
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
:: :: | :: :: | ||
Line 28: | Line 27: | ||
:: File Notes :: | :: File Notes :: | ||
:: File notes consist of unstructured text and :: | :: File notes consist of unstructured text and :: | ||
- | :: key/value properties, such as "Author: Name". | + | :: key/value properties, such as "Author: Name". :: |
- | :: beginning with "::" are comments | + | :: :: |
- | :: | + | :: Lines beginning with "::" are comments intended only :: |
- | :: | + | :: for someone examining the file in a text editor and :: |
- | + | :: should not be displayed by the Sokoban program. :: | |
:: :: | :: :: | ||
:: The optional but recommended property :: | :: The optional but recommended property :: | ||
:: "Collection: Name" assigns a name to the puzzle :: | :: "Collection: Name" assigns a name to the puzzle :: | ||
- | :: collection. | + | :: collection. This helps to preserve the collection’s :: |
- | :: internet | + | :: name when copied from a source, such as the :: |
- | + | :: internet, and pasted into a Sokoban program. :: | |
- | + | ||
:: :: | :: :: | ||
:: Titles :: | :: Titles :: | ||
:: A title line is the last non-blank text line before :: | :: A title line is the last non-blank text line before :: | ||
- | :: a board, | + | :: a board, saved game, or solution, provided the line :: |
- | :: | + | :: is either preceded by a blank line or is the only :: |
- | :: text line | + | :: text line in that position in the file. :: |
:: :: | :: :: | ||
- | :: Title lines are optional unless a single or | + | :: Title lines are optional unless a single or last :: |
:: text line from a preceding puzzle, saved game, :: | :: text line from a preceding puzzle, saved game, :: | ||
- | :: solution, or file | + | :: solution, or file notes can be mistaken for a title :: |
:: line. :: | :: line. :: | ||
:: :: | :: :: | ||
:: Puzzle Notes :: | :: Puzzle Notes :: | ||
:: Two special key/value pairs are supported in puzzle :: | :: Two special key/value pairs are supported in puzzle :: | ||
- | :: notes: "Title" and "Author", hence, titles can | + | :: notes: "Title" and "Author", hence, titles can be :: |
- | :: either | + | :: specified either in a title line or as a key/value :: |
:: pair. :: | :: pair. :: | ||
:: :: | :: :: | ||
Line 68: | Line 66: | ||
:: Goal square............: . . :............Goal square :: | :: Goal square............: . . :............Goal square :: | ||
:: Floor..................: :..................Floor :: | :: Floor..................: :..................Floor :: | ||
- | :: Floor..................: - | + | :: Floor..................: - :..................Floor :: |
+ | :: Floor..................: _ :..................Floor :: | ||
:: :: | :: :: | ||
:: Remarks: :: | :: Remarks: :: | ||
:: :: | :: :: | ||
- | :: The first and | + | :: The first and last non-empty square in each row must :: |
- | :: | + | :: be a wall or a box on a goal. An empty interior row :: |
- | :: | + | :: must be written with at least one "-" or "_". :: |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
:: :: | :: :: | ||
::::::::::::::::::::::::::: Moves :::::::::::::::::::::::::: | ::::::::::::::::::::::::::: Moves :::::::::::::::::::::::::: | ||
Line 99: | Line 88: | ||
:: Remarks: :: | :: Remarks: :: | ||
:: :: | :: :: | ||
- | :: | + | :: Jumps and pulls appear only in reverse mode saved :: |
- | + | :: games and solutions. Jumps are used to move the :: | |
- | + | :: pusher to the desired location before the reverse :: | |
- | :: | + | :: mode game begins. :: |
- | :: | + | |
- | + | ||
- | + | ||
- | :: | + | |
:: :: | :: :: | ||
- | :: Reverse mode saved games and solutions must | + | :: Reverse mode saved games and solutions must start :: |
- | :: with a jump, even if it is empty. | + | :: with a jump, even if it is empty, e.g., "[]Urrd". :: |
- | + | ||
:: :: | :: :: | ||
- | :: Pusher changes | + | :: Pusher changes apply only to puzzles with multiple :: |
- | :: pushers, | + | :: pushers, such as Multiban. Moves inside the braces :: |
- | :: | + | :: indicate the relative movement needed to switch from :: |
- | :: currently active pusher to the next active | + | :: the currently active pusher to the next active :: |
- | :: At | + | :: pusher. At the start of a game, a "{...}" sequence :: |
- | :: pusher relative to the top-left | + | :: activates the pusher relative to the top-left :: |
- | :: "{rddd}Urr{uul}uLU". If the top-left pusher is the | + | :: pusher. For example: "{rddd}Urr{uul}uLU". If the :: |
- | :: | + | :: top-left pusher is the first active pusher, the :: |
- | + | :: empty "{}" can be omitted. :: | |
:: :: | :: :: | ||
- | :: The current position is optional and defaults | + | :: The current position marker is optional and defaults :: |
- | :: position after the last move. | + | :: to the position after the last move. :: |
:: :: | :: :: | ||
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
Line 228: | Line 212: | ||
</pre> | </pre> | ||
- | = Implementation of the | + | = Implementation of the SOK Format = |
<div style="background-color:#FFFACD;padding:5px;border:1px solid #00008B;"> | <div style="background-color:#FFFACD;padding:5px;border:1px solid #00008B;"> | ||
The Sokoban puzzle file format is designed with user-friendliness in mind. Unlike stricter formats like HTML or XML, it allows puzzles to be written in a way that's easy for people to share and understand. This simplicity comes at a cost, though. While the files are plain text, the logic behind reading them can get tricky for the programmer if not handled correctly. | The Sokoban puzzle file format is designed with user-friendliness in mind. Unlike stricter formats like HTML or XML, it allows puzzles to be written in a way that's easy for people to share and understand. This simplicity comes at a cost, though. While the files are plain text, the logic behind reading them can get tricky for the programmer if not handled correctly. | ||
- | This guide will demonstrate how to create a program that effectively parses these files. To complement the guide, a fully functional and efficient reference implementation of both a file reader and writer, written in Common Lisp, is available [https://sourceforge.net/projects/sokoban-solver-statistics/ here]. The Common Lisp code can easily be converted to other programming languages, and this process can even be partially automated. | + | This guide will demonstrate how to create a program that effectively parses these files. To complement the guide, a fully functional and efficient reference implementation of both a puzzle file reader and writer, written in Common Lisp, is available [https://sourceforge.net/projects/sokoban-solver-statistics/ here]. The reference implementation supports additional features, like run-length encoding, which were once part of the file format definition but were later abandoned because they were rarely used and added implementation complexity that seemed unnecessary. The Common Lisp code can easily be converted to other programming languages, and this process can even be partially automated. |
===Main Procedure=== | ===Main Procedure=== | ||
- | '''Key Observation | + | '''Key Observation''' |
One crucial concept to remember is that the program will encounter notes until it stumbles upon a puzzle board, a snapshot (an unsolved state of the puzzle), or a solution (a snapshot representing a completed puzzle). This observation forms the core of our program's structure, as reflected in the following pseudocode: | One crucial concept to remember is that the program will encounter notes until it stumbles upon a puzzle board, a snapshot (an unsolved state of the puzzle), or a solution (a snapshot representing a completed puzzle). This observation forms the core of our program's structure, as reflected in the following pseudocode: | ||
Line 246: | Line 230: | ||
... (more to come) | ... (more to come) | ||
- | '''Prioritizing Notes | + | '''Prioritizing Notes''' |
At first glance, focusing on notes might seem strange since puzzle boards are the heart of the file. However, this approach has a significant advantage. Not only does the file begin with notes, but each puzzle and snapshot can have their own set of notes as well. This means that whenever we encounter a puzzle board or a snapshot, we need to grab any associated notes before moving on. | At first glance, focusing on notes might seem strange since puzzle boards are the heart of the file. However, this approach has a significant advantage. Not only does the file begin with notes, but each puzzle and snapshot can have their own set of notes as well. This means that whenever we encounter a puzzle board or a snapshot, we need to grab any associated notes before moving on. | ||
Line 265: | Line 249: | ||
snapshot.notes := readNotes() | snapshot.notes := readNotes() | ||
- | '''"Orphaned" Snapshots | + | '''"Orphaned" Snapshots''' |
- | The reason we handle "orphaned snapshots" is because it's sometimes useful to load snapshots and solutions separately for further processing. A common scenario | + | The reason we handle "orphaned snapshots" is because it's sometimes useful to load snapshots and solutions separately for further processing. A common scenario is loading a file containing solutions (perhaps from the clipboard) to pair them with existing puzzles. |
- | '''Recurring Pattern | + | '''Recurring Pattern''' |
It's worth noting the repeated use of the "readNotes()" function right before the "while" loop restarts. We'll see this pattern again as we complete the main function by adding the logic to load snapshots associated with a specific puzzle. | It's worth noting the repeated use of the "readNotes()" function right before the "while" loop restarts. We'll see this pattern again as we complete the main function by adding the logic to load snapshots associated with a specific puzzle. | ||
Line 293: | Line 277: | ||
post-processing, e.g., make titles unique and resistant to misinterpretation | post-processing, e.g., make titles unique and resistant to misinterpretation | ||
- | '''Wrapping Up | + | '''Wrapping Up''' |
With these additions, our main function is complete! It's both straightforward and robust. We've also introduced the concept of optional titles for puzzles and snapshots. These titles are essentially header lines from the text file, and the "readNotes()" function is responsible for differentiating them from regular notes belonging to the preceding element. | With these additions, our main function is complete! It's both straightforward and robust. We've also introduced the concept of optional titles for puzzles and snapshots. These titles are essentially header lines from the text file, and the "readNotes()" function is responsible for differentiating them from regular notes belonging to the preceding element. | ||
Line 314: | Line 298: | ||
As you can see, the function accumulates text lines until it encounters one of three conditions: a puzzle board, a snapshot, or the end of the file. | As you can see, the function accumulates text lines until it encounters one of three conditions: a puzzle board, a snapshot, or the end of the file. | ||
- | '''Title Line Detection | + | '''Title Line Detection''' |
The slightly tricky part for "readNotes()" is identifying a potential title line preceding the next puzzle board or snapshot. Here's the key: we discard trailing blank lines, but at first, we need to keep initial blank lines. This distinction plays a role in determining whether a line is interpreted as a title or simply part of the notes. | The slightly tricky part for "readNotes()" is identifying a potential title line preceding the next puzzle board or snapshot. Here's the key: we discard trailing blank lines, but at first, we need to keep initial blank lines. This distinction plays a role in determining whether a line is interpreted as a title or simply part of the notes. | ||
Line 331: | Line 315: | ||
In the second scenario, the blank line separates the notes from the title line. | In the second scenario, the blank line separates the notes from the title line. | ||
- | '''Complete Implementation | + | '''Complete Implementation''' |
The complete implementation of "readNotes()" is shown below. The description includes all the necessary details, allowing for a straightforward translation into your preferred programming language. | The complete implementation of "readNotes()" is shown below. The description includes all the necessary details, allowing for a straightforward translation into your preferred programming language. | ||
Line 345: | Line 329: | ||
trim notes text lines by removing: | trim notes text lines by removing: | ||
- | |||
- trailing spaces, tabs, and other control characters | - trailing spaces, tabs, and other control characters | ||
Line 363: | Line 346: | ||
===Board Text Lines=== | ===Board Text Lines=== | ||
- | + | To distinguish puzzle boards from notes, snapshots, and solutions, the program checks for specific characteristics: | |
- | + | ||
- | + | ||
- | + | ||
* '''Structure:''' The first non-empty column in each row must contain a wall or a box on a goal. This ensures a minimum level of structure, making puzzle boards easier to locate, especially in large files. | * '''Structure:''' The first non-empty column in each row must contain a wall or a box on a goal. This ensures a minimum level of structure, making puzzle boards easier to locate, especially in large files. | ||
Line 379: | Line 359: | ||
It's important to note that this validation focuses solely on the structure of the board. Whether the puzzle is well-formed (e.g., has a player, matching boxes and goals) is only checked when a user attempts to solve it. | It's important to note that this validation focuses solely on the structure of the board. Whether the puzzle is well-formed (e.g., has a player, matching boxes and goals) is only checked when a user attempts to solve it. | ||
- | '''Function Implementation | + | '''Function Implementation''' |
The "boardLines?()" function, despite its name, actually returns the identified puzzle rows rather than a simple true/false value. Here's the pseudocode: | The "boardLines?()" function, despite its name, actually returns the identified puzzle rows rather than a simple true/false value. Here's the pseudocode: | ||
function boardLines?() | function boardLines?() | ||
- | if the current text line contains board | + | if the current text line contains a board row and |
- | + | this board row contains at least one non-empty square then | |
- | collect text lines until | + | collect text lines until encountering: |
- | + | 1. a text line that doesn't contain a board row | |
- | while the last | + | 2. the end of the file |
- | discard | + | |
+ | while the last found board row is empty | ||
+ | discard the last board row | ||
put the last collected text line back on the queue (treat it as a notes text line) | put the last collected text line back on the queue (treat it as a notes text line) | ||
Line 399: | Line 381: | ||
===Snapshot Moves Text Lines=== | ===Snapshot Moves Text Lines=== | ||
- | + | '''Function Implementation''' | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | '''Function Implementation | + | |
The "snapshotLines?()" function, similar to "boardLines?()", returns the actual snapshot lines rather than a simple true/false value. Here's the pseudocode: | The "snapshotLines?()" function, similar to "boardLines?()", returns the actual snapshot lines rather than a simple true/false value. Here's the pseudocode: | ||
Line 413: | Line 390: | ||
2. the end of the file | 2. the end of the file | ||
- | We're nearing the completion of these functions! We've defined the "plural" functions | + | We're nearing the completion of these functions! We've defined the "plural" functions "boardLines?()" and "snapshotLines?()" that handle collections of lines. Now, let's explore their corresponding "singular" counterparts, which examine a single text line at a time. |
===Board Text Line=== | ===Board Text Line=== | ||
- | + | '''Function Implementation''' | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | '''Function Implementation | + | |
Here's the pseudocode for "boardLine?()": | Here's the pseudocode for "boardLine?()": | ||
function boardLine?( text ) | function boardLine?( text ) | ||
- | |||
if the text contains only legal board characters then | if the text contains only legal board characters then | ||
- | + | return the right-trimmed text as one board row | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | '''Explanation | + | '''Explanation''' |
- | + | Since the space character is a valid board symbol, there's no need to right-trim the text line before validation. However, if the text line represents a valid board row, the result is returned with trailing spaces removed. Left-trimming is avoided to preserve the row indentation. | |
===Snapshot Moves Text Line=== | ===Snapshot Moves Text Line=== | ||
- | + | '''Function Implementation''' | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | '''Function Implementation | + | |
Here's the pseudocode for "snapshotLine?()": | Here's the pseudocode for "snapshotLine?()": | ||
function snapshotLine?( text ) | function snapshotLine?( text ) | ||
- | + | if the text contains only legal snapshot text line characters and | |
- | if the text contains only legal snapshot characters and | + | |
the text contains at least one move (a direction character) then | the text contains at least one move (a direction character) then | ||
- | return | + | return the trimmed text as snapshot moves |
- | + | ||
- | + | ||
- | '''Explanation | + | '''Explanation''' |
- | The function verifies | + | The function verifies whether the line represents a valid sequence of snapshot moves. If valid, the trimmed text is returned as snapshot moves. Unlike a board text line, both left-trimming and right-trimming are permitted for a snapshot text line. |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
===Utility Functions=== | ===Utility Functions=== | ||
Line 478: | Line 425: | ||
Here's a final noteworthy utility function: | Here's a final noteworthy utility function: | ||
- | '''make-interpretation-resistant-title(title)''': This function addresses a potential issue where a title line preceding a puzzle or snapshot could be mistakenly interpreted as part of a puzzle or snapshot itself. For example, titles like "Dull" or "Rud" might cause confusion. | + | '''make-interpretation-resistant-title( title )''': This function addresses a potential issue where a title line preceding a puzzle or snapshot could be mistakenly interpreted as part of a puzzle or snapshot itself. For example, titles like "Dull" or "Rud" might cause confusion. |
The function ensures clarity by enclosing such titles in quotation marks. Here's the pseudocode: | The function ensures clarity by enclosing such titles in quotation marks. Here's the pseudocode: | ||
Line 490: | Line 437: | ||
===Conclusion=== | ===Conclusion=== | ||
- | + | This concludes our guide! As mentioned in the introduction, an accompanying [https://sourceforge.net/projects/sokoban-solver-statistics/ reference implementation] with additional features is available, written in Common Lisp, which can easily be adapted to other programming languages. | |
+ | |||
</div> | </div> | ||
Line 507: | Line 455: | ||
format. To do this, SokoSave Mobile implements heuristics based directly | format. To do this, SokoSave Mobile implements heuristics based directly | ||
on the SokoSplit utility (with a few small bug fixes): | on the SokoSplit utility (with a few small bug fixes): | ||
- | + | https://sokosave.org/sokosavedesktop/sokosplit/ | |
Here is a brief description of the heuristic. For each puzzle, perform | Here is a brief description of the heuristic. For each puzzle, perform |
Current revision as of 09:47, 26 September 2024
Contents |
Header added to the level files by the program "Sokoban YASC"
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Sokoban (c) by Falcon Co., Ltd., Japan :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: File Format 0.20 :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: :: :: File Notes Optional :: :: Puzzle 1 Required :: :: Title Optional* :: :: Board See legend :: :: Puzzle Notes Optional :: :: Saved Game or Solution 1 Optional :: :: Title Optional* :: :: Moves See legend :: :: Notes Optional :: :: Saved Game or Solution 2 Optional :: :: ... (more saved games and solutions) :: :: Puzzle 2 Optional :: :: ... (more puzzles) :: :: :: :: Remarks: :: :: :: :: File Notes :: :: File notes consist of unstructured text and :: :: key/value properties, such as "Author: Name". :: :: :: :: Lines beginning with "::" are comments intended only :: :: for someone examining the file in a text editor and :: :: should not be displayed by the Sokoban program. :: :: :: :: The optional but recommended property :: :: "Collection: Name" assigns a name to the puzzle :: :: collection. This helps to preserve the collection’s :: :: name when copied from a source, such as the :: :: internet, and pasted into a Sokoban program. :: :: :: :: Titles :: :: A title line is the last non-blank text line before :: :: a board, saved game, or solution, provided the line :: :: is either preceded by a blank line or is the only :: :: text line in that position in the file. :: :: :: :: Title lines are optional unless a single or last :: :: text line from a preceding puzzle, saved game, :: :: solution, or file notes can be mistaken for a title :: :: line. :: :: :: :: Puzzle Notes :: :: Two special key/value pairs are supported in puzzle :: :: notes: "Title" and "Author", hence, titles can be :: :: specified either in a title line or as a key/value :: :: pair. :: :: :: ::::::::::::::::::::::::::: Board :::::::::::::::::::::::::: :: Legend.................: :.................Legend :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Wall...................: # # :...................Wall :: :: Pusher.................: p @ :.................Pusher :: :: Pusher on goal square..: P + :..Pusher on goal square :: :: Box....................: b $ :....................Box :: :: Box on goal square.....: B * :.....Box on goal square :: :: Goal square............: . . :............Goal square :: :: Floor..................: :..................Floor :: :: Floor..................: - :..................Floor :: :: Floor..................: _ :..................Floor :: :: :: :: Remarks: :: :: :: :: The first and last non-empty square in each row must :: :: be a wall or a box on a goal. An empty interior row :: :: must be written with at least one "-" or "_". :: :: :: ::::::::::::::::::::::::::: Moves :::::::::::::::::::::::::: :: Legend.................: :.................Legend :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Move pusher up.........: u U :.......Push/pull box up :: :: Move pusher down.......: d D :.....Push/pull box down :: :: Move pusher left.......: l L :.....Push/pull box left :: :: Move pusher right......: r R :....Push/pull box right :: :: Begin jump.............: [ ] :...............End jump :: :: Begin pusher change....: { } :......End pusher change :: :: Current position.......: * * :.......Current position :: :: :: :: Remarks: :: :: :: :: Jumps and pulls appear only in reverse mode saved :: :: games and solutions. Jumps are used to move the :: :: pusher to the desired location before the reverse :: :: mode game begins. :: :: :: :: Reverse mode saved games and solutions must start :: :: with a jump, even if it is empty, e.g., "[]Urrd". :: :: :: :: Pusher changes apply only to puzzles with multiple :: :: pushers, such as Multiban. Moves inside the braces :: :: indicate the relative movement needed to switch from :: :: the currently active pusher to the next active :: :: pusher. At the start of a game, a "{...}" sequence :: :: activates the pusher relative to the top-left :: :: pusher. For example: "{rddd}Urr{uul}uLU". If the :: :: top-left pusher is the first active pusher, the :: :: empty "{}" can be omitted. :: :: :: :: The current position marker is optional and defaults :: :: to the position after the last move. :: :: :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: An example file: ---------------------------------------------------------------------- Collection: YASGen Author: YASGen & Brian Damgaard Copyright (c) 2003 by Brian Damgaard These levels may be freely distributed provided they are credited with the author's name. Chaos ##### ###p .# # b #.# # bb # #. # # # b.# ####### Solution/Moves dDuurrddLLrrddLLUlluuRDRddrruuLLrruullDlldddRRuULrrruullDldRddlUruuurr ddddLLuuRlddrruUUdlldlldRRuuulDrddlluRurrrddLLUluRRlddrruUlldlldRRRllu uulD ---------------------------------------------------------------------- More level examples: Demo Level 01 ######## # # #@ $ # # $ # # . . # ######## This demo level is trivial, but it suffices to demonstrate the file-format. A Sokoban program can add key/value pairs within the notes, such as: Author: nn Website: http://www.nn.net Solution/Moves rurrdDuLulDD The title of a snapshot does not bear any special meaning. This snapshot just happens to be the best solution found (so far). The demo-program automatically saves best solutions. Snapshot 7/0 urrrrdd This is an example of a snapshot saved by the user. Later he/she can continue work on this path. Snapshot 9/2 urrrrddLL*ruulDD Another snapshot saved by the user. The '*' indicates current position, i.e., the last moves were taken back, but are still available for the "redo" function. Demo Level 02 ######## # .# #@ $ # # $ # # . * # ######## Just another trivial demo level. The number of levels in a file is limited by available memory only. Solution/Moves rRRRdrUdlLLulD Reverse Mode Snapshot 13/6 [rrrd]UUrLLLdrD This is an example of a reverse mode snapshot. Reverse mode snapshots always have a leading jump-sequence, even if it is empty. An example: "[]Ulld...". Demo Level 03 ######## # # # $ # # $ # # + . # ######## Solution/Moves rrruulLrDuullDD
Implementation of the SOK Format
The Sokoban puzzle file format is designed with user-friendliness in mind. Unlike stricter formats like HTML or XML, it allows puzzles to be written in a way that's easy for people to share and understand. This simplicity comes at a cost, though. While the files are plain text, the logic behind reading them can get tricky for the programmer if not handled correctly.
This guide will demonstrate how to create a program that effectively parses these files. To complement the guide, a fully functional and efficient reference implementation of both a puzzle file reader and writer, written in Common Lisp, is available here. The reference implementation supports additional features, like run-length encoding, which were once part of the file format definition but were later abandoned because they were rarely used and added implementation complexity that seemed unnecessary. The Common Lisp code can easily be converted to other programming languages, and this process can even be partially automated.
Main Procedure
Key Observation
One crucial concept to remember is that the program will encounter notes until it stumbles upon a puzzle board, a snapshot (an unsolved state of the puzzle), or a solution (a snapshot representing a completed puzzle). This observation forms the core of our program's structure, as reflected in the following pseudocode:
function readPuzzleFile() file.notes := readNotes() while found-puzzle-board? or found-snapshot? ... (more to come)
Prioritizing Notes
At first glance, focusing on notes might seem strange since puzzle boards are the heart of the file. However, this approach has a significant advantage. Not only does the file begin with notes, but each puzzle and snapshot can have their own set of notes as well. This means that whenever we encounter a puzzle board or a snapshot, we need to grab any associated notes before moving on.
function readPuzzleFile() file.notes := readNotes() while found-puzzle-board? or found-snapshot? if found-puzzle-board? then puzzle := makePuzzle(puzzle-board) file.puzzles += puzzle puzzle.notes := readNotes() ... (more to come) else (found an orphaned snapshot) puzzle := makePuzzle(no board) file.puzzles += puzzle puzzle.snapshots += snapshot snapshot.notes := readNotes()
"Orphaned" Snapshots
The reason we handle "orphaned snapshots" is because it's sometimes useful to load snapshots and solutions separately for further processing. A common scenario is loading a file containing solutions (perhaps from the clipboard) to pair them with existing puzzles.
Recurring Pattern
It's worth noting the repeated use of the "readNotes()" function right before the "while" loop restarts. We'll see this pattern again as we complete the main function by adding the logic to load snapshots associated with a specific puzzle.
function readPuzzleFile() file.notes := readNotes() while found-puzzle-board? or found-snapshot? if found-puzzle-board? then puzzle := makePuzzle(puzzle-board, optional-title) file.puzzles += puzzle puzzle.notes := readNotes() while found-snapshot? puzzle.snapshots += snapshot snapshot.title := optional-title snapshot.notes := readNotes() else (found an orphaned snapshot) puzzle := makePuzzle(no board) file.puzzles += puzzle puzzle.snapshots += snapshot snapshot.title := optional-title snapshot.notes := readNotes() post-processing, e.g., make titles unique and resistant to misinterpretation
Wrapping Up
With these additions, our main function is complete! It's both straightforward and robust. We've also introduced the concept of optional titles for puzzles and snapshots. These titles are essentially header lines from the text file, and the "readNotes()" function is responsible for differentiating them from regular notes belonging to the preceding element.
As you can see, the "readNotes()" function carries a significant responsibility. However, this modular approach keeps the code well-organized and easy to manage.
Notes Text Lines
Now, let's take a closer look at the "readNotes()" function.
function readNotes() ... (initialize return values) collect notes text lines until encountering: 1. a puzzle board 2. a snapshot 3. the end of the file ... (more to come)
As you can see, the function accumulates text lines until it encounters one of three conditions: a puzzle board, a snapshot, or the end of the file.
Title Line Detection
The slightly tricky part for "readNotes()" is identifying a potential title line preceding the next puzzle board or snapshot. Here's the key: we discard trailing blank lines, but at first, we need to keep initial blank lines. This distinction plays a role in determining whether a line is interpreted as a title or simply part of the notes.
Imagine these two scenarios:
- Scenario 1 (No Title Line):
I am a notes text line I am a notes text line too
- Scenario 2 (With Title Line):
I am a notes text line [blank line] I am the title of the next puzzle or snapshot, if any
In the second scenario, the blank line separates the notes from the title line.
Complete Implementation
The complete implementation of "readNotes()" is shown below. The description includes all the necessary details, allowing for a straightforward translation into your preferred programming language.
function readNotes() initialize return values: notes, puzzle-board, snapshot, run-length-encoded-snapshot?, and optional-title collect notes text lines until encountering: 1. a puzzle board 2. a snapshot 3. the end of the file trim notes text lines by removing: - trailing spaces, tabs, and other control characters discard trailing blank lines (empty lines at the end) if a puzzle board or snapshot is found: if there are notes lines: and (only one line exists in the notes or the second-to-last line in the notes is blank) then optional-title := the last line in notes discard the last line in notes discard trailing blank lines discard leading blank lines (empty lines at the beginning)
Board Text Lines
To distinguish puzzle boards from notes, snapshots, and solutions, the program checks for specific characteristics:
- Structure: The first non-empty column in each row must contain a wall or a box on a goal. This ensures a minimum level of structure, making puzzle boards easier to locate, especially in large files.
- Minimum size: The board must have at least three rows and three columns.
- Empty rows: Empty rows are used for decoration and should only appear within the board, not at the beginning or end.
Here's the process the program follows:
- Tentative collection: When encountering a line that might contain puzzle rows, the program initially collects that line along with all subsequent lines that also seem like puzzle rows.
- Validation: The program then analyzes the collected lines to determine if they actually form a valid puzzle board. If not, or if the "tail" ends with an empty row (which is invalid), the program puts as many of the collected lines back on the queue as necessary, so they will be treated as notes.
It's important to note that this validation focuses solely on the structure of the board. Whether the puzzle is well-formed (e.g., has a player, matching boxes and goals) is only checked when a user attempts to solve it.
Function Implementation
The "boardLines?()" function, despite its name, actually returns the identified puzzle rows rather than a simple true/false value. Here's the pseudocode:
function boardLines?() if the current text line contains a board row and this board row contains at least one non-empty square then collect text lines until encountering: 1. a text line that doesn't contain a board row 2. the end of the file while the last found board row is empty discard the last board row put the last collected text line back on the queue (treat it as a notes text line) if the collected board rows live up to the minimum size criteria then return the collected board rows, left-justified else put all collected text lines back on the queue (treat them as notes)
Snapshot Moves Text Lines
Function Implementation
The "snapshotLines?()" function, similar to "boardLines?()", returns the actual snapshot lines rather than a simple true/false value. Here's the pseudocode:
function snapshotLines?() collect text lines with moves until encountering: 1. a line that doesn't contain moves 2. the end of the file
We're nearing the completion of these functions! We've defined the "plural" functions "boardLines?()" and "snapshotLines?()" that handle collections of lines. Now, let's explore their corresponding "singular" counterparts, which examine a single text line at a time.
Board Text Line
Function Implementation
Here's the pseudocode for "boardLine?()":
function boardLine?( text ) if the text contains only legal board characters then return the right-trimmed text as one board row
Explanation
Since the space character is a valid board symbol, there's no need to right-trim the text line before validation. However, if the text line represents a valid board row, the result is returned with trailing spaces removed. Left-trimming is avoided to preserve the row indentation.
Snapshot Moves Text Line
Function Implementation
Here's the pseudocode for "snapshotLine?()":
function snapshotLine?( text ) if the text contains only legal snapshot text line characters and the text contains at least one move (a direction character) then return the trimmed text as snapshot moves
Explanation
The function verifies whether the line represents a valid sequence of snapshot moves. If valid, the trimmed text is returned as snapshot moves. Unlike a board text line, both left-trimming and right-trimming are permitted for a snapshot text line.
Utility Functions
Here's a final noteworthy utility function:
make-interpretation-resistant-title( title ): This function addresses a potential issue where a title line preceding a puzzle or snapshot could be mistakenly interpreted as part of a puzzle or snapshot itself. For example, titles like "Dull" or "Rud" might cause confusion.
The function ensures clarity by enclosing such titles in quotation marks. Here's the pseudocode:
function make-interpretation-resistant-title( title ) if boardLine?( title ) or snapshotLine?( title ) then return the title enclosed by quotation marks else return the title
Conclusion
This concludes our guide! As mentioned in the introduction, an accompanying reference implementation with additional features is available, written in Common Lisp, which can easily be adapted to other programming languages.
Note: Some programs like Sokoban YASC resctrict the level titles to valid Windows file names. This means characters like these ": \ * ? [ ] ; < > | / " are ignored in the level title. This however it not part of the sok-file format.
Note from Eric Sunshine on the Yahoo Sokoban group:
If you would like to support other cases, such as when much or all of the meta-data appears before the puzzle, then you likely will need to implement a more complex heuristic for determining which non-puzzle data belongs to each puzzle. SokoSave Mobile takes this approach, trying very hard to intuit which information belongs with which puzzle, since many older collections are formatted in ways not compatible with the .sok format. To do this, SokoSave Mobile implements heuristics based directly on the SokoSplit utility (with a few small bug fixes): https://sokosave.org/sokosavedesktop/sokosplit/
Here is a brief description of the heuristic. For each puzzle, perform the following steps in order:
1. If there is a blank line immediately before the puzzle, assign it to the puzzle.
2. Assign all following unassigned non-blank lines to the puzzle.
3. Assign all preceding unassigned non-blank lines to the puzzle. These lines precede the blank line (if present) assigned to the puzzle in step 1.
4. Assign all following unassigned lines (blank or not) to the puzzle.
5. Optional: Clean up by trimming leading and trailing blank lines from the collected meta-data. (Internal blank lines are retained.)
This heuristic works correctly with all of the old puzzle collections I have sitting around which were downloaded years ago, as well as with modern collections available for download. The heuristic also is a superset of the YASC .sok parsing, so it works properly with those collections, as well.