I have Scheme code to generate random mazes using an algorithm based on Tarjan's disjoint-set data-structure with amortised union-find operations. It runs on top of Scheme 48. However, it is pretty simple code, and should be easy to port to other systems.

The program has two back-ends: one uses the Functional PostScript package to render to PostScript, the other prints out mazes with a text representation. The maze is generated on a hexagonal tiling of a square area, just for fun.

You can see a large sample maze, it's solution, a smaller maze, or a maze for the truly persistent, all in PostScript format.

By the way: one of the students here,
Robert Givan,
*has* solved the
maze for the truly persistent
by hand.
So it can be done.
If you are as crazy as Robert is.

The problem with this algorithm is that the initial paths will be long corridors snaking through the maze. As a result, the later branches off of these corridors will be constrained to be short digressions. So the true branch factor is not high; once you get onto one of the long "highways," it's hard to get lost.

A better algorithm uses the fast union/find disjoint-set data-structure.
Again, we start out by tiling the maze area,
then we randomly knock down walls in the maze until all the cells are
connected.
We never knock down a wall between two cells *c1* and *c2*
if there is already a path from *c1* to *c2*,
so that when we are done, there is only one path between any two cells.

The algorithm maintains sets of cells representing connected components
in the maze.
That is, cells *c1* and *c2* are in set S iff there is a
path from *c1* to *c2*.
When we knock down a wall between two cells, we union their sets together.

The algorithm starts by putting each cell in a singleton set, so there
are as many sets as there are cells.
Then it builds a vector containing all the walls in the maze;
this vector is then randomly permuted.
The inner loop of the maze builder scans the wall vector;
for each wall w, it considers the wall's neighboring cells,
*c1* and *c2*.
If *c1* and *c2* are in the same set, there is already a path
between the two cells, so the wall is left in place.
Otherwise, the wall is knocked down (which joins the two regions of the
maze), and the two sets are unioned together.
If the resulting set contains all the cells in the maze, we are done.

After digging the maze, the program finds the path length between every pair of top-row and bottom-row cells. The pair with the longest path between them are chosen as the entry and exit cells, and are opened up accordingly. So we choose the most difficult entry and exit cell we can, given the maze. This is done by using DFS to build a tree for the maze, then re-rooting the tree at each top-row cell, and searching backwards from each bottom-row cell, counting path length.

Finally, the maze is printed out using one of the back-ends.

The Pentium box was thrashing the disk the whole time it ran; wall time was much, much greater than 102 CPU minutes, and the entire computer was essentially unusable for any other task while the maze was being built. For starters, an algorithm that traverses a data-structure in a random order is, by design, going to exhibit zero locality. Secondly, Scheme 48's garbage collector is not generational, and hence copies the megabytes of maze data-structure back and forth on every GC. I could probably have helped the GC issue by dumping out a custom heap image containing only the maze code --- flushing the rest of the runtime (compiler, repl loop, and so forth and so forth) --- thus minimising the heap data.

It is intriguing to consider the color PostScript printer we have
over at the Media Lab.
It prints on rolls of paper three feet wide;
images can be as long as you like.
Then you could *really* get lost.

Olin Shivers / shivers at ccs dot neu dot edu