**- 2010 -**

D. Kunkle. **Roomy: A System for Space Limited
Computations** (invited tutorial), in *Proceedings of the International
Workshop on Parallel Symbolic Computation* (PASCO '10), Grenoble, France,
22-25, 2010.

[Publisher]
[PDF]
[Abstract]

There are numerous examples of problems in symbolic algebra in which the
required storage grows far beyond the limitations even of the distributed RAM
of a cluster. Often this limitation determines how large a problem one can
solve in practice. Roomy provides a minimally invasive system to modify the
code for such a computation, in order to use the local disks of a cluster or a
SAN as a transparent extension of RAM.

Roomy is implemented as a C/C++ library. It provides some simple data
structures (arrays, unordered lists, and hash tables). Some typical
programming constructs that one might employ in Roomy are: map, reduce,
duplicate elimination, chain reduction, pair reduction, and breadth-first
search. All aspects of parallelism and remote I/O are hidden within the Roomy
library.

D. Kunkle, V. Slavici and G. Cooperman. **Parallel
Disk-Based Computation for Large, Monolithic Binary Decision Diagrams**, in
*Proceedings of the International Workshop on Parallel Symbolic Computation*
(PASCO '10), Grenoble, France, 63-72, 2010.

[Publisher]
[PDF]
[Abstract]

Binary Decision Diagrams (BDDs) are widely used in formal verification.
They are also widely known for consuming large amounts of memory.
For larger problems, a BDD computation will often start thrashing due
to lack of memory within minutes. This work uses the parallel disks
of a cluster or a SAN (storage area network) as an extension of RAM,
in order to efficiently compute with BDDs that are orders of magnitude
larger than what is available on a typical computer. The use of parallel
disks overcomes the bandwidth problem of single disk methods, since the
bandwidth of 50 disks is similar to the bandwidth of a *single* RAM
subsystem. In order to overcome the latency issues of disk, the Roomy
library is used for the sake of its latency-tolerant data structures.
A breadth-first algorithm is implemented. A further advantage of the
algorithm is that RAM usage can be very modest, since its largest use is
as buffers for open files. The success of the method is demonstrated
by solving the 16-queens problem, and by solving a more unusual
problem — counting the number of tie games in a three-dimensional
4x4x4 tic-tac-toe board.

V. Slavici, X. Dong, D. Kunkle and G. Cooperman. **Fast
Multiplication of Large Permutations for Disk, Flash Memory and RAM**, in
*Proceedings of the International Symposium on Symbolic and Algebraic
Computation* (ISSAC '10), Munich, Germany, 355-362, 2010.

[Publisher]
[PDF]
[Abstract]

Permutation multiplication (or permutation composition) is perhaps the simplest
of all algorithms in computer science. Yet for large permutations, the
standard algorithm is not the fastest for disk or for flash, and surprisingly,
it is not even the fastest algorithm for RAM on recent multi-core CPUs. On a
recent commodity eight-core machine we demonstrate a novel algorithm that is
50% faster than the traditional algorithm, along with strong evidence that it
will be up to twice as fast on future many-core computers. For larger
permutations on flash or disk, the novel algorithm is orders of magnitude
faster. A disk-parallel algorithm is demonstrated that can multiply two
permutations with 12.8 billion points using 16 parallel local disks of a
cluster in under one hour. Such large permutations are important in
computational group theory, where they arise as the result of the well-known
Todd-Coxeter coset enumeration algorithm. The novel algorithm emphasizes
several passes of streaming access to the data instead of the traditional
single pass using random access to the data. Similar novel algorithms are
presented for permutation inverse and permutation multiplication by an inverse,
thus providing a complete library of the underlying permutation operations
needed for computations with permutation groups.

**- 2009 -**

D. Kunkle and G. Cooperman.
**Biased Tadpoles: a Fast Algorithm for Centralizers in Large Matrix Groups**,
in *Proceedings of the International Symposium on
Symbolic and Algebraic Computation* (ISSAC '09), ACM Press, 223--230, 2009.

[Publisher]
[PDF]
[Abstract]

Centralizers are an important tool in in computational group theory. Yet for
large matrix groups, they tend to be slow. We demonstrate a
*O*(√|*G*|(1/logε)) black box randomized algorithm that
produces a centralizer using space logarithmic in the order of the centralizer,
even for typical matrix groups of order 10^{20} . An optimized version
of this algorithm (larger space and no longer black box) typically runs in
seconds for groups of order 10^{15} and minutes for groups of order
10^{20}. Further, the algorithm trivially parallelizes, and so linear
speedup is achieved in an experiment on a computer with four CPU cores. The
novelty lies in the use of a biased tadpole, which delivers an order of
magnitude speedup as compared to the classical tadpole algorithm. The biased
tadpole also allows a test for membership in a conjugacy class in a fraction of
a second. Finally, the same methodology quickly finds the order of a matrix
group via a vector stabilizer. This allows one to eliminate the already small
possibility of error in the randomized centralizer algorithm.

D. Kunkle and G. Cooperman. **Harnessing Parallel Disks to
Solve Rubik's Cube**, *Journal of Symbolic Computation*, Volume 44, Issue 7,
872--890, July, 2009.

[Publisher]
[PDF]
[Abstract]

The number of moves required to solve any configuration of Rubik's cube has
held a fascination for over 25 years. A new upper bound of 26 is produced. More
important, a new methodology is described for finding upper bounds. The novelty
is two-fold. First, parallel disks are employed. This allows
1.4x10^{12} states representing symmetrized cosets to be enumerated in
seven terabytes. Second, a faster table-based multiplication is described for
symmetrized cosets that attempts to keep most tables in the CPU cache. This
enables the product of a symmetrized coset by a generator at a rate of 10
million moves per second.

**- 2008 -**

D. Kunkle and J. Schindler. **A Load Balancing Framework
for Clustered Storage Systems**, in *Proceedings of the IEEE International
Conference on High Performance Computing* (HiPC '08), Bangalore, 57--72, 2008.

[Publisher]
[PDF]
[Abstract]

The load balancing framework for high-performance clustered storage systems
presented in this paper provides a general method for reconfiguring a system
facing dynamic workload changes. It simultaneously balances load and minimizes
the cost of reconfiguration. It can be used for automatic reconfiguration or to
present an administrator with a range of (near) optimal reconfiguration
options, allowing a tradeoff between load distribution and reconfiguration
cost. The framework supports a wide range of measures for load imbalance and
reconfiguration cost, as well as several optimization techniques. The
effectiveness of this framework is demonstrated by balancing the workload on a
NetApp Data ONTAP GX system, a commercial scale-out clustered NFS server
implementation. The evaluation scenario considers consolidating two real world
systems, with hundreds of users each: a six-node clustered storage system
supporting engineering workloads and a legacy system supporting three email
severs.

D. Kunkle and G. Cooperman. **Solving Rubik's Cube: Disk is
the New RAM**, *Communications of the ACM* (CACM), New York, Volume 51,
Issue 4, 31--33, April 2008.

[Publisher]
[PDF]
[Abstract]

Substituting disk for RAM, disk-based computation is a way to increase working
memory and achieve results that are not otherwise economical.

D. Kunkle, D. Zhang, and G. Cooperman. **On Mining Frequent
Generalized Itemsets and Essential Generalized Association Rules without
Redundancy**, *Journal of Computer Science and Technology* (JCST),
P. R. China, 23(1): 77--102, Jan. 2008.

[Publisher]
[PDF]
[Abstract]

This paper presents some new algorithms to efficiently mine max frequent
generalized itemsets (g-itemsets) and essential generalized association rules
(g-rules). These are compact and general representations for all frequent
patterns and all strong association rules in the generalized environment. Our
results fill an important gap among algorithms for frequent patterns and
association rules by combining two concepts. First, generalized itemsets employ
a taxonomy of items, rather than a flat list of items. This produces more
natural frequent itemsets and associations such as (meat, milk) instead of
(beef, milk), (chicken, milk), etc. Second, compact representations of frequent
itemsets and strong rules, whose result size is exponentially smaller, can
solve a standard dilemma in mining patterns: with small threshold values for
support and confidence, the user is overwhelmed by the extraordinary number of
identified patterns and associations; but with large threshold values, some
interesting patterns and associations fail to be identified.

Our algorithms can also expand those max frequent g-itemsets and essential
g-rules into the much larger set of ordinary frequent g-itemsets and strong
g-rules. While that expansion is not recommended in most practical cases, we do
so in order to present a comparison with existing algorithms that only handle
ordinary frequent g-itemsets. In this case, the new algorithm is shown to be
thousands, and in some cases millions, of the time faster than previous
algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies,
with the depths of seven or more. Experimental results for previous algorithms
limited themselves to taxonomies with depth at most three or four.

In each of the two problems, a straightforward lattice-based approach is
briefly discussed and then a classification-based algorithm is developed. In
particular, the two classification-based algorithms are MFGI_class for mining
max frequent g-itemsets and EGR_class for mining essential g-rules. The
classification-based algorithms are featured with conceptual classification
trees and dynamic generation and pruning algorithms.

**- 2007 -**

D. Kunkle and G. Cooperman. **Twenty-Six Moves Suffice for
Rubik's Cube**, in *Proceedings of the International Symposium on Symbolic and
Algebraic Computation* (ISSAC '07), ACM Press, 235--242, 2007.

[Publisher] [PDF]
[Abstract]

The number of moves required to solve any state of Rubik's cube has been a
matter of long-standing conjecture for over 25 years -- since Rubik's cube
appeared. This number is sometimes called "God's number". An upper bound of 29
(in the face-turn metric) was produced in the early 1990's, followed by an
upper bound of 27 in 2006.

An improved upper bound of 26 is produced using 8000 CPU hours. One key to this
result is a new, fast multiplication in the mathematical group of Rubik's cube.
Another key is efficient out-of-core (disk-based) parallel computation using
terabytes of disk storage. One can use the precomputed data structures to
produce such solutions for a specific Rubik's cube position in a fraction of a
second. Work in progress will use the new "brute-forcing" technique to further
reduce the bound.

E. Robinson, D. Kunkle, and G. Cooperman. **A Comparative
Analysis of Parallel Disk-Based Methods for Enumerating Implicit Graphs**, in
*Proceedings of the International Workshop on Parallel Symbolic Computation*
(PASCO '07), London, Ontario, 78--87, 2007.

[Publisher]
[PDF]
[Abstract]

It is only in the last five years that researchers have begun to use disk-based
search techniques on a large scale. The primary examples of its use come from
symbolic algebra and from artificial intelligence. In the field of parallel
search, disk-based search has been forced on researchers because the historical
growth in the amount of RAM per CPU core has now stopped. Indeed, the current
trend toward multi-core CPUs now threatens to take us backwards.

This article makes an original contribution to the design of disk-based
parallel search algorithms. It presents a survey of disk-based techniques
side-by-side, for the first time. This allows researchers to choose from a menu
of techniques, and also to create new hybrid algorithms from the building
blocks presented here.

**- 2006 -**

D. Kunkle, D. Zhang, and G. Cooperman. **Efficient Mining
of Max Frequent Patterns in a Generalized Environment** (poster paper), in
*Proceedings of the 15th ACM Conference on Information and Knowledge
Management* (CIKM '06), Arlington, VA, 810--811, 2006.

[Publisher] [PDF]
[Abstract]

This poster paper summarizes our solution for mining max frequent generalized
itemsets (g-itemsets), a compact representation for frequent patterns in the
generalized environment.

**- 2005 -**

S. Boyle, S. Guerin, and D. Kunkle. **An Application of
Multi-agent Simulation to Policy Appraisal in the Criminal Justice System in
England** (book chapter), in *Computational Economics: A Perspective from
Computational Intelligence*, Idea Group Publishing, Pennsylvania, 2005.

[Publisher]
[Abstract]

This chapter reports on a multi-agent approach to the construction of a model
of the English criminal justice system. The approach is an integration of
model-building with ways of enabling people to engage in strategic policy
making and take into account the complex interactions of the criminal justice
system. From the workings of the police to court procedures to prisons,
decisions in one area of the criminal justice system can be crucial in
determining what happens in another area. The purpose was to allow assessment
of the impact across the whole justice system of a variety of policies.

**- 2004 -**

M. Agar, S. Guerin, R. Holmes, and D. Kunkle.
**Epidemiology or Marketing? The Paradigm-Busting Use of Complexity and
Ethnography**, in *Proceedings of Agent 2004: Social Dynamics: Interaction,
Reflexivity and Emergence*, Chicago, October 2004.

[Publisher]
[PDF]
[Abstract]

Based on ethnographic work with youth in Baltimore County, an agent-based model
was developed to test the finding that circulation of narratives explained the
rise and fall of heroin epidemic incidence curves. This paper features the use
of Design of Experiments to evaluate the parameters in that model. Results of
the analysis suggest the drug epidemics can be better understood as diffusion
of a commodity rather than as infection by a disease, the view of the medically
dominated substance abuse field. Policy implications of this change in views
are sketched in the conclusion.

M. Gambhir, S. Guerin, S. Kauffman, D. Kunkle. **Steps
Toward a Possible Theory of Organization**, in *Proceedings of the International
Conference on Complex Systems* (ICCS '04), Boston, May, 2004.

[Publisher]
[PDF]
[Abstract]

A distinguishing feature of self-organizing dynamical systems is their
evolution from unconstrained, high entropy states to constrained, low entropy
states. These systems therefore spontaneously become more constrained as they
advance through time, seemingly in contradiction to the second law of
thermodynamics. However, it is possible to make the argument -- and thereby bring
back to bear the framework of thermodynamics -- that self-organizing systems do
work, in the sense of physics, upon themselves to construct the constraints
leading to structure and low entropy. Once a system has formed a structure, and
it is constrained, it needs to repeatedly do work on itself to maintain its low
entropy, constrained state. The repetition of the performance of work is -- in
physics -- represented by a work cycle. We show that a simple self-organizing
ant-foraging model does indeed repeatedly do work during its structure
maintenance phase and that this repeated process can be described as a work
cycle.

S. Guerin and D. Kunkle. **Emergence of Constraint in
Self-organizing Systems**, *Journal of Nonlinear Dynamics, Psychology, and Life
Sciences*, Vol. 8, No. 2, April, 2004.

[Publisher] [PDF]
[Abstract]

Practitioners of agent-based modeling are often tasked to model or design
self-organizing systems while the theoretical foundation of self-organization
in science remains to be set. This paper explores self-organization in the
context of an agent-based model of ant colony food foraging. We gather specific
measures of order-creation and constraint construction particular to leading
theories of nonequilibrium thermodynamics that purport to govern
self-organizing dynamics. These measures are used to explore three claims: (a)
Constraints are constructed from entropy-producing processes in the
bootstrapping phase of self-organizing systems; (b) positive feedback loops are
critical in the structure formation phase; and (c) constraints tend to decay.
The continued presence of far-from-equilibrium boundary conditions are required
to reinforce constraints in the maintenance phase.

**- 2003 -**

P. Anderson, J. Arney, S. Inverso, D. Kunkle, T. Lebo, and
C. Merrigan. **A Genetic Algorithm Search for Improved Halftone Masks**,
in *Proceedings of Artificial Neural Networks in Engineering* (ANNIE '03), St.
Louis, November 2003.

[Publisher] [PDF]
[Abstract]

We present a genetic algorithm that automatically generates halftone masks
optimized for use in specific printing systems. The search is guided by a
single figure of merit based on a detailed model of the printing process and
the human visual system. Our experiments show that genetic algorithms are
effective in finding improved halftone masks and that two methods of reducing
the search space to particular subsets of possible halftone masks greatly
enhance the performance of the GA.

S. Boyle, S. Guerin, J. Pratt, and D. Kunkle. **Application
of Agent-Based Simulation to Policy Appraisal in the Criminal Justice System in
England and Wales**, in *Proceedings of Agent 2003: Challenges in Social
Simulation*, Chicago, October 2003.

[Publisher]
[PDF]
[Abstract]

This paper describes an agent-based approach for constructing a model of
criminal justice system operations in England and Wales. The primary purpose of
the model is to assess the impact of policy variants across the entire criminal
justice system. Because of the structure of this system, three separate
government departments interact and deliver services. Decisions in one area of
the criminal justice system can be crucial in determining what happens in
another area. Our purpose was twofold. First, we needed to contribute to the
Treasury's spending review by working with different groups in criminal justice
agencies to reach a consensus on how things actually occur (i.e., linking
behavior and actions of one group with another and with resources). Second, we
needed to produce a model of the entire criminal justice system that would
provide insights into questions related to capacity, case flow, and costs. We
also needed to model the ways in which individuals go through the system. The
result is a hybrid model that combines a simple system dynamics approach with
an agent-based model. The distinctive approach used in this work integrated
modeling with practical ways of enabling people to engage in strategic
policymaking, while taking into account the complexities of the criminal
justice system. The agent-based framework developed to meet these needs models
the criminal justice system, provides the ability to assess policy across the
system, and allows sharing of model output to improve cooperative efforts among
departments.