Dan Kunkle


I am starting at Google NYC in January, 2011. This page may not be updated regularly in the future.

I recently finished my PhD in computer science at the College of Computer and Information Science at Northeastern University in Boston. My adviser was Prof. Gene Cooperman.

A high-level description of our disk-based computing methods are described in a recent CACM Viewpoint article, Solving Rubik's Cube: Disk is the New RAM. Basically, we seek: methods that efficiently use high-latency storage as main memory; and general techniques for constructing latency tolerant algorithms and software. With these, we make available to the programmer several orders of magnitude more space for the same system price.

I am the primary author of Roomy: a C/C++ library for parallel disk-based computation.

My research interests include: high performance and disk-based computing; combinatorial optimization; and adaptive systems.

Links to: my adviser, college, and university.


A list of my technical publications follows. See also, my resume, and DBLP entry.

- 2010 -

D. Kunkle. Roomy: A System for Space Limited Computations (invited tutorial), in Proceedings of the International Workshop on Parallel Symbolic Computation (PASCO '10), Grenoble, France, 22-25, 2010.
[Publisher] [PDF] [Abstract]

There are numerous examples of problems in symbolic algebra in which the required storage grows far beyond the limitations even of the distributed RAM of a cluster. Often this limitation determines how large a problem one can solve in practice. Roomy provides a minimally invasive system to modify the code for such a computation, in order to use the local disks of a cluster or a SAN as a transparent extension of RAM.

Roomy is implemented as a C/C++ library. It provides some simple data structures (arrays, unordered lists, and hash tables). Some typical programming constructs that one might employ in Roomy are: map, reduce, duplicate elimination, chain reduction, pair reduction, and breadth-first search. All aspects of parallelism and remote I/O are hidden within the Roomy library.

D. Kunkle, V. Slavici and G. Cooperman. Parallel Disk-Based Computation for Large, Monolithic Binary Decision Diagrams, in Proceedings of the International Workshop on Parallel Symbolic Computation (PASCO '10), Grenoble, France, 63-72, 2010.
[Publisher] [PDF] [Abstract]

Binary Decision Diagrams (BDDs) are widely used in formal verification. They are also widely known for consuming large amounts of memory. For larger problems, a BDD computation will often start thrashing due to lack of memory within minutes. This work uses the parallel disks of a cluster or a SAN (storage area network) as an extension of RAM, in order to efficiently compute with BDDs that are orders of magnitude larger than what is available on a typical computer. The use of parallel disks overcomes the bandwidth problem of single disk methods, since the bandwidth of 50 disks is similar to the bandwidth of a single RAM subsystem. In order to overcome the latency issues of disk, the Roomy library is used for the sake of its latency-tolerant data structures. A breadth-first algorithm is implemented. A further advantage of the algorithm is that RAM usage can be very modest, since its largest use is as buffers for open files. The success of the method is demonstrated by solving the 16-queens problem, and by solving a more unusual problem — counting the number of tie games in a three-dimensional 4x4x4 tic-tac-toe board.

V. Slavici, X. Dong, D. Kunkle and G. Cooperman. Fast Multiplication of Large Permutations for Disk, Flash Memory and RAM, in Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC '10), Munich, Germany, 355-362, 2010.
[Publisher] [PDF] [Abstract]

Permutation multiplication (or permutation composition) is perhaps the simplest of all algorithms in computer science. Yet for large permutations, the standard algorithm is not the fastest for disk or for flash, and surprisingly, it is not even the fastest algorithm for RAM on recent multi-core CPUs. On a recent commodity eight-core machine we demonstrate a novel algorithm that is 50% faster than the traditional algorithm, along with strong evidence that it will be up to twice as fast on future many-core computers. For larger permutations on flash or disk, the novel algorithm is orders of magnitude faster. A disk-parallel algorithm is demonstrated that can multiply two permutations with 12.8 billion points using 16 parallel local disks of a cluster in under one hour. Such large permutations are important in computational group theory, where they arise as the result of the well-known Todd-Coxeter coset enumeration algorithm. The novel algorithm emphasizes several passes of streaming access to the data instead of the traditional single pass using random access to the data. Similar novel algorithms are presented for permutation inverse and permutation multiplication by an inverse, thus providing a complete library of the underlying permutation operations needed for computations with permutation groups.

- 2009 -

D. Kunkle and G. Cooperman. Biased Tadpoles: a Fast Algorithm for Centralizers in Large Matrix Groups, in Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC '09), ACM Press, 223--230, 2009.
[Publisher] [PDF] [Abstract]

Centralizers are an important tool in in computational group theory. Yet for large matrix groups, they tend to be slow. We demonstrate a O(√|G|(1/logε)) black box randomized algorithm that produces a centralizer using space logarithmic in the order of the centralizer, even for typical matrix groups of order 1020 . An optimized version of this algorithm (larger space and no longer black box) typically runs in seconds for groups of order 1015 and minutes for groups of order 1020. Further, the algorithm trivially parallelizes, and so linear speedup is achieved in an experiment on a computer with four CPU cores. The novelty lies in the use of a biased tadpole, which delivers an order of magnitude speedup as compared to the classical tadpole algorithm. The biased tadpole also allows a test for membership in a conjugacy class in a fraction of a second. Finally, the same methodology quickly finds the order of a matrix group via a vector stabilizer. This allows one to eliminate the already small possibility of error in the randomized centralizer algorithm.

D. Kunkle and G. Cooperman. Harnessing Parallel Disks to Solve Rubik's Cube, Journal of Symbolic Computation, Volume 44, Issue 7, 872--890, July, 2009.
[Publisher] [PDF] [Abstract]

The number of moves required to solve any configuration of Rubik's cube has held a fascination for over 25 years. A new upper bound of 26 is produced. More important, a new methodology is described for finding upper bounds. The novelty is two-fold. First, parallel disks are employed. This allows 1.4x1012 states representing symmetrized cosets to be enumerated in seven terabytes. Second, a faster table-based multiplication is described for symmetrized cosets that attempts to keep most tables in the CPU cache. This enables the product of a symmetrized coset by a generator at a rate of 10 million moves per second.

- 2008 -

D. Kunkle and J. Schindler. A Load Balancing Framework for Clustered Storage Systems, in Proceedings of the IEEE International Conference on High Performance Computing (HiPC '08), Bangalore, 57--72, 2008.
[Publisher] [PDF] [Abstract]

The load balancing framework for high-performance clustered storage systems presented in this paper provides a general method for reconfiguring a system facing dynamic workload changes. It simultaneously balances load and minimizes the cost of reconfiguration. It can be used for automatic reconfiguration or to present an administrator with a range of (near) optimal reconfiguration options, allowing a tradeoff between load distribution and reconfiguration cost. The framework supports a wide range of measures for load imbalance and reconfiguration cost, as well as several optimization techniques. The effectiveness of this framework is demonstrated by balancing the workload on a NetApp Data ONTAP GX system, a commercial scale-out clustered NFS server implementation. The evaluation scenario considers consolidating two real world systems, with hundreds of users each: a six-node clustered storage system supporting engineering workloads and a legacy system supporting three email severs.

D. Kunkle and G. Cooperman. Solving Rubik's Cube: Disk is the New RAM, Communications of the ACM (CACM), New York, Volume 51, Issue 4, 31--33, April 2008.
[Publisher] [PDF] [Abstract]

Substituting disk for RAM, disk-based computation is a way to increase working memory and achieve results that are not otherwise economical.

D. Kunkle, D. Zhang, and G. Cooperman. On Mining Frequent Generalized Itemsets and Essential Generalized Association Rules without Redundancy, Journal of Computer Science and Technology (JCST), P. R. China, 23(1): 77--102, Jan. 2008.
[Publisher] [PDF] [Abstract]

This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified.

Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four.

In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classification-based algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.

- 2007 -

D. Kunkle and G. Cooperman. Twenty-Six Moves Suffice for Rubik's Cube, in Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC '07), ACM Press, 235--242, 2007.
[Publisher] [PDF] [Abstract]

The number of moves required to solve any state of Rubik's cube has been a matter of long-standing conjecture for over 25 years -- since Rubik's cube appeared. This number is sometimes called "God's number". An upper bound of 29 (in the face-turn metric) was produced in the early 1990's, followed by an upper bound of 27 in 2006.

An improved upper bound of 26 is produced using 8000 CPU hours. One key to this result is a new, fast multiplication in the mathematical group of Rubik's cube. Another key is efficient out-of-core (disk-based) parallel computation using terabytes of disk storage. One can use the precomputed data structures to produce such solutions for a specific Rubik's cube position in a fraction of a second. Work in progress will use the new "brute-forcing" technique to further reduce the bound.

E. Robinson, D. Kunkle, and G. Cooperman. A Comparative Analysis of Parallel Disk-Based Methods for Enumerating Implicit Graphs, in Proceedings of the International Workshop on Parallel Symbolic Computation (PASCO '07), London, Ontario, 78--87, 2007.
[Publisher] [PDF] [Abstract]

It is only in the last five years that researchers have begun to use disk-based search techniques on a large scale. The primary examples of its use come from symbolic algebra and from artificial intelligence. In the field of parallel search, disk-based search has been forced on researchers because the historical growth in the amount of RAM per CPU core has now stopped. Indeed, the current trend toward multi-core CPUs now threatens to take us backwards.

This article makes an original contribution to the design of disk-based parallel search algorithms. It presents a survey of disk-based techniques side-by-side, for the first time. This allows researchers to choose from a menu of techniques, and also to create new hybrid algorithms from the building blocks presented here.

- 2006 -

D. Kunkle, D. Zhang, and G. Cooperman. Efficient Mining of Max Frequent Patterns in a Generalized Environment (poster paper), in Proceedings of the 15th ACM Conference on Information and Knowledge Management (CIKM '06), Arlington, VA, 810--811, 2006.
[Publisher] [PDF] [Abstract]

This poster paper summarizes our solution for mining max frequent generalized itemsets (g-itemsets), a compact representation for frequent patterns in the generalized environment.

- 2005 -

S. Boyle, S. Guerin, and D. Kunkle. An Application of Multi-agent Simulation to Policy Appraisal in the Criminal Justice System in England (book chapter), in Computational Economics: A Perspective from Computational Intelligence, Idea Group Publishing, Pennsylvania, 2005.
[Publisher] [Abstract]

This chapter reports on a multi-agent approach to the construction of a model of the English criminal justice system. The approach is an integration of model-building with ways of enabling people to engage in strategic policy making and take into account the complex interactions of the criminal justice system. From the workings of the police to court procedures to prisons, decisions in one area of the criminal justice system can be crucial in determining what happens in another area. The purpose was to allow assessment of the impact across the whole justice system of a variety of policies.

- 2004 -

M. Agar, S. Guerin, R. Holmes, and D. Kunkle. Epidemiology or Marketing? The Paradigm-Busting Use of Complexity and Ethnography, in Proceedings of Agent 2004: Social Dynamics: Interaction, Reflexivity and Emergence, Chicago, October 2004.
[Publisher] [PDF] [Abstract]

Based on ethnographic work with youth in Baltimore County, an agent-based model was developed to test the finding that circulation of narratives explained the rise and fall of heroin epidemic incidence curves. This paper features the use of Design of Experiments to evaluate the parameters in that model. Results of the analysis suggest the drug epidemics can be better understood as diffusion of a commodity rather than as infection by a disease, the view of the medically dominated substance abuse field. Policy implications of this change in views are sketched in the conclusion.

M. Gambhir, S. Guerin, S. Kauffman, D. Kunkle. Steps Toward a Possible Theory of Organization, in Proceedings of the International Conference on Complex Systems (ICCS '04), Boston, May, 2004.
[Publisher] [PDF] [Abstract]

A distinguishing feature of self-organizing dynamical systems is their evolution from unconstrained, high entropy states to constrained, low entropy states. These systems therefore spontaneously become more constrained as they advance through time, seemingly in contradiction to the second law of thermodynamics. However, it is possible to make the argument -- and thereby bring back to bear the framework of thermodynamics -- that self-organizing systems do work, in the sense of physics, upon themselves to construct the constraints leading to structure and low entropy. Once a system has formed a structure, and it is constrained, it needs to repeatedly do work on itself to maintain its low entropy, constrained state. The repetition of the performance of work is -- in physics -- represented by a work cycle. We show that a simple self-organizing ant-foraging model does indeed repeatedly do work during its structure maintenance phase and that this repeated process can be described as a work cycle.

S. Guerin and D. Kunkle. Emergence of Constraint in Self-organizing Systems, Journal of Nonlinear Dynamics, Psychology, and Life Sciences, Vol. 8, No. 2, April, 2004.
[Publisher] [PDF] [Abstract]

Practitioners of agent-based modeling are often tasked to model or design self-organizing systems while the theoretical foundation of self-organization in science remains to be set. This paper explores self-organization in the context of an agent-based model of ant colony food foraging. We gather specific measures of order-creation and constraint construction particular to leading theories of nonequilibrium thermodynamics that purport to govern self-organizing dynamics. These measures are used to explore three claims: (a) Constraints are constructed from entropy-producing processes in the bootstrapping phase of self-organizing systems; (b) positive feedback loops are critical in the structure formation phase; and (c) constraints tend to decay. The continued presence of far-from-equilibrium boundary conditions are required to reinforce constraints in the maintenance phase.

- 2003 -

P. Anderson, J. Arney, S. Inverso, D. Kunkle, T. Lebo, and C. Merrigan. A Genetic Algorithm Search for Improved Halftone Masks, in Proceedings of Artificial Neural Networks in Engineering (ANNIE '03), St. Louis, November 2003.
[Publisher] [PDF] [Abstract]

We present a genetic algorithm that automatically generates halftone masks optimized for use in specific printing systems. The search is guided by a single figure of merit based on a detailed model of the printing process and the human visual system. Our experiments show that genetic algorithms are effective in finding improved halftone masks and that two methods of reducing the search space to particular subsets of possible halftone masks greatly enhance the performance of the GA.

S. Boyle, S. Guerin, J. Pratt, and D. Kunkle. Application of Agent-Based Simulation to Policy Appraisal in the Criminal Justice System in England and Wales, in Proceedings of Agent 2003: Challenges in Social Simulation, Chicago, October 2003.
[Publisher] [PDF] [Abstract]

This paper describes an agent-based approach for constructing a model of criminal justice system operations in England and Wales. The primary purpose of the model is to assess the impact of policy variants across the entire criminal justice system. Because of the structure of this system, three separate government departments interact and deliver services. Decisions in one area of the criminal justice system can be crucial in determining what happens in another area. Our purpose was twofold. First, we needed to contribute to the Treasury's spending review by working with different groups in criminal justice agencies to reach a consensus on how things actually occur (i.e., linking behavior and actions of one group with another and with resources). Second, we needed to produce a model of the entire criminal justice system that would provide insights into questions related to capacity, case flow, and costs. We also needed to model the ways in which individuals go through the system. The result is a hybrid model that combines a simple system dynamics approach with an agent-based model. The distinctive approach used in this work integrated modeling with practical ways of enabling people to engage in strategic policymaking, while taking into account the complexities of the criminal justice system. The agent-based framework developed to meet these needs models the criminal justice system, provides the ability to assess policy across the system, and allows sharing of model output to improve cooperative efforts among departments.

Master's Thesis

D. Kunkle. Automatic Classification of One-Dimensional Cellular Automata. MS thesis. Rochester Institute of Technology, 2003.
[PDF] [Abstract]

Cellular automata, a class of discrete dynamical systems, show a wide range of dynamic behavior, some very complex despite the simplicity of the system's definition. This range of behavior can be organized into six classes, according to the Li-Packard system: null, fixed point, two-cycle, periodic, complex, and chaotic. An advanced method for automatically classifying cellular automata into these six classes is presented. Seven parameters were used for automatic classification, six from existing literature and one newly presented. These seven parameters were used in conjunction with neural networks to automatically classify an average of 98.3% of elementary cellular automata and 93.9% of totalistic k = 2 r = 3 cellular automata. In addition, the seven parameters were ranked based on their effectiveness in classifying cellular automata into the six Li-Packard classes.


My fleeting condition of celebrity.

My Erdös number is three. I have published with Erdös-2 authors Peter Anderson and Gene Cooperman.


Home page of Dan Kunkle, aka Daniel Kunkle, aka Daniel R. Kunkle - Year 2010.