Thomas Wahl
Khoury College of Computer Sciences
Northeastern University
360 Huntington Avenue
Boston, Massachusetts 02115

Office: 202 West Village H, room 344
Phone: ++1-617-373-3100
Email: t.wahl–ät–
                                      Thomas Wahl                   
Northeastern University

Welcome to my former homepage. After working at Northeastern for about a decade, I have moved on to join GrammaTech, Inc. The information below may therefore be somewhat out of date. Naturally, I will also not take on new students at this time.

I do research on various aspects of Formal Verification and Program Analysis, such as for concurrent and numeric programs; further on detecting and repairing security vulnerabilities of programs, such as information leaks introduced by compilers, and structural and quantitative manipulations on neural networks to influence their decision making.

Brief bio:

  • 2011–2021: Full-time faculty at Northeastern University, Khoury College of Computer Sciences
  • 2009–2011: Research Officer, Oxford University, Department of Computer Science (formerly "Computing Laboratory"), UK
  • 2007–2009: Lecturer and Postdoc, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
  • 2007: Ph.D., University of Texas, Austin

General research interests:

  • Software verification techniques, such as predicate abstraction, SAT and SMT solving, decision procedures
  • Infinite-state system verification, such as arbitrarily-threaded programs or distributed systems
  • Stability, robustness, reliability of numeric computations, especially using floating-point arithmetic
  • Software analysis techniques for security verification, such as detection of information leaks (e.g. through side channels)

Past and current projects:

  • Resource-aware program verification using Observation Sequences:

  • Short version: Verification techniques try to analyze software (or hardware) for bugs: behaviors that are unexpected, unwanted, and can cause the software to end up in an unworkable state, crash, or simply compute wrong results. This project develops a verification technique that tries to exploit the fact that most software systems are designed over a variable number of resources. How do changes (especially incremental changes) of the resource parameter affect the behavior of the software?

    More details: Many systems are naturally designed (consciously or not) over a variable number of certain resources, which can mean physical or logical entities. Examples of such resources include:
    1. the number of physical components of a certain functional type,
    2. the number of clients that simultaneously request a service from a server,
    3. the capacity of the communication channel between components,
    4. the number of context switches permitted among concurrent threads,
    5. the recursion nesting depth (= stack size) permitted in a piece of software.
    We can expect that the behavior of the instances of the parametric design family (= the concrete designs for each parameter choice) depend on the parameter in a smooth, "continuous" way. This enables analyses of the function that maps the number of resource instances (= the parameter) to a concrete design (this function goes from the natural numbers to the design space).
    More concretely, suppose k is an integral parameter of the design. With the target property P to be verified in mind, we define a function O mapping the parameter to observations about the design of size k, such that knowledge of O(k) enables a decision on whether P holds of the instance of size k. For example, the "observation" could simply be the set of states reachable in the system of size k.
    In general, we require of function O that it be monotone: more resources means more system behaviors (observations). This is almost always the case provided the value of k is not accessible in the program defining a system instance's behavior. (If it is, the dependence on k can be arbitrary, making analysis against unbounded k in general impossible.)
    We now analyze the system design for increasing values of k. Due to monotonicity of O, consecutive observation sets O(k) are in some sort of ≤ relationship. This begs the question whether the function O converges at some parameter point k0, i.e. whether it stops increasing after reaching k0. If so, the truth value of property P has stabilized, i.e. unless we have detected a violation of P up to this point, we have proved the property for all parameter instances.

    We have applied the Observation Sequence paradigm to the undecidable problem of analyzing reachability for concurrent procedures communicating via global-scope variables [PLDI 2018]. Our technique is dubbed CUBA (Context-UnBounded Analysis). A webpage with downloads and many other resources can be found here.

    We have also applied this paradigm to unbounded-queue message passing systems written in the P language, such as distributed communication protocols [CAV 2019]. The resource page for this project is here.

    I am also fortunate to work with a super team of Northeastern undergraduates, Andrew Dai and Andrew Johnson, who are helping me expand the scope of this technique.

  • Platform dependencies of floating-point programs:

  • Short version: The precise semantics of floating-point arithmetic programs, i.e. the values they compute, depends not only on the input but also on the behavior of the compiler and on the target hardware. Such dependencies infringe on the highly desirable goal of software portability: the same program run on the same inputs on different platforms can produce different results. We are investigating the consequences of these dependencies for numeric programs in general, and for the trustworthiness of floating-point compilers.

    More details: We are working on a way to evaluate the "damage", in terms of precision, that compiler optimizations cause to floating-point programs. Since floating-point arithmetic expression semantics is very sensitive to the evaluation order (e.g., the addition operation is not associative), expression rewriting generally changes the value computed, at least for some inputs. This causes runtime optimizers to produce code that is not I/O equivalent to the original, violating one of the fundamental principles of code optimization.
    In many cases, the effect on the computed value is marginal, but not in all. How can we detect such cases, how can we decide whether the optimization is "worth it" compared to the change in output value, and how can we perhaps even repair the optimizer, so that it produces accurate yet runtime-optimized code? [This is ongoing work.]

    Earlier we developed a method that can partially repair numeric instability against platform changes, making numeric programs more robust. This is in principle easy, by instructing the compiler to evaluate every expression following a "strict" evaluation model, such as "left-to-right", and to ignore any platform-specific precision- or efficiency-enhancing features, such as contracted operations, custom floating-point division routines, etc., across the whole program. But numeric instability is often caused only by a few critical program fragments; we don't want to take away the compiler's freedom to optimize the code that does not contribute (much) to instability. Our method first identifies what these critical fragments are, and then stabilizes them, within the smallest-possible scope, leaving the compiler free to manipulate other portions of the code. [VMCAI 2017]

    Yet earlier we looked at platform-dependent program differences that lead to changes in the control flow of a program. Such differences seriously affect the portability of numeric code. We developed a method that takes a numeric procedure and determines whether a given input may cause the procedure to take different branches, based merely on the floating-point platform on which it is compiled and executed. This method can be used in static and dynamic analyses of programs, to estimate their numeric stability [EURO-PAR 2015]. A project webpage can be found here.

  • Compilation-dependent security properties of software (currently with Konstantinos Athanasiou):

  • Software analysis techniques are arguably underused in securing a program against vulnerabilities such as data leaks. The techniques that do exist typically operate at the source code level, which has proven somewhat effective at detecting leaks. For software side-channel attacks, which try to gain access to passwords or other secret data by exploiting fine-grained information about the power or time consumption of program runs, source code level techniques are not reliable in actually verifying security properties. The reason is that the program undergoes many transformations during compilation, such as due to code optimizations, register allocation, and even runtime instruction scheduling, all of which have security ramifications. Such transformations are of course not visible, often not even predictable, at the source code level. Classical static analysis techniques thus give a false sense of security.

    This problem is well-known throughout verification but has recently received a huge amount of attention in the wake of processor-level side-channel leaks such as Meltdown and Spectre. In this project we investigate various instances of this problem. Our goals are to detect them, to fix them, and—if possible—to prove that certain leaks are not possible, independent of how the compiler turns the program into an executable. [This is ongoing work.]

    I am grateful for the support of Maggie Van Nortwick and Owen Loveluck, two strong undergraduates at Northeastern, in this project.

  • Protecting Confidentiality and Integrity of Deep Neural Networks against Side-Channel and Fault Attacks

  • Short version: Deep Learning has become a foundational means for solving grand societal challenges. Security of the deep neural networks' (DNN) inference engines, and of DNN training data, has become a big challenge in deploying artificial intelligence. This project aims at investigating both confidentiality (secrecy) breaches and integrity breaches (malicious manipulations) of DNNs themselves, or their inference processes.

    Sample Question 1: [confidentiality violations] how can we reverse-engineer structural and quantitative parameters of Neural Networks whose internals are intended to be confidential? One option is to observe the behavior of the DNN on specialized inputs. These may be inputs that cause the DNN inference to take unusually long (a timing side channel), or cause the computation to raise certain observable exceptions.

    Sample Question 2: [integrity violations] how can we manipulate a DNN running on a local device (e.g. by physically altering the device's circuitry [fault attacks]) in a way that adversely affects its execution? We plan to use rigorous tools such as mathematical formula solvers to determine how the logic of the DNN needs to be altered to achieve a certain behavioral change. Such computations help us assess the vulnerability of the DNN against fault attacks.

Research sponsors:

National Science
                                      Foundation Microsoft Research
Design borrowed in part from Domagoj Babic