The story of QueryVis, not just another visual programming language

June 30, 2020

By Ysabelle Kempe

Big blocks of code can be intimidating — most computer science novices know that. But what if, instead of text, programs could be interpreted using symbols, a medium humans have understood since they lived in caves?

There is a decades-long history of attempts to create a visual programming language; however, they “have never quite lived up to their promise,” according to Wolfgang Gatterbauer, a professor at Khoury College of Computer Sciences.

“It is easy to understand a diagram, but not easy to create it correctly,” Gatterbauer said. “Diagrams need to be perfectly correct when a program later needs to execute the precise task.”

With that driving concept, Gatterbauer, along with fellow Khoury professors Mirek Riedewald and Cody Dunne, is creating a tool — QueryVis — which takes existing code, specifically SQL queries, and automatically translates it into a succinct diagram. (SQL, or Structured Query Language, is used to manage data in relational database management systems, the standard format for storing company data. Every time you make an online purchase, you update a company’s database.) The researchers’ goal is to preserve the use of textual SQL queries, but supplement it with visuals to help users better understand query intent.

Put simply, QueryVis takes in an SQL query and outputs the corresponding diagram (see figure below). To the untrained eye, these diagrams look like a network of partitioned boxes, similar to chunks carved out of an Excel spreadsheet connected with arrows. However, the diagrams are rooted in fundamental SQL logical patterns. QueryVis is not a visual programming language, yet it learns from both the pitfalls and merits of prior attempts to develop visual languages.

QueryVis diagram for sample SQL query. The SQL query on the left runs over a database of sailors and boats they reserved. It returns those sailors who have never reserved a red boat. The right diagram shows the same semantics and can be automatically generated from the SQL query on the left. The dashed line represents a logical negation ('SELECT the names of sailors so that there does NOT EXIST any reservation by that sailor of a boat that is read').


QueryVis diagram for sample SQL query.
The SQL query on the left runs over a database of sailors and boats they reserved. It returns those sailors who have never reserved a red boat. The right diagram shows the same semantics and can be automatically generated from the SQL query on the left. The dashed line represents a logical negation (“SELECT the names of sailors so that there does NOT EXIST any reservation by that sailor of a boat that is read”). A more detailed example with multiple queries is available on the project page.

“The motivation for visual programming languages was always that if it is easier to understand visuals, it should also be easier to create them,” said Gatterbauer. “However, we are still using and teaching SQL, a written programming language which has been around for 40 years. And there is a good reason for it.”

Textual coding languages continue to dominate the computer science field, the Khoury team believes, because visual aids do not dramatically speed up the process of creating programs. Programmers need the power to specify exactly what their code will do, which is both difficult to implement and cumbersome to use in a visual language, the team wrote in a recent paper at the International Conference of Management of Data (SIGMOD).

Visual interpretation of graphics, on the other hand, is the fastest human communication method, but only works so efficiently in the context of understanding rather than composing visuals, pointed out the team in the same paper. So, while visual programming languages weren’t dazzlingly successful, there are benefits to translating existing textual queries into diagrams to be displayed either instead of or in conjunction with traditional code. (To see QueryVis in action, readers may try this demo: http://demo.queryvis.com/)

User studies run by the Khoury team confirm this, showing test subjects are meaningfully faster at comprehending QueryVis than traditional SQL queries. The results also show that subjects may make fewer errors in understanding queries while using QueryVis. The novel tool has potential applications in industry, code reuse processes, and education.

“The only thing the visual query language is really helpful in is if you look at it and try to interpret it,” Gatterbauer said. He drew a parallel to maps and their making. “While it is hard to draw an unambiguous map, it is easy to read one.”

The quest to validate the usefulness of QueryVis

QueryVis, funded in part by a Khoury seed grant, was designed with human-computer interaction best practices in mind. The diagrams don’t have superfluous visual elements, and no two queries with different semantics lead to the same diagram. When Dunne, the visualization expert on the team, first saw QueryVis, he couldn’t immediately identify any way to improve it.

“That’s rare when I come across a visualization,” Dunne said. “I also didn’t have very much experience reading SQL when I joined this research, and, as I started playing with QueryVis more, I realized the visualizations were helping me to better understand SQL queries.”

Dunne, along with Khoury PhD student Aristotelis Leventidis, played an important role in designing the studies vital in validating that QueryVis actually works. So far, two groups have been exposed to QueryVis: a classroom of 13 university students and a study on Amazon Mechanical Turk that drew 42 participants. (Mechanical Turk is a crowdsourcing platform that connects businesses to workers; in this case it connected the researchers to test subjects with a working knowledge of SQL.)

In the Mechanical Turk study, each participant was given a two to three minute tutorial on how to read QueryVis. The subjects then took a multiple-choice exam that presented them with traditional SQL queries, QueryVis, or both, and asked them to determine the intent of the query. Each university student took a similar exam.

It is not a simple task to prove one method works better than another, according to Dunne. A significant portion of the research group’s time was spent crafting multiple-choice questions that unambiguously evaluate whether a test subject truly understands a query’s intention. For the study results to be credible, it was crucial that incorrect responses were not the result of misunderstood questions.

“The goal of much of my research, including this, is to make an impact,” Dunne said. “I am making an impact here in how to evaluate these things appropriately, especially in the database community where human subject evaluations are not common.”

Significant applications on the horizon

Imagine you are a college student seeing SQL for the first time. Perhaps you’re struggling, and, if so, you’re not alone, according to Riedewald. SQL is a declarative language, meaning you declare to the computer what information you want, but don’t lead it through the specific steps to get that data. There are logical patterns to the queries, but they are difficult to glean from simply looking at text.

“SQL is very precise, it is very short,” Gatterbauer said. “But SQL is not very intuitive.”

Gatterbauer and Riedewald have taught SQL to hundreds of students, at Khoury College and other universities. Now, they imagine a new way of teaching SQL, using QueryVis visuals that train students to see the logical patterns across query types. For example, the SQL query that retrieves ‘all students who do not take any art classes’ follows the same logical pattern as that which retrieves ‘all sailors who do not reserve any red boats.’ While the databases may be different, the visualizations preserve and highlight the structural similarities between the queries. (See a longer explanation at the project page.)

Aside from its potential value in the classroom, QueryVis could also be useful for scientists, such as astronomers, who are not familiar with SQL, but work with massive amounts of data. Even for SQL experts, the researchers said, QueryVis could speed up the laborious process of determining the intent of queries written in the past or by someone else. By streamlining query comprehension and reuse processes, QueryVis could find a hungry market in industry, according to Riedewald.

The researchers envision a library that visualizes queries within code editors, which are the platforms where programmers write their code. There are only a limited number of very important SQL patterns, according to Gatterbauer, and a visual query library would allow users to discover these key patterns for themselves. Over time, familiarity with the QueryVis tool could even alter the way a user mentally approaches reading or writing queries.

“When you write a program, it helps to have an image in your head,” Riedewald said. “Otherwise, it’s like having an empty piece of paper where you have to start from zero every time.”