PICARD algorithm achieves State-of-the-Art performance for Text-to-SQL
November 2 8 min

PICARD algorithm achieves State-of-the-Art performance for Text-to-SQL

Introducing PICARD

This article introduces the current State-of-the-Art research in text-to-SQL and is based the following paper:

Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau. PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).

If you would like to explore the supporting code, visit: https://github.com/ElementAI/picard

What is text-to-SQL and why does it matter?

Digital transformation has resulted in an explosion of data, with most of this data stored in relational databases. In enterprises, these databases constitute a goldmine of untapped knowledge, but retrieving the correct data typically requires knowledge of how to formulate queries in SQL (Structured Query Language). Writing valid SQL queries is beyond the know-how for most knowledge workers, and thus the value of the data remains untapped.

“Example

Natural language interfaces, such as conversational agents and search engines, allow people to access and interact with data directly using human language, expressing queries in natural language without needing to know how to write the query in SQL.

How might companies benefit from a high-performing text-to-SQL solution?

Text-to-SQL interfaces can help employees and customers search and analyze structured data stored in one or more traditional relational databases. One of the appeals of the text-to-SQL technology is that the system can also cater to questions that are not as common, such as those for which traditional automation and user interfaces do not always scale. Many use cases span the entire business in every role.

ServiceNow offers several user-friendly ways for users to interface with a myriad of enterprise systems. These include the Virtual Agent (an enterprise chatbot solution) and the search functionality. Once equipped with a robust text-to-SQL system, these interfaces could enable a company’s users to make powerful and complex information requests in plain language (English among others) and receive relevant information directly within those interfaces.

ServiceNow also enables users to continuously monitor situations by creating persistent dashboards and periodic reports based on data retrieval. This is currently done by pointing and clicking through the database structure, and/or inputting some code. A natural language interface could enable a user to simply say what data they want to see in the dashboard or report (and perhaps even where it should be placed, or even how it should be visualized), and it would be set up automatically.

Here are a few examples of user requests that could benefit from text-to-SQL support:

  • A sales manager for a Virtual Reality software company asks a chatbot for a report that lists the top ten percent most active users based in New York City that have signed up for a free trial to manage their personalized avatars and digitized voice profiles and that have started related discussions in the online support forums, grouped by positive and negative sentiment, and sorted by most negative to positive sentiment score in each group.
  • A customer browsing a virtual pet store in the Metaverse using a VR headset, asks the virtual agent “what products are new and trending to train large dogs over 50 kilograms to walk with a leash without pulling?”
  • A marketing analyst uses a chatbot to ask “How many customers engaged with a virtual agent before they signed up with a credit card in the last month and what percent of those converted to the premium service with annual renewal?”
  • An IT director wants to create a list of all their employees in London, England, who were issued both a laptop and a tablet issued by their employer in the last 3 years, with results grouped by whomever the employee reports to, and sorted alphabetically by last name then first name.
  • A customer service analyst wants to search for customers in Montréal, Québec that purchased snowshoes last winter, and left a positive online shop review of at least four stars and complemented the shopping experience.

Introducing PICARD

Researchers from ServiceNow have recently developed a state-of-the-art algorithm for constrained decoding called PICARD. When we combine this algorithm with language models, we get a new method to automatically turn text into properly constructed SQL queries.

The system can work on databases it has never seen before. It relies on the structure and the content of the database as well as the knowledge encapsulated in the T5 language model. It is significantly more accurate than the prior state of the art, as evidenced by ServiceNow’s top rankings in the Spider and CoSQL challenges.

What are Spider and CoSQL?

Researchers at Yale University developed the Spider and CoSQL challenges as a way to compare research results fairly, on a uniform set of text-to-SQL tasks, from different research teams.

Spider consists of 200 databases with multiple tables spanning 138 different domains, 10,181 questions in natural language about the content of those databases, and 5,693 corresponding complex SQL queries. CoSQL is the dialogue version of the Spider challenge. It consists of 30,000+ turns (dialogue steps) and 10,000+ annotated SQL queries.

How did PICARD fare in the challenges?

When combined with a large language model called T5-3B, PICARD produced a test accuracy of 75.1% on the Spider challenge using database values, well ahead of the next four contenders whose score was in the 68-71% range. This means that using PICARD, 3 out of 4 user questions are answered correctly on unfamiliar databases without any additional training examples.

On the CoSQL Dialogue State Tracking challenge, it obtained accuracies of 54.6% and 23.7% respectively on the question match and interaction match metrics, while the next contender scored 51.6% and 21.2% on the same metrics, thus establishing a new state of the art for those tasks.

Going Beyond with new dialogue interfaces

Translating phrases to database queries is a great first step, but also a stepping stone to more innovations. There are many directions to go from where we are. Here are other adjacent possibilities.

We have already mentioned how it would be possible to use natural language to specify the way data should be formatted or visualized. A system might be able to infer the formats that are most likely to be appropriate based on the type of data (geographic information, time series, etc.).

One common problem in a large enterprise is that people might not even know which database(s) contain the information they need. An enterprise AI system might be able to identify which database is relevant to a given request before writing the appropriate query.

CoSQL is a dialogue state tracking task and breakthroughs are already being made in creating dialogue interfaces in which a user can iteratively refine their query. This would enable a user to construct even more complex queries step by step, and inspect results along the way. In this conversation, intermediate queries could be given shorthand names for future use.

Another idea is to allow natural language queries to be used as-is, in lieu of code, in scripts. This would result in more readable code without sacrificing functionality.

While natural language is a natural fit for specifying some things, sometimes it is easier to just point at things. Users can get where they want to go most efficiently using interfaces that combine natural language with pointing and clicking, making the best of both modalities.

Low-code / No-code text-to-code possibilities

Text-to-SQL is a relatively specific example of a general class of tasks called program synthesis. Research is underway on how to produce other kinds of code, be it JavaScript, Python or Java from natural language describing what the code needs to do together with a few examples. This has the potential to result in powerful low-code / no-code platforms that enable non-programmers to produce working apps.

There is also potential to accelerate programmer work when writing code that is pattern-based or relatively routine e.g., fetch this from that database, do this transform, export the result over there. Code-generating auto-complete functionality based on the last several lines of code written is one of the ways to get there.

Nothing says AI systems cannot go in the other direction, from code to natural language. An obvious application of this is automatic documentation and generation of comments that make code more approachable.

Finally, a fundamental question arises in this emerging context of human-machine interaction that involves higher order thinking. How can human and machine verify that they have understood each other? One possibility is to run the system back and forth, from text to code and then back to text, which amounts to a rephrasing of the original request by the machine.

Learn more

This is an exciting area of research that will impact the future of work and is already making inroads. We encourage you to learn more and encourage your own research and engineering teams to dig deeper. Get in touch through our GitHub project if you would like to learn more.