Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | Drexel University |
| Country | United States |
| Start Date | Oct 01, 2022 |
| End Date | Sep 30, 2026 |
| Duration | 1,460 days |
| Number of Grantees | 1 |
| Roles | Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2220991 |
Recent years have seen dramatic advances in the ability to exploit formal or semi-formal specifications of software behavior for finding software bugs or building confidence in software correctness. However, these specifications are typically given in the form of logical formulae (for formal verification) or additional code (for typical software testing), while the original description of expected behavior is typically written in natural language such as English prose.
Currently, these (semi)formal specifications are manually translated from natural language, leaving significant opportunities for misunderstandings or mistakes during translation, which can lead to validating that software satisfies useless or actively incorrect properties. The manual nature of the translation means it is difficult to audit after the fact.
This project pursues new approaches to connecting natural language specifications of software behavior, at the level of single sentences, to the (semi)formal specifications currently accepted by many varieties of software quality tools, including property-based testing frameworks, proof assistants, and various tools using temporal logic specifications. The project's novelties are connecting natural and formal specifications using techniques drawn from the linguistics literature, which are modular (making them possible to extend or locally repair) and evidence-producing (making it possible to audit the translation for understanding or debugging translation errors); innovating on the structure of word knowledge used by the system to allow high degrees of direct reuse across different semi-formal specification forms; and improving techniques for linguistic lexicon inference by using information about the datatypes involved.
These are all difficult to attain with mainstream machine learning techniques, against which the project will compare. The project will produce new techniques for translating English sentences into property-based tests for testing, proof assistant specifications for formal proofs of correctness, and multiple temporal logics for both correctness proofs and automated bug finding; and the project will implement them in an open source tool.
The project's impacts are expected to be tools for connecting English descriptions of behavior to the formal descriptions used by software quality experts (improving requirement tracing, communication between software experts and non-technical clients, and education), extensive validation of classic linguistic theories in an important domain application, and cross-pollination of techniques and applications between computational linguistics and software quality research. Additionally, the project is expected to improve confidence in individual pieces of software whose specifications are studied by this project.
The project takes categorial grammars as its key building block, an approach to compositional semantics of natural language that has been extensively vetted by linguists to cover a wide variety of subtle grammatical phenomena in many natural languages from different language families. Categorial grammars are therefore expected to impose no a priori restrictions on sentence structure or grammatical flexibility, unlike prior approaches to relating formal and natural specifications.
Categorial grammars have also been used to parse natural language into enough various logics addressing varied linguistic phenomena (such as time and/or place in addition to current facts) that they can also target all of the specification forms of interest to the project. Key novel insights for the project are the ability to share lexical entries of the grammar across different logics by transporting the semantics of a word in a given logic into semantics targeting other logics, and focusing on a family of specifications that can be deeply embedded in one highly-expressive logic (dependent type theory) to improve sharing, as well as benefiting from and experimentally validating linguistics research positing that such logics are more appropriate representations of natural language meaning.
The project will produce a publicly-available prototype system for relating English to property-based tests, proof assistant specifications in dependent type theory, and temporal logics. The project will also produce tools for symbolically learning the grammatical roles and semantics of words that are specific to a project or problem domain, from only a few examples. The framework will be adaptable by others to target other specification logics.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Drexel University
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant