From Natural Language Specifications to Program Input Parsers

We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ i...

Full description

Bibliographic Details
Main Authors: Lei, Tao (Contributor), Long, Fan (Contributor), Barzilay, Regina (Contributor), Rinard, Martin C. (Contributor)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Association for Computational Linguistics (ACL), 2013-07-22T15:40:26Z.
Subjects:
Online Access:Get fulltext
Description
Summary:We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint dependency parsing and semantic role labeling task. Our method is based on two sources of information: (1) the correlation between the text and the specification tree and (2) noisy supervision as determined by the success of the generated C++ parser in reading input examples. Our results show that our approach achieves 80.0\% F-Score accuracy compared to an F-Score of 66.7\% produced by a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest (which were written in English for humans with no intention of providing support for automated processing)
National Science Foundation (U.S.) (Grant IIS-0835652)
Battelle Memorial Institute (PO #300662)