From Natural Language Specifications to Program Input Parsers

We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ i...

Full description

Bibliographic Details
Main Authors: Lei, Tao (Contributor), Long, Fan (Contributor), Barzilay, Regina (Contributor), Rinard, Martin C. (Contributor)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Association for Computational Linguistics (ACL), 2013-07-22T15:40:26Z.
Subjects:
Online Access:Get fulltext
LEADER 02025 am a22002653u 4500
001 79643
042 |a dc 
100 1 0 |a Lei, Tao  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
100 1 0 |a Lei, Tao  |e contributor 
100 1 0 |a Long, Fan  |e contributor 
100 1 0 |a Barzilay, Regina  |e contributor 
100 1 0 |a Rinard, Martin C.  |e contributor 
700 1 0 |a Long, Fan  |e author 
700 1 0 |a Barzilay, Regina  |e author 
700 1 0 |a Rinard, Martin C.  |e author 
245 0 0 |a From Natural Language Specifications to Program Input Parsers 
260 |b Association for Computational Linguistics (ACL),   |c 2013-07-22T15:40:26Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/79643 
520 |a We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint dependency parsing and semantic role labeling task. Our method is based on two sources of information: (1) the correlation between the text and the specification tree and (2) noisy supervision as determined by the success of the generated C++ parser in reading input examples. Our results show that our approach achieves 80.0\% F-Score accuracy compared to an F-Score of 66.7\% produced by a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest (which were written in English for humans with no intention of providing support for automated processing) 
520 |a National Science Foundation (U.S.) (Grant IIS-0835652) 
520 |a Battelle Memorial Institute (PO #300662) 
546 |a en_US 
655 7 |a Article 
773 |t Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)