+ All Categories
Home > Documents > University of Washington ebender@uw

University of Washington ebender@uw

Date post: 26-Dec-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
1
Grammar Engineering Complements Language Documentation Emily M. Bender University of Washington [email protected] 1. What is Grammar Engineering? The development of grammars-in-software, used in combination with parsing and genera- tion algorithms to: Assign structures (morphological, syntactic and semantic) to input strings Generate strings from input semantic repre- sentations Recognize ungrammatical strings Dates back to the 1950s Finally practical, thanks to advances in comput- ing technology and computational linguistics 2. Why Do Grammar Engineering? Automatically apply analyses to large data sets Identify unanalyzed constructions Identify words or word forms missing from lexi- con/morphology Identify underconstrained analyses Produce a searchable treebank 3. What is the Grammar Matrix? http://www.delph-in.net/matrix A repository of implemented analyses, includ- ing: A core grammar with analyses of general pat- terns such as semantic compositionality “Libraries” of analyses of cross-linguistically variable phenomena Accessible via a web-based questionnaire Customization system produces working HPSG grammars from typological descriptions Bender et al. 2002, 2010 Parsing and treebanking Implemented grammar Utterances with no (correct) analysis Correctly analzed utterances Utterances with spurious ambiguity Identify unanalyzed constructions Identify unknown words/word forms Treebank (searchable) Identify underconstrained analyses Data Refine implemented grammar Grammar engineering process and outcomes Languages of grammars created with the customization system. Map image courtesy of Google Maps, location data courtesy of WALS. 0 20 40 60 80 100 40 60 80 100 120 140 160 180 200 Coverage (%)/ambiguity Hours of development coverage (%) ambiguity (avg) Wambaya grammar development curve 4. Sample Treebank Queries Show me all utterances with. . . : two overt arguments of the same verb an argument marked by preposition X a long-distance dependency crossing two or more clause boundaries an overt subject and an implicit object noun phrases with two or more modifiers on the same side of the head floated quantifiers 5. Why HPSG? Grammar engineering requires precise, formal- ized analyses. HPSG (Head-driven Phrase Struc- ture Grammar; Pollard and Sag 1994) is well- suited to grammar engineering in general and grammar engineering for language documenta- tion in particular: Lexicalist framework: Most of the information is stored in the lexicon. Constraint-based: Linguistic knowledge can be applied in an order-independent fashion, in parsing and generation. Surface-oriented: HPSG analyses do not posit ancillary structures. Integrated: HPSG analyses map surface strings to semantic representations. Broad-coverage: HPSG grammars seamlessly incorporate both broad generalizations and id- iosyncrasies; no core/periphery distinction. Data-driven: A more bottom-up approach to the exploration of linguistic universals. 6. Opportunities for Collaboration Grammar engineering does not have to mean more work for the field linguist! We believe that the best model involves collab- oration between field linguists and grammar en- gineers. The cost of grammar implementation is small compared to the overall cost of language docu- mentation. In 210 hours of development (1/20th of the time Nordlinger spent on the original anal- ysis), Bender (2008) was able to create a grammar that could assign appropriate anal- yses to 91% of the example sentences from Nordlinger 1998. In UW’s Linguistics 567 (taught annually in Win- ter quarter), students each do implemented grammars for different languages. Many of these students would be very interested to work with active field linguists. 7. Related work Grammar engineering is not limited to HPSG! Here are some other multilingual grammar engi- neering projects: ParGram (LFG): http://pargram.b.uib.no CoreGram (HPSG): http://hpsg.fu-berlin.de/Projects/ core.html OpenCCG (CCG): http://openccg.sourceforge.net PAWS (PC-PATR): http://carla.sil.org/paws.htm 8. Acknowledgments This poster represents joint work with the Grammar Matrix development team, including: Dan Flickinger, Stephan Oepen, Scott Drellishak, Lau- rie Poulson, Antske Fokkens, Michael Wayne Goodman, Kelly O’Hara, Safiyyah Saleem, Joshua Hou and Daniel P. Mills. This material is based upon work supported by the National Science Foundation under Grant No. 0644097. Any opinions, findings, and con- clusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We gratefully acknowledge the support of the Uti- lika Foundation. References Emily M. Bender. 2008. Evaluating a crosslinguistic grammar resource: A case study of Wambaya. In Proceedings of ACL08:HLT. Emily M. Bender, Scott Drellishak, Antske Fokkens, Laurie Poulson, and Safiyyah Saleem. 2010. Grammar customization. Research on Language & Computation, pp. 1–50. 10.1007/s11168-010-9070-1. Emily M. Bender, Dan Flickinger, and Stephan Oepen. 2002. The gram- mar matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In John Carroll, Nelleke Oostdijk, and Richard Sutcliffe, editors, Pro- ceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, pp. 8–14. Taipei, Taiwan. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie, editors. 2008. The World Atlas of Language Structures Online. Max Planck Digital Library, Munich. Http://wals.info. Rachel Nordlinger. 1998. A Grammar of Wambaya, Northern Australia. Research School of Pacific and Asian Studies, The Australian Na- tional University, Canberra. Carl Pollard and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. The University of Chicago Press and CSLI Publications, Chicago, IL and Stanford, CA. 2nd International Conference on Language Documentation & Conservation, University of Hawai’i, February 11, 2011
Transcript
Page 1: University of Washington ebender@uw

Grammar Engineering Complements

Language DocumentationEmily M. Bender

University of [email protected]

1. What is Grammar Engineering?

•The development of grammars-in-software,used in combination with parsing and genera-tion algorithms to:

– Assign structures (morphological, syntacticand semantic) to input strings

– Generate strings from input semantic repre-sentations

– Recognize ungrammatical strings

•Dates back to the 1950s

•Finally practical, thanks to advances in comput-ing technology and computational linguistics

2. Why Do Grammar Engineering?

•Automatically apply analyses to large data sets

• Identify unanalyzed constructions

• Identify words or word forms missing from lexi-con/morphology

• Identify underconstrained analyses

•Produce a searchable treebank

3. What is the Grammar Matrix?

http://www.delph-in.net/matrix

•A repository of implemented analyses, includ-ing:

– A core grammar with analyses of general pat-terns such as semantic compositionality

– “Libraries” of analyses of cross-linguisticallyvariable phenomena

•Accessible via a web-based questionnaire

•Customization system produces working HPSGgrammars from typological descriptions

Bender et al. 2002, 2010

Parsing and treebanking

Implemented grammar

Utterances with no (correct) analysis

Correctly analzed utterances

Utterances with spurious ambiguity

Identify unanalyzed constructions

Identify unknown words/word forms

Treebank(searchable)

Identify underconstrained

analyses

Data

Refine implemented

grammar

Grammar engineering process and outcomes

Languages of grammars created with the customization system. Map image courtesy of Google Maps, location data courtesy of WALS.

0

20

40

60

80

100

40 60 80 100 120 140 160 180 200

Cove

rage

(%)/a

mbi

guity

Hours of development

coverage (%)ambiguity (avg)

Wambaya grammar development curve

4. Sample Treebank Queries

Show me all utterances with. . . :

• two overt arguments of the same verb

• an argument marked by preposition X

• a long-distance dependency crossing two ormore clause boundaries

• an overt subject and an implicit object

• noun phrases with two or more modifiers on thesame side of the head

• floated quantifiers

5. Why HPSG?

Grammar engineering requires precise, formal-ized analyses. HPSG (Head-driven Phrase Struc-ture Grammar; Pollard and Sag 1994) is well-suited to grammar engineering in general andgrammar engineering for language documenta-tion in particular:

• Lexicalist framework: Most of the information isstored in the lexicon.

•Constraint-based: Linguistic knowledge can beapplied in an order-independent fashion, inparsing and generation.

•Surface-oriented: HPSG analyses do not positancillary structures.

• Integrated: HPSG analyses map surfacestrings to semantic representations.

•Broad-coverage: HPSG grammars seamlesslyincorporate both broad generalizations and id-iosyncrasies; no core/periphery distinction.

•Data-driven: A more bottom-up approach to theexploration of linguistic universals.

6. Opportunities for Collaboration

Grammar engineering does not have to meanmore work for the field linguist!•We believe that the best model involves collab-

oration between field linguists and grammar en-gineers.

•The cost of grammar implementation is smallcompared to the overall cost of language docu-mentation.– In 210 hours of development (1/20th of the

time Nordlinger spent on the original anal-ysis), Bender (2008) was able to create agrammar that could assign appropriate anal-yses to 91% of the example sentences fromNordlinger 1998.

• In UW’s Linguistics 567 (taught annually in Win-ter quarter), students each do implementedgrammars for different languages. Many ofthese students would be very interested to workwith active field linguists.

7. Related work

Grammar engineering is not limited to HPSG!Here are some other multilingual grammar engi-neering projects:

• ParGram (LFG): http://pargram.b.uib.no

• CoreGram (HPSG): http://hpsg.fu-berlin.de/Projects/core.html

• OpenCCG (CCG): http://openccg.sourceforge.net

• PAWS (PC-PATR): http://carla.sil.org/paws.htm

8. Acknowledgments

This poster represents joint work with the Grammar Matrix developmentteam, including: Dan Flickinger, Stephan Oepen, Scott Drellishak, Lau-rie Poulson, Antske Fokkens, Michael Wayne Goodman, Kelly O’Hara,Safiyyah Saleem, Joshua Hou and Daniel P. Mills.This material is based upon work supported by the National ScienceFoundation under Grant No. 0644097. Any opinions, findings, and con-clusions or recommendations expressed in this material are those ofthe author(s) and do not necessarily reflect the views of the NationalScience Foundation. We gratefully acknowledge the support of the Uti-lika Foundation.

References

Emily M. Bender. 2008. Evaluating a crosslinguistic grammar resource:A case study of Wambaya. In Proceedings of ACL08:HLT.

Emily M. Bender, Scott Drellishak, Antske Fokkens, Laurie Poulson,and Safiyyah Saleem. 2010. Grammar customization. Research onLanguage & Computation, pp. 1–50. 10.1007/s11168-010-9070-1.

Emily M. Bender, Dan Flickinger, and Stephan Oepen. 2002. The gram-mar matrix: An open-source starter-kit for the rapid development ofcross-linguistically consistent broad-coverage precision grammars.In John Carroll, Nelleke Oostdijk, and Richard Sutcliffe, editors, Pro-ceedings of the Workshop on Grammar Engineering and Evaluationat the 19th International Conference on Computational Linguistics,pp. 8–14. Taipei, Taiwan.

Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie,editors. 2008. The World Atlas of Language Structures Online. MaxPlanck Digital Library, Munich. Http://wals.info.

Rachel Nordlinger. 1998. A Grammar of Wambaya, Northern Australia.Research School of Pacific and Asian Studies, The Australian Na-tional University, Canberra.

Carl Pollard and Ivan A. Sag. 1994. Head-Driven Phrase StructureGrammar. Studies in Contemporary Linguistics. The University ofChicago Press and CSLI Publications, Chicago, IL and Stanford, CA.

2nd International Conference on Language Documentation & Conservation, University of Hawai’i, February 11, 2011

Recommended