+ All Categories
Home > Documents > 1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.

1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.

Date post: 20-Dec-2015
Category:
View: 219 times
Download: 6 times
Share this document with a friend
29
1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006
Transcript

1

More Xkwic and Tgrep

LING 5200Computational Corpus LinguisticsMartha PalmerMarch 2, 2006

LING 5200, 2006 BASED on Kevin Cohen’s LING

52002

Resources – Laura is bugging me to make a CU Corpora page… Like this

http://www.stanford.edu/dept/linguistics/corpora/cas-home.html

TGREP http://www.stanford.edu/dept/linguistics/corpora/cas-tut-tgrep.html

LING 5200, 2006 BASED on Kevin Cohen’s LING

52003

Searching with pos tags and !

[word = "[tT]he" & !( pos = "DT" ) ]; wsj

[ !(word = "water" | pos = "NN")]; [ !(word = "water") & !( pos = "NN")]; [ word != "water" & pos != "NN" ];

LING 5200, 2006 BASED on Kevin Cohen’s LING

52004

Operator precedence

The precedence properties of the (logical) operators are defined by the following list, i.e. if operator x is listed before operator y, operator x has precedence over y. Operators are evaluated left-right

=, !=, !, &, | [ ! word = "water" & ! pos = "NN" ];

disambiguates as [ !(word = "water") & !( pos = "NN")];

LING 5200, 2006 BASED on Kevin Cohen’s LING

52005

Searching sequences with | and ? "Bill" [pos = "NP"];

[pos = "NP"] [pos = "NP"] [pos = "NP"];

([pos = "NP"] [pos = "NP"]) | ([pos = "NP"] "of" [pos = "NP"]); ([pos = "NP"] "of“? [pos = "NP"]); Note: First match applies

LING 5200, 2006 BASED on Kevin Cohen’s LING

52006

Corpus Position: wild cards and contexts "give" []* "up"; "give" []{0,5} "up"; "give" []* "up" within 7; "Clinton" expand to 5; "Clinton" expand left to 5; "Clinton" expand right to 5;

LING 5200, 2006 BASED on Kevin Cohen’s LING

52007

Assignments and Intersect

Q1 = "rain"; Q2 = [pos="NN"]; intersect Q1 Q2;

Q1 = [pos = "JJ"] [pos = "NN"]; Q2 = "acid" "rain"; intersect Q1 Q2; [word = "acid" & pos = "JJ"] [word =

"rain" & pos = "NN"]

LING 5200, 2006 BASED on Kevin Cohen’s LING

52008

Structural restrictions

"give" []* "up" within s;

("gain" []* "profit") | ("profit" []* "gain") within 3 s;

("gain" []* "profit") | ("profit" []* "gain") within article;

"Clinton" expand left to 2 s;

LING 5200, 2006 BASED on Kevin Cohen’s LING

52009

Defining structural restrictions

Nounphrase = [pos = "DT"] [pos = "JJ"] [pos = "NN"];

Nounphrase;

[pos = “JJ”]

Go back to select

LING 5200, 2006 BASED on Kevin Cohen’s LING

520010

For fun

<s> [pos = "V.*"][pos = "PN.*”] </s>

<s> []* [pos = "V.*"][pos = "PN.*”] </s>

( [pos = “V.*”] [pos = “PN.*”]) within s

Not a question, not beginning of sentence…

LING 5200, 2006 BASED on Kevin Cohen’s LING

520011

less is more

less <filename> cat ??/* | less Switches

SPACE – next screenful b– previous screenful /<reg exp pattern> /RNR search for pattern ?<reg exp pattern> search backwards for

pattern q - quit

LING 5200, 2006 BASED on Kevin Cohen’s LING

520012

Searching for a word

tgrep Halloween – what happens? Why don’t you have to specify a file?babel>grep tgrep .cshrc

# tgrep stuff

#setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tgrepabl/brwn_cmb.crp

setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/wsj_mrg.crp

Count results: tgrep research | wc –l cat ??/* | grep Halloween | wc -l

LING 5200, 2006 BASED on Kevin Cohen’s LING

520013

Tgrep Switches

-a Match on all patterns in a sentence -w Return the whole sentence -n Put the entire string on one line -t Print only the terminals

LING 5200, 2006 BASED on Kevin Cohen’s LING

520014

Viewing it in sentential context tgrep –wn Halloween | more

tgrep –wn research | more (20,865 hits)

Can also use less

LING 5200, 2006 BASED on Kevin Cohen’s LING

520015

Viewing it in sentential context tgrep –wn research | more

LING 5200, 2006 BASED on Kevin Cohen’s LING

520016

Searching by POS tgrep NNS | more

Another way to do your sanity check

LING 5200, 2006 BASED on Kevin Cohen’s LING

520017

See more data?

tgrep NNS | grep . | more

LING 5200, 2006 BASED on Kevin Cohen’s LING

520018

Sentential context (again) tgrep –wn NNS | more

LING 5200, 2006 BASED on Kevin Cohen’s LING

520019

Searching by syntactic constituent tgrep NP | more

LING 5200, 2006 BASED on Kevin Cohen’s LING

520020

Single-line outputs tgrep –n NP | more

LING 5200, 2006 BASED on Kevin Cohen’s LING

520021

Viewing tree-like output tgrep –w NP | head 20

LING 5200, 2006 BASED on Kevin Cohen’s LING

520022

Searching for relations between nodes tgrep ‘NP < CC’ | head -16

LING 5200, 2006 BASED on Kevin Cohen’s LING

520023

tgrep –g (whole language)

A < B – A immediately dominates B A < B – A is immediately dominated by B A << B – A dominates B A >> B – A is dominated by B A . B – A immediately precedes B A .. B – A precedes B A<<,B – B is the leftmost descendent of A A<<‘B – B is the rightmost descendent of

A

LING 5200, 2006 BASED on Kevin Cohen’s LING

520024

Alternation

node names can be ORed e.g. tgrep ‘Clinton|Gore’ | head

LING 5200, 2006 BASED on Kevin Cohen’s LING

520025

Character classes

Regular expressions tgrep ‘/[Cc]hild/’ | egrep . | head

LING 5200, 2006 BASED on Kevin Cohen’s LING

520026

Working towards that weird example… tgrep ‘/[Pp]resident/’ | head

LING 5200, 2006 BASED on Kevin Cohen’s LING

520027

Combining alternation and a regular expression tgrep ‘Clinton|Gore|[Pp]resident/’ |

head

LING 5200, 2006 BASED on Kevin Cohen’s LING

520028

Searching for a transitive verb

tgrep -w 'VP << like < NP << DT' | more

LING 5200, 2006 BASED on Kevin Cohen’s LING

520029

Verbs + Particles

tgrep -w 'VP << kick' > kick

tgrep 'VP << /kick.*/ <2 PRT' kick

tgrep 'VP <1 VB <2 PRT' kick

tgrep -nw 'VP <1 /VB.*/ <2 PRT' kick

tgrep 'VP <1 (VB < kick) <2 PRT' kick

tgrep 'VP <1 (/VB.*/ < kick) <2 PRT' kick


Recommended