Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | horace-jeremy-blake |
View: | 214 times |
Download: | 0 times |
Dependency Parser for Swedish
Project for EDA171
byJonas Pålsson
Marcus Stamborg
Dependency Grammar Describes relations between words in a sentence A relation is between a head and its dependent(s) All words have a head except the root of a sentence
The big brown beaverbrown
The
beaver
big
Dependency Parsing
Find the links that connects words using a computer. Different algorithms exist. Nivre's parser has reported the best results for swedish.
Nivre's Parser Extension to Shift-Reduce. Adds arcs between input and stack. Produces a dependency graph using the following
actions: Shift - moves the input to the stack. Reduce - pops the stack. Left arc - creates an arc from input to stack. Right arc - creates an arc from stack to input.
More about actions
Nivre, J. (2004)
Corpus
Talbanken05 – modernized and computerized version of Talbanken76
Modified for use in CoNNL-X Shared Task Training set is about 11500 sentences We used a test set containing about 300 sentences
Example from the corpus:
1 Jag _ PO PO _ 2 SS _ _2 tycker _ VV VV _ 0 ROOT_ _3 det _ PO PO _ 2 OO _ _
How we did it
Collect data Build model Parse
ARFFBuilder
Trainer
Parser
Train Corpus
Data
Trained Classifier
Test Corpus with relations
Test Corpus
Collect data – Gold Standard Parsing
Build Weka compatible data file (arff). Determining the action sequence from an annotated
corpus is possible using the following rules. (Gold Standard Parsing) If input has stack as head -> Right Arc else if stack has input as head -> Left Arc else if arc exists between input and any word in stack -> Reduce else Shift
Train classifier Weka 3 – Data mining software C4.5 (J48) – Extension to the ID3 algorithm.
Generates decision trees Uses features derived from the current state of the
parser Outputs a trained classifier used by the parser to decide
the next action
Parse using trained classifier Uses the trained classifier to determine the head for
each word in a sentence Uses Nivre's algorithm with action decided by the
classifier Calculates the score as nbrWords assigned correct head
total number of words
Features All features describe the current state of the parser 1st set – Input and stack 2nd set – Input, stack and children. 3rd set – Input, stack and previous input. 4th set – Input, stack, children and previous input. We only used POS in the feature sets Using lexical values actually decreased performance For every set we used constraints to model valid actions
in the current state of the parser
Results
Input 1 2 3 4 5 6Stack 1 0.7161 0.8007 0.7972 0.7967 0.8036 0.8064
2 0.7268 0.8078 0.8055 0.8094 0.8136 0.81293 0.7275 0.8066 0.8076 0.8098 0.8129 0.81314 0.7300 0.8057 0.8076 0.8094 0.8096 0.80915 0.7309 0.8073 0.8071 0.8096 0.8101 0.80976 0.7307 0.8064 0.8071 0.8089 0.8092 0.8094
Scores using features:Stack_n_POS, Input_n_POS, Children
Input 1 2 3 4 5 6Stack 1 0.6936 0.7765 0.7804 0.7801 0.7779 0.7806
2 0.7297 0.7937 0.7970 0.7961 0.7958 0.79463 0.7300 0.7933 0.7963 0.7958 0.7940 0.79444 0.7309 0.7940 0.7967 0.7972 0.7960 0.79535 0.7327 0.7944 0.7974 0.7984 0.7969 0.79606 0.7313 0.7940 0.7972 0.7986 0.7965 0.7960
Scores using features:Stack_n_POS, Input_n_POS
Results cont.
Input 1 2 3 4 5 6Stack 1 0.7242 0.8022 0.8055 0.8052 0.8046 0.8050
2 0.7558 0.8156 0.8168 0.8179 0.8174 0.81823 0.7580 0.8152 0.8186 0.8184 0.8174 0.81844 0.7581 0.8158 0.8177 0.8184 0.8172 0.81755 0.7594 0.8167 0.8182 0.8186 0.8174 0.81776 0.7574 0.8161 0.8181 0.8177 0.8165 0.8172
Scores using features:Stack_n_POS, Input_n_POS, Children, Previous_Input_POS
Input 1 2 3 4 5 6Stack 1 0.7210 0.7999 0.8004 0.8002 0.8062 0.8076
2 0.7279 0.8064 0.8068 0.8108 0.8110 0.81423 0.7283 0.8068 0.8068 0.8101 0.8136 0.81384 0.7307 0.8068 0.8089 0.8106 0.8108 0.81055 0.7316 0.8068 0.8075 0.8103 0.8114 0.81146 0.7344 0.8064 0.8076 0.8101 0.8106 0.8108
Scores using features:Stack_n_POS, Input_n_POS, Previous_Input_POS
Conclusions Lexical values didn’t do much. Score even became
worse. Might be better with different classifying algorithm or different test corpus
Previous input word was a very effective feature, probably the single best addition from only stack and input
Difficult to find optimal feature set
Future improvements
Try other features Siblings Use LEX on specific words More words from original input string
Simulations to find the optimum feature set Use SVM instead of C4.5
Thank you for listening
More to come in the report