+ All Categories
Home > Documents > Introduction 2-Layer Factorial CRF Model

Introduction 2-Layer Factorial CRF Model

Date post: 19-Oct-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
1
ACL 2013, August, Sofia, Bulgaria Web IR / NLP Group Interactive and Digital Media Institute {wangaobo,kanmy}@comp.nus.edu.sg Aobo Wang and Min-Yen Kan Conclusion 2-Layer Factorial CRF Model Graphical representa<ons of the two types of CRFs used in this work. y t denotes the 1st layer label, z t denotes the 2nd layer label, and x t denotes the observa<on sequence. We propose to jointly model the two tasks of Informal word recognition (IWR) and Chinese word segmentation (CWS) Informal words in Chinese are difficult to recognize (shown in Figure 1) because they: Are not indicated by word delimiters Consist of a mix of numbers, alphabetic letters and Chinese characters Introduction The song is koo , doesnt really showcase anyones talent though.koo doesnt anyones cool doesn’t anyone’s Spelling Checker n连硬座都木有n 木有 很久 没有 While tools like spell checking may work to link informal English words to their formal counterpart, they don’t work for Chinese microtext (“tweet” / Weibo) Problem Formalization Incorrect segmentation (in blue rectangles) caused by informal words (in orange rectangles) Segmentations to neighbors help recognize informal words CWS and IWR are mutually dependent Formulate as a 2-layer sequential labelling task A Chinese microtext (in squares) with labels (in circles). F/IF indicates the character as part of a formal/informal word BIES is the widelyused coding scheme for segmenta<on hSps://www.comp.nus.edu.sg/~wangaobo/ACL13_Poster.pdf Experiment Results FCRF versus baselines on CWS. (‘ * ’) indicates sta<s<cal significance at p<0.001 (0.05) compared with the previous row. CWS Pre Rec F 1 OOVR HHMM (ICTCLAS, 2011) 0.640 0.767 0.698 0.551 LCRF (Sun and Xu, 2011) 0.661 0.691 0.675 0.572 LCRF iwr → LCRF cws 0.741 0.775 0.758 * 0.607 * FCRF 0.757 0.801 0.778 * 0.633 * Microtext is difficult to segment CWS benefits significantly from the results of IWR Joint inference works best IWR Pre Rec F 1 SVM (Xia and Wong, 2008) 0.382 0.621 0.473 DT 0.402 * 0.714 * 0.514 * LCRF cws → LCRF iwr 0.858 0.591 0.699 FCRF 0.877 * 0.655 * 0.750 * FCRF versus baselines on IWR. ‘ ’ (‘ * ’) indicates sta<s<cal significance at p<0.001 (0.05) compared with the previous row. SVM and DT tend to over predict informality IWR task is improved significantly with CWS tasks Joint inference again is most effective Still room for improving CWS with better IWR FCRF makes significant progress towards the UB Again, can further improve IWR with better CWS CWS enables IWR to make more predictions Upper bound systems versus their counterparts on IWR. Upper bound systems versus their counterparts on CWS. CWS (F 1 ) IWR(F 1 ) FCRFnew 0.690 0.552 FCRF 0.778 * 0.748 * Feature set evalua<on. FCRFnew refers to the system without the novel features we introduced, that are marked with “*”. Lexical Features Dictionary-based Features* Statistical Features* CWS (F 1 ) IWR(F 1 ) SVM ── 0.473 SVMJC 0.711 0.624 FCRF 0.778 * 0.748 * FCRF versus Adapted SVM for Joint Classifica<on (SVMJC). SVMJC classifies input into the space of crossproduct of the 2layer labels. Over-prediction is lessened FCRF is still more effective FCRF: Introduces a pairwise factor among different variables at each position captures the joint distribution among layers Compared with LCRF: FCRF has fewer parameters FCRF needs less training data Error Analysis Partially-observed informal words ” (“”, “very”) is a known informal word 狠久” (“很久”,”for a long <me”) is informal Extremely short sentences 肥家!太累了。。。(“回家!太累了。。。”,“Go home! Exhausted.”) The informal word itself forms a short sentence Two sentences are pragmatically related But lexical dependency is weak Freestyle Chinese Named Entities Freestyle Named EnNty ExplanaNon “榴莲雪媚娘” 榴莲” (“durian”), ” (“snow”), 媚娘” (“charming lady”) “棉宝” short for the cartoon name 海绵宝宝dj文祥” “徐ppUsernames mixed of Chinese and alphabe<c characters We evaluate our method on a manually-constructed data set with crowdsourced annotation The FCRF model yields significantly better performance than individual or sequential solutions We introduced novel features that Improve the performance significantly Upper bound systems validate the necessity and effectiveness of modeling the two tasks jointly
Transcript
Page 1: Introduction 2-Layer Factorial CRF Model

ACL  2013,    August,  Sofia,  Bulgaria      

Web IR / NLP Group

Interactive and Digital Media Institute {wangaobo,kanmy}@comp.nus.edu.sg

Aobo Wang and Min-Yen Kan

Conclusion

2-Layer Factorial CRF Model

Graphical  representa<ons  of  the  two  types  of  CRFs  used  in  this  work.  yt  denotes  the  1st  layer  label,  zt  denotes  the  2nd  layer  label,  and  xt  denotes  the  observa<on  sequence.

u We propose to jointly model the two tasks of

Informal word recognition (IWR) and Chinese word segmentation (CWS)

u Informal words in Chinese are difficult to recognize (shown in Figure 1) because they:

Ø  Are not indicated by word delimiters

Ø  Consist of a mix of numbers, alphabetic letters and Chinese characters

Introduction

“The song is koo, doesnt really showcase anyones talent though.”

koo      è  doesnt  è  

anyones è

             cool              doesn’t            anyone’s

Spelling  Checker

“排n久连硬座都木有了” n久 è  木有 è

                   很久                      没有

While  tools  like  spell  checking  may  work  to  link  informal  English  words  to  their  formal  counterpart,  they  don’t  work  for  Chinese  microtext  (“tweet”  /  Weibo)

Problem Formalization

u  Incorrect segmentation (in blue rectangles) caused by informal words (in orange rectangles) u  Segmentations to neighbors help recognize informal words u  CWS and IWR are mutually dependent

u  Formulate as a 2-layer sequential labelling task

A  Chinese  microtext  (in  squares)  with  labels  (in  circles).  F/IF  indicates  the  character  as  part  of  a  formal/informal    word  BIES    is  the  widely-­‐used  coding  scheme  for  segmenta<on  

hSps://www.comp.nus.edu.sg/~wangaobo/ACL13_Poster.pdf  

Experiment Results

FCRF   versus   baselines   on   CWS.   ‘‡’   (‘*’)   indicates   sta<s<cal  significance  at  p<0.001  (0.05)  compared  with  the  previous  row.  

CWS Pre Rec F1 OOVR

HHMM  (ICTCLAS,  2011)   0.640 0.767 0.698 0.551

LCRF  (Sun  and  Xu,  2011) 0.661‡ 0.691‡ 0.675 0.572‡

LCRFiwr  →  LCRFcws 0.741‡ 0.775‡ 0.758* 0.607*

FCRF 0.757‡ 0.801‡ 0.778*   0.633*

u  Microtext is difficult to segment u  CWS benefits significantly from the results of IWR u  Joint inference works best

IWR Pre Rec F1

SVM  (Xia  and  Wong,  2008) 0.382 0.621 0.473

DT 0.402* 0.714* 0.514*

LCRFcws  →  LCRFiwr 0.858‡ 0.591‡ 0.699‡

FCRF 0.877* 0.655* 0.750*  

FCRF  versus  baselines  on  IWR.  ‘‡’  (‘*’)  indicates    sta<s<cal  significance  at  p<0.001  (0.05)  compared  with  the  previous  row.  

u  SVM and DT tend to over predict informality u  IWR task is improved significantly with CWS tasks u  Joint inference again is most effective

u  Still room for improving CWS with better IWR u  FCRF makes significant progress towards the UB

u  Again, can further improve IWR with better CWS u  CWS enables IWR to make more predictions

Upper  bound  systems  versus  their  counterparts  on  IWR.  Upper  bound  systems  versus  their  counterparts  on  CWS.  

CWS  (F1) IWR(F1)

FCRF-­‐new 0.690 0.552

FCRF 0.778* 0.748*

Feature   set   evalua<on.   FCRF-­‐new   refers   to   the   system  without   the  novel  features  we  introduced,  that  are  marked  with  “*”.  

u  Lexical Features u  Dictionary-based Features* u  Statistical Features*

CWS  (F1) IWR(F1)

SVM   ── 0.473

SVM-­‐JC 0.711 0.624‡

FCRF 0.778* 0.748*

FCRF  versus  Adapted  SVM  for  Joint  Classifica<on  (SVM-­‐JC).  SVM-­‐JC  classifies  input  into  the  space  of  cross-­‐product  of  the  2-­‐layer  labels.  

u  Over-prediction is lessened u  FCRF is still more effective

u  FCRF: Ø  Introduces a pairwise factor among different variables at each position Ø  captures the joint distribution among layers

u  Compared with LCRF: Ø  FCRF has fewer parameters

Ø  FCRF needs less training data

Error Analysis

u  Partially-observed informal words  “狠”  (“很”,  “very”)  is  a  known  informal  word                        “狠久”  (“很久”,”for  a  long  <me”)  is  informal  

u  Extremely short sentences  “肥家!太累了。。。”    (“回家!太累了。。。”,“Go  home!  Exhausted.”)  

Ø  The informal word itself forms a short sentence Ø  Two sentences are pragmatically related Ø  But lexical dependency is weak

u  Freestyle Chinese Named Entities

Freestyle  Named  EnNty

ExplanaNon

“榴莲雪媚娘” “榴莲”  (“durian”),    “雪”  (“snow”),  “媚娘”  (“charming  lady”)

“棉宝” short  for  the  cartoon  name    “海绵宝宝”

“dj文祥”  “徐pp”

Usernames  mixed  of  Chinese  and  alphabe<c  characters

u  We evaluate our method on a manually-constructed data set with crowdsourced annotation u  The FCRF model yields significantly better performance than individual or sequential solutions

u  We introduced novel features that Improve the performance significantly

u  Upper bound systems validate the necessity and effectiveness of modeling the two tasks jointly

Recommended