SAE: Structured Aspect Extraction

Meltwater Meetup Budapest - 7 Sep. 2016

Omer Gunes and Tim Furche

Structured Aspect ExtractionGiorgio Orsi

University of Birmingham University of Oxford

Aspect Extraction (AE)

Identifying relevant features of an explicit or implicit entity of interest

The Sony Xperia XZ is the new headliner with top-of-the-line hardware, a bigger display, a new and improved camera, squared design, and, of course, water-proofing.

Sony Xperia XZ

Entity (explicit) Aspects

new headliner

top-of-the-line hardware

bigger display

new and improved camera

squared designwater-proofing

[Zhang and Liu, 2014]

Sentiment Analysis

Aspect (entity) based

new headliner

top-of-the-line hardware

bigger display

new and improved camera

squared design

water-proofing 0.218


0.476

0.476

0.476

Sony Xperia XZ

0.476

0.641

0.350

course 0.341

⟨ headliner, yes ⟩

⟨ hardware, top-of-the-line ⟩

⟨ display, { yes, bigger } ⟩

⟨ camera, { yes, new, improved } ⟩

⟨ design, squared ⟩

⟨ water-proofing, yes ⟩


Aspect extraction vs attribute extraction

Knowledge Base Construction

Basically, you want the attribute (i.e., aspect term) names and factual values

⟨ OEM, Sony ⟩

⟨ model, Xperia XZ ⟩

[Shin et al., 2015]

Structured Aspect Extraction (SAE)

Victorian two bedroom mid terrace property

Extends AE with fine-grained extraction and typing of complex (i.e., hierarchical) aspects

Victorian two bedroom mid terrace propertyAspect term extraction (ATE)

⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩Segmentation

⟨ { JJ, ⟨ { CD }, bedroom ⟩, mid terrace }, property ⟩Typing and Generalisation

modifiers = {qualifiers, quantifiers}

SAE: Why it is hard

Victorian two bedroom mid terrace property located in Cambridge and comprising of living room with ORIGINAL!!! cupboards, and ORIGINAL!!! picture rail.Stairway off living room leads to two bedrooms.

Noisy unstructured text (NUT)

bedroom mid terrace

picture rail.Stairway

cupboards

Cambridge

bedrooms

ORIGINALproperty

Cambridgerail.Stairway

Victorian

Cambridgerail.Stairway

cupboards

property

room

bedrooms

SAE: Why it is hard

Noisy unstructured text (NUT)

By the time we get to the dependency parser we have lost the battle already

The problems start with the tokenizer

picture rail.Stairway

Victorian two bedroom mid terrace property located in Cambridge and comprising of

living room with ORIGINAL !!! cupboards, and ORIGINAL !!! picture rail.Stairway off living

room leads to two bedrooms.

and continue with the POS tagger

NN NN NN

VBN

NNPNNP

NN NN

NN

NN NN

JJ JJ

JJ

CD VBG

VBG NNP NNP CC

CDVBZ

NNP VBG

Unsupervised SAE

Large corpus of homogeneous documents (50k ~ 250k)

same domain (use a classifier), preferably no bundles

Normalisation and tagging

tokenisation (NUT specific) orthography normalisation (most common orthography) POS tagging (Hepple’s on TreeBank) NP chunking (Ramshaw – Mitchell)

NP Clustering

head noun lemmatization (approx. last noun in NP) frequent head nouns -> aspect terms

Segmentation

cPMI optimal parsing of an NP -> modifiers / multi-words

Generalisation and typing

structured aspect patterns (SAP) entity, aspect term, qualifier, quantifier

NP Clustering

Two further double bedrooms Three further double bedrooms A further double bedroom Two first floor bedrooms …

Input: A large number of (normalized) NPs

Abstraction of numerical expressions + removal of non-content word prefixes

CD further double bedrooms CD further double bedrooms DT further double bedroom CD first floor bedrooms

{ CC, DT, EX, IN, PRP, PUNC }

Filter head nouns (exp. set but 70-75% of the corpus) and cluster them

Dameraau-Levenshtein to compensate for mispells

{ CD further double bedrooms further double bedroom CD first floor bedrooms }

[ bedroom ]

Segmentation

Victorian two bedroom mid terrace property

Basically, we have to assign the elements of the NP modifiers to: a multi-word expression an aspect term find sub-patterns

⟨ Victorian ⟨ two bedroom mid ⟩ ⟨ terrace ⟩ property ⟩

⟨ Victorian ⟨ two bedroom ⟩ ⟨ mid terrace ⟩ property ⟩

⟨ Victorian ⟨ two bedroom ⟩ mid terrace property ⟩

Valid parenthesizations

balanced parenthesization (algorithms and data structures – DP) for each level k of the parenthesization

we have at least two elements it either terminates with a head of cluster OR it contains no head of cluster

Segmentation

cPMI-optimal parenthesizations

Adaptation of corpus-wide Point-wise Mutual Information (cPMI)

missing the sentence boundary. The POS tagger cannot therefore produce the correct tags for Don’t. Thesecond sentence shows how POS tagging improves after our improved tokenization and orthographicnormalization. We do not only produce more sensible POS tags, but we also recover the correct sentenceboundaries.

Normalized texts are then split into sentences and POS tagged. We use a state-of-the-art rule-basedsentence splitter and Hepple’s POS tagger tagger (Hepple, 2000) trained on the Penn TreeBank Corpus.The tagged sentences are handed over to a rule-based NP chunker (Ramshaw and Mitchell, 1999) toproduce a corpus of noun phrases.

Noun-phrase clustering NPs are clustered around their head nouns. Head nouns are stemmed andlemmatized, e.g., by normalizing plurals, to avoid over-segmentation of the clusters due to differentsurface forms of equivalent head nouns.

The modifiers of the NPs are also normalized to prevent non-content prefixes and numerical expres-sions from fragmenting the clusters.

NPs are generalized to abstract forms where non-content words and numerical expressions are replacedby POS tags. A non-content word is a token with a POS tag in {CC, DT, EX, IN, PRP, PUNC}, while anumerical expression has POS tag CD. Prefixes consisting only of non-content words are then removedfrom the NPs. The process is illustrated in Example 3.

Example 3: Clustering

Two further double bedrooms CD further double bedrooms {CD further double bedrooms, [BEDROOM]Three further double bedrooms CD further double bedrooms further double bedrooms,A further double bedroom DT further double bedroom CD first floor bedrooms}Two first floor bedrooms CD first floor bedrooms

We start with four NPs with 2 different surface forms for the head noun (i.e., bedroom and bedrooms).For three of the NPs, the modifier starts with a numerical expression. This remains part of the NP but asa POS tag (CD). The non-content word prefix A (POS tag DT), is removed from the NP. The result ofthis process is a single cluster headed by BEDROOM and consisting of three partially-generalized NPs. Theclusters are filtered based on their cardinalities, i.e., the number of (possibly duplicated) NPs belongingto the cluster. The top-k clusters whose elements cover at least a certain percentage (experimentally setat 70%) of all the NPs are retained. Clusters can also be fragmented by spelling errors. The head nounsof discarded clusters are therefore checked for similarity against the head nouns of the retained ones.Two clusters are merged if the Dameraau-Levenshtein string edit distance is less than an experimentallyset threshold. If multiple merging options are possible, the one with highest similarity is chosen. Ifonly equivalent options are available, we merge in all possible ways. Clearly, the normalization of thenoun-phrase modifiers described above can affect the ranking of the noun-phrases, since clusters can bemerged thus increasing their ranking.

Segmentation and typing The segmentation phase identifies multi-word expressions and hierarchicalstructures in NP modifiers, thus producing a first approximation of an SAP. The key tool used in thesegmentation is corpus-level significant point-wise mutual information (cPMI) (Damani and Ghonge,2013). Our definition of cPMI uses the corpus of NPs instead of arbitrary descriptions. Let C be the set ofall clusters produced as described above. We denote by fC(t) the frequency of the string t in all clustersof C, i.e., obtained by summing up all of the occurrences of t in all clusters. Let 0 < � < 1 be thenormalization factor defined as in (Damani and Ghonge, 2013), and tkw, the concatenation of two stringst and w. We then define cPMIC(t,w) as follows:

cPMIC(t,w) = log

fC(tkw)

fC(t) · fC(w)|C| +

pfC(t) ·

qln(�)(�2)

The cPMI value is used to determine whether a token should be associated with (i) the head noun,(ii) a nested token representing the head of a different cluster, thus possibly inducing a nested structure,or (iii) an adjacent token, thus forming a multi-word expression.

⟨ Victorian ⟨ two bedroom ⟩ ⟨ mid terrace ⟩ property ⟩

Parenthesization that maximises cPMInp becomes a (ground) structured aspect pattern (SAP)

⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩

cPMInp = cPMIC (Victorian, property) + cPMIC (two bedroom, property) + cPMIC (mid terrace, property) + cPMIC (two, bedroom) + cPMIC (mid, terrace)

[Damani and Ghonge 2013]

Typing and Generalisation


Given a (ground) SAP…

Victorian → property-qualifier

two bedroom → property-qualifier

mid terrace → property-qualifier

property → property

two → bedroom-quantifier

bedroom → property{


Ground SAPs have good precision but pretty bad recall

POS-based pattern generalization

non-content words are always generalized aspect terms generalized only if a nested pattern with a ground head exists qualifiers are generalized one-at-a-time


⟨ { JJ, ⟨ { CD }, bedroom ⟩, mid terrace }, property ⟩

⟨ { Victorian, ⟨ { CD }, bedroom ⟩, JJ terrace }, property ⟩

⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid JJ }, property ⟩

⟨ { Victorian, ⟨ { two }, bedroom ⟩, JJ }, property ⟩

⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, NN ⟩

no labelled dataset is available. We take the heads of the noun-phrase clusters as a surrogate of the set ofvalid aspects. The analysis is limited to aspect terms. Let T be the set of valid aspect terms as definedabove, and E be the set of aspect terms produced by an SAP P. The score of P is computed as:

⌫(P) =|T |

Pe2E(1� maxt2T (

dist(t,e)len(t) < 0.2))

· log |T | ⌫(P) 2 [0,1]

where dist(t, e) denotes the Dameraau-Levenshtein edit distance between two strings t and e and len(·)denotes the length of the string. Patterns scoring less than an experimentally set threshold are eliminated.

3 Evaluation

Our method (SysName) is implemented in Java. All experiments are run on a Dell OptiPlex 9020 withtwo quad-core i7-4770 Intel CPUs at 3.40GHz and 32GB RAM, running Linux Mint 17 Qiana. Allresources used in the evaluation are made available for replicability.2

Datasets and metrics We use three groups of datasets in our evaluation (Table 1): The first two con-sist of the SemEval143 and SemEval154 datasets used for the aspect term extraction (ATE) and opiniontarget expression (OTE) subtasks of the aspect-based sentiment analysis (ABSA) task. The datasets pro-vide laptops, restaurants and hotel reviews with associated gold standard (GS) annotations. The hotel

domain is meant to be used in a completely unsupervised setting and therefore no training data was madeavailable. The size of SemEval14 in Table 1 is expressed in number of sentences instead of number oftexts since this information is unavailable. We complement the SemEval15 datasets with some specifi-cally designed for SAE (SAED). We provide texts from six domains. Four of them, i.e., chairs, real estate,shoes, and watches, describe products. The Amazon texts come from the Stanford’s Snap Lab’s web datacorpus (McAuley and Leskovec, 2013). The two remaining domains, i.e., hotels and restaurants, can beclassified as services. These descriptions are still feature intensive but, differently from the products, thefeatures are loosely connected to the main entity, i.e., locations, services/facilities offered. The datasetconsists of both NUT and (semi-) formal English texts. We provide GS annotations for 150 texts equallydistributed across the six domains. The GS provides and average of 355 aspect terms, 30 quantifiers,430 qualifiers, and 45 nested aspects per domain. Annotations were produced by 6 independent anno-tators (�=87%). We use standard recall, precision, and F1 score metrics. However, due to the differentgranularity of the output produced by the systems and of the GS annotations, the definition of a correctextraction varies slightly with each evaluation task.

Table 1: Datasets

DATASET DOMAIN SIZE (#texts) SOURCES CATEGORY FORMALITY TYPE

SemEval14 restaurants 3k + 800 GS (*) Citysearch service NUT evaluativelaptops 3k + 800 GS (*) N/A product NUT evaluative

SemEval15 restaurants 254 + 96 GS Citysearch service NUT evaluativehotels N/A + 30 GS Citysearch service NUT evaluative

SAED

chairs 94k + 25 GS Amazon, GumTree product NUT descriptivehotels 20k + 25 GS TripAdvisor service formal descriptive

real estate 87k + 25 GS RightMove product semi-formal descriptiverestaurants 115k + 25 GS TripAdvisor service formal descriptive

shoes 46k + 25 GS Amazon, GumTree product NUT descriptivewatches 10k + 25 GS Amazon, GumTree product NUT descriptive

Comparative evaluation – Simplified SAE The method by (Kim et al., 2012), hence ATL, is currentlythe closest to SAE we are aware of. We have obtained from the authors the dataset used in their evaluation

2All resources are available at http://bit.ly/29YtM3K and include: the SAED dataset and GS, our reimplementations ofIIITH and ATL, a compiled version of SysName, and all output files generated by all systems.

3

4http://alt.qcri.org/semeval2015/task12/

where: is the set of reference aspect terms (cluster heads) is the Dameraau – Levenshtein distance is the length of the string


⌫(P) =|T |

Pe2E(1� maxt2T (


· log |T | ⌫(P) 2 [0,1]


3 Evaluation




Table 1: Datasets




SAED






3



⌫(P) =|T |

Pe2E(1� maxt2T (


· log |T | ⌫(P) 2 [0,1]


3 Evaluation




Table 1: Datasets




SAED






3



Pattern scoring [Gupta and Manning, 2014]

Score patterns on their ability to discriminate between correct and incorrect extractions No labelled dataset available → use cluster heads as surrogate labels


⌫(P) =|T |

Pe2E(1� maxt2T (


· log |T | ⌫(P) 2 [0,1]


3 Evaluation




Table 1: Datasets




SAED






3


Patterns scoring less than an experimentally set threshold are eliminated

Pattern Matching

Pattern references

nested patterns are not repeated, they reference to each others enables parallel SAP generalisation and matching

⟨ { JJ, #SAPbedroom , mid terrace }, property ⟩ ⟨ { Victorian, #SAPbedroom , JJ terrace }, property ⟩ ⟨ { Victorian, #SAPbedroom , mid JJ }, property ⟩

⟨ { Victorian, #SAPbedroom , JJ }, property ⟩

SAPproperty

⟨ { Victorian, #SAPbedroom , mid terrace }, NN ⟩SAPNN

SAPbedroom

⟨ { two }, bedroom⟩ ⟨ { CD }, bedroom ⟩

How fast?

Induction: 10-14 msec / sentence Matching 2-3 msec / text bottlenecks: morphological analysis and cPMI-optimal segmentation

Evaluation

Datasets

SemEval OTE/ATE only useful for aspect terms We provide a SAED (Structured Aspect Extraction Dataset - http://bit.ly/2caeXf3)


⌫(P) =|T |

Pe2E(1� maxt2T (


· log |T | ⌫(P) 2 [0,1]


3 Evaluation




Table 1: Datasets




SAED






3


Systems

The SemEval 14/15 systems IITH [Raju et al., 2009] ATL [Kim et al., 2012] ATEX [Zhang and Liu, 2014]

Evaluation

ATE setting (SemEval Dataset)

020406080100

HIS_RD

DLIREC(U

)

NRC-Can

UNITOR

(U)

XRCE

SAP_RI

IITP

Seem

Go

ATEX(U

)

IIITH

(U)

ATL(U)

Sysnam

e(U)

Supervised Unsupervised

Restaurants Laptops

(a) SemEval14 Dataset

020406080100

ISISLif

LT3(U)

Elixa

(U)

Sentiue

UFGR

S

Wnlp V3

IIITH

(U)

ATL(U)

ATEX(U

)

SysNam

e(U)


Restaurants Hotels

(b) SemEval15 Dataset

020406080

100

R P F1 R P F1 R P F1 R P F1 R P F1 R P F1

Chairs Hotels RealEstate Restaurants Shoes Watches

IIITH ATL ATEX SysName

(c) SAED Dataset

Figure 2: SysName vs. others in ATE

structures is indeed a much more challenging task than simply identifying them. Another interestingaspect is the impact of the generalization on the performance. Generalized SAPs produce 444 correctextractions against the 386 of the ground ones (+15%).

0

20

40

60

80

100



ATE Simpl.SAE FullSAE

Figure 3: SysName on full SAE

Corpus size One obvious question is how dependent our method is on the size of the training corpus.To measure this, we evaluated SysName by inducing SAPs from increasing subsets of the original corporacorresponding to fractions of 1%, 5%, 10%, 25%, and 50% of their original size. Figure 4 shows theeffect of the corpus size on the performance of SysName in both the SAE and ATE settings. Clearly, largercorpora lead to better results in both settings. However, two interesting facts have been observed. Long-tail aspects are only induced from sufficiently large corpora (thus the behavior between 10% and 50%).Larger corpora also have the disadvantages that sufficiently frequent but incorrect tokens can end up be-ing extracted as modifiers. In other words, the increase in recall is not matched by a comparable increaseof precision (Figure 4b). In the case of SAE we even observe a slight drop in precision (Figure 4a).

SemEval 2014

020406080100

HIS_RD

DLIREC(U

)

NRC-Can

UNITOR

(U)

XRCE

SAP_RI

IITP

Seem

Go

ATEX(U

)

IIITH

(U)

ATL(U)

Sysnam

e(U)


Restaurants Laptops


020406080100

ISISLif

LT3(U)

Elixa

(U)

Sentiue

UFGR

S

Wnlp V3

IIITH

(U)

ATL(U)

ATEX(U

)

SysNam

e(U)


Restaurants Hotels


020406080

100




(c) SAED Dataset



0

20

40

60

80

100






SemEval 2015

Evaluation

Simplified SAE setting (SAE Dataset)

but not an implementation of the system. We have reimplemented the method and successfully repro-duced the experimental results described in the original paper. Figure 1 shows a comparison between ATLand SysName on the SAED dataset. An extraction is correct if modifiers and aspect terms match exactlythe GS annotations, and if modifiers are correctly typed as qualifiers or quantifiers. This is a simplifiedSAE setting where we do not require correct linking of modifiers to aspect terms. SysName performs 33%

0

20

40

60

80

100



ATL SysName

Figure 1: SysName vs. ATL on simplified SAE (SAED dataset)

better than ATL in average, outperforming it in all domains. Besides being unable to extract hierarchicalstructures, a visible issue in ATL is the inability to establish and leverage the semantic connection betweenmodifiers and aspect terms. This leads to a number of incorrect extractions for both aspect terms andmodifiers that could be avoided by leveraging, e.g., statistical co-occurrence or cPMI.

Comparative evaluation – ATE Restricting the evaluation to aspect terms makes it possible to compareSysName against other ATE systems. We denote by IIITH and ATEX the methods proposed by (Raju et al.,2009) and (Zhang and Liu, 2014) respectively. IIITH uses unsupervised clustering of noun-phrases toderive aspect terms and is therefore similar to SysName. We could not obtain the original IIITH systemfrom the authors so the evaluation relies on our own implementation. ATEX, on the other end, is chosenbecause of its ATE method based on topic modeling. Moreover, it is freely available for testing. Anaspect term is correctly extracted if it matches exactly a GS annotation. For SysName, IIITH, and ATLwe used the SAED corpora for training. Figures 2a and 2b show the results for the SemEval14 andSemEval15 datasets respectively. For all systems, except SysName, IIITH, ATEX, and ATL, we report thenumbers provided in the corresponding SemEval papers. The symbol (U) denotes systems that haveused additional data besides the training set provided by SemEval. These setting is called unconstrainedin the SemEval guidelines. SysName outperforms all unsupervised systems and even some supervisedones. Moreover, this is a lower bound for SysName due to a difference between the granularity of theSemEval15 GS and the output produced by SysName. As an example, Egyptian restaurant is considereda correct aspect term by SemEval15 but SysName would only produce restaurant as the aspect term andEgyptian as its modifier (and thus a miss for SemEval15). If we count an extraction that hits the entityas correct, our performance increases by 2% and 3% on restaurant and hotels respectively. A strikingresult is the performance achieved by SysName on the laptops domain in SemEval14, where also allsupervised systems are outperformed. As for many other product-like domains, aspect terms in thelaptops domain frequently fall within the scope of noun-phrases that are easily processed by our method.This is much less true for other service-like domains such as, e.g., restaurants and hotels. Figure 2c showsthe performance of SysName, IIITH, ATEX, and ATL on the SAED dataset. In this case, the GS differentiatesbetween aspect terms and modifiers, thus explaining the lower performance of traditional ATE systems,e.g., IIITH and ATEX, and the higher accuracy of, e.g., ATL that is able to appreciate this difference. ATEXalso struggles on restaurants due to long sentences.

Full SAE We evaluate the performance of SysName in the full SAE setting using the SAED dataset.An extraction is correct if aspect terms and modifiers match the GS annotations, including the correct(and possibly hierarchical) associations between modifiers and entities. In average (Figure 3), SysNameachieves an F1 of 58.6% in the full SAE setting, sensibly below the one obtained in the ATE (i.e., 79.6%)and simplified SAE (i.e., 67.8%) settings. Linking modifiers to aspect terms in the presence of hierarchical

Correct extraction: correct aspect term + correct modifier + correct typing for the modifier (i.e., qualifier / quantifier)

Evaluation

Full SAE setting (SAE Dataset)

Correct extraction: correct aspect term + correct modifier + correct typing for the modifier (i.e., X–quantifier, Y–qualifier) + correct linking (modifier-entity, sub-patterns)

020406080100

HIS_RD

DLIREC(U

)

NRC-Can

UNITOR

(U)

XRCE

SAP_RI

IITP

Seem

Go

ATEX(U

)

IIITH

(U)

ATL(U)

Sysnam

e(U)


Restaurants Laptops


020406080100

ISISLif

LT3(U)

Elixa

(U)

Sentiue

UFGR

S

Wnlp V3

IIITH

(U)

ATL(U)

ATEX(U

)

SysNam

e(U)


Restaurants Hotels


020406080

100




(c) SAED Dataset



0

20

40

60

80

100






SAE is substantially harder than ATE/OTE and simplified SAE

Evaluation

Effect of corpus size (SAE Dataset)

The larger the corpus… the better?

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

AVGR AVGP AVGF1

(a) SAE Task

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

AVGR AVGP AVGF1

(b) ATE Task

Figure 4: Performance vs. corpus size (average – SAED dataset)

Figures 5 and 6 show the breakdown of this experiment by domain for the ATE and SAE tasks re-spectively. The breakdown allows us to draw further conclusions on the relationship between the sizeof the corpus and the performance of the SAPs. There is a relationship between the variety of featuresand the amount of data that is necessary to induce good quality SAPs. For domains such as, e.g., chairs,realestate, shoes, and watches, starting from 25% of the size of the corpus we do not notice substantialimprovements in performance. This can be explained by the nature of the features in these domains thatare intrinsically limited, e.g., make and models of the products, types of real estate properties, etc. In therestaurants and hotel domains the texts are much more variegated in features, e.g., restaurant and hotelnames, dishes, locations, etc. Despite the large amount of texts available, it seems that our method wouldrequire even larger corpora before being able to converge to a stable set of aspects.

Efficiency: Finally, we evaluate the efficiency of the SAP induction and matching phases. SysName’sefficiency mostly depends on the length of the sentences, due, e.g., to the morphological analysis, ourcPMI-based segmentation, and pattern matching. SysName induces SAPs at a rate of 14 ms/sent and 6ms/sent for long (i.e., �10 tokens) and short (i.e., <10 tokens) sentences respectively. The matchingtime per sentence is almost negligible and ranges between 2ms and 3ms per text. We also notice a linearcorrelation between the size of the SAP and its matching time. This is achieved, despite the presence ofhierarchical structures, by replacing nested patterns with references to the corresponding SAP clusters,enabling parallel matching of the nested SAPs. In terms of training time, SysName can be trained on 20ktexts within 1hr on average. IIITH and ATL require more than 15hrs on the same dataset.

References[Cassell2000] Justine Cassell. 2000. Embodied conversational interface agents. Commun. ACM, 43(4):70–78.

[Chen et al.2014] Zhiyuan Chen, Arjun Mukherjee, and Bing Liu. 2014. Aspect extraction with automated priorknowledge learning. In Proc. of ACL, pages 347–358.

[Choi and Cardie2010] Yejin Choi and Claire Cardie. 2010. Hierarchical sequential learning for extracting opin-ions and their attributes. In Proc. of ACL, pages 269–274.

[Damani and Ghonge2013] Om P Damani and Shweta Ghonge. 2013. Appropriately incorporating statistical sig-nificance in pmi. In Proc. of EMNLP, pages 163–169.

[Ghani et al.2006] R. Ghani, K. Probst, Y. Liu, M. Krema, and A. Fano. 2006. Text mining for product attributeextraction. SIGKDD Explorations, 8(1):41–48.

[Gupta and Manning2014] Sonal Gupta and Christopher D Manning. 2014. Improved pattern learning for boot-strapped entity extraction. In CoNLL, pages 98–108.

[Hepple2000] M. Hepple. 2000. Independence and commitment: Assumptions for rapid training and execution ofrule-based pos taggers. In Proc. of ACL, pages 278–277.

[Hu and Liu2004] Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proc. ofSIGKDD, pages 168–177.

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

AVGR AVGP AVGF1

(a) SAE Task

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

AVGR AVGP AVGF1

(b) ATE Task

Figure 4: Performance vs. corpus size (average – SAED dataset)

Figures 5 and 6 show the breakdown of this experiment by domain for the ATE and SAE tasks re-spectively. The breakdown allows us to draw further conclusions on the relationship between the sizeof the corpus and the performance of the SAPs. There is a relationship between the variety of featuresand the amount of data that is necessary to induce good quality SAPs. For domains such as, e.g., chairs,realestate, shoes, and watches, starting from 25% of the size of the corpus we do not notice substantialimprovements in performance. This can be explained by the nature of the features in these domains thatare intrinsically limited, e.g., make and models of the products, types of real estate properties, etc. In therestaurants and hotel domains the texts are much more variegated in features, e.g., restaurant and hotelnames, dishes, locations, etc. Despite the large amount of texts available, it seems that our method wouldrequire even larger corpora before being able to converge to a stable set of aspects.

Efficiency: Finally, we evaluate the efficiency of the SAP induction and matching phases. SysName’sefficiency mostly depends on the length of the sentences, due, e.g., to the morphological analysis, ourcPMI-based segmentation, and pattern matching. SysName induces SAPs at a rate of 14 ms/sent and 6ms/sent for long (i.e., �10 tokens) and short (i.e., <10 tokens) sentences respectively. The matchingtime per sentence is almost negligible and ranges between 2ms and 3ms per text. We also notice a linearcorrelation between the size of the SAP and its matching time. This is achieved, despite the presence ofhierarchical structures, by replacing nested patterns with references to the corresponding SAP clusters,enabling parallel matching of the nested SAPs. In terms of training time, SysName can be trained on 20ktexts within 1hr on average. IIITH and ATL require more than 15hrs on the same dataset.

References[Cassell2000] Justine Cassell. 2000. Embodied conversational interface agents. Commun. ACM, 43(4):70–78.

[Chen et al.2014] Zhiyuan Chen, Arjun Mukherjee, and Bing Liu. 2014. Aspect extraction with automated priorknowledge learning. In Proc. of ACL, pages 347–358.

[Choi and Cardie2010] Yejin Choi and Claire Cardie. 2010. Hierarchical sequential learning for extracting opin-ions and their attributes. In Proc. of ACL, pages 269–274.

[Damani and Ghonge2013] Om P Damani and Shweta Ghonge. 2013. Appropriately incorporating statistical sig-nificance in pmi. In Proc. of EMNLP, pages 163–169.

[Ghani et al.2006] R. Ghani, K. Probst, Y. Liu, M. Krema, and A. Fano. 2006. Text mining for product attributeextraction. SIGKDD Explorations, 8(1):41–48.

[Gupta and Manning2014] Sonal Gupta and Christopher D Manning. 2014. Improved pattern learning for boot-strapped entity extraction. In CoNLL, pages 98–108.

[Hepple2000] M. Hepple. 2000. Independence and commitment: Assumptions for rapid training and execution ofrule-based pos taggers. In Proc. of ACL, pages 278–277.

[Hu and Liu2004] Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proc. ofSIGKDD, pages 168–177.

SAE setting

ATE setting

Evaluation

Effect of corpus size (SAE Dataset)

Not necessarily… your often reach a point where more data is not going to help

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

ChairsRChairsPChairsF1

(a) chairs

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

HotelsR HotelsP HotelsF1

(b) hotels

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

RealEstateRRealEstatePRealEstateF1

(c) realestate

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

RestaurantsRRestaurantsPRestaurantsF1

(d) restaurants

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

ShoesR Shoes P Shoes F1

(e) shoes

0

20

40

60

80

100

1% 5% 10% 25% 50% 100%

WatchesR WatchesP WatchesF1

(f) watches

Figure 5: Performance vs. corpus size (ATE task per domain – SAED dataset)

[Jakob and Gurevych2010] Niklas Jakob and Iryna Gurevych. 2010. Extracting opinion targets in a single andcross-domain setting with conditional random fields. In Proc. of EMNLP, pages 1035–1045.

[Kannan et al.2011] Anitha Kannan, Inmar E Givoni, Rakesh Agrawal, and Ariel Fuxman. 2011. Matching un-structured product offers to structured product specifications. In Proc. of SIGKDD, pages 404–412.

[Kelly et al.2012] C. Kelly, B. Devereux, and A. Korhonen. 2012. Semi-supervised learning for automatic concep-tual property extraction. In Proc. of CMCL, pages 11–20.

[Kim et al.2012] D. S. Kim, K. Verma, and P. Z. Yeh. 2012. Building a lightweight semantic model for unsuper-vised information extraction on short listings. In Proc. of EMLNP, pages 1081–1092.

[Li et al.2010] Fangtao Li, Chao Han, Minlie Huang, Xiaoyan Zhu, Ying-Ju Xia, Shu Zhang, and Hao Yu. 2010.Structure-aware review mining and summarization. In Proc. of ACL, pages 653–661.

[Liu et al.2013] Kang Liu, Liheng Xu, and Jun Zhao. 2013. Syntactic patterns versus word alignment: Extractingopinion targets from online reviews. In Proc. of ACL, pages 1754–1763.

[McAuley and Leskovec2013] J. McAuley and J. Leskovec. 2013. Hidden factors and hidden topics: Understand-ing rating dimensions with review text. In Proc. of RecSys, pages 165–172.

[Pang and Lee2008] Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf.Retr., 2(1-2):1–135.

[Popescu and Etzioni2005] Ana-Maria Popescu and Oren Etzioni. 2005. Extracting product features and opinionsfrom reviews. In Proc. of HLT-EMNLP, pages 339–346.

What’s next

Injecting supervision

Several places…, clustering, pattern scoring, and typing probably the most important ones

Dynamic cut-off thresholds

Use test sets to adjust corpus size and thresholds

Aspects not in NPs

Named entities, relations, other grammatical forms

e.g., living room with sash windows

Automatically determine the domain

Map the NP cluster heads to an existing KB (e.g., BabelNet) and use their graph for scoping

That’s all Folks!

References

[Shin et al.2015] Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Re .́ 2015. Incremental knowledge base construction using deepdive. PVLDB, 8(11):1310–1321.

[Raju et al.2009] S. Raju, P. Pingali, and V. Varma. 2009. An unsupervised approach to product attribute extraction. In Proc. of ECIR, pages 796–800.

[Ramshaw and Mitchell1999] L. A. Ramshaw and M. P. Mitchell. 1999. Text chunking using transformation-based learning. In Armstrong S. et Al, editor, Natural Language Processing Using Very Large Corpora, volume 11 of Text, Speech and Language Technology, pages 157–176.

[Kim et al.2012] D. S. Kim, K. Verma, and P. Z. Yeh. 2012. Building a lightweight semantic model for unsupervised information extraction on short listings. In Proc. of EMLNP, pages 1081–1092.

[Zhang and Liu2014] Lei Zhang and Bing Liu, 2014. Aspect and Entity Extraction for Opinion Mining, pages 1–40. Springer Berlin Heidelberg.

Date post:	18-Jan-2017
Category:	Technology
Upload:	giorgio-orsi
View:	161 times
Download:	0 times