+ All Categories
Home > Documents > Towards(Text(Mining(in(Climate(Science...

Towards(Text(Mining(in(Climate(Science...

Date post: 13-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
30
TowardsText Mining in Climate Science: Extraction of Quantitative Variables and their Relations Erwin Marsi, Pinar Öztürk, Elias Aamot Gleb Sizov, MuratV. Ardelan
Transcript
Page 1: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Towards  Text  Mining  in  Climate  Science:  Extraction  of  Quantitative  Variables    

and  their  Relations  Erwin  Marsi,  Pinar  Öztürk,  Elias  Aamot  

Gleb  Sizov,  Murat  V.  Ardelan  

Page 2: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Outline  

1.  Context:  text  mining  in  climate  science  2.  Annotation  scheme  3.  Rule  extraction  4.  Discussion  5.  Future  work  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   2  

Page 3: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Context  

•  OceanCertain  project:      What  is  the    impact  of    of  climatic  and  non-­‐  climatic  stressors  on  oceanic  processes?    

•  Cross-­‐disciplinary:  marine  science,    climate  science,  environmental  science,  social  science,…  

•  Strong  focus  on  consilience  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   3  

Page 4: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Problems  with  (climate)  literature  

Our  contribution:    develop  text  mining  tools  to  support    

marine  scientists  in  knowledge  discovery  BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   4  

•  Vast  amount  of  literature    and  growing  

•  Increased  specialisation  •  Isolated  research  communities    and  literatures  

•  Different  conventions  and  terminology  

Page 5: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

“Climate  change”  domain  

•  Many  different  processes  interact  in  complex  ways    •  Chains  of  causal  relations  and  correlations  •  Positive/negative  feedback  loops  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   5  

Page 6: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Feedback  example  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   6  

From:  National  Research  Council  (2011),    “Climate  Change:  Evidence,  Impacts  and  Choices”.  

Page 7: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Goal:  Discovering  Feedbacks    in  Scientific  Literature  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   7  

Page 8: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Plan  

1.  Design  annotation  scheme  to  capture  events  of  change,  cause,  correlation  and  feedback  

2.  Manually  annotate  text  corpus  3.  Develop  tools  for  automatic  annotation    4.  Automatically  retrieve  and  annotate  scientific  

publications  5.  Extract  rules  6.  Reasoning  combined  with  domain  knowledge  to  

discover  feedback  loops  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   8  

Page 9: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Outline  

1.  Context:  text  mining  in  climate  science  2.  Annotation  scheme  3.  Rule  extraction  4.  Discussion  5.  Future  work  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   9  

Page 10: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Annotation  Development  

•  Material:  12  abstracts  (2369  words)  from  articles  about  climate  and  ocean  change  

•  Iterative  development  in  collaboration  with  our  domain  expert  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   10  

Page 11: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Annotation  Scheme  

Intension:    capture  events  of  change,  cause,  correlation  and  feedback    Simple  domain  ontology:  1.   Entities:  Variable  2.   Events:  Change,  Increase,  Decrease,  Cause,  

Correlate,  Feedback  3.   Structure:  And,  Or,  Negation,  Referring  expressions  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   11  

Page 12: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Variable  

•  Quantitative  variable  :=      entity  that  can  be  counted  or  measured  

•  Naturally  expressed  by  a  number  (count,  scalar,  percentage,  ratio,…)      

•  Potential  variable  in  experiment/model  •  Must  be  involved  in  a  change  event  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   12  

Page 13: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Examples  of  Variable  

ü Significant  changes  in  [surface  ocean  pH]  ü Rise  in  [atmospheric  CO2  levels]    ✗  [carbon  dioxide]  and  [light]  are  two  major  pre-­‐  

   requisites  of  photosynthesis  ✗  changes  in  [the  network  of  global  biogeochemical    

   cycles]  ✗  The  concentrations  of  [DFe]  and  [TaLFe]  were    

   relatively  high  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   13  

Page 14: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Change/Increase/Decrease  

•  Change  :=    event  in  which  the  value  of  a  variable  is  changing    

•  Change:  direction  of  change  unspecified  •  Increase:  change  in  positive  direction  •  Decrease:  change  in  negative  direction  

•  Must  have  clear  textual  trigger  •  Variable  gets  thematic  role  of  Theme    

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   14  

Page 15: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Examples  of  Change  

ü [Change  regional  changes  in]  phytoplankton  ü [Increase  addition  of]  labile  dissolved  organic  carbon    ü [Decrease  to  slow  down]  calcification  in  corals  

« marine  primary  production  is  sensitive  to  climate  [Change  variability  and  change]  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   15  

Page 16: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Cause  

•  Cause  :=    event  where  a  change  causes  another  change  

•  Agent  and  Theme  roles  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   16  

Page 17: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Examples  of  Cause  

•  [Agent  rise  in  atmospheric  CO2  levels]  [Cause  causes]  [Theme  significant  changes  in  surface  ocean  pH]    

•  [Agent  diminished  calcification]  [Cause  led  to]    [Theme  a  reduction  in  the  ratio  of  calcite  precipitation  to  organic  matter  production]  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   17  

Page 18: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Change  &  Cause  Conflated  

•  If  there  is  both  a  Change  and  a  Cause  event,    only  Change  is  annotated  

•  Causal  relation  is  inferred  from  Agent  role  

 [Agent  addition  of  labile  dissolved  organic  carbon]    [Decrease  reduced]    [Theme  phytoplankton  biomass]  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   18  

Page 19: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Correlation  

•  Correlation  :=    event  where  two  changes  go  together  

•  Theme  and  Co-­‐theme  roles  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   19  

Page 20: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Examples  of  Correlation  

•  [Theme  reduced  calcite  production]    [Correlate  was  accompanied  by]    [Co-­‐theme  an  increased  proportion  of  malformed  coccoliths]  

•  [Theme  carbon:nutrient  ratio  turns  out  to  decrease]  [Correlate  with]    [Co-­‐theme  increasing  temperature]  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   20  

Page 21: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Outline  

1.  Context:  text  mining  in  climate  science  2.  Annotation  scheme  3.  Rule  extraction  4.  Discussion  5.  Future  work  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   21  

Page 22: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Rule  Extraction  

From  annotation,  we  can  derive  rules  about  relations  between  variables:  

1.  Causal  rules  2.  Correlation  rules  3.  Feedback  rules  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   22  

Page 23: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Causal  Rules  

7-2-14 20:50 brat

Page 1 of 1http://127.0.0.1:8001/index.xhtml#/ocwp1-pilot-annotations/abstracts/Gao12

Carbon dioxide and light are two major prerequisites of photosynthesis.

Rising CO2 levels in oceanic surface waters in combination with ample light supply are therefore often considered

stimulatory to marine primary production.

Here we show that the combination of an increase in both CO2 and light exposure negatively impacts

photosynthesis and growth of marine primary producers.When exposed to CO2 concentrations projected for the end of this century, natural phytoplankton assemblages of the

South China Sea responded with decreased primary production and increased light stress at light intensities representative of the upper surface layer.The phytoplankton community shifted away from diatoms, the dominant phytoplankton group during our field campaigns.To examine the underlying mechanisms of the observed responses, we grew diatoms at different CO2 concentrations and under varying levels (5–100%) of solar radiation experienced by the phytoplankton at different depths of the euphotic zone.

Above 22–36% of incident surface irradiance, growth rates in the high-CO2-grown cells were inversely related to

light levels and exhibited reduced thresholds at which light becomes inhibitory.

Future shoaling of upper-mixed-layer depths will expose phytoplankton to increased mean light intensities.

In combination with rising CO2 levels, this may cause a widespread decline in marine primary production and a community shift away from diatoms, the main algal group that supports higher trophic levels and carbon export in the ocean.

Increase Variable And Variable CausePartTheme Theme

AgentPart2

Increase VariableThemeTheme

Increase Variable And Variable DecreasePart Part2

Theme ThemeAgent

Variable And Increase VariablePart2Part Theme

Theme

Decrease Variable And Increase VariablePart2 ThemeTheme

Part

Variable Correlate * Co-themeTheme

Variable Decrease VariableCo-theme Theme

Increase VariableTheme

Coref

And Increase Variable RefExp Cause Decrease VariableThemePart ThemeTheme

Part2

AgentCoref

1

2

3

4

56

7

8

9

brat/ocwp1-pilot-annotations/abstracts/Gao12

[ " mean light intensities ^ " CO2 levels ] =) # marine primary production

Figure 1: Example of a causal rule extracted from a pair of annotated sentences

7-2-14 21:57 brat

Page 1 of 1http://127.0.0.1:8001/index.xhtml#/ocwp1-pilot-annotations/abstracts/Otma06

Biological activity gives rise to a difference in carbon concentration between the ocean surface and the deep waters.This difference is determined by the carbon:nutrient ratio of the sinking organic material and it is crucial in determining the distribution of CO2 between the atmosphere and the ocean.For this reason, it is interesting to determine whether the physical environment affects the carbon:nitrogen ratio of phytoplankton.Using a model with a novel representation of the effect of temperature on phytoplankton stoichiometry, we have investigated the influence of mixed-layer depth and water temperature on the elemental composition of an algal community.

In the light-limited regime, the carbon:nutrient ratio turns out to decrease with increasing

mixed-layer depth and temperature.

Hence our model suggests the existence of a positive feedback between temperature and atmospheric CO2 content through the stoichiometry of phytoplankton.This feedback may have contributed to the glacial/interglacial cycles in the atmospheric CO2 concentration.

Change VariableTheme

Variable Decrease Correlate IncreaseTheme Co-themeTheme

Theme

Variable And VariablePart2Part

Theme

Feedback Variable VariableTheme

Co-theme

12

3

4

5

6

7

brat/ocwp1-pilot-annotations/abstracts/Otma06

[ " mixed-layer depth ^ " temperature ] # the carbon:nutrient ratio

Figure 2: Example of a correlation rule extracted from an annotated sentence

we are also considering a type of evaluation in which ex-tracted rules and their corresponding source texts are shownto domain experts, who are then asked to judge if the ruleis entailed by the text.Manual annotation is costly. There are at least two strate-gies which may reduce annotation time and costs. The firstone is to bootstrap from existing extraction systems. Re-cent advances in open information extraction, where thereis no predefined set of entities and relations, have resultedin open source systems like ReVerb (Fader et al., 2011)and it successor OpenIE. Banko and Etzioni (2008) claimthat when the number of target relations is small, and theirnames are known in advance, an open IE system is able tomatch the precision of a traditional supervised extractionsystem, though at substantially lower recall. This suggeststhat at least a part of the annotation can be accelerated withthe help of such tools.A second strategy to reduce annotation costs involves theuse of active learning, which is a training method for su-pervised learners that tries to obtain maximal performancegain with minimal annotation effort (Olsson, 2009). It isan iterative procedure, starting with a small amount of la-beled data and a large amount of unlabelled data. In eachiteration, a classifier is trained on the labeled data and sub-sequently applied to the unlabelled data. Only the mostinformative instances – e.g., those for which classificationconfidence is lowest – are passed on to a human anno-

tator for manual annotation. These manually labeled in-stances are added to the training data and the procedure isrepeated. Good results have been reported with the use ofactive learning, e.g. by (Gamback et al., 2011).

The extracted rules expressing relations of correlation,causality or feedback between quantitative variables are in-tended to be used in knowledge discovery support systems.One use case is to search for other variables directly re-lated to a certain variable of interest. For example, findall processes that affect or are affected by a rise in atmo-spheric CO2 level. The variable in question may be ex-pressed in many different ways though, for example, asCO2, atmospheric CO2, CO2 concentrations or CO2 par-tial pressures, but not as CO2 levels in oceanic surfacewaters or the distribution of CO2 between the atmosphereand the ocean. Simple string matching between the vari-ables in queries to those in rules will given limited recalland precision. Related to this is the issue of differencesin terminology across research fields. For instance, exportproduction and biological pump are different terms, usedby chemists and biologists respectively, for the same pro-cess of carbon cycling in the oceans. One possible strategyto cope with this issue is to have a more fine-grained cate-gorisation of entities, allowing different surface realisationsto be mapped to the same underlying domain concept. Thiswould allow more general rules to be extracted, and couldalso be beneficial in helping to bootstrap lexical resources.

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   23  

Page 24: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Correlation  Rule  

7-2-14 20:50 brat

Page 1 of 1http://127.0.0.1:8001/index.xhtml#/ocwp1-pilot-annotations/abstracts/Gao12

Carbon dioxide and light are two major prerequisites of photosynthesis.

Rising CO2 levels in oceanic surface waters in combination with ample light supply are therefore often considered

stimulatory to marine primary production.

Here we show that the combination of an increase in both CO2 and light exposure negatively impacts

photosynthesis and growth of marine primary producers.When exposed to CO2 concentrations projected for the end of this century, natural phytoplankton assemblages of the

South China Sea responded with decreased primary production and increased light stress at light intensities representative of the upper surface layer.The phytoplankton community shifted away from diatoms, the dominant phytoplankton group during our field campaigns.To examine the underlying mechanisms of the observed responses, we grew diatoms at different CO2 concentrations and under varying levels (5–100%) of solar radiation experienced by the phytoplankton at different depths of the euphotic zone.

Above 22–36% of incident surface irradiance, growth rates in the high-CO2-grown cells were inversely related to

light levels and exhibited reduced thresholds at which light becomes inhibitory.

Future shoaling of upper-mixed-layer depths will expose phytoplankton to increased mean light intensities.

In combination with rising CO2 levels, this may cause a widespread decline in marine primary production and a community shift away from diatoms, the main algal group that supports higher trophic levels and carbon export in the ocean.

Increase Variable And Variable CausePartTheme Theme

AgentPart2

Increase VariableThemeTheme

Increase Variable And Variable DecreasePart Part2

Theme ThemeAgent

Variable And Increase VariablePart2Part Theme

Theme

Decrease Variable And Increase VariablePart2 ThemeTheme

Part

Variable Correlate * Co-themeTheme

Variable Decrease VariableCo-theme Theme

Increase VariableTheme

Coref

And Increase Variable RefExp Cause Decrease VariableThemePart ThemeTheme

Part2

AgentCoref

1

2

3

4

56

7

8

9

brat/ocwp1-pilot-annotations/abstracts/Gao12

[ " mean light intensities ^ " CO2 levels ] =) # marine primary production

Figure 1: Example of a causal rule extracted from a pair of annotated sentences

7-2-14 21:57 brat

Page 1 of 1http://127.0.0.1:8001/index.xhtml#/ocwp1-pilot-annotations/abstracts/Otma06

Biological activity gives rise to a difference in carbon concentration between the ocean surface and the deep waters.This difference is determined by the carbon:nutrient ratio of the sinking organic material and it is crucial in determining the distribution of CO2 between the atmosphere and the ocean.For this reason, it is interesting to determine whether the physical environment affects the carbon:nitrogen ratio of phytoplankton.Using a model with a novel representation of the effect of temperature on phytoplankton stoichiometry, we have investigated the influence of mixed-layer depth and water temperature on the elemental composition of an algal community.

In the light-limited regime, the carbon:nutrient ratio turns out to decrease with increasing

mixed-layer depth and temperature.

Hence our model suggests the existence of a positive feedback between temperature and atmospheric CO2 content through the stoichiometry of phytoplankton.This feedback may have contributed to the glacial/interglacial cycles in the atmospheric CO2 concentration.

Change VariableTheme

Variable Decrease Correlate IncreaseTheme Co-themeTheme

Theme

Variable And VariablePart2Part

Theme

Feedback Variable VariableTheme

Co-theme

12

3

4

5

6

7

brat/ocwp1-pilot-annotations/abstracts/Otma06

[ " mixed-layer depth ^ " temperature ] # the carbon:nutrient ratio

Figure 2: Example of a correlation rule extracted from an annotated sentence

we are also considering a type of evaluation in which ex-tracted rules and their corresponding source texts are shownto domain experts, who are then asked to judge if the ruleis entailed by the text.Manual annotation is costly. There are at least two strate-gies which may reduce annotation time and costs. The firstone is to bootstrap from existing extraction systems. Re-cent advances in open information extraction, where thereis no predefined set of entities and relations, have resultedin open source systems like ReVerb (Fader et al., 2011)and it successor OpenIE. Banko and Etzioni (2008) claimthat when the number of target relations is small, and theirnames are known in advance, an open IE system is able tomatch the precision of a traditional supervised extractionsystem, though at substantially lower recall. This suggeststhat at least a part of the annotation can be accelerated withthe help of such tools.A second strategy to reduce annotation costs involves theuse of active learning, which is a training method for su-pervised learners that tries to obtain maximal performancegain with minimal annotation effort (Olsson, 2009). It isan iterative procedure, starting with a small amount of la-beled data and a large amount of unlabelled data. In eachiteration, a classifier is trained on the labeled data and sub-sequently applied to the unlabelled data. Only the mostinformative instances – e.g., those for which classificationconfidence is lowest – are passed on to a human anno-

tator for manual annotation. These manually labeled in-stances are added to the training data and the procedure isrepeated. Good results have been reported with the use ofactive learning, e.g. by (Gamback et al., 2011).

The extracted rules expressing relations of correlation,causality or feedback between quantitative variables are in-tended to be used in knowledge discovery support systems.One use case is to search for other variables directly re-lated to a certain variable of interest. For example, findall processes that affect or are affected by a rise in atmo-spheric CO2 level. The variable in question may be ex-pressed in many different ways though, for example, asCO2, atmospheric CO2, CO2 concentrations or CO2 par-tial pressures, but not as CO2 levels in oceanic surfacewaters or the distribution of CO2 between the atmosphereand the ocean. Simple string matching between the vari-ables in queries to those in rules will given limited recalland precision. Related to this is the issue of differencesin terminology across research fields. For instance, exportproduction and biological pump are different terms, usedby chemists and biologists respectively, for the same pro-cess of carbon cycling in the oceans. One possible strategyto cope with this issue is to have a more fine-grained cate-gorisation of entities, allowing different surface realisationsto be mapped to the same underlying domain concept. Thiswould allow more general rules to be extracted, and couldalso be beneficial in helping to bootstrap lexical resources.

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   24  

Page 25: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Feedback  Rule  

7-2-14 21:57 brat

Page 1 of 1http://127.0.0.1:8001/index.xhtml#/ocwp1-pilot-annotations/abstracts/Otma06

Biological activity gives rise to a difference in carbon concentration between the ocean surface and the deep waters.This difference is determined by the carbon:nutrient ratio of the sinking organic material and it is crucial in determining the distribution of CO2 between the atmosphere and the ocean.For this reason, it is interesting to determine whether the physical environment affects the carbon:nitrogen ratio of phytoplankton.Using a model with a novel representation of the effect of temperature on phytoplankton stoichiometry, we have investigated the influence of mixed-layer depth and water temperature on the elemental composition of an algal community.

In the light-limited regime, the carbon:nutrient ratio turns out to decrease with increasing

mixed-layer depth and temperature.

Hence our model suggests the existence of a positive feedback between temperature and atmospheric CO2 content through the stoichiometry of phytoplankton.This feedback may have contributed to the glacial/interglacial cycles in the atmospheric CO2 concentration.

Change VariableTheme

Variable Decrease Correlate IncreaseTheme Co-themeTheme

Theme

Variable And VariablePart2Part

Theme

Feedback Variable VariableTheme

Co-theme

12

3

4

5

6

7

brat/ocwp1-pilot-annotations/abstracts/Otma06

l temperature ()+ l marine primary production

Figure 3: Example of a feedback rule extracted from an annotated sentence

Ultimately all relevant entities may be normalised by link-ing them to a unique concept in a domain ontology (Badaet al., 2012). However, whereas the concepts of interest inbiomedicine are relatively well understood – including suchentities as cells, proteins and genes – and covered by widelyused ontologies, such common ground currently seems tolack in climate, marine and environmental science science.A different but related problem is exemplified in correla-tion rule (15-b) extracted from the second part of sentence(15-a).

(15) a. Concentrations of DFe increased slightlywith depth in the water column, while that ofTaLFe did not show any consistent trend withdepth.

b. ¬ [ l depth l that of TaLFe ]

The problem is that depth in (15-b) is too general andshould in fact be linked to depth in the water column forproper interpretation. Likewise, that of TaLFe should beinterpreted as concentrations of TalFE. This illustrates theneed for coreference resolution and more general, linkingof subsequent mentions of the same entity in the text, a no-toriously hard task in NLP.Apart from search, another use case for extracted rules is togenerate potential hypotheses about indirect relations be-tween variables or feedback loops among them. This canbe accomplished by chaining together two or more rules,matching the change event on the right-hand-side of onerule to a similar change event on the left-hand-side of an-other rule. Matching gives rise to the same problems dis-cussed above, i.e., different ways of referring to the sameentity. In addition, there is the issue of context-dependency.Most rules are not universally applicable, but only applyunder certain conditions in a particular context. For ex-ample, a rule may be limited in scope to certain biologi-cal species or organisms, a particular geographical regionor historical time period, subject to a given assumption(only if . . . ), etc. This is related to initiatives for anno-tating meta-knowledge such as confidence level (fact vs.conjecture), source (resulting from observation vs. analy-sis) or origin (present or cited work) as in (Thompson et al.,2011). Proper modelling of rule context would require arather deep understanding of the whole text. Although weacknowledge the importance of conditions on events, we in-tend to leave their annotation to a later stage. For now, weplan to leave this to the user by offering facilities in the userinterface to quickly inspect the source text for each rule.

Inference with rules may be further enhanced by exploitingdomain knowledge. For example, given an ontology whichcontains the fact that diatoms are a kind of phytoplankton,rules containing either of the terms may be generalised bysubstituting the hypernym or specialised by substituting thehyponym. In a similar vein, rules can be generalised byremoving specifiers, modifiers or parts of a conjunction.Whether or not this constitutes valid inference seems con-nected to recent developments in textual entailment, in par-ticular work on natural logic (MacCartney and Manning,2008).

5. ConclusionAn annotation scheme was proposed to capture events ofchange, cause, correlation and feedback, as well as the en-tities involved in them, in the cross-disciplinary fields ofclimate science, marine science and environmental science.It was shown that rules about the relation between changingprocesses can be automatically extracted from annotatedtext. Follow-up work will involve annotating more text, aswell as measuring inter-annotator agreement and rule ade-quacy. Simultaneously, tools for automatic annotation willbe developed. Future work will also address normalisationof entities, tracking of entity mentions, modelling of rulecontext and combination with domain knowledge.

6. AcknowledgementsFinancial aid from the European Commission (OCEAN-CERTAIN, FP7-ENV-2013-6.1-1; no: 603773) is grate-fully acknowledged. We thank the reviewers for their valu-able comments.

7. ReferencesMichael Bada, Miriam Eckert, Donald Evans, Kristin Gar-

cia, Krista Shipley, Dmitry Sitnikov, William A. Baum-gartner, K. Bretonnel Cohen, Karin Verspoor, Judith A.Blake, and Lawrence E. Hunter. 2012. Concept an-notation in the CRAFT corpus. BMC bioinformatics,13(1):161+, July.

Michele Banko and Oren Etzioni. 2008. The tradeoffs be-tween open and traditional relation extraction. In Pro-ceedings of ACL-08: HLT, pages 28–36, Columbus,Ohio, June. Association for Computational Linguistics.

Oren Etzioni. 2011. Search needs a shake-up. Nature,476(7358):25–26, August.

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   25  

Page 26: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Outline  

1.  Context:  text  mining  in  climate  science  2.  Annotation  scheme  3.  Rule  extraction  4.  Discussion  5.  Future  work  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   26  

Page 27: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Discussion:  Annotation  problems  

•  Changing  variables  expressed  implicitly  by  nouns?      ocean  acidification  =>    ocean  becomes  more  acid  =>  decrease  of  pH  of  ocean  water    

•  Change  &  dimension  expressed  by  adjective?    Coccospheres  were  larger  =>  Coccospheres  increased  in  size  

•  Distinction  between  Cause  and  Correlation  sometimes  hard    

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   27  

Page 28: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Discussion:  Entities    

•  Different  subclasses  of  entities  are  needed  –  Can  we  use  existing  domain  taxonomies/ontologies?    

•  How  to  interpret  under-­‐specified  variables  in  context?  –  “decrease  in  growth  rate”  

•  How  to  reduce  over-­‐specified  variables?  –  “decrease  in  net  primary  production  in  the  coccolithophore  species  

Emiliania  Huxleyi”    

•  Variables  are  only  annotated  in  relation  to  Change  –  Impact  on  learning  entity  detection?  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   28  

Page 29: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Ongoing  /  Future  Work  

•  Annotate    full-­‐text  articles  •  Evaluate  correctness  of  derived  rules    •  Evaluate  inter-­‐annotator  agreement  •  Develop  tools  for  (semi-­‐)  automatic  annotation  •  Refinement  of  entity  ontology  •  Reasoning,  domain  knowledge,  UI,…  

BioTxtM  2014:  31-­‐5-­‐14   Marsi  et  al  -­‐  Towards  Text  Mining  in  Climate  Science   29  

Page 30: Towards(Text(Mining(in(Climate(Science ...nactem.ac.uk/biotxtm2014/presentations/Marsi_pres.pdfTowards(Text(Mining(in(Climate(Science:(ExtractionofQuantitativeVariables andtheirRelations!

Towards  Text  Mining  in  Climate  Science:  Extraction  of  Quantitative  Variables    

and  their  Relations  Erwin  Marsi,  Pinar  Öztürk,  Elias  Aamot  

Gleb  Sizov,  Murat  V.  Ardelan  


Recommended