+ All Categories
Home > Documents > Search Pubmed With R Part3

Search Pubmed With R Part3

Date post: 28-Dec-2015
Category:
Upload: cpmarqui
View: 16 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Search Pubmed with R Search Pubmed with R Part3 Part3
Transcript
Page 1: Search Pubmed With R Part3

Search Pubmed with RSearch Pubmed with R

Part3Part3

Page 2: Search Pubmed With R Part3

Query pubmed titles for systemic lupus Query pubmed titles for systemic lupus erythematosus with R Package RISmederythematosus with R Package RISmed11

#Type the following in the R console:#Type the following in the R console:library(RISmed)library(RISmed) lupus<- EUtilsSummary('lupus[Ti] lupus<- EUtilsSummary('lupus[Ti] erythematosus[ti] systemic[Ti]', retmax=200)erythematosus[ti] systemic[Ti]', retmax=200)

# retmax refer to Maximum number of records to retrieve, default is 1000.# retmax refer to Maximum number of records to retrieve, default is 1000.

fetch.lupus <- EUtilsGet(lupus)fetch.lupus <- EUtilsGet(lupus)fetch.lupusfetch.lupus

# Results: PubMed query: lupus[Ti] AND erythematosus[ti] AND systemic[Ti] Records: 200 # Results: PubMed query: lupus[Ti] AND erythematosus[ti] AND systemic[Ti] Records: 200

lupus.tit<-ArticleTitle(fetch.lupus)lupus.tit<-ArticleTitle(fetch.lupus)lupus.tit [1:10] # to view the first 10 results of lupus.tit [1:10] # to view the first 10 results of titlestitles

# export results to text file# export results to text file

write(lupus.tit,file="lupusRISmedTi.txt")write(lupus.tit,file="lupusRISmedTi.txt")ReferencesReferences1- RISmed package: Stephanie Kovalchik (2013). RISmed: Download content from NCBI databases. R package version 2.1.0. 1- RISmed package: Stephanie Kovalchik (2013). RISmed: Download content from NCBI databases. R package version 2.1.0.

http://CRAN.R-project.org/package=RISmed

Page 3: Search Pubmed With R Part3

Query pubmed titles for Query pubmed titles for systemic lupus erythematosus systemic lupus erythematosus

using RISmedusing RISmed

Page 4: Search Pubmed With R Part3

View results of the exported text View results of the exported text filefile

Export results to text file with R command line Export results to text file with R command line write(lupus.tit,file="lupusRISmedTi.txt")write(lupus.tit,file="lupusRISmedTi.txt") # export title results as text file and open file in excel or any other valid # export title results as text file and open file in excel or any other valid

text editortext editor

Page 5: Search Pubmed With R Part3

Find the Title Verb Relation Find the Title Verb Relation with Reverbwith Reverb

REVERB1 is an open extractor executable jarexecutable jar program developed by the University of Washington's Turing Center.

• It is important to note that Reverb is dependent on JAVA, therefore it is not a R program.

• Reverb is powerful and provides useful information about structure relation of a text. It is relative easy to use and runs very fast.

• In our case we will apply Reverb to to our text title results.

Reference:@inproceedings{ReVerb2011, author = {Anthony Fader and Stephen Soderland and Oren Etzioni}, title = {Identifying Relations for Open Information Extraction}, booktitle = {Proceedings of the Conference of Empirical Methods in Natural

Language Processing ({EMNLP} '11)}, year = {2011}, month = {July 27-31}, address = {Edinburgh, Scotland, UK} }

Page 6: Search Pubmed With R Part3

Install ReverbInstall ReverbYou can download the latest ReVerb jar from You can download the latest ReVerb jar from http://reverb.cs.washington.edu/reverb-latest.jar

This is the executable jar file is easy to run from MS-DOS This is the executable jar file is easy to run from MS-DOS command. command.

In In https://github.com/knowitall/reverb/ you can find how to use you can find how to use Reverb. It provides the following example which illustrates what Reverb. It provides the following example which illustrates what it does:it does:

““ReVerb takes raw text as input, and outputs (argument1, relation ReVerb takes raw text as input, and outputs (argument1, relation phrase, argument2) triples. For example, given the sentence phrase, argument2) triples. For example, given the sentence "Bananas are an excellent source of potassium," ReVerb will "Bananas are an excellent source of potassium," ReVerb will extract the triple (bananas, be source of, potassium).”extract the triple (bananas, be source of, potassium).”

In order to run Reverb you need to have Java installed on your In order to run Reverb you need to have Java installed on your computer. You can install Java from computer. You can install Java from https://www.java.com/en/download/

Reference:@inproceedings{ReVerb2011, author = {Anthony Fader and Stephen Soderland and Oren Etzioni}, title = {Identifying Relations for Open Information Extraction}, booktitle = {Proceedings of the Conference of Empirical Methods in

Natural Language Processing ({EMNLP} '11)}, year = {2011}, month = {July 27-31}, address = {Edinburgh, Scotland, UK} }

Page 7: Search Pubmed With R Part3

Use of ReverbUse of Reverb

Place Place reverb-latest.jar file and the result file “lupusRISmedTi.txt” under the same folderlupusRISmedTi.txt” under the same folder

Figure shows example of the 2 files in the same folder (which we named Reverb-Java)

Page 8: Search Pubmed With R Part3

Use of ReverbUse of Reverb

1-Open the MS-DOS cmd and type the 1-Open the MS-DOS cmd and type the path of the folder (Reverb-Java in our path of the folder (Reverb-Java in our example) containing both files: example) containing both files: reverb-latest.jar file and lupusRISmedTi.txtlupusRISmedTi.txt

Page 9: Search Pubmed With R Part3

Use ReverbUse Reverb2- 2- Type the following cmd line to view results on the console:

java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txtlupusRISmedTi.txt

Results are displayed on the MS-DOS windowResults are displayed on the MS-DOS window

Page 10: Search Pubmed With R Part3

Use of Reverb- export the results to xls Use of Reverb- export the results to xls

filefile 3- 3- Type the following cmd line to export results to a file ::

java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txt > lupusRISmedTi.txt > ReverbLupusRISmedTi.txtReverbLupusRISmedTi.txt

(the name given to the file was ReverbLupusRISmedTi.txt. You ReverbLupusRISmedTi.txt. You can use other name or even export to a xls file if you type can use other name or even export to a xls file if you type ReverbLupusRISmedTi.xlsReverbLupusRISmedTi.xls

Page 11: Search Pubmed With R Part3

Open the Reverb result file Open the Reverb result file ReverbLupusRISmedTi.txtReverbLupusRISmedTi.txt with MS excel with MS excel

Page 12: Search Pubmed With R Part3

Reverb outputReverb outputThe Reverb output has 18 The Reverb output has 18 columnscolumns

(see results in the excel file)(see results in the excel file)The most interesting are:The most interesting are: Col 3 (Col C) : Argument1 Col 3 (Col C) : Argument1 Col 4 (Col D): Verb Relation phraseCol 4 (Col D): Verb Relation phrase Col 5 (Col E): Argument2Col 5 (Col E): Argument2

(Col 12 refer to the confidence that this extraction is correct and col (Col 12 refer to the confidence that this extraction is correct and col 2 refer to2 refer to the sentence number where the extraction came from)

Page 13: Search Pubmed With R Part3

Reverb ResultsReverb ResultsResults of the first 5 rows (excel) from columns 3-5Results of the first 5 rows (excel) from columns 3-5

1- childhood-onset systemic lupus erythematosus 1- childhood-onset systemic lupus erythematosus is associated withis associated withethnicityethnicity

2- renal involvement2- renal involvement are lower inare lower in ACE inhibitor-treated patientsACE inhibitor-treated patients

3- Prednisone3- Prednisone inducedinducedtwo-way myocardial developmenttwo-way myocardial development

4- Acetylated histones4- Acetylated histones contribute tocontribute to the immunostimulatory the immunostimulatory potential of Neutrophil Extracellular Trapspotential of Neutrophil Extracellular Traps

5-clinical practice5-clinical practice monitor the impact ofmonitor the impact of systemic lupus systemic lupus erythematosuserythematosus

Note: Note: Blue color refer to argument 1Blue color refer to argument 1; white color is verb relation; ; white color is verb relation; orange color refer to argument 2orange color refer to argument 2

Page 14: Search Pubmed With R Part3

Prepare Reverb ResultsPrepare Reverb Resultsdata for R Wordclouddata for R Wordcloud

# use read.table script (from reference# use read.table script (from reference11 ) as follows: ) as follows:d <- d <-

read.table('ReverbLupusRISmedTi.txt',quote='',coread.table('ReverbLupusRISmedTi.txt',quote='',comment.char='', allowEscapes=F,sep='\t', mment.char='', allowEscapes=F,sep='\t', header=FALSE, as.is=T, stringsAsFactors=F)header=FALSE, as.is=T, stringsAsFactors=F)

# transforms the data into a data frame# transforms the data into a data framee<-as.data.frame(d)e<-as.data.frame(d)# merge columns (3-5) into a single text sentence# merge columns (3-5) into a single text sentencef=paste(e$V3,e$V4,e$V5) f=paste(e$V3,e$V4,e$V5) f[1:3] f[1:3] # view the first 3 lines # view the first 3 lines [1] "childhood-onset systemic lupus erythematosus is associated with [1] "childhood-onset systemic lupus erythematosus is associated with

ethnicity"ethnicity"[2] "renal involvement are lower in ACE inhibitor-treated patients" [2] "renal involvement are lower in ACE inhibitor-treated patients"

[3] "Prednisone induced two-way myocardial development"[3] "Prednisone induced two-way myocardial development" Reference:Reference: 1 Please stop using Excel-like formats to exchange data1 Please stop using Excel-like formats to exchange dataDecember 7th, 2012John MountDecember 7th, 2012John Mount

Page 15: Search Pubmed With R Part3

Represent Reverb ResultsRepresent Reverb Resultsin R Wordcloudin R Wordcloud

library (tm)

my.corpus<-Corpus(VectorSource(f))my.corpus<-Corpus(VectorSource(f))summary(my.corpus)inspect(my.corpus [1:3]) my.corpus <- tm_map(my.corpus, removeWords,

stopwords("english"))#my.corpus <- tm_map(my.corpus, stemDocument)myTdm <- TermDocumentMatrix(my.corpus, control =

list(wordLengths=c(1,Inf)))myTdm

# A term-document matrix (140 terms, 26 documents)# Non-/sparse entries: 163/3477# Sparsity : 96%# Maximal term length: 22 # Weighting : term frequency (tf)

Page 16: Search Pubmed With R Part3

Represent Reverb ResultsRepresent Reverb Resultsin R Wordcloudin R Wordcloud

findFreqTerms(myTdm, lowfreq=2)# [1] "associated" "damage" "distinct" "erythematosus"# [5] "increased" "independently" "lupus" "systemic"

termFrequency <- rowSums(as.matrix(myTdm))termFrequency <- subset(termFrequency, termFrequency>=10)m <- as.matrix(myTdm) wordFreq <- sort(rowSums(m), decreasing=TRUE) # This yields

Word Frequency library (wordcloud)#library (RColorBrewer)set.seed(375) pal1 <- brewer.pal(6,"Dark2")wordcloud(words=names(wordFreq), freq=wordFreq,

scale=c(2,.9),min.freq=1, random.order=F, colors= pal1)

Page 17: Search Pubmed With R Part3

R Wordcloud of Reverb ResultsR Wordcloud of Reverb Results


Recommended