SwCS: Section-Wise Content Similarity Approach to Exploit
Scientific Big DataArticle
SwCS: Section-Wise Content Similarity Approach to Exploit
Scientific Big Data
Kashif Irshad1, Muhammad Tanvir Afzal2, Sanam Shahla Rizvi3, Abdul
Shahid4, Rabia Riaz5 and Tae-Sun Chung6,*
1Department of Computer Science, Capital University of Science and
Technology, Islamabad, Pakistan 2Department of Computer Science,
NAMAL Institute, Mianwali, 42250, Pakistan
3Raptor Interactive (Pty) Ltd., Eco Boulevard, Witch Hazel Ave,
Centurion, 0157, South Africa 4Institute of Computing, Kohat
University of Science and Technology, Pakistan
5Department of CS&IT, University of Azad Jammu and Kashmir,
Muzaffarabad, 13100, Pakistan 6Department of Artificial
Intelligence, Ajou University, Korea
*Corresponding Author: Tae-Sun Chung. Email:
[email protected]
Received: 02 September 2020; Accepted: 28 October 2020
Abstract: The growing collection of scientific data in various web
reposi- tories is referred to as Scientific Big Data, as it
fulfills the four “V’s” of Big Data—volume, variety, velocity, and
veracity. This phenomenon has cre- ated new opportunities for
startups; for instance, the extraction of pertinent research papers
from enormous knowledge repositories using certain innova-
tivemethods has become an important task for researchers and
entrepreneurs. Traditionally, the content of the papers are
compared to list the relevant papers from a repository. The
conventional method results in a long list of papers that is often
impossible to interpret productively. Therefore, the need for a
novel approach that intelligently utilizes the available data is
imminent. Moreover, the primary element of the scientific knowledge
base is a research article, which consists of various logical
sections such as the Abstract, Intro- duction, Related Work,
Methodology, Results, and Conclusion. Thus, this study utilizes
these logical sections of research articles, because they hold
significant potential in finding relevant papers. In this study,
comprehensive experiments were performed to determine the role of
the logical sections-based terms indexing method in improving the
quality of results (i.e., retrieving rele- vant papers). Therefore,
we proposed, implemented, and evaluated the logical sections-based
content comparisons method to address the research objec- tive with
a standard method of indexing terms. The section-based approach
outperformed the standard content-based approach in identifying
relevant documents from all classified topics of computer science.
Overall, the pro- posed approach extracted 14% more relevant
results from the entire dataset. As the experimental results
suggested that employing a finer content similarity
This work is licensed under a Creative Commons Attribution 4.0
International License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original
work is properly cited.
878 CMC, 2021, vol.67, no.1
technique improved the quality of results, the proposed approach
has led the foundation of knowledge-based startups.
Keywords: Scientific big data; ACM classification; term indexing;
content similarity; cosine similarity
1 Introduction
The myriad of scientific research publication over the web has been
increasing over the past several years [1]. This knowledge base is
generated by numerous researchers worldwide, and the scientific
documents are published in different journals, conferences,
workshops, etc. As this scientific data can be described in terms
of the “four V’s” of Big Data: volume (huge reposi- tories are
available), variety (different venues require their own format),
velocity (increasing with rapid pace), and veracity (content
extraction from PDF versions may be noisy). This expanding
knowledge base can be referred to as Scientific Big Data. According
to a recent analysis of Scientific Big Data, more than 50 million
journal papers and billions of conference papers have been
published; in addition, 1.3 billion books have been digitized by
Google [2]. These documents are then indexed in various digital
repositories such as Web of Science, SCOPUS, and PubMed. As of
October 2017, PubMed contained 27.5 million records, representing
approximately 7000 journals [3]. Further, SCOPUS indexes 75 million
records [4] and Google Scholar indexes 389 million documents [5].
Thus, identifying pertinent research papers from such huge
repositories is an immense challenge. Generally, thousands of
papers are returned from these systems for a user query.
This significant amount of data on different web repositories
hinders the process of retrieving relevant information in a
concrete manner. Millions of generic hits and irrelevant documents
are returned by contemporary indexers, posing a challenging task
for researchers. Consequently, the problem has grabbed the
attention of scholarly communities, and researchers are in the
process of developing effective solutions. Subsequently, the
solutions to such problems may lead to the foundation of new and
emerging startups. Therefore, the community is seeking solutions
through various perspectives such as by exploiting citation,
metadata, collaborative filtering, and content- based
approaches.
Citations are deemed to be a great source of information for
recommending relevant papers. Researchers have proposed various
citation-based techniques, including bibliographic coupling [3] and
co-citation [4]. Kessler used a bibliographic connection as basic
similarity metrics [3] to locate groups or clusters in technical
and scientific literature. In particular, authors carefully screen
the citations of their papers, which is advantageous because there
is a high probability that most of the articles cited in the
references list will be relevant to the topic of the citing paper.
However, citation-based approaches do not yield proper results for
articles that were not referred because authors cannot cite all the
relevant papers.
In addition, metadata-based techniques, suggested by [5,6], uses
various types of metadata such as paper title, authors, and venues
for finding relevant documents. In this method, the discovery and
use of documents is characterized by metadata that help in document
discovery, network management, visibility, and organizational
memory [7].
Another widely used approach to obtain relevant documents is
collaborative filtering, which determines relevant documents by
utilizing collaborative knowledge. These recommendations are based
on user profiles and past preferences of the user’s taste [8]. The
collaborative filtering system
CMC, 2021, vol.67, no.1 879
provides better accuracy than the content-based approach, but
suffers from the item cold-start problem. The authors have
suggested using a unified Boltzmann machine that naturally combines
content and collaborative features to produce better
recommendations for both cold-start and non- cold-start
items.
In content-based approaches, the content of two individual papers
is analyzed to determine their relevance with respect to each
other. For example, papers with contents similar to that of focused
paper “A” will be considered more relevant for paper “A” [9]. More
precisely, the content- based approach extracts important terms
from the contents of two papers and compares them to find the
relevance between the papers.
In context, IMRAD (Introduction, Method, Results, and Discussion)
is a common structure for organizing a research article, which was
introduced by Louis Pasteur [10]. Its adoption began in the 1940s,
and it became the dominant format for preparing papers in the 1980s
[11]. Subse- quently, different detailed structures were developed.
For example, Shotton proposed an ontology (discourse elements
ontology) that conceptually describes the logical sections and
other related information pertaining to scientific documents
[12,13]. In addition, the technique developed and proposed in [14]
maps the diversified section names over the logical sections of a
research doc- ument. Their proposed approach does not depend on the
regular flow of sections (Introduction, Related Work, Methodology,
Results, Conclusion, and Summary), rather it uses paper template
information (possible positioning of sections) and the dictionary
terms of sections. Besides, every section in a paper has its own
meaning. For example, the authors provide an overview of their
research in the “Abstract” section, whereas the “Introduction”
section contains a brief introduction about the focused research
topic. An overview of related previous research as well as the
problems and deficiencies in the existing techniques are generally
described in the “Related work” section. Subsequently, the
“Methodology” section contains the architecture of the proposed
solution along with a detailed description of the proposed
technique. Then, the obtained results are analyzed and discussed in
the “Result” section of the document. Finally, the findings from
the research are presented in the “Conclusion” section.
As discussed above, each logical section has its own importance and
significance; however, the entire document is treated at the same
level of importance in standard content-based approaches. For
example, if a term in the “Abstract” section of paper “A” matches
with a term in the conclusion section of paper “B,” the standard
content-based approaches will declare that paper “B” is relevant to
paper “A,” although these documents might not be actually relevant.
On the contrary, if the importance of logical sections are defined
and a term from the “Abstract” section of paper “A” matches to a
term in the “Abstract” section of paper “B,” then the probability
of the two documents being relevant may increase.
Similarly, if two researchers have independently proposed two
different algorithms to solve the same problem in papers “A” and
“B,” respectively, then there is a greater possibility of a higher
number of matching terms between the methodology sections of both
papers. However, a survey paper “C” covering the same problem area
might have a higher number of matched terms with paper “A” in the
entire content of the paper. In this case, standard content-based
approaches will consider survey paper “C” to be more relevant to
paper “A” than paper “B; ” however, papers “A” and “B” are more
related in reality. Thus, considering all of the aforementioned
issues, we present a study that identifies whether section-wise
content similarity increases the chances of recommending relevant
papers, as compared to the conventional content-based
approach.
880 CMC, 2021, vol.67, no.1
Therefore, we performed a section-wise content comparison between
research papers to address the objectives of the study. The vectors
of each logical section were formed from each scientific paper.
Furthermore, the corresponding vectors were compared in a standard
manner using cosine similarity, as in the content-based approach.
This indicated that the terms appearing in each logical section
were more likely to be compared only with the terms occurring in
the corresponding logical section of the other papers. The proposed
approach was comprehensively evaluated by accounting data from each
topic under the ACM classification hierarchy. The section- based
approach outperformed content-based approach in identifying
relevant documents in all topics of computer science. Further, the
gain percentage varied from 36% for Topic-E “Data” and 2% for the
Topic-D “Software. ” Overall, the gain percentage of the proposed
approach was 14% for all dataset.
The rest of the paper is organized as follows. The literature
review is presented in Section 2 for the validation of the framed
hypothesis. The proposed methodology of the study is presented in
Section 3. The evaluation of the experiment is elaborated in
Section 4 with the experimental setup and considerations, where a
sample paper from the classified ACM topics was selected. The
experimental results and its comparisons are presented in Section
5, and the results are discussed in Section 6. Finally, the
contributions of this study are summarized and concluded in Section
7.
2 Related Work
In the above section, the extent of research publications and the
estimated quantity of scientific documents were discussed along
with commonly faced problems. Consequently, various approaches have
been proposed in the literature to help the scientific community in
this task. These contemporary approaches are divided into four
major categories. The first approach iden- tifies and recommends
relevant papers by utilizing user collaborations; the second
approach uses the metadata of the papers; the third approach uses
citations to identify relatedness between documents; and the fourth
approach exploits the content of papers to recommend relevant
research papers. Certain hybrid systems utilize two or more of
these techniques to enhance the recommendations for relevant papers
based on different considerations. This section reviews the most
important, recent, and classical methods related to these
approaches.
2.1 Collaborative Filtering-Based Collaborative filtering-based
approaches list relevant documents by exploiting user profiles
and
past preferences of the users’ choices. Recommender systems
envisage a user’s choices and pref- erences based on his/her
accessed and rated items. Thus, collaborative filtering can be
categorized into two approaches: model-based and memory-based. In
model-based approaches, predictions are made based on the model,
which contains information on items and users’ interactions with
each other; whereas memory-based approaches exploit the user’s
existing rating data for predicting their preference to other
items. Therefore, collaborative filtering is considered an
important approach because of its high performance and simple
requirements, as highlighted by [15,16].
Currently, search engines used for academic purposes, such as
Scienstein, have become influential and prominent hybrid systems
recommending research.
2.2 Metadata-Based Another approach to recommend relevant papers is
that based on metadata, where the
metadata of a paper, such as paper title, author names, publication
date, and venues are used to extract relevant documents. Thus, the
relevance and discovery of documents is characterized using
CMC, 2021, vol.67, no.1 881
metadata. Moreover, one of the core services provided to users is
the creation and provision of metadata that support the
functionality of various digital libraries. In particular, objects
of interest and relevant information are accessed utilizing a
metadata technique. As the documents and metadata are digital, the
alternative implementation of the data can be made accessible.
However, conventional metadata are less likely to exist in digital
libraries. Metadata in digital libraries help in document
discovery, network management, visibility, and organizational
memory [17].
Thus, metadata are an important source for creating recommendations
of relevant research papers. Moreover, recommendation systems based
on metadata work efficiently, mostly because only a few terms need
to be analyzed for recommending relevant research papers.
Consequently, as the metadata are a set of small number of
terms—generated from authors’ keywords, title terms, and
categories, the quality of recommendation is not highly accurate
owing to the difficulty of the recommender system to make concrete
decisions by analyzing a small number of terms. Therefore, several
authors have developed hybrid approaches that use metadata as well
as collaborative filtering, content, and citations to make accurate
recommendations.
2.3 Citation-Based Citations form a highly important dataset that
use various techniques for recommending
relevant research papers. The two common techniques pertaining to
this field are bibliographic coupling [6] and co-citation [7]. In
the former approach, two papers, A and B, are considered similar if
they share a citation of paper C in their references. In the
co-citation technique, the number of common citations received by
two given papers, A and B, is used as an indicator of similarity
between A and B. Therefore, co-citation between papers A and B
indicate that they have been cited by paper C. Moreover, the
citations techniques have been extended by various researchers in
recent times [18,19].
Most of the citation-based techniques use citation network
information. These techniques provide adequate and appropriate
recommendations, because the citations are carefully hand- picked
by the authors. However, these approaches are limited to working
well within certain citation networks, because the authors cannot
cite every relevant paper in their research. The relevant papers
that have not been cited become a weak candidate to be discovered
with relevance. Thus, it has considerably high chances of missing
relevant papers.
2.4 Content-Based In content-based approaches, the contents of two
papers are analyzed to determine their
relevance. For example, papers with a more similar content to that
of focused paper A will be considered more relevant to paper A.
Thus, content-based recommendation systems analyze the internal
content of the documents to recommend relevant papers [20]. This
method is used by the majority of literature reviews that compare
the content of certain research papers to recommend relevant
scientific documents. A given paper of any file format (e.g., pdf
and doc) is transformed into text format having any one
typographical case (e.g., lower case), and cleaned by removing stop
words. In addition, known standard abbreviations are expanded using
their full text, and term-frequency vectors using term
frequency–inverse document frequency (TF–IDF) vectors are
generated. Thereafter, the TF–IDF is used to determine the
similarity between documents. As existing techniques do not use the
internal logical sections of research articles, the current study
conducted experiments to evaluate this novel approach on detecting
the similarity and relevance between documents.
882 CMC, 2021, vol.67, no.1
3 Proposed Methodology
This section comprehensively delineates the proposed methodology of
the current study; the architecture of the proposed methodology is
presented in Fig. 1. This methodology comprises various logical
steps including (a) selection of a dataset of research articles,
(b) section-wise and complete extraction of text from PDFs, (c)
section-wise and complete indexing of terms using Apache Lucene,
(d) computing similarity between documents using cosine similarity
based on the terms indexed from both approaches, and (e) comparing
both experimental results. Each step of the proposed methodology is
elaborated in the following sections.
3.1 Comprehensive Dataset Selection The dataset selection for the
proposed approach included the criteria of a dataset covering
a
vast number of topics and being comprehensive enough to conclude
the research. In addition, the dataset should allow us to access
the logical sections of the papers for section-wise term extraction
and matching. Based on these requirements, we selected the dataset
of the Journal of Universal Computer Science (J. UCS), as papers
from all topics of computer science are published in J. UCS.
Moreover, it is one of the most comprehensive journals in the
computer science domain, publishing research articles of authors
from various fields and backgrounds [21,22]. Therefore, the
selected dataset will play a consequential role in comprehensively
investigating the proposed research. The J. UCS dataset is
presented on the top row of Fig. 1.
3.2 PDF-to-Text Conversion The text from the PDF files available at
the J. UCS server was required to utilize its content.
Thus, the PDF files were converted into XML format using a tool
named PDFx with the approach adopted in [23]. Upon selecting the
dataset and acquiring the XML files of every paper along with its
logical sections, the content- and section-based approaches were
performed to obtain highly ranked papers.
3.3 Content-Based Approach The content-based approach had been
implemented because the proposed technique was
extended from it. The standard implementation of the content-based
approach was performed using the Apache Lucene API that is widely
applied to identify content/word similarities [24,25], as it
extracts rare terms (based on TF–IDF), and computes the cosine
similarities between research papers. As shown on the right-hand
side of Fig. 1, the following steps were performed to implement the
content-based approach.
3.3.1 Extracting Important Terms The first step of implementing the
content-based approach involves the acquisition of impor-
tant terms from a research paper, and inputting the text files to
the Apache Lucene API, where the term extractor, TF–IDF, extracts
terms from the entire document using Eq. (1); this process is
repeated for all the documents in the dataset. The TF–IDF
prioritizes the rare terms of a document. For example, if term “T1”
frequently occurs in document “D1,” but does not frequently occur
in other documents “D2” to “Dn,” then term “T1” is considered
important for document “D1.” Conversely, term “T” is not considered
unimportant when it frequently occurs in all documents “D1” to
“Dn.”
CMC, 2021, vol.67, no.1 883
Figure 1: Overall process architecture for comparing the content-
and section-based approaches
This measure has been extensively used by the scientific community
to select important terms from text documents [26,27]. The
extracted terms are indexed and stored in the database, as shown on
the right-hand side of Fig. 1.
tfidf (t,d,D)= tf (t,d)∗ idf (t,d) (1)
884 CMC, 2021, vol.67, no.1
3.3.2 Ranking Papers on Content Similarity The cosine similarity
measure is widely applied to compute content similarity between
research
documents [28,29]. The terms of documents “D1” and “D2,”
respectively, are represented as vectors “A” and “B” in Eq.
(2).
Document similarity= cos θ = A ·B ||A|| ||B|| (2)
This similarity measure is available in Apache Lucene. The cosine
similarities of each docu- ment with all other documents of the
dataset were computed. Moreover, the ranked list of other similar
research documents was retrieved based on the descending scores for
each document, as shown on the right-hand side of Fig. 2. The
cosine similarity is a fundamental measure that computes the
similarity between two datasets in form of vectors. In this study,
the research documents were represented as input data vectors in
form of key terms. The representative terms of the documents were
extracted using TF–IDF, which is considered as a benchmark
technique for extracting key terms from documents. Besides, there
are certain other techniques for key-term extraction, such as KEA,
Yahoo Key-Term Extractor, and Alchemy API that can be used for
similar purposes. However, applying all or some of them may lead to
a different research question that involves evaluating the effect
of various key-term extractors on the quality of results. Overall,
TF–IDF and cosine similarity are considered as default mechanisms
for evaluating content-based similarity. Therefore, the focus of
this study was to evaluate the standard content-based and
section-wise content similarities to determine the role of sections
in finding relevant papers.
Figure 2: Top-10, top-15, and top-20 results for topic-A
3.4 Section-Wise Content-Based Approach to Find Similar Documents
The same dataset was converted into logical sections to apply a
section-wise content-based
approach in finding relevant documents. The steps followed in this
approach included certain additional tasks, which are shown on the
left-hand side of Fig. 2 and described as follows.
CMC, 2021, vol.67, no.1 885
3.4.1 Extracting Sections of Research Papers In the current study,
all the text files were converted into six logical sections:
“Abstract,”
“Introduction,” “Related work,” “Methodology,” “Results,” and
“Conclusion.” The section head- ings appearing in the research
papers were converted into these logical sections using the
approach proposed in [14]. Although a paper consists of various
logical sections, such as the “Abstract,” “Introduction,”
“Literature Review,” “Proposed Work,” “Results,” and “Conclusion,”
these sections are not explicitly mentioned in any article.
Therefore, the proposed approach extracts the sections instanced in
the papers and maps them over the logical sections. This task is
achieved using the template information pertaining to the paper and
terms dictionary. The template portrays sequential information
regarding the article, such as the “Introduction” is the first
section of the paper, and the “Results” section occurs before the
“Conclusion” section. Similarly, the terms dictionary refers to the
various terms that are commonly used for representing a certain
section, e.g., literature review, related work, and experimental
setup. The left-hand side of Fig. 2 depicts the accuracy of the
stated approach at 78%, which was improved up to nearly 100% by
manually checking the content of the logical sections from each
source article. Although the accuracy of the stated approach was
low, it still performed various prerequisite tasks such as
extraction of section instances (i.e., actual section labels) from
a research article and conducted its mapping over the predefined
logical sections. As the objective of this study was to evaluate
the performance of section-wise similarity results with a trivial
approach, we preferred to adopt the technique proposed by [14] for
convenience.
3.4.2 Extracting Section-Wise Important Terms The process of
section-wise term extraction is similar to that discussed in
Section 2.3.1.
However, the content of each section (“Abstract,” “Introduction,”
etc.) of a paper was separately marked for Apache Lucene. The
proposed approach separately required the important terms extracted
by Apache Lucene from each section of each research paper. The
extracted section- wise terms were then indexed in a database for
future usage, as illustrated on the left-hand side of Fig. 1.
3.4.3 Ranking Papers Based on Section-Wise Content Comparisons The
cosine similarity was applied upon the indexed terms to compute the
similarity score
between the source and target papers, where each paper was
represented as a six-term vector. The corresponding term vectors of
each paper were compared, i.e., six similarity scores were obtained
for each paper based on the matching of the corresponding term
vectors (“Abstract” with “Abstract,” “Introduction” with
“Introduction,” “Related work” with “Related work,” “Method- ology”
with “Methodology,” “Results” with “Results,” and “Conclusion” with
“Conclusion”) of every other paper. The final similarity score of
each paper was evaluated by averaging all the scores obtained in
each ACM topic with respect to all other papers. In this way, a
ranked list of relevant papers was acquired based on the similarity
score of each paper arranged in descending order. The similarity
score was computed based on the cosine similarity between the
papers, as shown on the left-hand side of Fig. 1.
4 Evaluation Setup
Based on the similarity scores computed above, two separate ranked
lists were formed: one for content-based similarity and another for
section-wise content similarity of the papers. These lists are
required to be evaluated based on certain benchmarks; however, to
the best of our knowledge, there is no standard benchmark that can
be employed to evaluate the results of the proposed
886 CMC, 2021, vol.67, no.1
study. Therefore, we constructed a benchmark to compare and
evaluate both approaches, as shown in the lower part of Fig.
1.
The development of a gold standard dataset was a crucial task,
because there was no available benchmark that could be used for
evaluating the proposed approach in the domain of relevant paper
recommendations. Normally, authors logically define such standards
or prefer user studies. In this study, the documents belonging to
the same ACM topic were defined as relevant docu- ments. Besides,
authors manually select suitable ACM topics to represent their
research during publishing in J. UCS. Therefore, it could be
presumed that the authors chose the best topic(s) representing
their papers. Thus, we decided to consider the topic information of
every research paper available in J. UCS as the gold standard
(benchmark), and both the approaches (content and section-wise
content) were evaluated against this benchmark. The evaluation
process examined the number of top recommendations extracted using
both the approaches from a list of 200 documents belonging to the
topic(s) of the query paper. The complete list of topics developed
by ACM—hierarchically classified as topics A to K is illustrated in
Tab. 1.
Table 1: Complete list of topics in the 1998 ACM computing
classification system
I II
Topic-A: General literature Topic-G: Mathematics of computing
Topic-B: Hardware Topic-H: Information systems Topic-C: Computer
systems organization Topic-I: Computing methodologies Topic-D:
Software Topic-J: Computer applications Topic-E: Data Topic-K:
Computing milieux Topic-F: Theory of computation
Comparing the top recommendations of 200 documents with the
benchmark would require exhaustive manual effort. Therefore, five
papers from each of the topics (“A” to “K”) were selected for
evaluation. Moreover, both the approaches were employed to produce
results for each of the 200 documents, to comprehensively judge the
working of both the approaches. The topics of each selected paper
were compared with the topics of top recommendations (top 10, top
15, and top 20) that resulted from using both the techniques. In
particular, the topics of paper A (source paper) were compared with
the topics of the recommended papers and ranked in a list to find
the total number of matched topics. The number of recommended
papers, having the same topics as the source paper, was noted, and
the detailed results are discussed below.
5 Topic-Wise Results of Content and Section-Based Approaches
The topic-wise results of the content and section-based approaches
on Topic-A are presented in this section; the rest of the topic
results have not been discussed owing to their identical
nature.
5.1 Topic-A Results Topic-A in the ACM topic classification
represents papers published under the “General
Literature” category. Among the 200 papers present in the employed
dataset, 14 papers belonged to Topic-A, whereas 186 papers were
related to various other topics, and therefore, considered as
noise.
CMC, 2021, vol.67, no.1 887
In Fig. 2, the header row, i.e., “Topic,” “SR#,” and “PAPER ID”
represent the topic, serial number, and IDs of the source papers,
respectively. The sub-columns (top 10, top 15, and top 20) under
the content-based column contain the matched frequencies reflecting
the number of times a topic of the top recommendations were
identical to that of the source paper. Similarly, the
“Section-based” column contains the same sub-columns representing
the number of topics matched in the ranked papers for the
section-based approach. For instance, in the top 10 rec-
ommendation for the source paper ID = 1, 9/10 papers belonging to
Topic-A were extracted by the content-based approach, whereas 10/10
papers belonging to the same topic were extracted by the
section-based approach. For better understanding through
representation in Fig. 2, distinctive colors—purple (Win), sky blue
(Loss), and brown (Equal), respectively, indicate the comparative
Win, Loss, and Equal status of the approach based on the number of
relevant papers recom- mended in the top 10, top 15, and top 20
results. In the top 10 recommendations, the comparative scores of
the content-based approach was Win = 1,Loss= 3, and Equal = 1,
whereas the scores for the section-based approach was Win= 3,Loss=
1, and Equal= 1. Similarly, the section-based approach performed
better in providing the top 15 recommendations with Win= 2,Loss= 1,
and Equal= 2, in comparison to the content-based approach that had
a score of Win = 1,Loss= 2, and Equal= 2. The scores were more
indicative in the top 20 recommendations, with the section- based
approach clearing Win= 3,Loss= 0, and Equal= 2 against the
content-based approach at Win= 0,Loss= 3,Equal= 2.
The comparative results for the top-10 recommendations under
Topic-A are portrayed in Fig. 3a, where the X-axis represents the
source papers (1–5) for Topic-A and the Y-axis charts the number of
relevant papers. Note that the top-10 results were acquired from a
list of 200 papers related to Topic-A, where 14 papers belonged to
the same topic and the remaining 186 belonged to other topics. This
indicates that the similarity scores of the papers represented on
the X-axis were computed with respect to each of the 200 papers,
and the top 10 recommendations include papers that have the highest
similarity scores among the entire list.
Figure 3: Comparison of top-20 results for section- and
content-based techniques with gain in the section-based results.
(a) Top-10 comparison for Topic-A. (b) Top-10 gain percentages for
Topic-A
888 CMC, 2021, vol.67, no.1
Fig. 3b depicts the gain/loss percentage of the section-based
approach over the content- based approach, where the former
delivered better results for source paper IDs = 1, 2, and 5, and
performed equally at position 3. However, the content-based
approach outperformed the section-based approach at position
4.
The section-based approach achieved gain percentages of 11%, 14%,
and 33% for the paper IDs= 1, 2, and 5, respectively; however, the
loss percentage was 100% for paper ID = 4. Overall, the gain
percentage of the proposed approach remained at 12% for the 5
papers in the top-10 recommendations for Topic-A.
In Fig. 4a, the comparisons of the two approaches for providing
top-15 recommendations of Topic-A are visualized. At positions 1
and 5, the section-based approach produced better results than the
content-based approach, whereas both the approaches performed
equally at positions 3 and 4. The content-based approach performed
better than the section-based approach at position 2.
Figure 4: Comparison of top-15 results for section- and
content-based techniques with gain in the section-based results.
(a) Top-15 comparison for Topic-A. (b) Top-15 gain percentages for
Topic-A
Fig. 4b portrays the gain/loss percentages of the section-based
approach over the content- based approach. The section-based
approach gained 7% and 10% for paper IDs = 1 and 5, respectively,
whereas it exhibited an 8% loss for paper ID = 2. However, the
gain/loss percentages between the approaches remained at 0% for
paper IDs = 3 and 4. Overall, the proposed approach delivered a
gain percentage of 2% for all the 5 papers in the top-15
recommendations for Topic-A.
The top-20 recommendations of Topic-A extracted by the two
approaches are compared in Fig. 5a, which shows the section-based
approach performed better than the content-based approach at
positions 1, 4, and 5 and produced an equal number of papers at
positions 2 and 3.
Subsequently, the gain/loss percentages of the section-based
approach are visualized in Fig. 5b. The section-based approach
gained 11%, 67%, and 7% for paper IDs = 1, 4, and 5, respectively.
However, both the approaches produced equal results (0% gain/loss)
for paper IDs= 2
CMC, 2021, vol.67, no.1 889
and 3. Overall, the gain percentage of the proposed approach was 9%
for all the 5 papers within the top-20 recommendations for Topic-A.
In summary, the average gain of the proposed approach was 7% for
all the recommendations—top 10, top 15, and top 20.
Figure 5: Comparison of top-20 results for section- and
content-based techniques with gain in the section-based results.
(a) Top-15 comparison for Topic-A. (b) Top-15 gain percentages for
Topic-A
Figure 6: Overall comparison of top-10 results for content- and
section-based approaches
5.2 Comparative Results This section presents the comparative
results for all the 55 papers selected for the study.
Figs. 6–8 show the results of the top-10, top-15, and top-20
recommendations for each paper,
890 CMC, 2021, vol.67, no.1
respectively. Although the content-based approach performed better
than the section-based approach on 10 occasions for providing the
top-10 recommendations, the section-based approach outperformed the
content-based approach on 31 instances. Similarly, for the top-15
recommen- dations, the content-based approach performed better on 9
occasions and produced equal results 13 times; however, the
section-based approach outperformed the content-based approach 33
out of 55 times. Furthermore, the proposed section-based approach
yielded more number of top-20 matching results on 34 instances and
produced equal results 11 times—in comparison to the content-based
approach that outperformed the section-based approach only on 10
out of 55 instances.
Figure 7: Overall comparison of top-15 results for content- and
section-based approaches
Figure 8: Overall comparison of top-20 results for content- and
section-based approaches
CMC, 2021, vol.67, no.1 891
6 Discussion
In this study, we performed multiple experiments to evaluate the
effectiveness of using terms from various logical sections rather
than extracting terms from a paper. Thus, a detailed comparison
between the section- and content-based approaches was presented.
From the ACM classification hierarchy, five papers were considered
from each root-level topic (Topic A–K). In addition, there was
enough noise (irrelevant papers) under each topic to validate the
working efficiency of the sections- and content-based approaches.
Moreover, topic-wise comparisons and statistics of every paper from
the selected list of papers were highlighted in detail.
Furthermore, the gain/loss percentages of the section-based
approach for the top-10, top-15, and top-20 recommendations were
evaluated.
The overall topic-wise gain percentages are shown in Fig. 9. The
findings of the comprehen- sive analysis are as follows:
(1) For all topics, except Topic-D in the top-20 recommendations,
the section-based approach outperformed the content-based
approach.
(2) The highest accuracy of the proposed approach was observed for
Topic-E and Topic-I, whereas that for Topic-D remained on
low.
(3) The gain percentages for the section-based approach remained
consistently higher than the content-based approach for the top-10
and top-15 results; however, the corresponding gain percentages in
the top-20 list were not as high as those in the top-10 and top-15
lists, except for Topic-F. Nonetheless, the top-20 recommendations
remained in close competition with the top-10 and top-15 results
for Topic-A, Topic-E, Topic-F, and Topic-I.
(4) The overall gain percentages of the section-based approach were
15% for top-10, 14% for top-15, and 12% for the top-20
recommendations. The average gain percentage across all the topics
was 13.72%.
Figure 9: Overall gain percentages of section-based approach
Tab. 2 lists the results of the comparative analysis based on gain
percentage of the section- based approach over the content-based
approach for obtaining the top-10, top-15, and top-20
recommendations of topics A–K under ACM classification. As can be
observed, the proposed approach yielded superior results in most of
the cases, except for the top-20 results of Topic-D
892 CMC, 2021, vol.67, no.1
and the equal top-15 results of Topic-J. In summary, the overall
gain percentage of the section- based approach was 13.72%. Thus,
the section-wise comparison of terms identified a greater number of
relevant papers than the existing methods.
Table 2: Gain percentages of the section-based approach for top-10,
top-15, and top-20 results
Topic Gain percentage of section-based approach (%)
Top 10 Top 15 Top 20 Overall
Topic-A: General literature 12 2 9 8 Topic-B: Hardware 14 26 15 18
Topic-C: Computer systems organization 3 11 3 6 Topic-D: Software 3
4 −1 2 Topic-E: Data 33 39 35 36 Topic-F: Theory of computation 11
9 19 13 Topic-G: Mathematics of computing 10 10 3 8 Topic-H:
Information systems 14 26 15 18 Topic-I: Computing methodologies 31
29 29 30 Topic-J: Computer applications 16 0 3 6 Topic-K: Computing
milieux 19 14 7 13
7 Conclusion
The body of scientific knowledge is rapidly expanding with more
than 2 million research papers being published annually. These
papers are accessed through various search systems such as general
search engines, citation indices, and digital libraries; however,
identifying pertinent research papers from these huge repositories
is a challenge. On a general query, thousands of papers are
returned from these systems, thus making it difficult for end users
to find relevant documents. This phenomenon has attracted the
attention of researchers to devise state-of-the-art approaches that
could assist the scientific community in identifying relevant
papers rapidly. These techniques can be categorized into four major
categories: (1) content-based, (2) citation-based, (3) metadata-
based, and (4) collaborative filtering approaches. Although the
content-based approaches have a better recall than other methods,
there are limitations that result in a long list of recommendations
against user queries. Therefore, researchers have attempted to
improve the quality of results by devising more intelligent
techniques [30]. In this study, we presented one such idea that may
lead the foundation of innovative research-based entrepreneurship.
The section-wise content similarity approach intelligently
processes the data components of research articles to produce
enhanced results. The stated method compares the terms occurring in
the corresponding logical sections of each research paper present
in the repository. For example, the terms, occurring in the
“Result” section of a research article, are compared with the terms
occurring in the “Result” section of other research articles. Thus,
identifying relevant state-of-the-art approaches is important for
finding relevant articles in a short time. The findings of this
research are as follows:
(1) The proposed approach outperformed the traditional
content-based approach in identifying relevant documents in all
topics of computer science classified under ACM hierarchy.
CMC, 2021, vol.67, no.1 893
(2) The gain percentage varied from 36% for Topic-E (Data) to 2%
for Topic-D (Software). The overall gain percentage of the proposed
approach was evaluated at 14% for all topics.
The contribution of our study is related to the vision of applying
Scientific Big Data tech- niques to facilitate the delivery of
sophisticated next-generation library services. The retrieval of
copious relevant knowledge with the adoption of sophisticated
techniques is not just a necessity, but also a decisive action of
the global scientific community toward the management of collective
wisdom, prosperity, development, and sustainability. The proposed
approach poses great potential for further exploration in future.
Although certain previous studies [31,32] have highlighted the
importance of article sections, such as the “Results” and
“Methodology” being more important than the “Introduction” and
“Related work,” we did not assign any weight on the research
article sections based on their importance. Thus, a weighted model
may be built in future based on the importance of various sections.
Furthermore, a mechanism based on artificial intelligence may be
employed for identifying logical sections to extract relevant
content.
Funding Statement: This work was partially supported by Institute
of Information & communica- tions Technology Planning &
Evaluation (IITP) grant funded by the Korea government (MSIT)
(2020-0-01592) and Basic Science Research Program through the
National Research Foundation of Korea (NRF) funded by the Ministry
of Education (2019R1F1A1058548).
Conflicts of Interest: The authors declare that they have no
conflicts of interest to report regarding the present study.
References [1] M. T. Afzal, N. Kulathuramaiyer, H. Maurer and W. T.
Balke, “Creating links into the future,” Journal
of Universal Computer Science, vol. 13, no. 9, pp. 1234–1245, 2007.
[2] A. E. Jinha, “Article 50 million: An estimate of the number of
scholarly articles in existence,” Learned
Publishing, vol. 23, no. 3, pp. 258–263, 2010. [3] K. Funk, R.
Stanger, J. Eannarino, L. Topper and K. Majewski, “PubMed journal
selection and
the changing landscape of scholarly communication,” National
Library of Medicine, 2017. [Online]. Available
https://www.nlm.nih.gov/bsd/disted/video/selection.html.
[4] Scopus Data, 2020. [Online]. Available
https://www.elsevier.com/__data/assets/pdf_file/0017/114533/
Scopus_GlobalResearch_Factsheet2019_FINAL_WEB.pdf.
[5] M. Gusenbauer, “Google Scholar to overshadow them all?
Comparing the sizes of 12 academic search engines and bibliographic
databases,” Scientometrics, vol. 118, no. 1, pp. 177–214,
2019.
[6] M. M. Kessler, “Bibliographic coupling between scientific
papers,” American Documentation, vol. 14, no. 1, pp. 10–25,
1963.
[7] H. Small, “Co-Citation in the scientific literature: A new
measure of the relationship between two documents,” Journal of the
American Society for Information Science, vol. 24, no. 4, pp.
265–269, 1973.
[8] M. T. Afzal and M. Abulaish, “Ontological representation for
links into the future,” in Proc. ICCIT, Gyeongju, South Korea, pp.
1832–1837, 2007.
[9] M. T. Afzal, H. Maurer, W. T. Balke and N. Kulathuramaiyer,
“Rule based autonomous citation mining with tierl,” Journal of
Digital Information Management, vol. 8, no. 3, pp. 196–204,
2010.
[10] R. A. Day, “The origins of the scientific paper: The IMRAD
format,” Journal of American Medical Writers Association, vol. 4,
no. 2, pp. 16–18, 1989.
[11] J. Wu, “Improving the writing of research papers: IMRAD and
beyond,” Landscape Ecology, vol. 26, no. 1, pp. 1345–1349,
2011.
[12] D. Shotton, K. Portwin, G. Klyne and A. Miles, “Adventures in
semantic publishing: Exemplar semantic enhancements of a research
article,” PLoS Computing Biology, vol. 5, no. 4, e1000361,
2009.
894 CMC, 2021, vol.67, no.1
[13] S. Peroni, D. Shotton and F. Vitali, “Faceted documents:
Describing document characteristics using semantic lenses,” in
Proc. DocEng, Paris, France, pp. 191–194, 2012.
[14] A. Shahid and M. T. Afzal, “Section-wise indexing and
retrieval of research articles,” Cluster Comput- ing, vol. 21, no.
1, pp. 1–12, 2017.
[15] Y. Koren and R. Bell, “Advances in collaborative filtering,”
in Recommender SystemsHandbookSpringer, 1st ed. vol. 1. Boston, MA,
USA: Springer, pp. 145–186, 2011.
[16] Y. Cai, H. Leung, Q. Li, H. Min, J. Tang et al.,
“Typicality-based collaborative filtering recommenda- tion,” IEEE
Transactions on Knowledge and Data Engineering, vol. 26, no. 3, pp.
766–779, 2014.
[17] L. D. Murphy, “Digital document metadata in organizations:
Roles, analytical approaches, and future research directions,” in
Proc. ICSS, Kohala Coast, HI, USA, pp. 267–276, 1998.
[18] A. Shahid, M. T. Afzal and M. A. Qadir, “Lessons learned: The
complexity of accurate identification of in-text citations,”
International Arab Journal of Information Technology, vol. 12, no.
5, pp. 481–488, 2015.
[19] A. Riaz and M. T. Afzal, “CAD: An algorithm for
citation-anchors detection in research papers,” Scientometrics,
vol. 117, no. 3, pp. 1405–1423, 2018.
[20] L. Pasquale, G. D. Marco and S. Giovanni, “Content-based
recommender systems: State of the art and trends, ” in Recommender
Systems Handbook Springer, 1st ed. vol. 1. Boston, MA, USA:
Springer, pp. 73–105, 2011.
[21] A. Hanan, R. Iftikhar, S. Ahmad, M. Asif and M. T. Afzal,
“Important citation identification using sentiment analysis of
in-text citations,” Telematics and Informatics, 2020, 101492.
[22] A. Shahid, M. T. Afzal, M. Abdar, M. E. Basiri, X. Zhou et
al., “Insights into relevant knowledge extraction techniques: A
comprehensive review,” Journal of Supercomputing, vol. 76, no. 1,
pp. 1–39, 2020.
[23] A. Constantin, S. Pettifer and A. Voronkov, “PDFX: Fully
automated pdf-to-xml conversion of scientific literature,” in Proc.
DocEng, Florence, Italy, pp. 177–180, 2013.
[24] M. Borg, P. Runeson, J. Johansson and M. V. Mantyla, “A
replicated study on duplicate detection: Using apache lucene to
search among android defects,” in Proc. ESEM, Torino, Italy, pp.
1–4, 2014.
[25] Y. Zhou, X. Wu and R. Wang, “A semantic similarity retrieval
model based on Lucene,” in Proc. ICSESS, Beijing, China, pp.
854–858, 2014.
[26] S. Pascal and M. W. Guy, “Beyond TF-IDF weighting for text
categorization in the vector space model,” in Proc. IJCAI ,
Edinburgh, Scotland, pp. 1130–1135, 2005.
[27] L. H. Patil and M. Atique, “A novel approach for feature
selection method TF-IDF in document clustering,” in Proc. IACC,
Ghaziabad, India, pp. 858–862, 2013.
[28] X. Bai, M. Wang, I. Lee, Z. Yang, X. Kong et al., “Scientific
paper recommendation: A survey,” IEEE Access, vol. 7, no. 1, pp.
9324–9339, 2019.
[29] M. Qamar, M. A. Qadir and M. T. Afzal, “Application of cores
to compute research papers similarity,” IEEE Access, vol. 5, no. 1,
pp. 26124–26134, 2017.
[30] M. D. Lytras, V. Raghavan and E. Damiani, “Big data and data
analytics research: From metaphors to value space for collective
wisdom in human decision making and smart machines,” International
Journal on Semantic Web and Information Systems, vol. 13, no. 1,
pp. 1–10, 2017.
[31] S. Teufel, A. Siddharthan and D. Tidhar, “Automatic
classification of citation function,” in Proc. EMNLP, Sydney,
Australia, pp. 103–110, 2006.
[32] K. Sugiyama and M. Y. Kan, “Exploiting potential citation
papers in scholarly paper recommenda- tion,” in Proc. JCDL,
Indianapolis, USA, pp. 153–162, 2013.
<< /ASCII85EncodePages false /AllowTransparency false
/AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left
/CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1)
/CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile
(sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Warning
/CompatibilityLevel 1.4 /CompressObjects /Off /CompressPages true
/ConvertImagesToIndexed true /PassThroughJPEGImages true
/CreateJobTicket false /DefaultRenderingIntent /Default
/DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy
/LeaveColorUnchanged /DoThumbnails false /EmbedAllFonts true
/EmbedOpenType false /ParseICCProfilesInComments true
/EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false
/EndPage -1 /ImageMemory 1048576 /LockDistillerParams false
/MaxSubsetPct 100 /Optimize true /OPM 1 /ParseDSCComments true
/ParseDSCCommentsForDocInfo true /PreserveCopyPage true
/PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness
true /PreserveHalftoneInfo false /PreserveOPIComments false
/PreserveOverprintSettings true /StartPage 1 /SubsetFonts true
/TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue
false /ColorSettingsFile (None) /AlwaysEmbed [ true ] /NeverEmbed [
true ] /AntiAliasColorImages false /CropColorImages true
/ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK
/DownsampleColorImages false /ColorImageDownsampleType /Average
/ColorImageResolution 300 /ColorImageDepth 8
/ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold
1.50000 /EncodeColorImages true /ColorImageFilter /FlateEncode
/AutoFilterColorImages false /ColorImageAutoFilterStrategy /JPEG
/ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1]
/VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.15
/HSamples [1 1 1 1] /VSamples [1 1 1 1] >>
/JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256
/Quality 30 >> /JPEG2000ColorImageDict << /TileWidth
256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 300
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Average /GrayImageResolution 300
/GrayImageDepth 8 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /FlateEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict <<
/QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >>
/GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples
[1 1 1 1] >> /JPEG2000GrayACSImageDict << /TileWidth
256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict
<< /TileWidth 256 /TileHeight 256 /Quality 30 >>
/AntiAliasMonoImages false /CropMonoImages true
/MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK
/DownsampleMonoImages false /MonoImageDownsampleType /Average
/MonoImageResolution 1200 /MonoImageDepth -1
/MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true
/MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1
>> /AllowPSXObjects false /CheckCompliance [ /None ]
/PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False
/CreateJDFFile false /Description << /CHS
<FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000500044004600206587686353ef901a8fc7684c976262535370673a548c002000700072006f006f00660065007200208fdb884c9ad88d2891cf62535370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
/CHT
<FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef653ef5728684c9762537088686a5f548c002000700072006f006f00660065007200204e0a73725f979ad854c18cea7684521753706548679c300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
/DAN
<FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002000740069006c0020006b00760061006c00690074006500740073007500640073006b007200690076006e0069006e006700200065006c006c006500720020006b006f007200720065006b007400750072006c00e60073006e0069006e0067002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
/DEU
<FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f00630068007700650072007400690067006500200044007200750063006b006500200061007500660020004400650073006b0074006f0070002d0044007200750063006b00650072006e00200075006e0064002000500072006f006f0066002d00470065007200e400740065006e002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
/ESP
<FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000640065002000410064006f0062006500200050004400460020007000610072006100200063006f006e00730065006700750069007200200069006d0070007200650073006900f3006e002000640065002000630061006c006900640061006400200065006e00200069006d0070007200650073006f0072006100730020006400650020006500730063007200690074006f00720069006f00200079002000680065007200720061006d00690065006e00740061007300200064006500200063006f00720072006500630063006900f3006e002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
/FRA
<FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f007500720020006400650073002000e90070007200650075007600650073002000650074002000640065007300200069006d007000720065007300730069006f006e00730020006400650020006800610075007400650020007100750061006c0069007400e90020007300750072002000640065007300200069006d007000720069006d0061006e0074006500730020006400650020006200750072006500610075002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
/ITA
<FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f006200650020005000440046002000700065007200200075006e00610020007300740061006d007000610020006400690020007100750061006c0069007400e00020007300750020007300740061006d00700061006e0074006900200065002000700072006f006f0066006500720020006400650073006b0074006f0070002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
/JPN
<FEFF9ad854c18cea51fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e30593002537052376642306e753b8cea3092670059279650306b4fdd306430533068304c3067304d307e3059300230c730b930af30c830c330d730d730ea30f330bf3067306e53705237307e305f306f30d730eb30fc30d57528306b9069305730663044307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e30593002>
/KOR
<FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020b370c2a4d06cd0d10020d504b9b0d1300020bc0f0020ad50c815ae30c5d0c11c0020ace0d488c9c8b85c0020c778c1c4d560002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
/NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken
voor kwaliteitsafdrukken op desktopprinters en proofers. De
gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe
Reader 5.0 en hoger.) /NOR
<FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200066006f00720020007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c00690074006500740020007000e500200062006f007200640073006b0072006900760065007200200065006c006c00650072002000700072006f006f006600650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
/PTB
<FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020007000610072006100200069006d0070007200650073007300f5006500730020006400650020007100750061006c0069006400610064006500200065006d00200069006d00700072006500730073006f0072006100730020006400650073006b0074006f00700020006500200064006900730070006f00730069007400690076006f0073002000640065002000700072006f00760061002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
/SUO
<FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a00610020006c0061006100640075006b006100730074006100200074007900f6007000f60079007400e400740075006c006f0073007400750073007400610020006a00610020007600650064006f007300740075007300740061002000760061007200740065006e002e00200020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
/SVE
<FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740020006600f600720020006b00760061006c00690074006500740073007500740073006b0072006900660074006500720020007000e5002000760061006e006c00690067006100200073006b0072006900760061007200650020006f006300680020006600f600720020006b006f007200720065006b007400750072002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
/ENU (Use these settings to create Adobe PDF documents for quality
printing on desktop printers and proofers. Created PDF documents
can be opened with Acrobat and Adobe Reader 5.0 and later.)
>> /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [
<< /AsReaderSpreads false /CropImagesToFrames true
/ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides
false /IncludeGuidesGrids false /IncludeNonPrinting false
/IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ]
/OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false
/SimulateOverprint /Legacy >> << /AddBleedMarks false
/AddColorBars false /AddCropMarks false /AddPageInfo false
/AddRegMarks false /ConvertColors /NoConversion
/DestinationProfileName () /DestinationProfileSelector /NA
/Downsample16BitImages true /FlattenerPreset <<
/PresetSelector /MediumResolution >> /FormElements false
/GenerateStructure true /IncludeBookmarks false /IncludeHyperlinks
false /IncludeInteractive false /IncludeLayers false
/IncludeProfiles true /MultimediaHandling /UseObjectSettings
/Namespace [ (Adobe) (CreativeSuite) (2.0) ]
/PDFXOutputIntentProfileSelector /NA /PreserveEditing true
/UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling
/LeaveUntagged /UseDocumentBleed false >> ] >>
setdistillerparams << /HWResolution [300 300] /PageSize
[612.000 792.000] >> setpagedevice