Date post: | 02-Apr-2018 |
Category: |
Documents |
Upload: | seventhsensegroup |
View: | 219 times |
Download: | 0 times |
7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation
http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 1/6
International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 630
Hypermedia Structure : Document
Composition and Migration Path forRich Set of Presentation
R. N. Jugele* and Dr. V. N. Chavan
*Department of Computer Science, Science College, Congress Nagar, Nagpur. Maharashtra,
Head, Department of Computer Science, S. K. Porwal College, Kamptee, Dist : Nagpur. Maharashtra.
Abstract - The original paper documents are employed
for archiving. Different hypertext structures encountersin the document. Different methods for analyzing
document structure is presented. This structure used for
presentation of the content of the document to the user.The hypermedia research community find that it is
necessary to establish a reference architecture for
hypermedia systems to make progress on defining a protocol to enable third party applications to access link services. There is a need to extend the scope of these requirements. The overall architecture for the
integration of existing hypermedia systems in a
distributed, collaborative model and provide a clearevolution path towards achieving this goal.
Keywords – Hypermedia, link, document, logical,
geometric, protocol, object, virtual, runtime.
I. INTRODUCTIONA working group establishing a protocol for
hypermedia systems and aim of this protocol is to
enable applications to access hypermedia link service functionality in a consistent and standard
manner. It is observed that it is difficult to make
progress on defining Hypertext Protocol without
establishing a reference architecture for hypermediasystems. The Dexter Model[8] attempts to provide
a standard hypermedia terminology coupled with a
formal model of the common abstractions found within contemporary hypermedia systems. A threelayer conceptual data model is presented without
any suggestion of an architecture for realizing the
model. The Flag Taxonomy[16] shows thefunctionality and interaction of hypermedia systems
in such a manner as to aid classification. To
establish an inclusive reference architecture for hypermedia systems.
Following are the areas:
Agreement upon specification for location
specifiers (LocSpecs)[6]: Reich[11] and Rutledge[15] propose solutions for addressingthis issue of open location specifications.
A reference architecture for hypermedia system:Gronbaek[7] propose a synthesis architecture
based around the conceptual layers of the Dexter
Model and introduce three protocols for integrating with external entities.
A vision of a globally distributed and
collaborative model with a clear evolution pathtoward this goal: Present model illustrates how
hypermedia systems can be integrated in a
manner which provide powerful, distributed and collaborative architecture.
II. PAPER AS STRUCTURED DOCUMENT
Fig. 3. Paper document to Structured
HyperdocumentThe document model defined forms the basis for
algorithms to convert paper documents into
structured hyperdocument. These algorithms
require processing phases, addressing variousaspects of the document structures and content[14].
Processing steps are distinguished based on the
different representation levels as shown in Fig. 3.described the method which is tailored easily for
use in other applications.
A. Paper to image objectsThe scanned pages are segmented using the Isodata
thresholding technique[12] and it analysed the
binary images. For speeding up processing, original
image can be reduced to other resolutions. Theseare all mapped to the common document reference
7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation
http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 2/6
International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 631
coordinate system called image object with its
geometric features. For each image object a set of geometric features is defined i.e. width, height and
aspect ratio.
B. Image objects to basic geometric objectsClassifying the image object into a set of geometric
object classes is a segment. Here a decision tree
method is used[18]. The class labels are {text,
figure,horizontal line,vertical line,noise}. Itsfeatures are minimum, maximum, average or modal
value of the features of the image objects in the
group. The values ximin; x
imax, y
imin; y
imax define the
bounding box for segment i.
There are three characteristics of segments:
Features of the individual segments
Relations between pairs of segments
Characteristics based on the whole set of
segments
The individual characteristics used are the width,height and position of a segment on the page. The
powerful method of selection and action forms the
basis for further document analysis to deriving the basic geometric objects. The image objects are
usually the smallest basic items in the image which
can be given an interpretation in document terms,
like characters and parts of figures. They do notcorrespond to the basic components required in thegeometric structure which are the single paragraphs
and complete figures.
C. Geometric objects to geometric structureFor multi-column documents, the geometric
structure is mostly concerned with columnstructure. For two column documents segments are
classified into {centered, left column, right
column}.
The column of a segment s is computed byconsidering whether it is intersected by the middle
line. If not, the column is obvious, otherwise thefollowing is used:
left_column c(s)<-
column(s) = centered-c(s) right_column c(s)>
where is the parameter for deciding when anelement is considered to be centered and c is
centrality. This method is not suited for centered
segments in the document, as it depends on the
alignment of the bounding boxes in the verticaldirection.
D. Geometric to logical objectsThe basic objects have a geometric label. There areone or two headers on the top of the page, page
numbers are at the bottom of the page, title pages
have both a title and footer above and below thetextbody. The classification strategy shown in the
following table.
Predicate New type
top most(text) header
vertical overlap(text,header) header
bottom most(text) page number
in margin(figure,text) Caption
segment centered(text)^ Title
above middle(text)
segment centered(text)^ Footer
below middle(text)
Segment centered(text) is same as deciding whether
a column is centered. The algorithm is suited for the title page, the pure textual pages and thecombined text/figure pages present.
E. Basic objects to contentTo extract the content of figures in a hypertextcontext, focus is on labels in the figure.
plain alphanumeric labels : facsimile of their
corresponding ASCII string
alphanumeric template labels : text strings
derived from a template where the variable part isa plain alphanumeric label and the fixed part is
some visual shape
icon labels : non-alphanumeric labelsdistinguished by their shape alone
legend labels : icon labels with an associated textual definition
The content of figures is analyzed at full resolution
to avoid losing important details.The resulting segments are sent to the figure
analysis package. Again the raw text is tokenized
for use in further analysis.
F. Layout and content to logical structureLogical segments have it own meaning and no
direct relation to other objects, it can be done by
starting with the layout information[18] and thenapplying rules capturing knowledge of layout
conventions. Following regular expressions is
used, where * means zero or more occurrences and + means at least one occurrence.
-chapter:<start-of-line><numeral><,><word>+<end-of-line>-section:<start-of-line><numeral><,><numeral>
<word>+<end-of-line>
Text labels in a figure have the geometricclassification text they can have a logical
classification indicating their meaning. As in the
7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation
http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 3/6
International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 632
logical labeling of basic objects this is domain
specific and requires knowledge about the contentof the figures.
Three logical classes can be distinguished. Figures can have a label of class title
There are labels of class note provide contextualinformation about the figure
Class name each of them naming a part of an
object in the figure
G. Logical structure to hypertextIt provides the hierarchical structure of thehyperdocument and the linear structures required
for the reading order and accessing the figures in
the document[13]. Computing a standard index
structure based on the labels in the figure is also
trivial. An index of important keywords in the textcan be found automatically based on the statistics
of occurrence in the text[3].
The cross-group structure between the set of figuresand the text can be found when there is some
explicit way of reference to figures and these
references can be found by searching for the patterns:
- “ Note"<:> \Reference Figure"<numeral>
- “ Note"<:>\Reference Figures"<<numeral>,>+
“and" <numeral>Other common ways of referring to figures are,“see figure <numeral>", “as shown in figure
<numeral>", “(fig. <numeral> illustrates",
“(fig.<numeral>)", etc. The values of the numeralsare used to derive the links of the cross-group
structure that relates the text with the set of figures.
To find the cross-group structure for a specificfigure and its scope in the text, the tokenized text of
each label in the figure is searched for in the
corresponding text.
Characteristics of the labels:
The labels in the figure consist of multiple words
The text in both the label and the associated text
The text of the labels does not necessarily appear
in the same order and with the exact words in the
text
Finally identify whether the superscript is part of atextual part of a document, formula, footnotes also
have to be incorporated in the classification of
logical basic objects. As no semantic linking is
considered there is no remaining cross-referencestructure.
III. COMPOSITIONS WITH VARIOUS
ENTRY POINTSModels MOAP, I-HTSPN and Madeus allow a
composition as an end point of a relationship but
not a component inside a composition. Different
entry points in a composition are desirable becausethey allow different presentations of nodes that are
recursively contained in the composition. NCM isan example of a model that allows such facility
since a link can go into nested compositions asspecified by the node list of end point of the link.
In Fig. 2 the presentation of composition C2 can be
started through links l1 or l3, coming from other
parts of the document. When C2 starts through link l1, nodes V1 (video), A1 (background audio) and
A2 (voice node) must start at the same time. If C2
starts through link l3, nodes V1 and A2 must startat the same time without the background audio.
Therefore the presentation depends on the external
context that is on the navigation that led to
presentation of the composite node.
Fig. 2. Hypermedia document
IV. A PROTOCOL ALONE IS INSUFFICIENT
Most systems designer have developed their own
proprietary protocols for communicating with link server and further involve a major re-
implementation to rewrite the system to find out
some new standard protocol. Davis[2] suggested
that the difference between system protocols could be resolved if each system produced a protocol
shim which would reside between the application
and the link server as shown in figure 3.Anderson[1] offers a critique of Hypertext Protocoland makes pragmatic recommendations for
improving syntax and semantics.
Fig. 3: Hypermedia Protocol architecture
The aim of the Hypermedia Protocol initiative is toenrich the user's environment by integrating third
party applications with existing link services. It will
not reduce the effectiveness of link services by
7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation
http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 4/6
International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 633
rendering the functionality of these associated tools
inaccessible to the end user.To overcome this problem there is general
agreement that some form of runtime on the user'smachine is necessary, the further model is shown in
figure 4. It uses a Java virtual machine[4] and develop a framework to allow additional tools and
functionality to be dynamically downloaded to the
user's machine. The protocol shim functionality will
be incorporated within the runtime component.
Fig. 4: Introduced runtime component
A further requirement identified and it is amultimedia document/object management which
open to allow developers to utilize third party product shown in figure 5 and useful in providingdirection for the Hypermedia Protocol initiative
and enhance both current and future developments
in the field of hypermedia.
Fig. 5 Reference architecture
V. RUNTIME INVENTION
It act as a mediator between the viewers and thelink server. Following are various approaches to
provide a runtime component which offer the rich
set of presentation, authoring, navigation and hypermedia link service tools.
Implementing new Runtime: Implement theruntime and client-side hypermedia tools from
scratch, it signifies a complete re-invention. It
involve an unreasonable amount of effort. It is platform dependent but only one implementation
per platform. It provide a consistent user
interface across the platforms.
Virtual Machine: It allow minimal runtime
component in a byte-code interpreted languageand extremely versatile. The user can incorporate
any custom written tools with the runtime tosupplement those provided by the link server. It
offers great flexibility and zero administrationclient, each link server must assume that the
runtime component has no local hypermedia
tools of its own and should therefore offer to provide them. It demands a complete re-invention
for each different link server, as it supply its own
client-side hypermedia tools the problem of
interface inconsistency may occur. Additional penalty also incurred each time while a new tool
is dynamically downloaded prior to usage.
Reusing Existing Hypermedia Systems asRuntimes: This strategy promotes the wholesalere-use of existing and familiar client-side
hypermedia tools which sufficiently open to
integrate and combine the previous approaches. Itallow to the developer and user a complete
freedom over their choice of runtime which
would be the favorite hypermedia system. Thisapproach is designed to accommodate their
differences and allow them to co-exist and allow
a hypermedia system with its own set of
proprietary viewers to utilize third party remotelink service. A full definition of the essential
components and protocols is required to achievethis.
Allowing a hypermedia system to act as a runtimecomponent within the model means the
hypermedia system can augment locally provided
link services with those of a remote link service. If a runtime is represented by a hypermedia system
with a link service then there is no reason why the
runtime cannot also act as a link service. If a link
service is represented by a hypermedia system, thenthere is no reason why the link service cannot also
act as a runtime. This confuses the distinction between the two entities as a client runtime can
masquerade as a link server and a link server canmasquerade as a client runtime. Due to this dual
role, greater scope for configuration is possible.
VI. HYPERMEDIA REFERENCE
ARCHITECTUREThe protocols required to connect the componentsand then present a reference architecture for
hypermedia so that individual components can be
discuss their role within the architecture. The protocol allows the developers of each component
to have the choice as to which aspects of the
reference architecture they wish to adopt and
pattern of interaction that each of the protocols isdefine. Following are related protocols:
7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation
http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 5/6
International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 634
Viewer Protocol: It has identical purpose to that
of the Hypermedia Protocol where it will enablethird party applications to communicate with the
runtime component. Following are issues thatneed to be addressed:
o Ratification of the way in which viewers candetermine the hypermedia and collaboration
services available
o The adoption of a sufficiently open and versatile specification of location specifiers.
Hypermedia Protocol: It provide an interface
for communicating with a link server. Following
are issues that need to be addressed:o Ratification of the way in which link servers
advertise the services they offer
o The adoption of a sufficiently versatilespecification of location specifiers
o Provision of locking for hypermedia objects
Collaboration Service Protocol: The systems
DHM[5], HyperDisco[19], SP3[9] and Sepia[17] provide support for collaboration among users.
By incorporating an additional component, many
of the common services necessary to supportcollaborative working practices can be provided.
Following are issues that need to be addressed:
o Support for tight and loose modes of
collaborationo Interaction with Document Management
System to provide object locking
o Event notification subscription/unsubscription
and deliveryo Interaction with Link Service and Document
Management System to support versioning
Document Management Service Protocol:Open Document Management API(ODMA)[10]
defines a common interface to commercial
document management systems and promoteinteroperability. This standard also addresses
issues like heterogeneity, unique and portabledocument identifiers. The ODMA standard has
no mention of support for the streaming of multimedia objects. Where as DHM[5],
HyperDisco[19] and SP3[9] provide proprietary
solutions for document management and versioncontrol. Following are issues that need to beaddressed:
o Globally unique and portable document
naming schemeo Add, remove and modify documents
o Document retrieval
o Support for versioning
o Document locking
VII. CONCLUSIONFor the layout and logical analysis, one page of each class is used for optimizing the parameters
that were not fixed beforehand like sh, ov, sw and
oh used in grouping of segments and used indefining columns. For optimizing the parameters
used in detection of text labels in the figure theselected figure page is used. The figure contains
both parentheses and dashes in the textual labels.The first structure is the hierarchical structure. The
cross-group structure between the set of figures and the text i.e., all references to figures are found correctly. In the logical classification of the content
of figures identifying the titles and notes no errors
are made by the system.
Model provide a reference architecture for theintegration of differing hypermedia systems in a
powerful, distributed and collaborative framework.
Different alternative strategies for achieving this
end are described. This allows users to continue toenjoy the rich functionality of existing and familiar
client-side hypermedia tools available withinchosen hypermedia system.Without prior agreement upon the clear roles of the
architectural components, a unilateral attempt at
defining any of the four protocols identified by the
authors would be non-productive and as such theseremain undefined. If a reference architecture can
help guide the way towards the global integration
of hypermedia systems, then the researchcommunity can look forward to exploring emerging
technologies and their potential for easing the non-
trivial task of distributed information management.
REFERENCES
[1] Anderson, K. M., A Critique of the OpenHypermedia Protocol. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, TechnicalReport CIT-SR-97-01, pp1-4, April1997.http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/anderson.html.
[2] Davis H. C., Lewis, A.J. and Rizk, A., OHP: A DraftProposal for an Open Hypermedia Protocol, In TheProceedings of the 2nd Workshop on OpenHypermedia Systems, Technical Report UCI-ICS 96-10. http://www.daimi.aau.dk/~kock/OHS-
HT96/Documents/ohp.html.[3] G. Salton. Another look at automatic text-retrieval
systems. Communications of the ACM,29(7):648{656, 1986.
[4] Gosling, J. and McGinton, H., The Java LanguageEnvironment: A White Paper,1995. http://java.sun.com/whitePaper/java-whitepaper-1.html.
[5] Grønbæk, K. and Trigg, R. H., Design Issues for aDexter-Based Hypermedia System. In Proceedings of the ACM Hypertext '92 Conference, Milano, Italy, pp191-200, November 1992.
[6] Grønbæk, K. and Trigg, R. H., Toward a Dexter- based Model for Open Hypermedia: Unifying
7/27/2019 Hypermedia Structure : Document Composition and Migration Path for Rich Set of Presentation
http://slidepdf.com/reader/full/hypermedia-structure-document-composition-and-migration-path-for-rich-set 6/6
International Journal of Computer Trends and Technology (IJCTT) - volume4Issue4 –April 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 635
Embedded References and Link Objects.In Proceedings of the ACM Hypertext '96 Conference,Washington D.C., pp149-160, March 1996.
[7] Grønbæk, K. and Wiil, U. K., Towards a Reference
Architecture for Open Hypermedia. In Proceedings of the 3rd Workshop on Open HypermediaSystems, Technical Report CIT-SR-97-01, pp31-38,April 1997. http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/gronbak.html
[8] Halasz, F. G. and Schwartz, M., The Dexter Hypertext Reference Model. In Communications of theACM, 37(2), pp30-39, February 1994.
[9] Leggett, J. J. and Schnase, J. L., Dexter With Open
Eyes. In Communications of the ACM, 37(2), pp77-86,February 1994.
[10] ODMA Association of Information and ImageManagement (AIIM). http://www.aiim.org/odma.
[11] Reich, S., How OHP's LocSpecs Could BenefitFrom ISO/IEC 10744. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, TechnicalReport CIT-SR-97-01, pp54-59, April
1997.http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/reich.ps.
[12] R.O. Duda and P.E. Hart. Pattern classi_cation and scene analysis. Wiley, 1973.
[13] R.N. Jugele and V.N. Chavan,“ODA : ProcessingModel Design for Linking Document”, InternationalJournal Of Engineering And Computer Science,ISSN:2319-7242, Vol 2. - , issue 3, March - 2013, pp.
806-810.[14] R.N. Jugele and V.N. Chavan,“ODA: A Study of
Document Design", International Journal of EmergingTrends & Technology in Computer Science
(IJETTCS), ISSN:2278-6856,Vol 2 , issue 1, Jan-Feb- 2013, pp. 194-198.
[15] Rutledge, L. and Hardman, L. Applying the HyTimeModel to the Open Hypermedia Protocol.In Proceedings of the 3rd Workshop on OpenHypermedia Systems, Technical Report CIT-SR-97-01, pp63-65, April1997. http://www.daimi.aau.dk/~kock/OHS-HT97/Papers/rutledge.html
[16] Sterbye, K. and Wiil, U. K., The Flag Taxonomy of Open Hypermedia Systems. In Proceedings of theACM Hypertext '96 Conference, Washington D.C., pp129-139, March 1996.
[17] Streitz, N. and Haake, J. and Hannemann, J. and Lemke, A. and Schuler, W. and Schütt, H. and
Thüring, M., SEPIA: A Cooperative HypermediaAuthoring Environment. In Hypertext: Concepts,Systems and Applications, Proceedings of theHypertext '90 Conference, INRIA, France, pp11-22, November 1990.
[18] S. Tsujimoto and H. Asada. Major components of a
complete text reading system. Proceedings of theIEEE, 80(7):1133{1149, 1992.
[19] Wiil, U. K. and Leggett, J. J., The HyperDiscoApproach to Open Hypermedia Systems.In Proceedings of the ACM Hypertext '96 Conference,Washington D.C. , pp140-148, March 1996.
Books :
01. Principles of Multimedia By. Ranjan Parekh
Tata McGraw Hill Companies.
02. Hypertext and Hypermedia By. J. Nielsen
Academic Press.