+ All Categories
Home > Documents > Chapter 10. Integration & Interoperability:...

Chapter 10. Integration & Interoperability:...

Date post: 22-Jun-2018
Category:
Upload: doanliem
View: 222 times
Download: 0 times
Share this document with a friend
27
Chapter 10: Combining Descriptions Last revised: October 17, 2010 ‐1‐ Chapter 10. Integration & Interoperability: Combining Descriptions Chapter author: Karen Joy Nomorosa, J. J. M. Ekaterin Table of Contents 10.1 Introduction ............................................................................................................................................. 1 10.1.1 Combining Information – A Framework................................................................................................ 2 10.1.2 Evolving Information Systems to Combine Descriptions ............................................................... 4 10.1.3 Planning for Interoperability ..................................................................................................................... 5 10.2 The Context of Combining Descriptions ........................................................................................ 6 10.2.1 Business‐to‐Business Interoperability ................................................................................................... 6 10.2.2 Health Information Systems ....................................................................................................................... 7 10.2.3 Government ....................................................................................................................................................... 8 10.2.4 Education .......................................................................................................................................................... 10 10.2.5 Digital Museums ............................................................................................................................................ 11 10. 3 Combining Descriptions during DesignTime .......................................................................... 12 10.3.1 Technical and Syntactic Compatibility ................................................................................................. 12 10.3.2 Structural Compatibility ............................................................................................................................. 16 10.3.3 Content Compatibility ................................................................................................................................. 17 10.3.4 Semantic Compatibility............................................................................................................................... 18 10.4 Combining Descriptions during RunTime ................................................................................. 18 10.4.1 Data Mapping .................................................................................................................................................. 19 10.4.2 Metadata Mapping/ Crosswalks ............................................................................................................. 20 10.5 NonTechnical Considerations for Interoperability ................................................................ 21 10.5.1 Socio‐political (Inter‐organizational) ................................................................................................... 21 10.5.2 Organizational ................................................................................................................................................ 23 10.1 Introduction You need to buy a coat and a friend recommends an online store, Shopstyle.com, that specializes in women’s clothing. You follow the “clothing” link and see a set of categories that include “accessories,” “sportswear,” “swimwear,” “outerwear,” and others. You correctly guess that coats are a subcategory of “outerwear,” and then watch as hundreds of different coat choices appear. You see a particular one you like and you hover over it with your mouse to get more details. But if you click on your selected coat, Shopstyle then redirects you to another online shopping site—Nordstrom.com, Neiman Marcus, Lord & Taylor, and so on— where the coat is actually offered for sale. It is only then that you realize that every item shown in Shopstyle.com actually comes from different online retailers, with Shopstyle.com serving as the aggregator of different catalogs from over 250 online stores. More importantly, Shopstyle.com provides you a seamless shopping experience, as the site allows you to feel like you are shopping in a single web store instead of shopping in many different stores. Combining information from many different sources as Shopstyle.com does poses numerous strategy, design, and implementation challenges as producers of that information often
Transcript
Page 1: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 1 ‐ 

Chapter 10. Integration & Interoperability: Combining Descriptions Chapter author: Karen Joy Nomorosa, J. J. M. Ekaterin Table of Contents 

 10.1  Introduction ............................................................................................................................................. 1 10.1.1 Combining Information – A Framework ................................................................................................ 2 10.1.2 Evolving Information Systems to Combine Descriptions ............................................................... 4 10.1.3 Planning for Interoperability ..................................................................................................................... 5 

10.2  The Context of Combining Descriptions ........................................................................................ 6 10.2.1 Business‐to‐Business Interoperability ................................................................................................... 6 10.2.2 Health Information Systems ....................................................................................................................... 7 10.2.3 Government ....................................................................................................................................................... 8 10.2.4 Education .......................................................................................................................................................... 10 10.2.5 Digital Museums ............................................................................................................................................ 11 

10. 3 Combining Descriptions during Design‐Time .......................................................................... 12 10.3.1 Technical and Syntactic Compatibility ................................................................................................. 12 10.3.2 Structural Compatibility ............................................................................................................................. 16 10.3.3 Content Compatibility ................................................................................................................................. 17 10.3.4 Semantic Compatibility ............................................................................................................................... 18 

10.4 Combining Descriptions during Run‐Time ................................................................................. 18 10.4.1 Data Mapping .................................................................................................................................................. 19 10.4.2 Metadata Mapping/ Crosswalks ............................................................................................................. 20 

10.5 Non‐Technical Considerations for Interoperability ................................................................ 21 10.5.1 Socio‐political (Inter‐organizational) ................................................................................................... 21 10.5.2 Organizational ................................................................................................................................................ 23 

 

10.1 Introduction You need to buy a coat and a friend recommends an online store, Shopstyle.com, that specializes in women’s clothing. You follow the “clothing” link and see a set of categories that include “accessories,” “sportswear,” “swimwear,” “outerwear,” and others. You correctly guess that coats are a subcategory of “outerwear,” and then watch as hundreds of different coat choices appear. You see a particular one you like and you hover over it with your mouse to get more details.

But if you click on your selected coat, Shopstyle then redirects you to another online shopping site—Nordstrom.com, Neiman Marcus, Lord & Taylor, and so on— where the coat is actually offered for sale. It is only then that you realize that every item shown in Shopstyle.com actually comes from different online retailers, with Shopstyle.com serving as the aggregator of different catalogs from over 250 online stores. More importantly, Shopstyle.com provides you a seamless shopping experience, as the site allows you to feel like you are shopping in a single web store instead of shopping in many different stores. Combining information from many different sources as Shopstyle.com does poses numerous strategy, design, and implementation challenges as producers of that information often

Page 2: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 2 ‐ 

use different identifiers, descriptors, or classifications for their items. The coat that you just looked at, for instance, may be included in the “outerwear” class or category by one service provider but included in “jackets” by another. Furthermore, different service providers have different information policies and processes and use different kinds of technologies. Shopstyle.com would have to work through these considerations in order to successfully obtain and combine information to offer the services available on Shopstyle. So, how does an aggregator like Shopstyle reconcile these differences? Can Shopstyle impose information specifications for items, orders, or customer information on the partner stores and expect them to comply? Or does Shopstyle make allowances for the different specifications that each of the partner stores is sure to have? What could Shopstyle do in order to encourage other stores to partner with them and provide them with the information necessary to be included in Shopstyle searches? How would Shopstyle or the partner store convert its information to make it useful and meaningful to all? Finally, how would the information management strategy of each of the entities change in order to accommodate each other? The challenges faced by Shopstyle in combining information are not faced by Internet retailers. Companies and governmental agencies large and small, all over the world, face them when they create “composite” or “extended” applications by combining or integrating information sources and services of their own or from independent parties. This chapter will discuss the concepts, strategies and technologies needed to meet these challenges. 10.1.1 Combining Information – A Framework Creating an aggregated catalog by bringing related things together isn’t just a technical challenge. It involves much more than reconciling the application interfaces and protocols that enable the exchange of information. Developing a strategy for combining information means dealing with design issues that relate to the information itself, the organization or party that handles it, and the intended end consumer. Each of these factors introduces different design challenges. Figure 10.1 – Balancing Parties in Business Interoperability

  Product information, for instance, may be unstructured or structured; textual or composed

of multiple media; physical, digital or both. In an online store such as Shopstyle, for instance, an item minimally consists of a photo, some narrative description, and some specific information about price and item code. Some items are augmented with a list of features or options like size, style, and color. The nature and number of these characteristics determine how easily the information can be aggregated and exchanged as well as the most ideal technology available for

Page 3: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 3 ‐ 

these processes. It also has implications for processing speed and, of course, the tradeoff between information organization and information retrieval.

The nature of the party managing the information, on the other hand, can affect the less technical aspects of combining information. This ranges from the individual, as in the case of personal information management, to intra-organizational, such as enterprise information management, to inter-organizational, such as when different companies need to communicate with each other. At a basic level, this may affect how information components are described. As the scope of the organizing system becomes more complex, and information needs to be managed across more systems and sources, it is imperative that the parties reach agreements on information structure and description. The transaction volume also affects the strategy for combining information. Information that is needed or processed only a once a month, for instance, could be handled manually or by a less scalable system than information that is used thousands of times in a month. There are cost and efficiency tradeoffs to be considered when thinking about both scenarios. Finally, how the information is consumed and who consumes it also affects the strategy for combining descriptions. People within an enterprise consume information in the form of emails and company documents. External entities like partner companies and customers also consume information from that company, but often for different business processes than those that produced it. This has implications in terms of how much information is exposed, the level of data integrity required, and the level of precision and accuracy in terms of information retrieval. Information can be used and consumed in a variety of ways, such as within one delivery channel or across channels. For instance, you can go online to check movie schedules and buy tickets, or go to the movie theater to check the schedule and buy tickets. Any action or change in information in any channel has repercussions for some other information components, such as the number of seats still available for a particular movie showing. This opens up issues on integrating information across different channels – more specifically how much of it is to be done – in order to ensure a predictable user experience. Information can also be used across a variety of devices or technology platforms. When planning that trip to the movies, a user can check for the movie schedule on a computer or smart phone, send a text message inquiry to an SMS-based information service, or open up a newspaper and see what information the theater has provided to the print publication. This has a variety of implications for technology and how information can easily and more efficiently be used and reused between different systems and platforms with varying speeds, processing capabilities and network connections. In particular, the degree to which the information sources are available in a processable digital form with descriptive metadata determines how their combination or integration takes place. Ideally, the integration or combination of the information from these different sources takes place in a web portal, a composite application, or a “mash-up” as an automated mechanical process rather than only “in the mind of the user” as a perceptual and cognitive one. In this chapter we use the three factors in Figure 10.1 to take a close look at the goal of bringing related things together and analyzing them in terms of combining the descriptions of information items, classes, and collections. This more abstract approach unifies what might otherwise be seen as a hodgepodge of techniques and illustrates the tradeoffs implicit in choices and the extent of descriptions when related things need to be brought together from one or more organizing systems.

Page 4: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 4 ‐ 

10.1.2 Evolving Information Systems to Combine Descriptions Advances in technology have enabled the use of software applications and networked systems to manage, process and exchange information. While such advances have enabled more automated and efficient information transactions, they have also enabled businesses and the processes they carry out to grow more complex, which in turn makes the software applications even more complex. Thus it is no surprise combining multiple IT systems to enable intra- and inter-enterprise information processes involves substantial effort.

The traditional approach to enabling heterogeneous IT systems together was to fully integrate, which is to allow the “unrestricted sharing of data and business processes among any connected applications and data sources” in the organization [Linthicum proper citation needed]. This can be a strategic approach to improving the management of content and of IT resources, especially when the enterprise has disparate applications and redundant information spread across different groups and departments. However, it can also be a costly approach as integration points may be numerous, with vastly different technologies needed to get one application to integrate with another. Maintenance also becomes an issue—changes in one system may entail changes in all systems integrating with it.

Allowing unrestricted access to data and business processes also becomes a problem when working across enterprises, especially when it comes to information security. Full integration between two companies, for instance, may expose business intelligence and information that should be kept private. This type of exposure is too much for most businesses, regardless of whether the relationship with the other business is collaborative or competitive. Security issues really come into play when collaborating companies need to access private networks and secure servers.

This heterogeneity in supporting IT systems along with the need to quickly evolve with the rapid changes in the firm’s competitive and collaborative environment has driven an evolution from more vertical, isolated structures to a more loosely coupled, ecosystem paradigm [see e.g., Lamoureaux, Raff, & Temin, 2003; Powell, 1990]. This has led to more componentized and modularized systems that need only to exchange information without concern for underlying processes.

The emerging paradigm, then, is to enable independent systems to interoperate, or to have “the ability of two or more systems or components to exchange information and to use the information that has been exchanged.” [IEEE definition cite]. Because the focus is in the exchange of information, independent systems need not know other systems’ underlying logic, or even how they store and organize information. What is important is knowing what kind of information is expected and in what format, and what kind of information is returned and in what format.

This strategic approach to information sharing allows systems to remain highly independent of each other. Changes in one system need not necessarily affect how other systems work as long as the information that is sent and received through an interface stays the same. This allows greater adaptability, as changes to system logic or business processes can be done in self-contained modules without necessarily affecting others.

Page 5: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 5 ‐ 

There are several approaches to planning for interoperability between systems. A tightly coupled approach is when different collaborating organizations agree on the semantics, technology and structure while designing a system that enables them to exchange similarly formatted information directly with one another. This approach means, however, that changes in the sending system could necessitate changes in the receiving system. A looser approach to interoperability may include a gateway interface between systems that transforms one piece of information as it receives it into something that the receiving system expects and is able to accept automatically. This enables the systems to remain independent of another, with the gateway doing the necessary information transformation. 10.1.3 Planning for Interoperability Planning for interoperability does not only involve creating programmatic interfaces and setting up networks to enable the exchange of information. In fact, most of the work involves preparing the content to enable easier sharing across systems. From an information organization perspective, information should be stored in an intelligent way—that is, with the appropriate structure and metadata—in order to make its retrieval easier. If all of a firm’s information were stored as unstructured text, it would take significant processing power to parse and interpret the text each time it was needed. It would be more sensible to create the text in a more structure manner along with descriptive metadata that would make retrieval more precise and efficient. The digitization of different information types also serves to emphasize the importance of storing an organization’s information in an intelligent way. Previous discussions on the semantic gap and multimedia information retrieval show that retrieving audio files, video and images is not as straightforward as searching for text (which in itself poses a lot of issues for increasing precision and accuracy). Attaching detailed descriptions to these types of information to let users know both low-level and conceptual details about the resource helps in increasing the precision and accuracy of retrieval. Metadata, then, plays an integral role in interoperability involving multimedia information. Planning for interoperability goes beyond giving content creators and users the ability to add, edit and use descriptions. It means ensuring that everyone has a common understanding of what descriptions mean and how descriptions are to be used. It is also important to stress the necessity of accurate information creation and capture, and business processes should be put in place to ensure this happens. As different parties begin to collaborate with each other, necessitating interoperability between their systems, the larger industry or business ecosystem to which they belong needs to be considered. Decisions have to be made about adopting standards to increase interoperability among different entities and to enable new participants in the same ecosystem to more easily share information with the existing ones. Power imbalances between departments and firms will also play a part in determining which standards and technologies to adopt in order to continue these collaborative relationships.

This chapter on combining descriptions will explore the various issues involved in planning for interoperability. Section 10.2 will delve deeper into the context of interoperability: how are different sectors and industries applying the concepts of combining descriptions? What are the costs and benefits of doing so? The discussion then moves to the intricacies of combining descriptions in section 10.3, which covers implementing interoperability both at design-time and

Page 6: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 6 ‐ 

at run-time, at varying levels of consideration. The technical discussions revolve around making interoperability work in a more automated setting but the high level conceptual considerations vital in any setting requiring combining information are paramount to the discussion. Finally, 10.4 goes into more detail about the environmental considerations to think about when planning for interoperability, as many concerns extend beyond issues relating the information itself and enter the realm of organizational and social politics.

10.2 The Context of Combining Descriptions Because combining information from different sources and systems varies depending on the type of information, the party managing and exchanging that information, and how it is consumed, it’s informative to look at different implementations with varying levels of informational, organizational and consumption considerations. This section will take a look at how different enterprises in different sectors have implemented systems to aid in combining descriptions. We examine not only implementation successes, but also outstanding issues and challenges that demonstrate the need for interoperability and better ways of managing interoperability. 10.2.1 Business-to-Business Interoperability Walmart, a retail store founded in 1962 in Rogers, Arkansas, has since grown to be the largest retailer in the United States. With over 8,400 stores in 15 countries, it has annual sales of over $405 billion With a corporate mission of driving down the cost of living for everyone, Walmart works to streamline its operations and maximize business intelligence in order to have a supply chain that brings products that customers want to the store when they are needed. This lowers costs in terms of distribution and inventory management and at the same meets customer expectations. Since at least the early 2000s, Walmart has been considered the thought leader in using information to streamline its operations as well as direct its corporate strategy to compete more successfully with other retailers [Sullivan, Wailgum citation citation citation]. It has pushed for the use of technology such as the bar code [Wilbert] and RFID [Sullivan] and required suppliers to comply with company-set standards. At Walmart’s end, a standard barcode and a standard RFID implementation each store to keep track of the products that they sell, regardless of the particular manufacturer or supplier, and this information is easily aggregated at the headquarters level. Walmart requires that all information exchanges use an Electronic Data Interchange (EDI) format using a data model that they specify. Not surprisingly, this retail giant — the entity with greater economic power in the business relationship — dictates how information is to be described, how to encode that information into an acceptable format, and what protocols and technology will be used to send that information. Small suppliers who need Walmart’s business adjust their IT systems and business processes to conform to the specifications set out by the bigger company. If small companies want their products on Walmart’s shelves, they have to play by Walmart’s rules. In contrast, Procter & Gamble (P&G), a major manufacturer of consumer goods almost as large as Walmart, has a more collaborative relationship with WalMart and the two firms interchange much more supply chain information than WalMart does with small suppliers. P&G

Page 7: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 7 ‐ 

shares consumer information and habits to Walmart, while and Walmart provides chain information, inventory and sales information to P&G. The two firms have developed a joint scorecard that measures success, and developed a common understanding of what the consumer wants, leading to increased sales of P&G products in Walmart stores. [Grean] Key here is the word “common.” P&G and Walmart were not only able to exchange information technically, but they were also able to transform their individual data and business knowledge into something that they could both understand and appreciate. By having that common understanding, they were able to make joint decisions to improve sales and profits for both companies. 10.2.2 Health Information Systems A typical person sees many different doctors over the course of his or her lifetime. Perhaps this person went to a pediatrician when he was younger, a family doctor or general practitioner when he came into adulthood, some specialists for specific illnesses, and an orthopedic surgeon for some knee trouble. All of these doctors keep records about this person, such as tests ordered or treatments received. So when this person sees a new doctor, or wants to access his own medical records, there can be a truly life-long trail of information to trace.

The health care industry provides a very interesting space to explore in terms of combining information because of the number and complexity of stakeholders and information types. Patients, hospitals, clinics, doctors, laboratories, government agencies, research facilities, insurance companies, health IT vendors all hold information that may be helpful and usable to another, and could potentially help provide a complete health service experience to the patient. The example of our patient and his different doctors raises four key issues. First, patient information is scattered among different health care providers [and even perhaps the patient himself?], increasing the difficulty of accessing health information when it’s needed. This makes personal health management, where people have access to their health information in order to manage their personal health, very difficult. Furthermore, when medical records are in paper form — as all medical records were for decades, and many still are —it is much more difficult to recognize patterns in a patient’s health, such as chronic illnesses or adverse reactions to drugs. This presents barriers to better health care delivery: Information can be lost within one person’s medical history, and it can be harder to match a patient’s records with the latest medical research. While there has been significant progress in health research for new and improved treatments, more progress can be made if medical information can be aggregated and used in the research process. [being nitpicky, but wouldn’t there also be privacy issues here? Is there a way to frame the issue where it doesn’t sound quite as much like advocating opening up everyone’s medical records to researchers?] Treatments can also be improved if there is better access to research findings, and easier ways to match existing cases with potential cures and treatments. Finally, public health can also be improved if there was an easier way to aggregate [anonymized?] health information for entire communities as well as a quicker way to share health information across different entities. [Detmer] I count three issues here: personal health management, better health care delivery, and public health. What’s the fourth? Medical records can be exchanged by mailing, emailing, and faxing. However, this does not solve the problem that the health records remain unstructured and in a form that cannot be easily aggregated or retrieved. The US government has pushed to move paper health records into electronic form [ref], but there still are myriad health IT systems providers with different

Page 8: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 8 ‐ 

models and processes to store and manage information, which makes information exchange difficult. One way to address this problem is to have interfaces that translate between formats. Imagine a mapping system that contains how the sending system describes its information and how a receiving system describes its own. The interface then transforms the information being sent by mapping the contents into the format that the receiver expects. This becomes an issue when dealing with multiple systems, as the amount of interfaces needed to map from one system to another increases exponentially as more systems are involved in the exchange. [Walker] Another way is for software creators to agree on a specific format that can be readily exchanged across systems. To advance the government’s initiative to create interoperable electronic health records, the Health Information Technology Standards Panel (HITSP) was created. Consisting of partners from both the public and private sectors, with vendors, health care providers, and standards-making bodies included, the panel aims to create a widely accepted set of standards that will be used by different health software applications. These standards outline the data models, technical implementations, user roles, security and privacy standards, and process guidelines that software vendors should use in order to successfully interoperate with other systems [HITSP Harmonization Framework]. The health industry provides very fertile ground for discussions on combining information. Taken in the context of our interoperability framework, we see that it belongs to the most complex of classifications. Medical information remains partly in unstructured, physical form, although there has been great effort to move it to a more structured, electronic format. Health information can be found with the individual as well as with doctors, hospitals, laboratories, clinics – a set of heterogeneous organizations with different processes, goals, and information needs – all needing to work together to deliver quality health care. All of these entities not only input health information but also consume the information and use it to inform decisions and analyses. However, their processes and information needs in their consumption also vary a great deal. There remain many technical and non-technical challenges to be faced in aiming for a comprehensive view of a person’s health information. On the technical side, common data models, security and privacy technologies, network issues, and the like need to be discussed as different systems have different implementations. Use cases for health information need to be discussed in order to arrive at a model that will be optimized for all parties involved. On the organizational side, organizations with competing interests need to agree on something that may very well decrease their competitive advantage. Organizations would also need to bear the cost of migrating to these interoperable systems. These different considerations and politics need to be managed if the initiative to make systems interoperable is to be successful. 10.2.3 Government To leverage the ubiquity of Internet and technology as well as the availability of vast amounts of information, governments around the world, as early as the 1990s, began planning to move their administrations into the digital age by implementing electronic government, or e-government [Guijarro, 2006]. E-government refers to the ability to deliver government services through electronic means. These services can range from government-to-citizen, government-to-business, government-to-employees, government-to-government, and vice-versa [Scholl, 2007].

Page 9: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 9 ‐ 

Applications of e-government may range from a government unit providing a portal where citizens can apply for a driver’s license or file their taxes, to more complex implementations such as allowing different government agencies to share certain pertinent information with one another. For example, by providing information on driver’s license holders to the multiple agencies such as the police.

Because the government interacts with heterogeneous entities and their various systems, e-government planners must consider how to integrate and interoperate with different systems and data models. Countries belonging to the Organization for Economic Cooperation and Development (OECD) have continuously refined their strategies for e-government. The United States as well as members of the European Union, including the United Kingdom, France, Germany, and Denmark, have all been developing frameworks that describe technical and organizational structures for interoperability. At the technical level, these frameworks describe choices such as communications technology to be used, network access and connectivity for information exchange, and data formats and standards to use. Semantic interoperability is also considered, focusing on topics such as data models, ontologies, and other forms of knowledge representation. On the organizational level, topics like how different groups and entities internal and external to the organization will interact and to what extent are considered. [Guijarro, 2006; Klischewski, 2004]

An example of a highly successful implementation of a business-to-government implementation is the use of the Universal Business Language (UBL) by the government of Denmark. UBL is a “royalty-free library of standard electronic XML business documents such as purchase orders and invoices” [OASIS UBL TC]. The government of Denmark adapted these standards to their local needs and mandated that organizations wanting to do business with the government use these formats for invoicing. By automating the matching process between an electronic order and an electronic invoice, the government expects total potential savings of about 160 million Euros per year [Brun, Brown and Lohde 2005], thus highlighting the need for a standard format by which businesses can send in orders and invoices electronically.

Recognizing that its position as a government entails that all types of suppliers, big or small, must have equal opportunity to sell products and services, the government of Denmark not only set data format standards, it also gave several options by which information can be exchanged. For example, to keep costs down for smaller suppliers who would have a tougher time switching to the new system, paper-based invoices would be sent to scanning agencies that would scan and create electronic versions to be submitted to the government.

The complex nature of government, whether citizen- or customer-facing or within its many different departments and organizations, and its mandate to accommodate all of these different entities presents interesting challenges in terms of our interoperability framework. Having to cater to users of different types and means implies that services need to be accessible through multiple channels, whether manual, automatic, online or mobile. The information that is collected is used and exchanged within and across government organizations, with very specific policies governing the management of that information. As such, collecting and sharing user information becomes not only a technical challenge, but one fraught with political and social issues.

Page 10: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 10 ‐ 

10.2.4 Education Advances in technology have changed people’s attitude toward learning and how schools enable learning [Citation Needed]. It has allowed access to a vast collection of digital educational resources that professors and students can use to facilitate and augment learning. Libraries subscribe to digital versions of journal articles and make available electronic copies of books. Publishers offer supplementary materials to textbooks such as electronic learning assessments and related resources. Professors increasingly use learning management software to consolidate resources in their courses and present them in a consistent and cohesive manner to students. Social media has changed the way students and professors interact, extending classroom exchanges into online discussions, forums, and blogs. Considering that education in this context consists of more than consuming content and extends into how that content can be integrated into an environment that maximizes learning [Littlejohn], content is usually redesigned and reformatted to fit into different learning environments. McGraw-Hill, for instance, admits to maintaining as many as 25 versions of the same content to suit different learning management systems [Citation Needed]. In addition, different content providers create different supplemental learning tools like blogs, discussion forums, and resource depositories, creating a learning environment that isn’t very cohesive and user-friendly. In order to more easily use and reuse content, as well as have the ability to integrate different learning tools into a single learning management system, the IMS Global Learning Consortium, an organization composed of 140 members from leading educational institutions and education-related companies, has released specifications called Common Cartridge and Learning Tools Interoperability. These specifications provide a common format and guidelines to construct tools and create content that can be easily imported into learning management systems. Common Cartridge specifications give detailed descriptions of the directory structure, metadata and information models associated with a particular learning object. For example, a learning package from McGraw-Hill may contain content from a book, some interactive quizzes, and some multimedia to support the text. Common Cartridge specifies how files would be organized within a directory, how links would be represented, how the package would communicate with a back-end server, how to describe each of the components, and other details like Common Cartridge specifications. This enables a professor or a student using any capable learning management system to import a “cartridge” or learning material and have it appear in a consistent manner with all other learning materials within the Learning Management System (LMS). This means that content providers need not maintain multiple versions of the same content just to conform to the formats of different systems, allowing them to focus their resources on creating more content as opposed to maintaining the ones they already have. Learning Tools Interoperability (LTI), on the other hand, would allow different LMS to access external services and communicate with publisher sites to offer more functionality and content. For instance, a professor can choose for his LMS to include personal blogs for each of students. His current LMS may not support such functionality, but through LTI, he can import that functionality into his LMS [Chen]. Again, the specifications indicate how to model and design a particular tool for it to interoperate with a variety of learning management systems.

Page 11: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 11 ‐ 

Both Common Cartridge and LTI have garnered the support of leading content providers, such as McGraw-Hill, as well as leading LMS developers, such as Sakai and Blackboard Looking at this in the context of the interoperability framework, we see that while information providers are working to build structure for digital materials, the main problem is that users consume the content using competing systems that have their own data formats by which to accept content. Huge publishers, wanting to increase distribution of their product, offered their content in all these different formats. While the specifications that the IMS created refer to the technical considerations in creating content and tools, the process of getting to that point involved a lot of organizational and political discussions. Internally, content and LMS providers need to set aside the necessary resources to re-factor their products to conform to the standards. Externally, competing providers have to collaborate with one another to create the specifications.

In this example, we see how setting and conforming to standards becomes integral in ensuring interoperability, especially in processes or products that conform to a particular design pattern. 10.2.5 Digital Museums Museums are starting to leverage technology and the popularity of Web 2.0 features such as tagging and social networking, in order to attract new audiences [Srinivasan, Boast, Furner, Becvar, 2009]. Increasingly, heterogeneous objects can be searched, accessed, and described across diverse collections. For instance, virtual collections would enable patrons who are unable to visit remote museums to enjoy their resources. On the other hand, patrons who intend to visit museums can direct their searches to locate specific objects that may not be currently on view. Metadata interoperability is also vital to the vibrant lending between collections for specific, jointly curated programs. Heterogeneity in museum data is a result of different cataloging formats and practices, incomplete or partly erroneous data entries, ad hoc semantic taxonomies, and different natural languages used in object descriptions [Hyvönen et. al. 2004]. For inter-museum virtual exhibitions relying on multiple digitized collections, interoperability is especially vital. Recognizing this, institutions such as the Getty Information Institute and the International Committee for Documentation of the International Council of Museums have come together to form standards that would ensure consistency in how museums manage information about their collections [Bower and Roberts, 2001].

A perfect example on how planning for interoperability can massively improve user experience and attract more audiences is MuseumFinland. Implemented in 2004, the MuseumFinland project aims to provide a portal for publishing heterogeneous museum collections on the semantic web. Museum visitors are presented with intelligent, content-based search and browsing services that offer a consolidated view across Finnish museums from the National Museum to the Lahti City Museum [Hyvönen et. al. (2) 2004]. This virtual inter-museum repository of cultural artifacts such as textiles, furniture, and tools, is intended to provide a semantically rich search and browse experience for its users. To enable these goals, MuseumFinland used shared ontologies for cataloging and mapped the variety of existing terms used by different museums onto the shared ontologies. For example, the same ontology of Locations is used by the museums to avoid ambiguity in location naming references for entities like villages, cities, and countries. In this way, objects used or produced at the same locations or

Page 12: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 12 ‐ 

related to the same historical event can be discovered by the users. This results in a substantially richer contextual experience not feasible by conventional virtual museum browsing.

10. 3 Combining Descriptions During Design-Time

Figure 10.2 – Dimensions of Interoperability The task of combining information is much more complex than ensuring that metadata between two systems is named consistently, or that a specific data format is used. It consists of several dimensions that range from the technical aspects of the information exchange to the different information components to be combined to the conceptual model of the information. Having complete agreement in all of these layers would enable the most accurate and efficient exchange of information. But even if agreement on all these different layers is impossible, descriptions can still be combined. Analyzing the specific barriers to combining descriptions may suggest a particular approach that all parties can agree on. One decision that must be made is whether to cope with differences in interfaces and information models at design-time or at run-time. Planning for interoperability during design-time means that entities combining their information need to agree on the different layers of interoperability before the system is designed and built. This would entail all entities choosing a particular technology to perform the exchange and processing of information, agreeing on how the data is to be structured and formatted, and mutually understanding the meaning of each of the information components to be combined. The rest of this section describes each of these different layers in more detail, illustrating how interoperability in different situations can be obtained. 10.3.1 Technical and Syntactic Compatibility At the most basic level, systems that wish to exchange information must have the basic infrastructure to communicate with one another and to speak the same language. This means that participating systems must have set communication protocols, a common data format, and an agreed-upon way of representing information digitally.

Page 13: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 13 ‐ 

Data Encoding Different types of information can be transformed from physical form to digital form in different ways, determining how data is represented, the quality of the resulting data stream, and the amount of space it consumes. It is important to agree on how a particular piece of information will be encoded, as the receiving system needs to interpret and process this information. If the expected and received encoding are different, the result may be misinterpreted or corrupted data on the receiving end.

Text Text is encoded by using corresponding binary representations for each of the characters found in the file. A simple encoding scheme is the American Standard Code for Information Interchange (ASCII), which represents the most common English punctuation marks, alphabet and numbers in 8 bits of information ranging from 0 – 127. However, since this encoding has a very limited character set, other types of encodings that allow for characters from different languages, such as Japanese or Chinese, as well as other special characters such as copyright and trademark symbols, are quickly becoming more popular than ASCII. UTF-8, for instance, is an encoding standard that is able to represent different languages, including Latin letters with diacritics, Greek, Cyrillic, and Hebrew, as well as most characters in common use. It uses 1 to 4 bytes (8-32 bits) of information, but is also backwards-compatible with ASCII. Other common text encoding includes JIS encodings for Japanese characters, Guobiao encodings for Chinese characters, and different ISO standards for different European, Middle Eastern and Asian languages.

Apart from languages, Unix, Macintosh, and Windows machines all represent special control characters, such as the undisplayed character that signifies the end of a line of text, differently. These differences in how text is represented should be considered when doing cross-platform exchanges, or else incorrect information may be interpreted and processed, which may result in unexpected actions or results.

Multimedia Different encoding standards exist for multimedia, such as images, audio, and video. These standards differ in the resulting quality when the information is displayed or processed, the compression algorithms used to make it easier for the piece of information to be transmitted and exchanged, and other such considerations. Again, knowing how a given file is encoded is imperative to ensuring that the applications chosen to exchange and process the information is able to correctly interpret it.

Data Exchange Formats Aside from the way a piece of information is represented, it is also important to consider how the data is structured in order to convey the intended meaning. There are several different data formats that remain popular and are widely used in the exchange of information, including EDI, XML, and JSON.

EDI, or the Electronic Data Interchange, is used to exchange formatted messages between computers or systems. Organizations use this format to conduct business transactions electronically without human intervention, such as in sending and receiving purchase orders or exchange invoice information and such. There are four main standards that have been developed for EDI, including the UN/EDIFACT standard recommended by the UN, ANSI ASC X12

Page 14: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 14 ‐ 

standard widely used in the US, TRADACOMS standard that is widely used in the UK, and the ODETTE standard used in the European automotive industry. These standards include formats for a wide range of business activities, such as shipping notices and fund transfers.

EDI messages are strictly formatted, with the meaning of the information being highly dependent on its position in the document. For instance, a line in an EDI document with

BEG*00*NE*MOG009364501**950910*CSW11096^

corresponds to a line in the X12 standard for Purchase Orders (standard 850). “BEG” specifies the start of a Purchase Order Transaction Set. The asterisk (*) symbol delineates between items in the line, with each value corresponding to a particular field or information component described in the standard. “NE”, for example, corresponds to the Purchase Order Type Code, which in this instance is “New Order”. In this example, the exchange format describing the information being transmitted is not readily available within the document. Instead, parties exchanging the information must agree on these formats beforehand, and need to ensure that the information instance is at the right position within the document so that it can be correctly interpreted by the receiving party.

Extensible Markup Language, or XML, is a data format more verbose than EDI. As seen in Figure 10.3, information exchanged using XML is marked-up with metadata that describe its meaning. Anyone can create an XML schema — the field names and structures used to describe a set of information — but some organizations have strived to create standard ways of describing information using XML. The Organization for the Advancement of Structured Information Standards (OASIS), for instance, produces schemas that are used in e-commerce, government, law, supply chain activities, and the web, among others. The example discussed in 10.2.3 shows how one such standard, the Universal Business Language (UBL), was implemented by the government of Denmark for business-to-government transactions. Using these standards to exchange information ensures that both the sender and receiver use the same tags to describe the information and follow the same structure.

<PurchaseOrder> <POHeader> <TransactionPurpose> OO </TransactionPurpose> <POType> NE </POType> <PONumber> MOG009364501 </PONumber> <PODate> 03-05-2010 </PODate> <CustomerNumber> 950910 </CustomerNumber> <ContractNumber> CSW11096 </ContractNumber> </POHeader> … </PurchaseOrder>

Figure 10.3. Sample XML Document

A lightweight alternative to XML that is increasingly growing in popularity in text-based information exchange is the Javascript Object Notation (JSON) commonly used to exchange data between a server and a web application. Like XML, JSON includes a descriptor or the name of the information being exchanged. Unlike XML, it is a simple name/value pair and uses notations

Page 15: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 15 ‐ 

that have are common to different object-oriented programming languages, with parsers available for a range of different programming languages. As can be seen in Figure 10.4, the resulting JSON document is much smaller compared to its XML counterpart.

{ “POHeader”: { “TransactionPurpose”: “00”, “POType”: “NE”, “PONumber”: “MOG009364501”, “PODate”: “03-05-2010”, “CustomerNumber”: “950910”, “ContractNumber”: “CSW11096” }, “POBody”: …. }

Figure 10.4. Sample JSON Document

These examples show how the same set of elements can be exchanged using EDI, XML, JSON and many other different data exchange formats and how different the resulting documents are. As such, sending and receiving entities should agree upon which data exchange format to use in order to accurately handle the data being exchanged. Communication Protocols While data encoding describes how information is represented, and data exchange formats describe how information is structured, communication protocols refer to how information is exchanged between systems. These protocols dictate how documents are enclosed within messages, and how these messages are transmitted across a network. Communication protocols must consider and describe things such as the format of the message, how errors will be detected and reported, information security, and encryption.

Before the advent of the Internet, a common way to exchange information across distances was the fax machine. Faxing, however, requires human intervention, as the information is received as an image copy of the original. Nowadays, there are a number of communication protocols that are used over networks, including File Transfer Protocol (FTP), Hypertext Transfer Protocol (HTTP) commonly used in the Internet, Post Office Protocol (POP) commonly used for e-mail, and other protocols under the Transmission Control Protocol/Internet Protocol (TCP/IP) suite. Different product manufacturers normally also have more proprietary protocols that they employ, including Apple Computer Protocols Suite and Cisco Protocols. In addition, different types of networks would also have corresponding protocols, including Mobile Wireless Protocols.

Information traveling over networks is typically sent and received in several chunks. Communication protocols disassemble these documents and give each chunk a header that ensures that the information follows the proper route from sender to receiver and can be re-assembled on the other end. As such, entities on both sides of an information transfer must be synchronized in terms of how messages are transported across the network, or else they cannot be properly routed and received.

Page 16: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 16 ‐ 

10.3.2 Structural Compatibility It is insufficient that technical considerations are agreed upon to enable the exchange of information. Entities that wish to settle interoperability issues at design-time must also agree on some of the more conceptual issues of information. One such consideration is how the different information components are structured to create a document. Organizations must agree on the granularity of the information, how these different components relate to one another hierarchically, and how the document is composed. Granularity Granularity refers to the level of detail or precision for a specific information component. For instance, the address of a particular location has different data items, including the number, street name, city, state, country and zip code. A high granularity model might represent address as all these individual components (Figure 10.5), while a low granularity model may aggregate that information into a single address field (Figure 10.6).

<ShippingAddress> <Street>350 5th Ave</Street> <City>New York</City> <State>NY</State> <Country>USA</Country> <ZipCode>10118</ZipCode> </ShippingAddress>

Figure 10.5. High Granularity

<ShippingAddress>350 5th Ave, New York, NY, USA 10118</ShippingAddress>

Figure 10.6. Low Granularity

While it is easy to derive the complete address by aggregating the different information components from the high granularity model, it is not as easy to break down the low granularity model into its individual information components. This does not mean, however, that a high granularity model is always the best choice, especially if the context of use does not require it, as there are corresponding tradeoffs in terms of efficiency and speed in assembling and processing the information.

Composition Two different systems may model the concepts in the same way, however this is not a guarantee that the documents they produce are assembled and composed in a way that is compatible with each other. Consider a shipment order coming from a business client. It is highly possible that one system may create two separate documents, first containing client information, such as the BillTo preferences, and the second containing the shipping information, such as the ShippingAddress. On the other hand, the receiving system may expect that a single Shipping Order document contains all of this information. While both systems are consistent in terms of the syntax, content and semantics of the information being sent, it will still be unable to interoperate as the received document does not have the expected structure.

Page 17: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 17 ‐ 

10.3.3 Content Compatibility While organizations exchanging information may agree on the structure and composition of documents, the actual content of the documents being exchanged also needs to be considered. The way information is described and formatted need to be consistent in order to avoid errors and the incorrect interpretation of information. Descriptors Unlike EDI, which relies solely on the position of information to determine its meaning, data formats like XML and JSON include tags that describe the content that is being sent. In order to increase the accuracy of data interpretation when combining information, these descriptors should be consistent or agreed upon by the entities involved in the exchange.

For instance, consider a shipment document being sent to a courier service. The customer sends the following as the shipping address:

<ShippingAddress> <Street>350 5th Ave</Street> <City>New York</City> <State>NY</State> <Country>USA</Country> <ZipCode>10118</ZipCode> </ShippingAddress>

However, the courier service is expecting the following:

<ShipTo> <Street>350 5th Ave</Street> <City>New York</City> <State>NY</State> <Country>USA</Country> <ZipCode>10118</ZipCode> </ShipTo>

When the courier service receives the shipment order, it may not be able to discern what the shipping address is, as it looks for the descriptor “ShipTo” but instead is given a descriptor “ShippingAddress.” While both essentially mean the same thing, they are described differently, and so an issue occurs when a machine processing this information was not explicitly instructed to treat these descriptor as equivalent.

Data Types and Formats Going beyond descriptors and looking at the information itself, some conventions need to be in place as to what type of information is being combined and how that information is formatted.

Consider the courier service sending a response to the shipment order. The seemingly simple data item ‘Expected Arrival Date’ can have many different data formats. For instance, it could be:

Page 18: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 18 ‐ 

03-05-2010 2010-03-05 03/05/2010 March 5, 2010

and a host of other possibilities. This may raise some errors and issues when parsers are unable to recognize the data formats being used.

Custom Codes and Values

There are instances when standard codes are used between organizations to represent certain types of information. These codes are used to somehow reduce errors and ambiguity. For instance, in the purchase order sample used in 10.3.1, POType “NE” actually refers to “New Order”. Other codes are likewise available, such as “BK” for Blanket Order, “RO” for Rush Order, and the like. These custom codes should be agreed upon by organizations combining information so that no issues arise during data processing.

10.3.4 Semantic Compatibility The most important layer that should be agreed upon is the semantic layer, or the meaning of the information being combined. While there are some run-time (see Section 10.4) strategies to cope with combining information for systems with incompatible technical, structural, or content features, the output would still be inaccurate if entities do not agree on the conceptual model of the information being exchanged.

Consider the relationship between P&G and Walmart as described in 10.2.1. P&G believed that they had been hitting 95% billing accuracy, but Walmart considered P&G to be accurate only 15% of the time. [Citation Needed] After some time, both parties realized that their concept of billing accuracy differed. While both loosely defined billing accuracy as purchase order to invoice match rate, a closer look revealed that P&G only considered matching the number of cases being delivered and billed, while Walmart looked at both number of cases and dollar value per case. This incompatibility in the conceptual model of billing accuracy took a person tracking each purchase order and its corresponding invoice, and 3 weeks of assessment, to finally discern.

This briefly illustrates how a document can have the same descriptors, have compatible data formats and content types, but still be completely incompatible(?) because of inconsistent conceptual models.

10. 4 Combining Descriptions during Run-Time The success of design-time interoperability depends upon tightly coupled systems where specific design requirements can be negotiated ahead of system implementation and where maintenance or upgrades of all interoperating systems can continually be made. In cases where high flexibility, ad hoc or real-time combinations are desired, run-time interoperability may provide appropriate alternatives. Some low-level incompatibilities between systems, such as the presence of syntactical, data encoding, and particular structural and content issues, can also be

Page 19: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 19 ‐ 

rectified by implementing run-time interoperability techniques, creating more loosely-coupled, interoperating systems.

We actually perform some run-time combinations all the time as we make daily decisions. For example, when planning a vacation, we use a variety of systems to negotiate a wide set of ad hoc requirements such as our resources and time, our fellow travelers and their availability, the bookings for hotel and transportation, as well as desirable destinations and their various offerings. We somehow reconcile the different descriptors used in each of the resources we consult, and match these against each other so that the relevant information can be combined and compared. Run-time operability’s primary goal then, is to provide integrated solutions for highly complex, disparate, heterogeneous, unbounded, loosely interconnected, highly portable, and constantly evolving systems [Carney et. al. 2005]. The next section will discuss some of the most popular run-time interoperability techniques, with some examples of how they have been implemented in the real world.

10.4.1 Data Mapping To enable the information exchanges between existing systems at run-time, a straightforward process is data mapping, where data elements from distinct data models of different systems are compared and matched to their counterparts across systems. In the simplest implementation, data mapping involves only two systems, a source and a target. The relationship between each data element may be unidirectional or bidirectional, and each data element, although being conceptually the same, can be called differently across systems. To illustrate the difference between a unidirectional and bidirectional map, consider two systems, the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED-CT) and the International Classificiation of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). SNOMED-CT is a medical language system for clinical terminology maintained by the International Health Terminology Standards Development Organization (IHTSDO) and a designated electronic exchange standard for clinical health information for U.S. Federal Government systems [International Health Terminology Standards Development Organisation]. The ICD-10-CM, on the other hand, is an international diagnostic classification system for general epidemiological, health management, and clinical use maintained by the World Health Organization (WHO) and used for coding and classifying morbidity data from inpatient/outpatient records, physicians offices, and most National Center for Health Statistics (NCHS) surveys [World Health Organization 2010]. Because many different SNOMED-CT concepts can be mapped to a single ICD-9-CM code [McBride et. al. 2006], a map in this direction cannot be used in reverse without introducing confusion and ambiguity.

The purpose of data maps may vary from simple exchanges of data to enabling access to longitudinal data to facilitating standardized reporting [McBride et. al. 2006]. The preservation of version histories of data elements and relations in both systems is vital for verifying the validity of the data map. To verify accuracy, a data map should be reproducible solely through its use case [McBride et. al. 2006] by individuals who are not involved in its initial production but familiar with the meanings of the data elements.

Data Mapping and Transformation Tools The conceptual relationships between different descriptions can be mapped out manually when designing maps. This, however, becomes more difficult as more complex maps need to be designed and created, perhaps due to the number of terms being mapped or when there are more

Page 20: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 20 ‐ 

structural or granularity issues to consider. Using tools to create these connections becomes vital in ensuring the accuracy and robustness of the maps being designed.

In their most basic form, graphical mapping tools provide users with a graphical user interface to connect data elements from source to target by drawing a line from one to the other [Altova 2010]. More commonly, graphical data mapping tools are included in an extract, transform, and load (ETL) database suite that provides additional powerful data transformation capabilities. Whereas data mapping is the first step in capturing the relationships between different systems, data transformation entails code generation that uses the resulting maps to produce an executable transformational program that converts the source data into target format. ETL databases extract the information needed from the outside sources, transform these into information that can be used by the target system using the necessary data mappings, and then loads it into the end system.

Languages such as XSLT (XSL Transformations) and TXL facilitates the ease of data transformation while various commercial data warehousing tools provide varying functionalities from single/multiple source acquisition, data cleansing, to statistical and analytical capabilities. Based on XML (Extensible Markup Language), XSLT is a declarative language designed for transforming XML documents into other XML documents [W3C 1999]. For example, XSLT can be used to convert XML data into HTML documents for web display. XSLT processing entails taking an input document in XML format and one or more XSLT stylesheets through a template processing engine to produce a new resulting document. 10.4.2 Metadata Mapping/ Crosswalks In systems, such as bibliographical records, where metadata can be as meaningful and important to consider as the actual data, metadata standards are often implemented. (Reference: Ch 4, Section 4.5.1) As a result, the interoperability between these systems is highly dependent upon successful transformation of metadata schemas from one system to another. Some of the solutions available to address this issue include crosswalks, translation algorithms, and specialized data dictionaries [OCLC 2008].

Similar to data mapping, one of the most straightforward approaches is the use of crosswalks, which are equivalence tables of metadata elements, semantics, and syntax from one schema to another [NISO 2004]. Crosswalks not only enable systems with different metadata standards to interchange information in real-time but are also used by third-party systems such as harvesters and search engines to, respectively, generate union catalogs, or a catalog combining descriptions of different collections, and perform queries on multiple systems as if they are one consolidated system [Chan and Zeng 2006 and Godby et. al. 2008].

An example of how crosswalks are used is in the digital library space, where a system called WorldCat allows users to access many library databases to locate items in their community libraries and, depending on patron privileges, to request items through their local libraries from libraries all over the world [OCLC (2)]. For this powerful tool to accurately locate holdings in each library, two metadata standards are involved. At the book publisher, wholesaler, and retailer end, the international standard Online Information Exchange (ONIX) [EdItEUR 2009] is used to standardize books and serials metadata throughout the supply chain. ONIX is implemented in book suppliers’ internal and customer-facing information systems to track products and to facilitate the generation of advance information sheets and supplier

Page 21: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 21 ‐ 

catalogs1. At the library end, the Machine-Readable Cataloging (MARC) [The Library of Congress 2010] formats are to manage and communicate bibliographic and related information. When a member library acquires a title, information in ONIX format is sent from the supplier to the Online Computer Library Center (OCLC) where it is matched with a corresponding MARC record in the WorldCat database [Godby et. al. 2008] by using an ONIX to MARC crosswalk. This enables WorldCat to provide accurate real-time holdings information of its member libraries as soon as a book is acquired.

Because schemas have varying degrees of richness, challenges to crosswalks may arise from schema differences in levels of equivalency, in semantic definitions, in rules regarding mandatory, optional, or number of occurrences of metadata elements, in hierarchical or value constraints, and in controlled vocabularies [Chan and Zeng 2006]. As a result of these complexities, absolute crosswalks that ensure exact mappings will result in a non-mappable space if the source schema is substantially richer than the target schema. In practice, relative crosswalks where all elements in a source schema are mapped to at least one target regardless of semantic equivalence are often implemented [Chan and Zeng 2006]. However, forgoing strict equivalence creates overlaps in meaning and scope, thus lowering the quality of data converted by crosswalks [Chan and Zeng 2006].

As the number of systems increase, multiple-schema crosswalks become increasingly impractical if each pair of schemas requires a separate crosswalk. A more efficient approach would be the use of one schema as a switching mechanism for all other schemas to map towards [Chan and Zeng 2006]. Consider how the Getty has created a crosswalk called Categories for the Description of Works of Art (CDWA) to switch between eleven metadata standards, including Machine-Readable Cataloging/Anglo-American Cataloging Rules (MARC/AACR) and Dublin Core (DC) [J. Paul Getty Trust 2010]. In this instance, the “Creation Date” element in CDWA is mapped to “260c Imprint – Date of Publication, Distribution, etc.” in MARC/AACR and to “Date.Created” in DC. Although this creates a two-step look-up in real-time, a direct mapping of this element from MARC/AAACR to DC is no longer necessary for systems to interoperate.

10.5 Non-Technical Considerations for Interoperability

The preceding two sections considered interoperability from the perspectives of technical design and implementation. Although these points of view are integral to realizing the benefits of interoperability, the social and organizational concerns cannot be ignored since organizational policies, biases, structures, standards, and the prevailing culture within as well as outside of organizations may provide opportunities and challenges for interoperability. 10.5.1 Socio-political (Inter-organizational)

Information and Economic Power Asymmetry The diverse nature and characteristics of organizations such as their different objectives, structures, and cultures are expressed through their varying sizes, compositions, and influences. As a result, power distributions between different organizations are markedly different.

The example used in 10.2.1 illustrates this extensively, where the substantial economic bargaining power of a large retailer such as Walmart over small suppliers allows the former to mandate the standards, processes, and rules by which business is conducted. Another example is

                                                         

Page 22: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 22 ‐ 

how the informational and economic prowess of Google and Apple allow them to control the extent of interoperability attainable in products, services, or applications that utilize their platforms through mandated APIs and the process by which third-party applications are approved. Specifically, if Apple changes how specific features must be implemented in iPhone apps, developers will have to comply to maintain their apps on the iPhone platform. The asymmetry between these large market dominants and the myriad of smaller entities providing peripheral support, services, or components can result in de facto standards that may pose significant burden for small businesses and reduce overall competition.

Standards To enable interoperability between systems, applications, and devices, the establishment and publication of industry and user community standards are essential. A standard interface describes the data formats and protocols to which systems should conform such that they can achieve interoperability through adherence to generic requirements. The process of standards setting is undeniably socio-political as it involves various stakeholders, including developers, implementers, users, and standards bodies, all of which have competing needs and motivations. Besides organizational inertia, other challenges to standardization include closed policies, processes, or development groups, intellectual property, credentialing, abuse of standards as trade barriers, lack of specifications, competing standards, high implementation costs, lack of conformance metrics, and lack of clarity or awareness.

We see standards-setting in practice in 10.2.4, where individuals and organizations from educational institutions, learning tools vendors, content creators, and the like banded together to form the IMS Global Learning Consortium. The consortium works together to create standard formats and guidelines such as the Common Cartridge and Learning Tools Interoperability specifications to ensure that products and services made by different entities are still able to communicate and exchange information with one another. The creation of standards by such a body, however, does not necessarily translate to successful adoption. In cases such as these, widespread use of such standards depends on the network effects through acceptance and use by a majority of vendors or users or major players in the industry. For the education example, huge learning tools vendors such as Blackboard, as well as popular content creators such as McGraw-Hill must lead the way to ensure successful adoption. Public Policy Beyond businesses and standards-setting organizations, the government sector also wields substantial influence over the implementation and success of interoperability in organizations. As institutions with large and inalienable constituents, governments and governmental entities have similar influences as large businesses due to their size and substantial impact over the society-at-large. Different types of governmental forms around the world, ranging from centrally planned autocracy to loosely organized nation states, can have far-reaching consequences in terms of how policies are designed and how the underlying cultural values are expressed. For example, as a federal constitutional representative republic, the United States' three levels of government, ranging from local to state to federal, are fractious in policy-making and enforcement as these independent government units are given the power to define laws and policies in their jurisdiction. Supporting interoperability between systems at different levels and functions of government can be challenging but is also vital to the smooth operations in a

Page 23: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 23 ‐ 

nation where decisions, jurisdictions, and powers are highly decentralized and often contradictory.

The adoption of UBL in Denmark as discussed in 10.2.3 illustrates how the support of government through the enforcement of laws and policies as well as through the participation of government institutions can push the rapid adoption of certain standards. Denmark’s Ministry of Finance published a statute that different public sector institutions must be able to send and receive electronic invoices, with the Ministry of Science and Technology specifying the use of a Danish localized version of UBL. By pushing all public sector institutions to use UBL, companies transacting with these institutions were forced to conform to the standard and develop ways to produce and receive documents compatible with it.

10.5.2 Organizational Opportunities and challenges for interoperability need not be limited to interactions between organizations. Because different areas within an organization exhibit different structural and cultural characteristics, the resultant biases within an organization are often as strongly polarized as outside influences. For example, siloed business functions may be resistant to the concept of interoperability in order to gain competitive advantage or command resources over other business functions. Organizational biases may be manifested in multiple contradictory information access policies in different groups within the same organization, or even the implementation of separate, disjoint systems that cannot be integrated without substantial additional investments. In cases where enterprise mandates are issued from the executive level, different functions or units may dispute or try to customize the implementation of systems or policies to the extent of defeating the practical application of the original standardization intent. A proliferation of exceptions and extensive workarounds may be indicative of this situation. On the other hand, there may be instances where an overarching standardization policy is inappropriate for all units within an organization. For example, a division created with the primary goal for innovating and experimenting with new technologies may benefit from having the flexibility to be exempt from enterprise mandates in order to test and validate new ideas that may become useful to the greater organization.

The elements of structure within an organization includes formal reporting relationship, depth and span of its hierarchies, how business units or entities are defined, and linkages such as communication, coordination, and integration tools that are available between business units. Because this structure can strongly affect the power distribution within the organization, it is also an important consideration when examining the extent of interoperability that can be attained or is desired. Often characterized by different kinds of value contribution, different policies, processes, and practices, business units are challenged with the task to clearly define and prioritize different business goals, align and coordinate business processes, and build collaboration capabilities to achieve a high level of interoperability. In addition to information exchange, organizational operability also aim to provide services that are widely available, easily identifiable and accessible across the enterprise.

Page 24: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 24 ‐ 

REFERENCES: Altova. "MapForce - Graphical Data Mapping, Conversion, and Integration Tool". Altova. 2010. http://www.altova.com/mapforce.html Brailer, D. “Interoperability: The Key to the Future Health Care System” Brun, M., Brown, J., Lohde. "Adoption of UBL in Denmark: Business Cases and Experiences". XTech. 2005. Carney, D J., Fisher, D., Morris, E.J., Place P.R. “Some Current Approaches to Interoperability”. Integration of Software-Intensive Systems Initiative. CMU/SEI-2005-TN-033. Technical Note. 2005. Chan, L.M., and Zeng, M. L. "Metadata Interoperability and Standardization - A Study of Methodology Part I". D-Lib Magazine. Vol.12(6). 2006. Chen, J. et al. “Combine Personal Blog functionalities with LMS using Tools Interoperability Architecture” Chaudhry, B et al. “Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care” Daft, Richard L. Organization Theory and Design. P.201-220. Detmer, D. “Building the national health information infrastructure for personal health, health care services, public health, and research” EDItEUR. "ONIX" EDItEUR. 2009. http://www.editeur.org/83/Overview/ Fishman, C. “The Wal-Mart You Don’t Know” The J. Paul Getty Trust. "Metadata Standards Crosswalk" The J. Paul Getty Trust. 2010. http://www.getty.edu/research/conducting_research/standards/intrometadata/crosswalks.html Glushko, Robert J. "Seven Contexts for Service System Design", Handbook of Service Science, 2010. Godby, C. J., Smith, D., and Childress, E. "Toward Element-Level Interoperability In Bibliographic Metadata" Code{4}lib Journal. Issue 2, 2008-03-24. 2008. Green, M et al. “Supply-Chain Integration through Information Sharing: Channel Partnership between Wal-Mart and Procter & Gamble”

Page 25: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 25 ‐ 

Guijarro, L. “Interoperability Frameworks and enterprise architectures in e-government initiatives in Europe and the United States” Healthcare Information Technology Standards Panel (HITSP) Harmonization Framework (http://www.hitsp.org/harmonization.aspx#Specifications) Hyvönen et. al. Cultural Semantic Interoperability on the Web: Case Finnish Museums Online. University of Helsinki and Helsinki Institute for Information Technology (HIIT) (2004). Hyvönen et. al. Finnish Museums on the Semantic Web: The User's Perspective on MuseumFinland. University of Helsinki and Helsinki Institute for Information Technology (HIIT) (2004). IBM. "Patterns : Service-Oriented Architecture and Web Services". IBM Redbooks. 2004. http://www.redbooks.ibm.com/abstracts/sg246303.html International Health Terminology Standards Development Organisation. "SNOMED CT". http://www.ihtsdo.org/snomed-ct/ JSON. "Introducing JSON". http://json.org Klischewski, R. “Information Integration or Process Integration? How to Achieve Interoperability in Administration” Naomi Lamoureaux, Daniel Raff, and Peter Temin, “Beyond markets and hierarchies: toward a new synthesis of American business history,” The American Historical Review, 108 (2003): pp. 404-433. Layne, K. et al. “Developing fully functional e-government: A four stage model” The Library of Congress. "MARC Standards". The Library of Congress. 2010. http://www.loc.gov/marc/ Linthicum, D. “Enterprise Application Integration” Littlejohn, A. “Reusing online resources: a sustainable approach to e-learning” LMS Global Learning Consortium Learning Tools Interoperability http://www.imsglobal.org/toolsinteroperability2.cfm LMS Global Learning Consortium Common Cartridge http://www.imsglobal.org/commoncartridge.html *EDI samples come from http://miscouncil.org

Page 26: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 26 ‐ 

McBride, S., Gilder, R., Davis, R., Fenton, S. "Data Mapping" American Health Information Management Association. 2006. Neslin S. et. Al. “Challenges and Opportunities in Multichannel Customer Management” NISO (National Information Standards Organization). "Understanding metadata" Bethesda, MD: NISO Press. 2004. http://www.niso.org/standards/resources/UnderstandingMetadata.pdf OASIS. "OASIS Universal Business Language (UBL) TC". http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl OCLC. "Metadata Schema Transformation Services" OCLC. 2008. http://www.oclc.org/research/activities/schematrans/default.htm OCLC. "WorldCat" OCLC. 2010. http://www.worldcat.org/ Papazoglou, M. et. Al. “Advances in object-oriented data modeling” Chapter 9 – Database Integration: The Key to Data Interoperability Powell, Walter. “Neither market nor hierarchy: Network forms of organization,” Research in Organizational Behavior, 12 (1990): pp. 295-336. Rockley, A. “Enterprise Content Management: A Unified Strategy” Scholl, H. et al. “E-Government Integration and Interoperability: Framing the Research Agenda” Shopstyle.com http://www.silicon.com/management/ceo-essentials/2010/01/05/cheat-sheet-electronic-data-interchange-39737354/ Srinivasan, R., Boast, R., Becvar, K. and Furner, J. “Digital Museums and Diverse Cultural Knowledges: Moving Past the Traditional Catalog“, The Information Society, 25(4). Sullivan, L. “Wal-Mart’s Way” http://www.informationweek.com/news/mobility/RFID/showArticle.jhtml?articleID=47902662 Vernadat, François B. Technical, Semantic and Organizational Issues of Enterprise Interoperability and Networking. Annual Reviews in Control 34 (2010) p.139-44. Von Riegen, C. "OASIS Symposium: How Standards Address Interoperability Needs." SAP AG 2006. http://www.oasis-open.org/events/symposium_2006/proceedings.php W3C. "XSL Transformations (XSLT) Version 1.0". W3C Recommendation. 1999. http://www.w3.org/TR/xslt#section-Introduction

Page 27: Chapter 10. Integration & Interoperability: …people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter10-20101017.pdfChapter 10: Combining ... ‐ 1 ‐ Chapter 10. Integration & Interoperability:

Chapter 10: Combining Descriptions    Last revised: October 17, 2010 

  ‐ 27 ‐ 

Wailgum, T. “How Wal-mart Lost Its Technology Edge” http://www.cio.com/article/143451/How_Wal_Mart_Lost_Its_Technology_Edge?page=9&taxonomyId=3171 Walker, J et al. “The Value of Health Care Information Exchange and Interoperability” Wal-Mart supplier website http://walmartstores.com/Suppliers/248.aspx Wilbert, C. “How Wal-Mart Works” http://money.howstuffworks.com/wal-mart.htm World Health Organization. "International Classification of Disease (ICD)". WHO. 2010. http://www.who.int/classifications/icd/en/


Recommended