+ All Categories
Home > Documents > Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Date post: 30-Dec-2016
Category:
Upload: harris
View: 216 times
Download: 3 times
Share this document with a friend
10
Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy Hongwei Zhu a, , Harris Wu b a Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, MA 01854, United States b Department of Information Technology and Decision Sciences, College of Business and Public Administration, Old Dominion University, Norfolk, VA 23529, United States abstract article info Article history: Received 12 May 2013 Received in revised form 8 January 2014 Accepted 17 January 2014 Available online 24 January 2014 Keywords: Information quality Data quality Data standards Quality assessment XBRL GAAP Taxonomy Data standards are often used by multiple organizations to produce and exchange data. Given the high cost of de- veloping data standards and their signicant impact on the interoperability of data produced using the standards, the quality of data standards must be systematically measured. We develop a framework for systematically assessing the quality of large-scale data standards using automated tools. It consists of metrics for intrinsic and contextual quality dimensions, as well as effectual metrics that assess the extent to which a standard enables data interoperability. We evaluate the quality assessment framework using two versions of a large nancial reporting standard, the US GAAP Taxonomy, and public companies' nancial statements created using the Tax- onomy. Evaluation results conrm the effectiveness of the framework. Findings from the evaluation also offer valuable insights to decision makers who develop and improve data standards, select and adopt data standards, or consume standards-based data. © 2014 Elsevier B.V. All rights reserved. 1. Introduction Data standards specify data elements to be used by multiple organi- zations to create data that can be exchanged and processed unambigu- ously. Large-scale data standards, such as those within the US Department of Defense [52] and across the real estate mortgage in- dustry [32], include many data elements and are intended for use by a large number of organizations. Such data standards are costly to develop and can have a signicant impact on organizations that use the standards. Systematic methods for measuring the quality of large-scale data standards are needed to aid the development, im- plementation, and evolvement of data standards. Despite extensive work in the areas of data and information quality [18,30,54,56,66], little has been done to create automated methods for assessing the quality of data standards. We attempt to ll this research gap by developing a framework with metrics and automatic methods to systematically assess the quality of large-scale data standards. The framework offers methods to answer fundamental questions about a data standard, such as: Is the standard complex? Does the standard have everything I need? Do I need everything in the standard? Does the standard accomplish its primary objective? We evaluate the frame- work using real-world data standards and the corresponding data in- stances in the nancial domain. The standards are the United States Generally Accepted Accounting Principles (GAAP) Taxonomy released in 2009 and then revised in 2011. The two versions of the Taxonomy are specied using the eXtensible Business Markup Language (XBRL) [70]. The Securities and Exchange Commission (SEC) has adopted both the 2009 and 2011 versions of the GAAP Taxonomy and mandated the public companies to use either version to create their nancial state- ments. The data instances are ofcial nancial statements encoded in XBRL, submitted to the SEC by publicly traded companies. Our work makes four contributions to both research and practice. (1) The framework consists of a small number of quality metrics for four primary aspects of data standard quality. Thus it is not only com- pact and easy to implement, but also informative and relatively compre- hensive. (2) The contextual and effectual metrics of the framework are novel. The contextual metrics measure how well a data standard ts users' needs. The effectual metrics objectively measure how well a stan- dard has accomplished its primary objective of achieving semantic data interoperability. We are not aware of the same standard quality metrics elsewhere in extant literature. (3) For each metric we implement an au- tomated method to obtain the measurement. Evaluation shows that the metrics are effective for large-scale automated measurements. We eval- uate the framework using XBRL taxonomies and data instances. The methods are applicable to any data standards specied using a formal language such as XML. (4) The framework and the evaluation also di- rectly answer the call for increased professional relevance in decision support systems research [14]. The SEC has tasked the Financial Ac- counting Standards Board (FASB) to continuously improvethe GAAP Taxonomy. The methods and ndings of this research are apparently useful to decision makers such as those at FASB and other standards de- velopment organizations. Decision Support Systems 59 (2014) 351360 Corresponding author. Tel.: +1 978 934 2585. E-mail addresses: [email protected] (H. Zhu), [email protected] (H. Wu). 0167-9236/$ see front matter © 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.dss.2014.01.006 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss
Transcript
Page 1: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Decision Support Systems 59 (2014) 351–360

Contents lists available at ScienceDirect

Decision Support Systems

j ourna l homepage: www.e lsev ie r .com/ locate /dss

Assessing the quality of large-scale data standards: A case of XBRLGAAP Taxonomy

Hongwei Zhu a,⁎, Harris Wu b

a Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, MA 01854, United Statesb Department of Information Technology and Decision Sciences, College of Business and Public Administration, Old Dominion University, Norfolk, VA 23529, United States

⁎ Corresponding author. Tel.: +1 978 934 2585.E-mail addresses: [email protected] (H. Zhu), hw

0167-9236/$ – see front matter © 2014 Elsevier B.V. All rihttp://dx.doi.org/10.1016/j.dss.2014.01.006

a b s t r a c t

a r t i c l e i n f o

Article history:Received 12 May 2013Received in revised form 8 January 2014Accepted 17 January 2014Available online 24 January 2014

Keywords:Information qualityData qualityData standardsQuality assessmentXBRLGAAP Taxonomy

Data standards are often used bymultiple organizations to produce and exchange data. Given the high cost of de-veloping data standards and their significant impact on the interoperability of data produced using the standards,the quality of data standards must be systematically measured. We develop a framework for systematicallyassessing the quality of large-scale data standards using automated tools. It consists of metrics for intrinsic andcontextual quality dimensions, as well as effectual metrics that assess the extent to which a standard enablesdata interoperability. We evaluate the quality assessment framework using two versions of a large financialreporting standard, the US GAAP Taxonomy, and public companies' financial statements created using the Tax-onomy. Evaluation results confirm the effectiveness of the framework. Findings from the evaluation also offervaluable insights to decision makers who develop and improve data standards, select and adopt data standards,or consume standards-based data.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Data standards specify data elements to be used bymultiple organi-zations to create data that can be exchanged and processed unambigu-ously. Large-scale data standards, such as those within the USDepartment of Defense [52] and across the real estate mortgage in-dustry [32], include many data elements and are intended for useby a large number of organizations. Such data standards are costlyto develop and can have a significant impact on organizations thatuse the standards. Systematic methods for measuring the quality oflarge-scale data standards are needed to aid the development, im-plementation, and evolvement of data standards.

Despite extensive work in the areas of data and information quality[18,30,54,56,66], little has been done to create automated methods forassessing the quality of data standards. We attempt to fill this researchgap by developing a framework with metrics and automatic methodsto systematically assess the quality of large-scale data standards. Theframework offers methods to answer fundamental questions about adata standard, such as: Is the standard complex? Does the standardhave everything I need? Do I need everything in the standard? Doesthe standard accomplish its primary objective? We evaluate the frame-work using real-world data standards and the corresponding data in-stances in the financial domain. The standards are the United StatesGenerally Accepted Accounting Principles (GAAP) Taxonomy released

[email protected] (H. Wu).

ghts reserved.

in 2009 and then revised in 2011. The two versions of the Taxonomyare specified using the eXtensible Business Markup Language (XBRL)[70]. The Securities and Exchange Commission (SEC) has adopted boththe 2009 and 2011 versions of the GAAP Taxonomy and mandated thepublic companies to use either version to create their financial state-ments. The data instances are official financial statements encoded inXBRL, submitted to the SEC by publicly traded companies.

Our work makes four contributions to both research and practice.(1) The framework consists of a small number of quality metrics forfour primary aspects of data standard quality. Thus it is not only com-pact and easy to implement, but also informative and relatively compre-hensive. (2) The contextual and effectual metrics of the framework arenovel. The contextual metrics measure how well a data standard fitsusers' needs. The effectual metrics objectivelymeasure howwell a stan-dard has accomplished its primary objective of achieving semantic datainteroperability. We are not aware of the same standard quality metricselsewhere in extant literature. (3) For eachmetric we implement an au-tomatedmethod to obtain themeasurement. Evaluation shows that themetrics are effective for large-scale automatedmeasurements.We eval-uate the framework using XBRL taxonomies and data instances. Themethods are applicable to any data standards specified using a formallanguage such as XML. (4) The framework and the evaluation also di-rectly answer the call for increased professional relevance in decisionsupport systems research [14]. The SEC has tasked the Financial Ac-counting Standards Board (FASB) to continuously “improve” the GAAPTaxonomy. The methods and findings of this research are apparentlyuseful to decisionmakers such as those at FASB and other standards de-velopment organizations.

Page 2: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

352 H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

The rest of the paper is organized as follows. Section 2 reviews relat-ed research to justify the need for effective measurement of data stan-dard quality. Section 3 describes the metrics of the framework.Section 4 presents the evaluation method and briefly describes thedata standards and the corresponding data used for the evaluation.Section 5 presents the evaluation results. Section 6 discusses the charac-teristics of this research. Section 7 concludes the paper and points outdirections of future research.

2. Related work

Our work is closely related to three bodies of literature: (1) dataquality, (2) quality of schema, ontology, and other forms of metadata,and (3) quality of XBRL data and XBRL taxonomies.

Data standards are a form of metadata, thus certain theories andconcepts of data quality can be adapted to examine the quality of datastandards. Most data quality research has focused on data, not the stan-dards used to create and organize the data. The extant data qualityresearch has identified useful quality dimensions and developed assess-ment methods that use objective metrics or survey instruments[9,13,26,40,59,67]. The dimensions, whether discovered using the surveymethod or based on theories, can be grouped into categories such as in-trinsic, contextual, and representational [59,67]. Although the methodsand results of data quality research are useful to this study, they mustbe adapted to consider the unique characteristics and purposes of datastandards. Furthermore, many data quality metrics in extant researchrely on subjective, human assessment that can be biased, labor intensive,and costly to implement in practice.

Quality of database schemas and data integration schemas isdiscussed in [2,19,49]. Among the quality aspects introduced, complete-ness andminimality (i.e., no redundancy) are most relevant to data stan-dards. Because of the differences in use contexts, we need to developmetrics and measurement methods suitable for data standards.

Metadata in digital libraries and library information systems playsan important role in organizing and searching information items. Meta-data quality has been studied in the library and information sciencecommunity [9,40,44,58,60]. Most of their work examines the metadatavalues. For example, completeness is often determined by whethervalues for metadata such as author and title are supplied for each infor-mation item [68]. This line of research concerns issues in standards-based data, which is relevant, but it does not address the key questionsregarding the quality of data standards.

Ontology is often used for sharing knowledge, developing intelligentsystems, and enabling semantic integration of data [18,31,39,63]. Withthe growth of the Semantic Web [3,53], ontology engineering hasemerged as a research stream, with a number of studies focusing on on-tology quality [10,21,23,42,57,62]. This line of research has been influ-enced by earlier work on the quality of conceptual modeling [27],which is extended in subsequent studies [25,35–37]. Existing ap-proaches often rely on intuition or theories such as semiotic theory toidentify the dimensions of ontology quality, and most evaluationshave used relatively small “toy ontologies” [10,55]. Studies on the com-plexity of ontology have used graph theoretic or entropy-based metricsadapted from software complexity [24,33,69,72]. Other quality metrics,such as cohesion and coupledness, are specific to ontology implementa-tion and application [28,41,71]. Thus not all ontology metrics are appli-cable to data standards. Ontology quality metrics do not examine howontology is being used, and yet, howusers use data standards is a crucialcomponent of measuring the standards' quality.

Numerous papers have beenwritten on the topic of XBRL [50], howev-er, only a few studies focus on the quality of XBRL data and XBRL taxon-omies. In an earlier study [7], a manual inspection of line items infinancial statements of 69 companieswas used to examinewhether a pre-liminary taxonomy met the reporting needs of most companies. Manualinspection continued to be used in studies to examine if companies prop-erly used XBRL taxonomies [1,5,6]. While the manual approach can reveal

deep issues such as violation of reporting convention and inconsistencybetween XBRL version and non-XBRL version of financial statements, it islabor intensive and not applicable at a large scale. Commercial softwaretools were used in [12] on approximately 100 XBRL financial statements,then manual efforts were used to examine error and warning messagesgenerated by the tools. It was still a manual approach and its results de-pend on the particular implementation of the software tools. Automatedtools have been suggested to perform large-scale analysis [7,8], whichholds great potential to advance both research and practice of standardquality management as demonstrated in recent studies that examinedonly one or two aspects of quality [74,75].

A recent study identifies dozens of aspects for quality data standardsin termsof the endproduct, the development process, and the implemen-tation and use of data standards [22]. The measurement of these aspectsrelies on the use of an instrument that requires human input. Thus it islabor intensive to obtain a measurement and the results are subjective.

Clearly, there is a need for systematic methods to efficiently and ob-jectively measure the quality of large-scale data standards. This workaims to address such a need.

3. Framework for quality of data standards

We develop the framework with three underlying design principles.First, quality is based on the notion of “fitness for use”. Thus the frame-workmust examine not only a data standard itself, but alsowhether thestandard meets users' needs as well as how well the standard leads tointeroperable data. Second, the framework must consist of metricsthat can be algorithmically measured so that they are applicable tolarge-scale data standards. There are two aspects of “large-scale”: thestandard itself is large, and the number of its users is large. Third, theframework must be compact and informative. The framework doesnot need to cover every aspect of data standard quality. Rather, theframework should be fit for use by practitioners who prefer a smallnumber of dimensions for quality assessment [46].

3.1. Quality dimensions

We define the quality of a data standard as the standard's fitnessfor multiple users to produce interoperable data. Following this defini-tion and the design principles discussed earlier, we identify four “use-centric” dimensions that indicate the quality of data standards from in-trinsic, contextual, and effectual aspects, as summarized in Table 1.

Complexity of a data standard affects the correct understanding andappropriate use of the standard due to limitations of users' cognitive ca-pacity [34,61]. Risks of unintended consequences also increase with in-creasing complexity of information systems [11,29,38,45], of whichdata standards are often an important part. A number of factors impactstandards complexity, such as the complexity of the underlying domain,the amount of the information needing to be captured, and the diverseneeds and preferences of various stakeholders. A standard should bekept as simple as possible. But certain stakeholdersmayprefer high com-plexity (e.g., technology suppliers may prefer high complexity as it willincrease the demand of their products). Thus it is important to measureand manage standards complexity to minimize misunderstanding andmisuse of data standards. Aside from complexity, misuse can be exacer-bated when users of a standard are not provided with sufficient incen-tives for producing standards-based data [51]. Although the perceivedcomplexity of a data standard may differ among users with differinglevels of expertise, complexity is generally intrinsic to a given standardand we will develop metrics to measure this intrinsic dimension.

Completeness of a data standard indicates whether the standardcontains the specifications of all data elements and relationships neededby the user. Relevancy of a data standard indicates whether the stan-dard contains the specifications of only the data elements and therelationships needed by the user. Apparently, these two dimensions de-pend on the specific usage contexts of various users. In practice, a data

Page 3: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Table 1Dimensions of quality of data standards.

Dimension Aspect Explanation

Complexity Intrinsic Number of data elements and number of various relationships among the data elements specified in the dataCompleteness Contextual Extent to which a data standard specifies all the data elements and relationships needed by users of the standardRelevancy Contextual Extent to which a data standard specifies only the data elements and relationships needed by users of the standardInteroperability Effectual Extent to which a data standard achieves of its primary objective of ensuring interoperability of data created using the standard

353H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

standard sometimes lacks the necessary completeness and relevancybecause differing stakeholder interests and insufficient incentivesoften result in a standard that either contains “infrequently reused”data elements [52] or is a “least common denominator” [4], limitingthe standard's fitness for use (i.e., quality). Thus completeness and rele-vancy are important quality dimensions of data standards.

Interoperability of data from multiple users of a data standard as-sesses the effect of the standard on the primary objective of achievinginteroperability of standards-based data.

Although additional dimensionsmay be added, the four dimensionsprovide a concise indication of quality from intrinsic, contextual, and ef-fectual aspects concerning a data standard's fitness for use. Further-more, as we will see next, metrics for these dimensions can bemeasured with automated methods, making these dimensions suitablefor evaluating large-scale data standards.

3.2. Metrics for quality dimensions

For each dimension, we identify and define a set of metrics that canbe measured using automatic computational methods.

3.2.1. Metrics for complexityA data standard usually specifies data types (for physical implemen-

tation), data elements (corresponding to concepts in application do-mains), and relationships among data types (e.g., composite andderived data types) and data elements (e.g., is_a and part_of relation-ships). Since there are often more data elements than data types, wefocus on metrics for data elements in the rest of the discussion.

Data elements and their relationships can be represented using agraph where nodes correspond to data elements defined in a standardand directed edges correspond to the relationships between data ele-ments. Then we can use the various characteristics of a directed graphto define complexity metrics. Let S be the set of the nodes and E bethe set of labeled directed edges. The following metrics indicate certainaspects of a data standard's complexity:

• Number of data elements (i.e., nodes), |S|, reflects the size of thestandard.

• Number of edges, |E|, reflects the size of the standard.• Edge–node ratio, |E|/|S|, indicates the complexity of relationshipsamong data elements

• Entropy, e = −Σi p(i) log2 p(i), where p(i) is the probability of anygiven node having a degree i (i.e., having i edges). Entropy indicatesthe uncertainty and therefore complexity in relationships amongdata elements. The minimum entropy is 0, when all nodes have thesame number of edges. Themaximum is log2 k, where k is the numberof all possible degrees that a node can have and a node has an equalprobability of having any of the k degrees.

Intuitively, the number of elements (concepts)measures complexityin terms of size. The number of edges and the edge/node ratio measurecomplexity in terms of relationships among concepts. The entropymea-sures complexity in terms of variance, or the uncertainty, of the rela-tionships in which a concept is involved. These metrics are directlyrelated to user tasks of understanding a given data standard, choosing

the appropriate data elements to represent the user's data, and makingappropriate extensions to the standard when extensions are allowed.

Note that as an intrinsic dimension, complexity is measured by eval-uating the standard itself but not against the data instances based on thestandard. In contrast, data instances are needed for the metrics of con-textual and effectual dimensions, which will be described next.

3.2.2. Metrics for completeness and relevancyCompleteness and relevancy of the same data standard can be differ-

ent to different users. They evolve with the user's needs even for thesame user. Further, they are different between an individual user andthe user community. When a data element is needed, chances are thatall the relationships associated with it are also needed. Thus it is likelythat relationship-based measurements for completeness and relevancyare proportional to element-basedmeasurements. For simplicity, in thispaper we limit the metrics to data elements and leave the relationshipsamong data elements for future research.

Let Ui be the set of data elements required by the user i. From theuser i's perspective, the metrics for completeness and relevancy can bedefined as

completenessi ¼Ui∩Sj jUij j ; andrelevancyi ¼

Ui∩Sj jSj j :

From the user community's perspective, the metrics can be definedas

completenessc ¼∪iUi∩Sj j∪iUij j ; andrelevancyc ¼

∪iUi∩Sj jSj j :

A standard can be complete by specifying every possible data ele-ments, but it suffers from low relevancy because many of the specifieddata elementsmay not be needed bymost users. Conversely, a standardcan be highly relevant by specifying only crucial data elements that areabsolutely needed by all users, but it is incomplete because it does notspecify the data elements needed by a variety of users. Analogously,with more than 230,000 entries, the Oxford English Dictionary is per-ceived to have high quality by adult users. But for an elementary schoolstudent, most of thewords in the dictionary are not relevant. To the stu-dent, a children's or junior dictionary (typically with several thousandsof entries) has a higher quality even though occasionally the studentcannot find a certain word in the dictionary.

3.2.3. Metrics for interoperabilityA standard often adopts a uniform syntax for data representation.

Thus it is relatively trivial to achieve syntactic interoperability whendata is produced in conformancewith the standard. A standard also de-fines a set of data elements with their semantics agreed upon by allusers, aiming to attain semantic interoperability. However, semanticheterogeneity problems will arise when the standard allows for multi-ple representations of the same data [51] or when users are allowedto choose among different elements in the standard or to extend thestandard with custom elements. Certain semantic heterogeneity prob-lems can be resolved when semantic mappings are available [18,31].However, there are no automatic methods to reliably induce such

Page 4: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

354 H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

mappings. Manual creation of such mappings is not scalable for stan-dards with a large number of users.

In this paper, we focus on the comparability aspect of semantic inter-operability. A set of data instances is comparable if the instances use thesame set of data elements defined in a data standard. Interoperabilitymeasures the extent to which the data instances have overlappingdata elements defined in a standard. This definition allows us to mea-sure interoperability directly without relying on unreliable semanticmatching techniques [47,48]. It is possible that pervasive misuse of anotherwise high quality data standard will result in data with low inter-operability. The metrics can be adapted by including only the data ofwell-intentioned users if such users can be identified. Herewedonot in-tend to distinguish abusers from well-intentioned users. Thus the met-rics reflect the actual effect of all users.

The interoperability between a pair of data instances is based on thecommon data elements used. The interoperability between users i and j,Ii,j, can be defined as

ii; j ¼Ui∩Uj

��� ���ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiUijjUj

��� ���r : ð1Þ

Clearly, Ii,j = Ij,i. The pair-wise interoperability for all users, I2, is de-fined as the arithmetic mean of pair-wise interoperability among allpairs. This definition can be extended to interoperability of any k-tuple (with k ≥ 2) as

Ii1 ;…;ik ¼Ui1∩⋯∩Uik

��� ���ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiUi1 j⋯jUik

��� ���k

r : ð2Þ

The k-interoperability of all users, Ik, can bedefined as the arithmeticmean of the k-interoperability among all k-tuples.Wewill limit our dis-cussion to I2 and I3 because interoperability calculation is computation-ally expensive. For Ik, there are O(nk) k-tuples that need to becomputed; here n is the number of users of a given data standard.

When a user is allowed to extend the standard, Ui can be partitionedinto two sets: Ui

s (elements from the standard) and Uic (elements

custom-made by the user). One may argue that custom elements tendto be specific to the user and people are largely interested in comparingthe data defined in the standard. Thus it is reasonable to measure inter-operability by just considering standard data elements in the denomi-nator of the formulas. For this purpose, we define Ii,j′ and Ii,j,k′ byreplacing all occurrences of U with Us in Eqs. (1) and (2). Similarly, I2′and I3′ are the arithmetic means of the pairs and triples.

4. XBRL GAAP Taxonomy and data collection

We use XBRL GAAP Taxonomy and data instances created using theTaxonomy to empirically evaluate the framework for data standardquality. In this section, we provide background information aboutXBRL and GAAP Taxonomy, followed by a description of the data collec-tion and analysis methods.

4.1. XBRL and the GAAP Taxonomy

XBRL is a technology based on XML Schema and XML Linking. It de-fines a business reporting language by specifying a set of data types,XML elements, and attributes for each element. For example, XBRL de-fines data types such asmonetaryItemType and sharesItemType that areoften used in business reporting. Using XBRL, any jurisdiction can devel-op its own reporting taxonomy as a data standard for companies to ex-change business data.

An XBRL-based business reporting taxonomy consists of taxonomyschemas that define data elements and linkbases that specify various

relationships between data elements or between a data element andother resources. For example, below is the GAAP Taxonomy specifica-tion of the Assets data element:

bxs:element id = ‘us-gaap_Assets’ name = ‘Assets’nillable = ‘true’ substitutionGroup = ‘xbrli:item’ type

= ‘xbrli:monetaryItemType’ xbrli:balance = ‘debit’xbrli:periodType = ‘instant’/N

The name attribute specifieswhat is generally known as the “tag” forusers to tag their Assets data. The type attribute specifies the data type ofthe element, which is a monetaryItemType data type defined in XBRL.The element also has several attributes that specify XBRL-specific prop-erties of the element. Below is an example of how the Assets element isused by a company to report its total assets ($176 billion):

bus-gaap:Assets contextRef = “eol_PE2035——1210-K0010_STD_0_20120929_0” decimals = “-6” id

= “id_401409_472EB522-A942-4262-B21B-5925B6A7DA2D_1_16” unitRef = “iso4217_USD” N 176064000000b/us-gaap:

AssetsN

There are two types of elements: concrete (by default) and abstract(specified using the abstract attribute). A concrete element such as As-sets can be used in data instances (financial statements) with actualvalues. An abstract element is used by the Taxonomy only to conceptu-ally group other elements that usually have a part-of or is-a relationshipwith the abstract element.

An XBRL taxonomy has five types of linkbases, each specifyinga kind of relationship: definition, label, reference, calculation, andpresentation. A definition linkbase specifies the conceptual rela-tionships between elements such as generalization–specializationor parent–child relationship. A label linkbase provides human-readable descriptions for the elements defined in the taxonomyschema. A reference linkbase provides further explanations tothe elements by linking them to authoritative references (e.g.,SEC regulations or certain accounting standards) that define themeaning of the elements.

A calculation linkbase specifies the numeric relationships betweenconcrete elements. For example, the following fragments in the GAAPTaxonomy specify that Assets is the sum of Current Assets and Non-current Assets:

bcalculationArc order = ‘10’ use = ‘optional’ weight

= ‘1.0’ xlink:arcrole = ‘http://www.xbrl.org/2003/arcrole/summation-item’ xlink:from = ‘loc_Assets’xlink:to = ‘loc_AssetsCurrent’ xlink:type = ‘arc’/N

bcalculationArc order = ‘20’ use = ‘optional’ weight

= ‘1.0’ xlink:arcrole = ‘http://www.xbrl.org/2003/arcrole/summation-item’ xlink:from = ‘loc_Assets’xlink:to = ‘loc_AssetsNoncurrent’ xlink:type = ‘arc’/N

When this relationship is represented using a graph, each data ele-ment, identified by an ID using either xlink:from or xlink:to attribute,corresponds to a node in the graph. Each link, specified by both xlink:arcrole and xlink:type attributes, corresponds to a directed edge in thegraph. Thus the above calculation linkbase can be represented withthree nodes and two directed edges in the graph corresponding to thecalculation linkbase.

A presentation linkbase specifies the hierarchical grouping (mainlythe parent–child relationship) and the order in which the elementsare presented in a report. For example, the GAAP Taxonomy's presenta-tion linkbase uses an AssetsAbstract element as the parent of a variety ofassets (including the Assets element) organized into different levels forrendering a human-readable financial report.

Page 5: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Table 2Datasets used for evaluation.

1st year filer 2nd year filer Total

2009 Taxonomy 60 61 1212011 Taxonomy 1189 223 1412Total 1249 284 1533

355H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

The SEC adopted the GAAP Taxonomy and mandated that all publiccompanies must use the Taxonomy to produce financial statements inXBRL format, beginning on June 15, 2009 with a phased-in schedulebased on company size. By October 31, 2014, all public companiesmust use the Taxonomy to submit their financial statements to theSEC. Each company needs to submit financial statements with addition-al details (so-called “detailed tagging”) starting its second year of XBRLfiling. The 2009 GAAP Taxonomy had 13,452 elements and 41,651 pre-sentation and calculation links among these elements. In March 2011SEC adopted a revised Taxonomy, the 2011 GAAP Taxonomy, with15,967 elements and 54,823 presentation and calculation links amongthese elements. Since then, a new version is scheduled to release onan annual basis. Companies are recommended to use the latest versionof the Taxonomy and are allowed to extend the taxonomy by introduc-ing additional data elements and relationships.

4.2. Data acquisition and analysis

Themethods to support data acquisition and analysis are depicted inFig. 1.

A data acquisition agentmonitors the RSS Feed from the SEC (feed://www.sec.gov/Archives/edgar/usgaap.rss.xml) to obtain company fil-ings submitted to the SEC. The acquisition agent downloads the finan-cial statements and the accompanying taxonomy extensions into alocal repository. An ETL (Extract/Transform/Load) program parses thefiles downloaded and loads the extracted data into a relational database.The GAAP Taxonomy is also parsed and loaded into the relational data-base. Stored SQL procedures and other programs are used to analyze thedata in the relational database.

We collected all second quarter (with an official filing date of June30, 2011) financial statements submitted to SEC in XBRL format as ofAugust 15, 2011. Each statement is from a different public company orso-called filer. The data is partitioned according to the version of theGAAP Taxonomy used and whether the company is in its first year orsecond year of using the GAAP Taxonomy (see Table 2). A majority ofthe statements are from companies who are in their first year usingthe GAAP Taxonomy and most of them used the 2011 version.

5. Results of empirical evaluation

In this section, we present the results of evaluating the frameworkusing two versions of the US GAAP Taxonomy and financial statementscreated using the taxonomies.

5.1. Complexity of GAAP Taxonomy

The complexity metrics for 2009 and 2011 GAAP taxonomies arepresented in Table 3. The last column shows the percentage changefrom the 2009 to the 2011 version of the Taxonomy. For illustration pur-poses, we only consider calculation and presentation links.

Fig. 1.Methods for data acquisition and analysis.

5.1.1. Number of elements and number of edgesBoth taxonomies define a large number of data elements that are ei-

ther concrete or abstract. We use Sc to denote the set of all concrete ele-ments. The 2011 Taxonomyhas 18.70%more elements;most of the newadditions are abstract elements to organize the Taxonomy. The 2011Taxonomy also has substantially more edges in graphs correspondingto calculation and presentation links. The increase of presentationlinks (41.74%) is greater than the increase of calculation links(14.57%), which indicates that the effort of creating the 2011 Taxonomyhad focused on better organizing data in human-readable financial re-ports rendered using presentation links.

5.1.2. Edge–Node ratioNot all data elements have edges defined in calculation or presenta-

tion linkbases. Only concrete, numeric elements can have calculationlinks. In the 2009 Taxonomy, 4713 elements have calculation links; inthe 2011 Taxonomy, 5250 elements have calculation links. We use Sc+

to denote elements that have a calculation link. For the graphs corre-sponding to calculation linkbases, we present two edge–node ratios,one considering all concrete elements in Sc, and the other consideringonly the concrete elements in Sc+. Both ratios are higher for the 2011Taxonomy.

All elements of the 2009 Taxonomy have presentation links. Out of15,967 elements of the 2011 Taxonomy, 244 (1.5%) do not have a pre-sentation link. Given the small percentage of elements that do nothave a presentation link,we use the number of all elementswhen calcu-lating the edge–node ratio for the graphs corresponding to presentationlinkbases. The 2011 Taxonomy has a 19.59% increase in the edge–noderatio, which is greater than the ratios of the calculation graph. Again, thisindicates that the focus of the 2011 Taxonomy had been on improvingthe organization of data in human-readable reports.

The fact that edge–node ratios of both versions of the GAAP Taxono-my are greater than 1 indicates that the data elements arewell connect-ed. We have considered only two out of five types of linkbases. Ataxonomy user, i.e. a filing company, must consider all linkbases. Thusthe GAAP Taxonomy is quite complex in terms of relationships amongdata elements.

5.1.3. EntropyBoth versions of the GAAP Taxonomy have high entropy and the

2011 version has higher entropy. This means that for a randomlygiven data element, there are many probable numbers of calculationor presentation links. In fact, both the number of child nodes and thenumber of edges (i.e., degrees) per node have a long-tail distribution.That is, most nodes only have a small number of direct links to other

Table 3Complexity metrics of the 2009 and 2011 GAAP Taxonomies.

2009 2011 % Change

|S|, number of elements 13,452 15,967 18.70%|Sc|, number of concrete elements 10,799 11,159 3.33%|Ec|, number of calculation links 15,566 17,849 14.67%|Ep|, number of presentation links 26,085 36,974 41.74%|Ec|/|Sc| 1.44 1.60 11.11%|Ec|/|Sc+| 3.30 3.40 3.03%|Ep|/|S| 1.94 2.32 19.59%ec, entropy of calculation graph 2.48 2.64 6.45%ep, entropy of presentation graph 2.69 2.98 10.78%

Page 6: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Fig. 2. Distribution of the number of links.

Table 5The numbers of elements used in financial statements.

356 H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

nodes, but a few nodes have a large number direct links to other nodes.For example, the median and maximum degrees in the presentationgraph corresponding to the 2011 Taxonomy are 1 and 391, respectively.Fig. 2 shows the number of links versus the occurrence frequency on alog–log scale.

In summary, both versions the GAAP taxonomies are complex. The2011 Taxonomy is more complex, but its elements are better organizedthan the 2009 Taxonomy.

5.2. Completeness and relevancy of GAAP taxonomies

Completeness (C) and relevancy (R) of the two versions of the Tax-onomy from a user's perspective are measured using the datasets. Theiraverage values and standard deviations (in parentheses) from an indi-vidual user's perspective are provided in Table 4.

On average, both versions of the GAAP Taxonomy have high com-pleteness and low relevancy. This is because most of the data elementsused by a user are from the GAAP Taxonomy, hence the high complete-ness. However, the number of GAAP elements used by a user is verysmall compared to the total number of elements defined in the GAAPTaxonomy, hence the low relevancy. As shown in Table 5, when a user(i.e., a filing company) uses the 2011 GAAP Taxonomy, on average a fi-nancial statement contains less than 150 data elements and nearly 110of the elements are from the GAAP Taxonomy. The proportions for the2009 Taxonomy users are similar. The large variances indicate signifi-cant heterogeneity of Taxonomy users in terms of the number of stan-dard or custom elements used.

Another observation from Tables 4 and 5 is that the average com-pleteness, relevancy and the number of standard elements used differamong different groups of users. Table 6 presents the p-values to testthe statistical significance of the differences in completeness and

Table 4Completeness and relevancy of the GAAP Taxonomies from individual user's perspective.

1st year filer 2nd year filer Both user types

C R C R C R

2009 Taxonomy 0.8660 0.0098 0.6980 0.0175 0.7813 0.0137(0.1101) (0.0041) (0.1273) (0.0050) (0.1458) (0.0060)

2011 Taxonomy 0.9190 0.0082 0.7806 0.0182 0.8972 0.0098(0.0864) (0.0034) (0.1093) (0.0050) (0.1035) (0.0052)

Both versions 0.9165 0.0083 0.7628 0.0181 0.8880 0.0101(0.0883) (0.0035) (0.1182) (0.0050) (0.1118) (0.0054)

relevancy. A p-value less than 0.05 indicates that the difference issignificant.

Completeness and relevancy forfirst year and second year taxonomyusers are statistically different. In fact, to second year users, both taxon-omies are less complete andmore relevant. Themain reason is that sec-ond year filers are required by the SEC to annotate their financialstatements with more details (so-called “detailed tagging”), thereforethey typically expand their usage of both standard and custom ele-ments. An increased use of GAAP elements results in a higher relevancyof the data standard. When the increase of custom elements outnum-bers the increased use of GAAP elements, completeness decreases.

The measured values are also different for the two versions of theGAAP Taxonomy. Except for second year filers' relevancy, all other dif-ferences are statistically significant between the two taxonomy ver-sions. The 2011 Taxonomy has higher completeness and lowerrelevancy than those of the 2009 Taxonomy. We observe that in thedataset a majority of financial statements are from first year filers whoused the 2011 Taxonomy. Without detailed tagging, they use a smallernumber of data elements, among which the proportion of GAAP ele-ments are larger than that of the 2009 Taxonomy users. Thus the 2011Taxonomy has a higher completeness from these users' perspective.Similarly, the 2011 Taxonomy users in the dataset used a smaller num-ber of GAAP elements (in absolute value, see Table 5), and at the sametime, the 2011 Taxonomy has 3.33% more concrete elements. Thus themeasured completeness is lower.

From a user community's perspective, both versions of the GAAPTaxonomy have substantially lower completeness and higher relevancyin comparison to the measurements for individual users (see Table 7).

Standard Custom Both Standard Custom Both

2009 Taxonomy (N = 121) 2011 Taxonomy (N = 1412)Min 46 0 86 6 0 30Max 322 348 657 377 302 638Mean 148.09 59.33 229.63 108.91 18.6 147.88Std dev 64.18 64.84 123.52 58.23 32.47 85.86

1st year filers (N = 1249) 2nd year filers (N = 284)Min 6 0 30 23 0 83Max 355 348 638 377 348 657Mean 91.92 59.33 122.40 200.32 71.52 294.78Std dev 38.61 64.84 52.75 55.54 55.42 96.59

Page 7: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Table 6p-Values of two-tailed t-test with equal mean as null hypothesis.

1st year vs. 2ndyear

2009 vs. 2011

C R C R

2009 Taxonomy 0.0000 0.0000 1st year 0.0005 0.00302011 Taxonomy 0.0000 0.0000 2nd year 0.0000 0.1687Both versions 0.0000 0.0000 Both filers 0.0000 0.0000

Table 8Interoperability among 121 financial statements based on GAAP 2009.

Ii,j Ii,j′ Ii,j.k Ii,j,k′

Min 0.0494 0.0791 0.0200 0.0300Max 0.7168 0.9217 0.4416 0.7991Mean 0.2031 0.2952 0.1192 0.1735Standard deviation 0.0748 0.0878 0.0463 0.0576

357H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

Even though an individual user may use only a couple hundred ele-ments from the GAAP Taxonomy, these elements do not always overlapbetween users. Thus collectively, more Taxonomy elements are used bythe user community, hence a higher relevancy. Meanwhile, the userscollectively introduce a large number of custom elements.When the in-crease of custom elements outpaces the increased use of Taxonomy el-ements, completeness decreases. For example, the 121 companies whoused the 2009 Taxonomy collectively used 2736 elements from the Tax-onomy and introduced 7179 custom elements. The 1412 companieswho used the 2011 Taxonomy together used 5244 GAAP elements andintroduced 26,269 custom elements.

We shall point out that the measurements for a community changewith the size and characteristics of the community. Relevancy semi-monotonically increases with the size of the community because anyadditional elements from a standard used by the community increasesthe standard's relevancy to the community. Completeness dependsnot only on standards elements used, but also on the custom elementsintroduced. Within the dataset, we can only have a fair comparison be-tween first year filers and second year filers who used the 2009 Taxon-omy (60 filers vs. 61 filers). To second year filers, the Taxonomy haslower completeness and higher relevancy. As explained earlier for indi-vidual users, this is largely due to the SEC requirements on detailedtagging.

5.3. Interoperability of data created using GAAP Taxonomy

Out of the 121 financial statements based on the GAAP 2009 Taxon-omy, we have computed the interoperability of 7260 pairs. The summa-ry statistics of the interoperability scores Ii,j, Ii,j′, Ii,j,k, and Ii,j,k′ arereported in Table 8.

As shown in the first column of Table 8, the average pair-wise inter-operability score is only 0.2031. That is, investors can conveniently com-pare only about 20.31% of the financial information from twocompanies' statements.

While allowing flexibility, the usage of non-standard elements cer-tainly affects interoperability. Many custom elements extend theGAAP elements to allow for more detailed, company-specific reporting.If investors do not consider company-specific elements when compar-ing companies' financial statements, the interoperability can be com-puted based on GAAP elements only. The results for this scenario arereported in the second column of Table 8. The interoperability scorefor the dataset is 29.52%.

The interoperability of the financial statements from three compa-nies is expected to be lower than that of two companies (see the thirdand fourth columns in Table 8). On average, 11.92% of the financial in-formation from three companies' XBRL statements is comparable. Ifonly GAAP elements are considered, 17.35% of the financial informationis comparable.

Table 7Completeness (C) and Relevancy (R) of the GAAP Taxonomy from a user community's perspec

1st year filers 2nd year

# C R #

2009 Taxonomy 60 0.5317 0.1384 612011 Taxonomy 1189 0.2733 0.3748 223

The 2011 GAAP Taxonomy leads to higher interoperability amongthe financial statements, as shown in Table 9. The interoperabilityamong second-yearfilings is lower than that among thefirst year filings.Second year filers use more custom elements due to detailed tagging,resulting in lower interoperability. The differences among all datasetsare statistically significant (p b 0.01) using a two-tailed test except forthe difference between the 2009 and 2011 taxonomies measured byI2′ among second year filers.

Since comparisons are usually performed for companies in the sameindustry, we classified the companies into industries according to thefirst two digits of the NAICS (North American Industry ClassificationSystem) code and calculated the interoperability of firms within thesame industry. The interoperability between financial statements variesby industry (Table 10). For example, the manufacturing industry hashigher interoperability among its financial statements, likely due to itsestablished financial accounting structure. The finance and insuranceindustry, on the other hand, has the lowest interoperability likely dueto its intrinsic accounting complexity and the diversity of financialstructures within the industry. Except for the finance and insurance in-dustry, financial statements within a given industry have higher inter-operability than financial statements from all industries. For mostindustries the interoperability among the financial statements basedon the 2011 GAAP Taxonomy is higher than that of the financial state-ments based on the 2009 GAAP Taxonomy.

It is interesting to observe that the 2011 GAAP Taxonomy is morecomplex, yet financial statements based on it generally have a higherpair-wise interoperability score. Why? Recall that to most individualusers, the 2011 Taxonomy is more complete but less relevant, whichmeans that these users use a smaller number of elements from the Tax-onomy (low relevancy) but these elements represent a larger fraction ofall the data elements needed by the users (high completeness). As auser community, the Taxonomy elements being used tend to overlapmore, thus together the number of Taxonomy elements used is actuallysmaller compared to 2009 Taxonomy users. As a result, the community-wise relevancy is higher for the 2011 users (see Table 7). When compa-nies use more common data elements in the Taxonomy, the interoper-ability among their financial statements increases.

As companies start to reportmore details in their second year filings,the interoperability of second yearfilerswithin the same industry is alsolower than that of the financial statements from the first year filers.Table 11 shows the interoperability among first year and second yearfilers using the 2009 GAAP Taxonomy, for several major industries.

6. Discussion

We have presented a use-centric framework with metrics that canbe measured using automated methods to assess the multiple dimen-sions of data standard quality. The evaluation employs a large real-world dataset and produces measurements that are corroborated by

tive.

filers All filers

C R # C R

0.2851 0.2165 121 0.2759 0.25330.2197 0.3646 1412 0.1664 0.4699

Page 8: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Table 9Interoperability of financial statements based on Taxonomy version and filing status.

1st Year Filers 2nd Year Filers All Filers

I2 I2′ I2 I2′ I2 I2′

2009 Taxonomy 0.2363 0.3154 0.1985 0.3053 0.2031 0.29522011 Taxonomy 0.2450 0.3695 0.2208 0.3074 0.2306 0.3084

Table 11Mean interoperability by industry for 1st year and 2nd year filings based on 2009Taxonomy.

1st year filings 2nd year filings

Industry N I2 I2′ N I2 I2′

Finance and Insurance 21 0.2389 0.3203 9 0.1613 0.2858Information 6 0.3830 0.4911 3 0.2251 0.3669Manufacturing 12 0.3755 0.4711 15 0.2516 0.3711All industries 60 0.2363 0.3154 61 0.1985 0.3153

358 H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

evidence in practice in terms of differing user requirements and evolv-ing data standards.

The evaluation indicates that the metrics and automated methodsare effective in measuring themultiple aspects of data standard quality.Complexity metrics used in our framework have been used elsewhereto measure complexity of artifacts such as ontology and software[23,33,72]. The other metrics are novel contributions of our work. Asmentioned in Section 2, completeness in most metadata quality studiesexamines whether the values for certain metadata elements are sup-plied in the instances. This definition does not examine whether thedata standard (i.e., the metadata schema and vocabulary) has every-thing that a user requires. The notion of completeness in anXBRL usabil-ity study is similar to ours [7], but their method is completely manual.Relevancy is clearly task specific. Our metric is use-centric and allowsfor a direct measurement. We are not aware of a similar metric in liter-ature. For example, to measure an ontology's relevancy, the method in[10] selects the information search task and uses the count of sub-class–superclass relationship as a proxy. This is at best an indirect mea-surement. We have not seen any use of effectual metrics such asinteroperability among data instances in the literature.

The framework is useful to decisionmakers in both standards devel-opment organizations and user communities of data standards (includ-ing data producers as well as data consumers). Standards developerscan use the framework and the measurement methods to continuouslymonitor and improve their data standards. For standards users whoneed to select from multiple competing data standards, the frameworkprovides useful tools for them to make comparisons. The framework isalso useful to consumers of standards-based data as they also need tobe aware of the quality of both the standard and the data.

It is tempting to provide guidelines for decidingwhether a data stan-dard has a high or a low quality based on the measurements using theframework. Unfortunately, this cannot be donewithout considering ap-plication context. For example, without knowing context specifics, onecannot (and should not) answer questions such as “a standard is 80%complete, is it good or bad” and “a standard is 75% relevant to all userscollectively, is it good or bad”. The measurements, however, can helpstandards makers and standards users in their decision making. For ex-ample, the team at FASB who maintains the GAAP Taxonomy has beenidentifying frequent custom data elements and selectively adding themost common ones to the next iteration of the Taxonomy. The team isaware of the trade-offs between completeness and relevancy and doesnot attempt to make the Taxonomy 100% complete. In fact, FASBwould like to allow data producers to design custom elements outside

Table 10Mean interoperability by industry.

2009 Taxonomy 2011 Taxonomy

Industry N I2 I2′ N I2 I2′

Finance and Insurance 30 0.2023 0.2921 316 0.2285 0.3028Information 7 0.2739 0.3838 45 0.2639 0.3614Manufacturing 27 0.2709 0.3725 385 0.3055 0.3998Mining 10 0.2119 0.3366 75 0.2416 0.3289Professional 3 0.3002 0.4117 47 0.2959 0.3861Transport and warehousing 7 0.2051 0.3218 26 0.2772 0.3645Utilities 9 0.2306 0.3734 32 0.2481 0.3370Wholesale trade 3 0.3570 0.4611 20 0.3286 0.4158Other 25 0.2136 0.3155 466 0.2307 0.3159All 121 0.2031 0.2952 1412 0.2306 0.3084

the standard to encourage voluntary report of detailed financial infor-mation. By examining the metrics from our proposed quality measure-ment framework, the FASB taxonomy team can optimize the Taxonomyto achieve the balance between uniformity and flexibility.

It is also tempting to combine metrics to obtain a single measure-ment. However, we do not think that there is a universal formula forcombining the metrics because the relative importance of quality di-mensions varies by application context. A standard can be good in cer-tain aspects but not so good in certain other aspects. Additionally,different metrics are more valuable at different stages of a datastandard's life cycle. For example, intrinsic metrics can be used at thedevelopment stage, contextual and effectual metrics are suitable in thepilot and production stage when users have begun to use the standard.However, if a single measure is absolutely necessary, one can usemethods such as weighted harmonic mean to combine multiple mea-surements. The usefulness of such weightedmeasures may be an inter-esting topic for future research.

The framework has several limitations. Additional dimensions canbe added. We have purposely left out syntax-based metrics becausewith the increasing use of software that comes with robust syntacticvalidation functions, syntactic errors can be largely avoided. However,additional dimensions based on semantics are desirable. For example,a large data standard is likely to contain semantically equivalent data el-ements. Redundancy, or minimality, is a good metric that can be poten-tially measured semi-automatically by adapting the techniquesdeveloped for schema and ontology matching and mapping [15–17]. Itwould be also interesting to study the impact of redundancy oncompleteness and relevancy. There is a trade-off between adding re-dundancy to the standard to promote its perceived completeness andfitness for use, and the resulting decrease in relevancy because of theenlarged standard. The methods for measuring the interoperability ofstandards-based data can be further enhanced. As shown in [18],ontology-based semantic mappings (when available) can be utilizedto improve interoperability, especially interoperability of data instancescreated under different standards. Accuracy, as defined ontologically in[64], is also an important dimension. However, there is no known auto-matic method to measure accuracy. This is partially because there is nodefinitive relationship between signs (e.g., element names) and mean-ings [65].

Another limitation of the framework is that its contextual and effec-tual qualitymetrics rely on the availability of observations on howusersuse a data standard. When such observations are unavailable, surveysmay be used as an alternative. For example, surveys have been used toassess the user-perceived quality of metadata for digital content[43,73]. Perception gaps between different stakeholders can be usefulfor solving problems in quality management [26]. Although the surveymethod involves humans in the loop and can identify issues not discov-erable using automated methods, it can also introduce bias.

7. Conclusion and future research

We have developed a framework for assessing the quality of large-scale data standards and empirically evaluated this framework using areal-world data standard. The results show that the quality dimensionsand metrics can be used to effectively assess several important aspects

Page 9: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

359H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

of data standard quality. Furthermore, our analysis of XBRL GAAP Tax-onomy and XBRL data provides timely insights for the financialreporting practice. For example, through our dissemination effort andactive engagement with the XBRL community, Taxonomy designershave learnedmore about the trade-off between completeness and rele-vancy. They realize that they should not expect to totally eliminate cus-tom elements by exhaustively expanding the Taxonomy. Instead, theyselectively add commonly created custom elements to the next versionof the Taxonomy. This approach can increase data interoperability withaminimum adverse impact on the Taxonomy's complexity and relevan-cy. They have also begun to designate certain standard elements asindustry-specific and add industry-specific standard elements, asthese data elements are commonly used by a particular industry. Suchindustry specifications will increase both completeness and relevancyof the standard.

For future research, we will enhance the framework for data stan-dard quality and expand the evaluation. All limitations discussed earlierwill be addressed. For example, wewill developmethods for identifyingredundancy in data standards and assessing its impacts on interopera-bility of data instances. In addition, we will analyze the revision notesof the new XBRL taxonomy to infer the design objectives of the Taxon-omy revisions, and empirically validate whether the design objectiveshave been achieved and whether the revisions have contributed tohigher standards quality. We will develop mathematical models onhow the complexity of a standard and other factors such as decisionmaking process and user requirement affect the interoperability ofdata produced by different data producers. We will also explore thefeasibility of developing utility functions [20] and use them to provideguidelines for decision makers of different standards stakeholders.Due to limited computing resources, we did not perform any k-interoperability analysis for k N 3 in this study. We will use cloud com-puting resources to analyze k-interoperability among data created bydifferent standard users. The results will allow us to understand the im-pact of standard quality on the interoperability of data as k increases.We will also evaluate our quality assessment framework using datastandards other than XBRL such as HL7 in the healthcare domain.

In summary, we believe that we have made an important step to-wards developing systematicmethods for assessing data standard qual-ity.With the exponential growth of data, standards play an increasinglyimportant role in improving data usability and the effectiveness of data-intensive systems. Further efforts are needed so that we can effectivelyassess and improve the quality of large-scale data standards.

Acknowledgments

This research is supported in part by the National Science Founda-tion under grant #1355683. The authors gratefully acknowledge helpfulcomments from Arnon Rosenthal and the review team on earlier ver-sions of the paper.

References

[1] J.W. Bartley, Y.A. Chen, E.Z. Taylor, A Comparison of XBRL Filings to Corporate 10-Ks—Evidence From the Voluntary Filing Program, SSRN, 2010. (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1397658).

[2] M.d.C.A.O.M. Batista, A.C. Salgado, Information quality measurement in data integra-tion schemas, VLDB'07 Workshop on Quality in Databases, VLDB Endowment andACM, Vienna, Austria, 2007, pp. 61–72.

[3] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web, Scientific American, 2001.34–43.

[4] P.A. Bernstein, L.M. Haas, Information integration in the enterprise, Communicationsof the ACM 51 (9) (2008) 72–79.

[5] E.J. Boritz, W.G. No, Auditing an XBRL Instance Document: The Case of United Tech-nologies Corporation, University of Waterloo, 2008.

[6] E.J. Boritz, W.G. No, SEC's XBRL Voluntary Program on Edgar: The Case for QualityAssurance, SSRN, 2008, (http://ssrn.com/abstract=1163254).

[7] M. Bovee, M.L. Ettredge, R.P. Srivastava, M.A. Vasarhelyi, Does the year 2000 XBRLTaxonomy accommodate current business financial-reporting practice? Journal ofInformation Systems 16 (2) (2002) 165–182.

[8] M. Bovee, A. Kogan, K. Nelson, R.P. Srivastava, M.A. Vasarhelyi, Financial Reportingand Auditing Agent with Net Knowledge (FRAANK) and eXtensible BusinessReporting Language (XBRL), Journal of Information Systems 19 (1) (2005) 19–41.

[9] T.R. Bruce, D. Hillmann, The continuum of metadata quality: defining, expressing,exploiting, in: D. Hillmann, E.L. Westbrooks (Eds.), Metadata in Practice, AmericanLibrary Association, Chicago, 2004, pp. 238–256.

[10] A. Burton-Jones, V.C. Storey, V. Sugumaran, P. Ahluwalia, A semiotic metricssuite for assessing the quality of ontologies, Data & Knowledge Engineering55 (1) (2005) 84–102.

[11] L. Cao, H. Zhu, Normal accidents: data quality problems in ERP-enabledmanufactur-ing, ACM Journal of Data and Information Quality 4 (3) (2013) 11:11–11:26.

[12] K.H. Chou, How valid are they? An examination of XBRL voluntary filing documentswith the SEC EDGAR system, 14th International XBRL Conference, Philadelphia, USA,2006.

[13] H. Crawford, Encyclopedias, in: R. Bopp, L.C. Smith (Eds.), Reference and InformationServices: An Introduction, Libraries Unlimited, Englewood, 2001, pp. 433–459.

[14] A. David, P. Graham, Eight key issues for the decision support systems discipline, De-cision Support Systems 44 (3) (2008) 657–672.

[15] H.-H. Do, E. Rahm, Matching large schemas: approaches and evaluation, InformationSystems 32 (6) (2006) 857–885.

[16] A. Doan, A.Y. Halevy, Semantic integration research in the database community: abrief survey, AI Magazine 26 (1) (2005) 83–94.

[17] A. Doan, J. Madhavan, P. Domingos, A.Y. Halery, Learning tomap between ontologies onthe semanticweb, The 11th InternationalWorldWideWeb Conference (WWW), 2002.

[18] J. Du, L. Zhou, Improving financial data quality using ontologies, Decision SupportSystems 54 (2012) 76–86.

[19] F. Duchateau, Z. Bellahsene, Measuring the quality of an integrated schema, Concep-tual Modeling — ER 2010, Springer, Vancouver, BC, Canada, 2010, pp. 261–273.

[20] A. Even, G. Shankaranarayanan, P.D. Berger, Evaluating a model for cost-effectivedata quality management in a real-world CRM setting, Decision Support Systems50 (1) (2010) 152–163.

[21] M. Fernández, C. Overbeeke, M. Sabou, E. Motta, What makes a good ontology? Acase-study in fine-grained knowledge reuse, Proceedings of the 4th Asian Confer-ence on the Semantic Web, Springer-Verlag, Shanghai, China, 2009, pp. 61–75.

[22] E. Folmer, Quality of Semantic Standards, Ph.D. Thesis University of Twente, 2012.[23] A. Gangemi, C. Catenacci, M. Giaramita, J. Lehmann, R. Gil, F. Bolici, O. Strignana,

Ontology Evaluation and Validation, Laboratory for Applied Ontology, ISTC-CNR,Trento, Italy, 2005.

[24] W. Harrison, An entropy-based measure of software complexity, IEEE Transactionson Software Engineering 18 (11) (1992) 1025–1029.

[25] J. Krogstie, G. Sindre, H. Jørgensen, Process models representing knowledge for ac-tion: a revised quality framework, European Journal of Information Systems 15(1) (2006) 91–102.

[26] Y.W. Lee, D. Strong, B. Kahn, R.Y. Wang, AIMQ: AMethodology for Information Qual-ity Assessment, Information and Management 40 (2) (2002) 133–146.

[27] O.I. Lindland, G. Sindre, A. Sølvberg, Understanding quality in conceptual modeling,IEEE Software 11 (2) (1994) 42–49.

[28] Y. Ma, B. Jin, Y. Feng, Semantic oriented ontology cohesion metrics forontology-based systems, Journal of Systems and Software 83 (1) (2010) 143–152.

[29] D. MacKenzie, Computer-related accidental death: an empirical exploration, Scienceand Public Policy 21 (4) (1994) 233–248.

[30] S.E. Madnick, R.Y. Wang, Y.W. Lee, H. Zhu, Overview and framework for data and in-formation quality research, ACM Journal of Data and Information Quality 1 (1)(2009)(Article #2).

[31] S.E. Madnick, H. Zhu, Improving data quality with effective use of data semantics,Data and Knowledge Engineering 59 (2) (2006) 460–475.

[32] M.L. Markus, C.W. Steinfield, R.T. Wigand, G. Minton, Industry-wide informationsystems standardization as collective action: the case of the U.S. residential mort-gage industry, MIS Quarterly 30 (Special Issue) (2006) 439–465.

[33] T.J. McCabe, A complexity measure, Proceedings of the 2nd International Conferenceon Software Engineering, IEEE Computer Society Press, San Francisco, California,United States, 1976, p. 407.

[34] G.A. Miller, The magical number seven, plus or minus two: some limits on our ca-pacity for processing information, Psychological Review 63 (2) (1956) 81–97.

[35] D.L. Moody, Theoretical and practical issues in evaluating the quality of conceptualmodels: current state and future directions, Data & Knowledge Engineering 55 (3)(2005) 243–276.

[36] D.L. Moody, G. Sindre, T. Brasethvik, A. Sølvberg, Evaluating the quality of informa-tion models: empirical testing of a conceptual model quality framework, Proceed-ings of the 25th International Conference on Software Engineering, IEEE ComputerSociety, Portland, Oregon, 2003, pp. 295–305.

[37] H.J. Nelson, G. Poels, M. Genero, M. Piattini, A conceptual modeling quality frame-work, Software Quality Journal 20 (1) (2012) 201–228.

[38] P.G.A.P. Neumann, Computer Related Risks, ACM Press/Addison Wesley PublishingCo., New York, 1995.

[39] N.F. Noy, A. Doan, A.Y. Halevy, Semantic Integration, AI Magazine 26 (1) (2005) 7–9.[40] X. Ochoa, E. Duval, Automatic evaluation of metadata quality in digital repositories,

International Journal on Digital Libraries 10 (2/3) (2009) 67–91.[41] A.M. Orme, H. Yao, L.H. Etzkorn, Coupling metrics for ontology-based systems, IEEE

Software 23 (2) (2006) 102–108.[42] J. Pak, L. Zhou, A framework for ontology evaluation, in: R. Sharman, H.R. Rao, T.S.

Raghu (Eds.), Exploring the Grand Challenges for Next Generation E-business,Springer, Berlin Heidelberg, 2011, pp. 10–18.

[43] N. Palavitsinis, N. Manouselis, S.S. Alonso, Evaluation of a metadata application pro-file for learning resources on organic agriculture, in: F. Sartori, M.A. Sicilia, N.Manouselis (Eds.), MSTR 2009, CCIS 46, Springer-Verlag, Berlin, 2009, pp. 270–281.

Page 10: Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

360 H. Zhu, H. Wu / Decision Support Systems 59 (2014) 351–360

[44] J.-R. Park, Metadata quality in digital repositories: a survey of the current state of theart, Cataloging and Classification Quarterly 47 (3/4) (2009) 213–228.

[45] C. Perrow, Normal Accidents: LivingWith High-risk Technologies, Princeton Univer-sity Press, Princeton, NJ, 1999.

[46] Personal Communication, Discussion with participants of MIT Information QualityIndustrial Symposium in past five years, 2011.

[47] E. Rahm, P.A. Bernstein, A survey of approaches to automatic schema matching,VLDB Journal 10 (4) (2001) 334–350.

[48] E. Rahm, H.-H. Do, S. Maßmann, Matching large XML schemas, ACM SIGMOD Record33 (4) (2004) 26–31.

[49] T.C. Redman, Data Quality for the Information Age, Artech House, Boston, MA, 1996.[50] S. Roohani, X. Zhao, E.A. Capozzoli, B. Lamberton, Analysis of XBRL literature: a de-

cade of progress and puzzle, The International Journal of Digital Accounting Re-search 10 (2010) 131–147.

[51] A. Rosenthal, L. Seligman, M.D. Allen, A. Chapman, Fit for purpose: toward anengineering basis for data exchange standards, International IFIP WorkingConference on Enterprise Interoperability Information, Services and Process-es for the Interoperable Economy and Society, Springer, Enschede, TheNetherlands, 2013, pp. 91–103.

[52] A. Rosenthal, L. Seligman, S. Renner, From semantic integration to semanticsmanagement: case studies and a way forward, ACM SIGMOD Record 33 (4)(2004) 44–50.

[53] N. Shadbolt, T. Berners-Lee, W. Hall, The semantic web revisited, IEEE Intelligent Sys-tems 21 (3) (2006) 96–101.

[54] G. Shankaranarayanan, Y. Cai, Supporting data quality management indecision-making, Decision Support Systems 42 (1) (2005) 302–317.

[55] E.P.B. Simperl, C. Tempich, Ontology engineering: a reality check, in: M. Robert, T.Zahir (Eds.), OTM Conferences, Springer, 2006, pp. 836–854.

[56] V.C. Storey, R.M. Dewan, M. Freimer, Data quality: setting organizational policies,Decision Support Systems 54 (1) (2012) 434–442.

[57] B. Stvilia, A model for ontology quality evaluation, First Monday 12 (12) (2007).[58] B. Stvilia, L. Gasser, Value based metadata quality assessment, Library and Informa-

tion Science Research 30 (1) (2008) 67–74.[59] B. Stvilia, L. Gasser, M.B. Twidale, L.C. Smith, A framework for information quality as-

sessment, Journal of the American Society for Information Science and Technology58 (12) (2007) 1720–1733.

[60] S.A. Sutton, Metadata quality, utility and the semantic web: the case oflearning resources and achievement standards, Cataloging and Classifica-tion Quarterly 46 (1) (2010) 81–107.

[61] J. Sweller, Cognitive load during problem solving: effects on learning, Cognitive Sci-ence 12 (2) (1988) 257–285.

[62] S. Tartir, I.B. Arpinar, M. Moore, A.P. Sheth, OntoQA: metric-based ontology qualityanalysis, IEEE International Conference on Semantic Computing, 2005.

[63] H.Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, S. Hübner,Ontology-based integration of information— a survey of existing approaches, IJCAI-01Workshop: Ontologies and Information Sharing, Seattle, WA, 2001, pp. 108–117.

[64] Y. Wand, R.Y. Wang, Anchoring data quality dimensions in ontological foundations,Communications of the ACM 39 (11) (1996) 86–95.

[65] Y. Wand, R. Weber, On the deep structure of information systems, Information Sys-tems Journal 5 (3) (1995) 203–223.

[66] R.Y. Wang, M.P. Reddy, H.B. Kon, Toward quality data: an attribute-based approach,Decision Support Systems 13 (3–4) (1995) 349–372.

[67] R.Y. Wang, D.M. Strong, Beyond accuracy: what data quality means to data con-sumers, Journal of Management Information Systems 12 (4) (1996) 5–33.

[68] J. Weagley, E. Gelches, J.-R. Park, Interoperability and metadata quality in digital videorepositories: a study of Dublin Core, Journal of Library Metadata 10 (1) (2010) 37–57.

[69] E.J. Weyuker, Evaluating Software complexity measures, IEEE Transactions on Soft-ware Engineering 14 (9) (1988) 1357–1365.

[70] XBRL International, Extensible Business Reporting Language (XBRL) 2.1, XBRL Inter-national, 2006.

[71] H. Yao, A.M. Orme, L. Etzkorn, Cohesion metrics for ontology design and application,Journal of Computer Science 1 (1) (2005) 107–113.

[72] H. Zhang, Y.-F. Li, H.B.K. Tan, Measuring design complexity of semantic web ontol-ogies, Journal of Systems and Software 83 (5) (2010) 803–814.

[73] Y. Zhang, Y. Li, A user-centered functional metadata evaluation ofmoving image col-lections, Journal of the American Society for Information Science and Technology 59(8) (2008) 1331–1346.

[74] H. Zhu, L. Fu, Towards quality of data standards: empirical findings from XBRL,The 30th International Conference on Information Systems (ICIS'09), Phoenix, AZ, USA,2009.

[75] H. Zhu, H. Wu, Interoperability of XBRL financial statements in the U.S. InternationalJournal of E-Business Research 7 (2) (2011) 18–33.

Hongwei Zhu holds a Ph.D. fromMIT and is an associate professor of Information Systems attheUniversity ofMassachusetts Lowell.His researchaims to improve thequality of data stan-dards and standards-based data. He is a member of AIS and the Best Practices Committee ofXBRL US.

Harris Wu received his Ph.D. in Business Administration from the University of Michiganat Ann Arbor. He is an associate professor of Information Technology at the Old DominionUniversity. His research interests include social computing, data quality, complex systems,and inter-organizational collaboration. He is a member of AIS and XBRL US.


Recommended