+ All Categories
Home > Documents > TEXTUAL BUSINESS INTELLIGENCE

TEXTUAL BUSINESS INTELLIGENCE

Date post: 08-Apr-2018
Category:
Upload: piripiccione
View: 218 times
Download: 2 times
Share this document with a friend

of 13

Transcript
  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    1/13

    Bill InmonForest Rim TechnologyPO Box 210200 Wilcox StreetCastle Rock, CO

    [email protected]

    TEXTUAL BUSINESS INTELLIGENCE

    By W H Inmon

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    2/13

    In the beginning there were applications systems. And for a variety of reasons not the

    least was no corporate data organizations began to build data warehouses. In a shortamount of time data warehouses began to spread around the world.

    ETL

    And with data warehouses came the corporations opportunity to look at informationacross the corporation. But building a data base of integrated, historical, granular data

    was not enough. As powerful as data can be inside a data warehouse, unless the end usercan unleash the potential of the data warehouse, there wasnt much value in building a

    data warehouse.

    CLASSICAL BUSINESS INTELLIGENCE

    Soon appeared Business Intelligence (BI). BI is the software that is needed to go into a

    data warehouse and examine the data, the relationships, and the information that is found

    there. Once BI appeared, organizations could make sense of the data found in the data

    warehouse.

    ETL

    BI

    With BI, organizations could create reports, transactions, and very sophisticated analysisof the data found in the data warehouse. Among other things, graphical displays of

    information was popular. The granular data found in the data warehouse provided a very

    firm foundation for the analysis and discovery of corporate information.

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    3/13

    ETL

    BI

    BI was designed to operate on the data found in the data warehouse. And exactly what

    was the essential nature of the data found in the data warehouse? The data found in the

    classical data warehouse was

    - Numeric, where numbers can be added and subtracted- repetitive, where the same type of values occur over and over- where other supporting data points of interest surround and are attached to the

    numeric data.

    ETL

    - numeric data- repetitive data- pints of interest data

    CONTENTS OF THE CLASSICAL DATA WAREHOUSELets take a look at what the typical contents of a data warehouse look like. Suppose an

    organization has an integrated list of the checks written by individuals, the date of the

    check, and the location where the check was written.

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    4/13

    - numeric data- repetitive data- pints of interest data

    Aug 3, 2010 Sarah Inmon Lima, Peru 2,347.70Aug 3, 2010 Brook Sadler La Pax, Mx 3,346.87Aug 3, 2010 Beasley Smith Bogota, Chile 10,245.36Aug 3, 2010 Tony Velez Caracas, VZ 4,556.12Aug 4, 2010 Nancy Jones Juarez, Mx 7,114.09Aug 4, 2010 Rbta Ross Rio, Brasil 9,109.23Aug 4, 2010 Joan Jett Sao Paolo, Bz 3,339.87Aug 4, 2010 Glen Frey Brasilia, Bz 2,109.25....................................................................................

    Such an arrangement of data might be typical for the contents of a classical data

    warehouse. Consider how the data might be used for analysis. Some data will be used for

    selecting data and discriminating data from other data. Other data numeric data willbe used in calculations and comparisons. Given the typical data base that has been

    described, the analyst could examine such things as

    - how many checks were written on a given day- how much money was spent in Peru- how much money changed hands in South America- what was the largest check written- and so forth.-

    - numeric data- repetitive data- pints of interest data

    Aug 3, 2010 Sarah Inmon Lima, Peru 2,347.70Aug 3, 2010 Brook Sadler La Pax, Mx 3,346.87Aug 3, 2010 Beasley Smith Bogota, Chile 10,245.36Aug 3, 2010 Tony Velez Caracas, VZ 4,556.12Aug 4, 2010 Nancy Jones Juarez, Mx 7,114.09Aug 4, 2010 Rbta Ross Rio, Brasil 9,109.23Aug 4, 2010 Joan Jett Sao Paolo, Bz 3,339.87Aug 4, 2010 Glen Frey Brasilia, Bz 2,109.25....................................................................................

    Selectiongroupingdiscrete reporting

    Additionsubtractionmultiplicationcomparison

    In order to do the analysis and the calculations, a BI tool could be used on top of the data.

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    5/13

    - numeric data- repetitive data- pints of interest data

    Aug 3, 2010 Sarah Inmon Lima, Peru 2,347.70Aug 3, 2010 Brook Sadler La Pax, Mx 3,346.87Aug 3, 2010 Beasley Smith Bogota, Chile 10,245.36Aug 3, 2010 Tony Velez Caracas, VZ 4,556.12Aug 4, 2010 Nancy Jones Juarez, Mx 7,114.09Aug 4, 2010 Rbta Ross Rio, Brasil 9,109.23Aug 4, 2010 Joan Jett Sao Paolo, Bz 3,339.87Aug 4, 2010 Glen Frey Brasilia, Bz 2,109.25....................................................................................

    BI

    For many years data warehouse and BI worked more or less as described. And eventhough they are not aware of it BI tools focused on operating on repetitive numeric data

    for calculation and comparison, where nonnumeric data served the purpose of allowing

    data to be selected and grouped together. And BI tools simply expected to find repetitive

    data in the data warehouse, where the same type of data was repeated over and over.

    ENTER THE UNSTRUCTURED DATA WAREHOUSE

    But a profound change in data warehousing has occurred. Today it is possible to build an

    entirely different kind of data warehouse. Today it is possible to build a data warehouse

    that is based on text. And text has an entirely different set of properties than data that

    classically has been placed in a data warehouse.

    Now it is possible to build a data warehouse using textual ETL, such as that provided by

    Forest Rim Technology. Now unstructured text can be read into Textual ETL and a new

    type of data warehouse can be built.

    Textual

    ETL

    Text

    Text

    The contents of an unstructured data warehouse are very different than that of a classical

    data warehouse. The contents of an unstructured data warehouse are not surprisingly

    text. However, the text that arrives in the unstructured data warehouse is formatted into a

    standard relational data base. For years organizations have been able to place text in a

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    6/13

    relational data base in the form of blobs. But once text is placed into a relational data base

    in the form of a blob, there is not a lot that can be done with it.

    Instead textual ETL passes the text through a myriad of algorithms before the text is

    placed in the relational data base. (NOTE: most of the important algorithms are patent

    pending. See Forest Rim Technology for licensing opportunities.) The net result is arelational data base that can be used for analytical purposes.

    Textual

    ETL

    Text

    Text...As we see it, the e vent was a success......Thanks for the sale. I enjoyed it greatly......I think you ought to know about what went......I went down the street and saw the same thing......I want to return my purchase......I found a stain on the bottom of the......Let me tell you how pleased I was when I......Your salesperson is outrageous. Do you know......Your products are great but your service is lousy......I want my money back...

    We went for a walk down the park lane. It went by the riverand there was a bridge to cross the creek at one point. Shewas talking so intently that s he never even noticed the bridge.She was wrapped up in her thoughts. First there had been theoss at graduate school. Then there was the Derek. And Tally.She just couldnt take it any more. There had to be anotheranswer. Maybe she needed a change. Maybe a change of

    climates was what was called for.The ducks in the pond went by and were followed by a brood ofsix small paddlers, each mimicking the mother....

    The offering includes stock option rights exercisable a t .10per share by Nov 15. In addition, there are warrants as well.

    The entire equity was held by three people. Now that therewas to be a dis tribution, they would all profit.

    The dealership offered the latest model. Of course you canorder off of the Internet and pick up the car at t he dealership.When you do it this way you get to choose all the optionsyou want. But the price is not negotiable and you are responsiblefor selling your car....

    But creating new forms of a data warehouse leads to its own challenges (as well as

    opportunities.) The first thing the organization discovers is that classical BI does not

    work very well with an unstructured data warehouse. What is needed is an entirely

    different kind of BI. What is needed is Textual BI.

    TextualBI

    TextualETL

    Text

    Text

    TEXTUAL BUSINESS INTELLIGENCEWhy is there a need for a new and different kind of BI? The answer is simple the data

    in an unstructured data warehouse is fundamentally different than the data found in

    classical BI. Lets start with numeric data. One of the essences of textual data is that is

    decidedly not numeric. Textual data consists of words, and you cannot add or subtract

    words. About the best you can do is to count words. So one of the major differences

    between a structured data warehouse and an unstructured data warehouse is the ability to

    do calculations and comparisons against the basic data found in the data warehouse.

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    7/13

    But there is another important difference as well. Universally, classical data warehouses

    contain repetitive data. In a classical data warehouse, the same type of data appears over

    and over. But in an unstructured data warehouse there may or may not be any repetition.

    In this regard an unstructured data warehouse is fundamentally different from a classical

    data warehouse. And the lack of or existence of repetition makes a big difference in thetype of BI that can be done against the data warehouse, as shall be seen.

    Text

    Text

    Nonnumeric

    Under normal circumstances data is not repetitive in an unstructured data warehouse. But

    there are some circumstances for some types of data where there is a certain amount

    of repetition in a data warehouse.

    UNSTRUCTURED DATA REPETITIVE/NON REPETITIVE

    As some examples of repetition occurring in an unstructured data warehouse, consider

    contracts. Suppose there is a collection of oil and gas leases, a form of a contract. The

    first contract is for landowner ABC, the second contract is for landowner BCD, and so

    forth. One contract is different from any other contract. But taken as a whole, there is a

    great similarity between the different contracts found in the collection. The structure of

    the contracts, much of the fine print, and so forth are common among all the contracts. So

    the contracts in the collection are collectively structurally repetitive, even if all the text is

    not exactly the same.

    Now consider the transcripts from a call center. Certainly a person participating in the

    call center conversation can say whatever he/she wants to say. But most operators

    working a call center have been carefully trained to structure the conversation. As a result

    there is a certain similarity to the structure of each call.

    And there are plenty other examples of structural repetition in the world of text.

    But there are plenty of examples where there is no structural repetition in text. Consider

    emails. In emails, a person can say whatever he/she wants to say. The email can be short

    or long. The email can be formal or informal. The email can be in any language, and so

    forth. There simply is no structural conformity of text when it comes to email.

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    8/13

    Text

    Text

    Repetitive Non repetitive

    contractscall center callsinsurance claimswarranty claimslog recordsreal estate filings

    emaillawmedical recordsdepositionsdoctors notes

    REPETITIOUS CONTRACTS

    Pictured below are three different contracts. There is great similarity between the

    contracts but each contract is certainly different from any other contract.

    NON REPETITIOUS LAWAs an example of no structural uniformity, consider the law that is shown. Below are two

    sections of the 848 page long Dodd Frank law, passed in 2010. There is no structural

    uniformity to the different sections of the law whatsoever.

    BI AND REPETITIVE DATAThere is a relationship to the existence or the non existence of repetition in a document

    and the type of BI that can be used. When it comes to text, if the text is repetitive, then

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    9/13

    both classical BI and textual BI can be used against the text. But if there is no repetition,

    then only textual BI can be used, as seen below.

    Text

    Text

    Repetitive Non repetitive

    ClassicalBI

    TextualBI

    Another way of looking at this concept is that an unstructured data warehouse can have

    two types of BI used, as in the case of contracts, while completely non repetitive text can

    have only textual BI used against it, as in the case of the Dodd Frank law. The diagram

    below makes this point.

    Text

    Text

    Repetitive Non repetitive

    ClassicalBI

    TextualBI

    Contract

    ContractLaw

    TEXTUAL BI AN EXAMPLE

    So what does Textual BI look like? Consider the following example of Textual BI, from

    Forest Rim Technology.

    In the diagram below, the basic screen is shown. It is seen that there is basic query

    management, there is parametric control of the query, there is execution of the query, and

    there is the display of the results of the query. In many ways this screen is analogous the

    a SQL statement. The difference is that this elaborate query management tool is built

    specifically for the management of textual data, not general purpose access and analysis

    of a relational data base.

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    10/13

    The most interesting part of the textual Business Intelligence query is in the execution.

    The screen below shows a simple query where there is a search for all contracts where

    there is a mention of naphtha and helium

    The query is executed and there are six occurrences of contracts in which naphtha andhelium are mentioned.

    Now that the query has been executed, the results are displayed. First the basic

    parameters of the query are shown. Not that the results can be displayed in four ways

    - showing the contracts where the text is found,

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    11/13

    - showing the byte locations in the contracts where the references are found- showing snippets of text where the references are found,- showing the entire contract where the references are found.

    Suppose the analyst merely wants to find the contracts where the references are found.The results would look like

    Or suppose the analyst wants to find the exact byte location where the references are

    found. The results would look like -

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    12/13

    Or suppose the analyst wanted to see snippets of text where the references are found. The

    results would look like -

    There are more snippets to be shown. They look like

    Or suppose the analyst wanted to see the entire document and at a glance see where the

    references are found in the document. The results would look like -

    There are then many different ways to look at and analyze text using Textual Business

    Intelligence.

  • 8/7/2019 TEXTUAL BUSINESS INTELLIGENCE

    13/13

    The example that has been shown was selected for its simplicity. Textual Business

    Intelligence can look at text in many different ways, other than the simple example that

    has been shown. The diagram below depicts just some of the many sophisticated ways

    that analysis can be done with textual Business Intelligence.

    Textual

    BI

    Textual

    ETL

    Text

    Text Search can be done in many ways -- by a word- by words (AND/OR)

    - by categories of words- by indexes of words- by words in proximity- many other ways

    COMPLEMENTARY BUSINESS INTELLIGENCE

    As a final note, is textual BI a replacement for classical BI? The answer is not at all.

    Textual BI and classical BI are complementary. A sophisticated organization is going to

    need BOTH forms of Business Intelligence.

    Textual

    BI

    Classical

    BI

    References

    - BUILDING THE UNSTRUCTURED DATA WAREHOUSE, W H Inmon, KrishKrishnan, TechnicsPubs, 2011

    - TAPPING INTO UNSTRUCTURED INFORMATION, W H Inmon, TonyNesavich, Pearson Publications, 2008

    - DW 2.0 ARCHITECTURE FOR THE NEXT GENERATION OF DATAWAREHOUSING, W H Inmon, Morgan Kauffman, 2009


Recommended