8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
1/13
Bill InmonForest Rim TechnologyPO Box 210200 Wilcox StreetCastle Rock, CO
TEXTUAL BUSINESS INTELLIGENCE
By W H Inmon
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
2/13
In the beginning there were applications systems. And for a variety of reasons not the
least was no corporate data organizations began to build data warehouses. In a shortamount of time data warehouses began to spread around the world.
ETL
And with data warehouses came the corporations opportunity to look at informationacross the corporation. But building a data base of integrated, historical, granular data
was not enough. As powerful as data can be inside a data warehouse, unless the end usercan unleash the potential of the data warehouse, there wasnt much value in building a
data warehouse.
CLASSICAL BUSINESS INTELLIGENCE
Soon appeared Business Intelligence (BI). BI is the software that is needed to go into a
data warehouse and examine the data, the relationships, and the information that is found
there. Once BI appeared, organizations could make sense of the data found in the data
warehouse.
ETL
BI
With BI, organizations could create reports, transactions, and very sophisticated analysisof the data found in the data warehouse. Among other things, graphical displays of
information was popular. The granular data found in the data warehouse provided a very
firm foundation for the analysis and discovery of corporate information.
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
3/13
ETL
BI
BI was designed to operate on the data found in the data warehouse. And exactly what
was the essential nature of the data found in the data warehouse? The data found in the
classical data warehouse was
- Numeric, where numbers can be added and subtracted- repetitive, where the same type of values occur over and over- where other supporting data points of interest surround and are attached to the
numeric data.
ETL
- numeric data- repetitive data- pints of interest data
CONTENTS OF THE CLASSICAL DATA WAREHOUSELets take a look at what the typical contents of a data warehouse look like. Suppose an
organization has an integrated list of the checks written by individuals, the date of the
check, and the location where the check was written.
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
4/13
- numeric data- repetitive data- pints of interest data
Aug 3, 2010 Sarah Inmon Lima, Peru 2,347.70Aug 3, 2010 Brook Sadler La Pax, Mx 3,346.87Aug 3, 2010 Beasley Smith Bogota, Chile 10,245.36Aug 3, 2010 Tony Velez Caracas, VZ 4,556.12Aug 4, 2010 Nancy Jones Juarez, Mx 7,114.09Aug 4, 2010 Rbta Ross Rio, Brasil 9,109.23Aug 4, 2010 Joan Jett Sao Paolo, Bz 3,339.87Aug 4, 2010 Glen Frey Brasilia, Bz 2,109.25....................................................................................
Such an arrangement of data might be typical for the contents of a classical data
warehouse. Consider how the data might be used for analysis. Some data will be used for
selecting data and discriminating data from other data. Other data numeric data willbe used in calculations and comparisons. Given the typical data base that has been
described, the analyst could examine such things as
- how many checks were written on a given day- how much money was spent in Peru- how much money changed hands in South America- what was the largest check written- and so forth.-
- numeric data- repetitive data- pints of interest data
Aug 3, 2010 Sarah Inmon Lima, Peru 2,347.70Aug 3, 2010 Brook Sadler La Pax, Mx 3,346.87Aug 3, 2010 Beasley Smith Bogota, Chile 10,245.36Aug 3, 2010 Tony Velez Caracas, VZ 4,556.12Aug 4, 2010 Nancy Jones Juarez, Mx 7,114.09Aug 4, 2010 Rbta Ross Rio, Brasil 9,109.23Aug 4, 2010 Joan Jett Sao Paolo, Bz 3,339.87Aug 4, 2010 Glen Frey Brasilia, Bz 2,109.25....................................................................................
Selectiongroupingdiscrete reporting
Additionsubtractionmultiplicationcomparison
In order to do the analysis and the calculations, a BI tool could be used on top of the data.
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
5/13
- numeric data- repetitive data- pints of interest data
Aug 3, 2010 Sarah Inmon Lima, Peru 2,347.70Aug 3, 2010 Brook Sadler La Pax, Mx 3,346.87Aug 3, 2010 Beasley Smith Bogota, Chile 10,245.36Aug 3, 2010 Tony Velez Caracas, VZ 4,556.12Aug 4, 2010 Nancy Jones Juarez, Mx 7,114.09Aug 4, 2010 Rbta Ross Rio, Brasil 9,109.23Aug 4, 2010 Joan Jett Sao Paolo, Bz 3,339.87Aug 4, 2010 Glen Frey Brasilia, Bz 2,109.25....................................................................................
BI
For many years data warehouse and BI worked more or less as described. And eventhough they are not aware of it BI tools focused on operating on repetitive numeric data
for calculation and comparison, where nonnumeric data served the purpose of allowing
data to be selected and grouped together. And BI tools simply expected to find repetitive
data in the data warehouse, where the same type of data was repeated over and over.
ENTER THE UNSTRUCTURED DATA WAREHOUSE
But a profound change in data warehousing has occurred. Today it is possible to build an
entirely different kind of data warehouse. Today it is possible to build a data warehouse
that is based on text. And text has an entirely different set of properties than data that
classically has been placed in a data warehouse.
Now it is possible to build a data warehouse using textual ETL, such as that provided by
Forest Rim Technology. Now unstructured text can be read into Textual ETL and a new
type of data warehouse can be built.
Textual
ETL
Text
Text
The contents of an unstructured data warehouse are very different than that of a classical
data warehouse. The contents of an unstructured data warehouse are not surprisingly
text. However, the text that arrives in the unstructured data warehouse is formatted into a
standard relational data base. For years organizations have been able to place text in a
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
6/13
relational data base in the form of blobs. But once text is placed into a relational data base
in the form of a blob, there is not a lot that can be done with it.
Instead textual ETL passes the text through a myriad of algorithms before the text is
placed in the relational data base. (NOTE: most of the important algorithms are patent
pending. See Forest Rim Technology for licensing opportunities.) The net result is arelational data base that can be used for analytical purposes.
Textual
ETL
Text
Text...As we see it, the e vent was a success......Thanks for the sale. I enjoyed it greatly......I think you ought to know about what went......I went down the street and saw the same thing......I want to return my purchase......I found a stain on the bottom of the......Let me tell you how pleased I was when I......Your salesperson is outrageous. Do you know......Your products are great but your service is lousy......I want my money back...
We went for a walk down the park lane. It went by the riverand there was a bridge to cross the creek at one point. Shewas talking so intently that s he never even noticed the bridge.She was wrapped up in her thoughts. First there had been theoss at graduate school. Then there was the Derek. And Tally.She just couldnt take it any more. There had to be anotheranswer. Maybe she needed a change. Maybe a change of
climates was what was called for.The ducks in the pond went by and were followed by a brood ofsix small paddlers, each mimicking the mother....
The offering includes stock option rights exercisable a t .10per share by Nov 15. In addition, there are warrants as well.
The entire equity was held by three people. Now that therewas to be a dis tribution, they would all profit.
The dealership offered the latest model. Of course you canorder off of the Internet and pick up the car at t he dealership.When you do it this way you get to choose all the optionsyou want. But the price is not negotiable and you are responsiblefor selling your car....
But creating new forms of a data warehouse leads to its own challenges (as well as
opportunities.) The first thing the organization discovers is that classical BI does not
work very well with an unstructured data warehouse. What is needed is an entirely
different kind of BI. What is needed is Textual BI.
TextualBI
TextualETL
Text
Text
TEXTUAL BUSINESS INTELLIGENCEWhy is there a need for a new and different kind of BI? The answer is simple the data
in an unstructured data warehouse is fundamentally different than the data found in
classical BI. Lets start with numeric data. One of the essences of textual data is that is
decidedly not numeric. Textual data consists of words, and you cannot add or subtract
words. About the best you can do is to count words. So one of the major differences
between a structured data warehouse and an unstructured data warehouse is the ability to
do calculations and comparisons against the basic data found in the data warehouse.
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
7/13
But there is another important difference as well. Universally, classical data warehouses
contain repetitive data. In a classical data warehouse, the same type of data appears over
and over. But in an unstructured data warehouse there may or may not be any repetition.
In this regard an unstructured data warehouse is fundamentally different from a classical
data warehouse. And the lack of or existence of repetition makes a big difference in thetype of BI that can be done against the data warehouse, as shall be seen.
Text
Text
Nonnumeric
Under normal circumstances data is not repetitive in an unstructured data warehouse. But
there are some circumstances for some types of data where there is a certain amount
of repetition in a data warehouse.
UNSTRUCTURED DATA REPETITIVE/NON REPETITIVE
As some examples of repetition occurring in an unstructured data warehouse, consider
contracts. Suppose there is a collection of oil and gas leases, a form of a contract. The
first contract is for landowner ABC, the second contract is for landowner BCD, and so
forth. One contract is different from any other contract. But taken as a whole, there is a
great similarity between the different contracts found in the collection. The structure of
the contracts, much of the fine print, and so forth are common among all the contracts. So
the contracts in the collection are collectively structurally repetitive, even if all the text is
not exactly the same.
Now consider the transcripts from a call center. Certainly a person participating in the
call center conversation can say whatever he/she wants to say. But most operators
working a call center have been carefully trained to structure the conversation. As a result
there is a certain similarity to the structure of each call.
And there are plenty other examples of structural repetition in the world of text.
But there are plenty of examples where there is no structural repetition in text. Consider
emails. In emails, a person can say whatever he/she wants to say. The email can be short
or long. The email can be formal or informal. The email can be in any language, and so
forth. There simply is no structural conformity of text when it comes to email.
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
8/13
Text
Text
Repetitive Non repetitive
contractscall center callsinsurance claimswarranty claimslog recordsreal estate filings
emaillawmedical recordsdepositionsdoctors notes
REPETITIOUS CONTRACTS
Pictured below are three different contracts. There is great similarity between the
contracts but each contract is certainly different from any other contract.
NON REPETITIOUS LAWAs an example of no structural uniformity, consider the law that is shown. Below are two
sections of the 848 page long Dodd Frank law, passed in 2010. There is no structural
uniformity to the different sections of the law whatsoever.
BI AND REPETITIVE DATAThere is a relationship to the existence or the non existence of repetition in a document
and the type of BI that can be used. When it comes to text, if the text is repetitive, then
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
9/13
both classical BI and textual BI can be used against the text. But if there is no repetition,
then only textual BI can be used, as seen below.
Text
Text
Repetitive Non repetitive
ClassicalBI
TextualBI
Another way of looking at this concept is that an unstructured data warehouse can have
two types of BI used, as in the case of contracts, while completely non repetitive text can
have only textual BI used against it, as in the case of the Dodd Frank law. The diagram
below makes this point.
Text
Text
Repetitive Non repetitive
ClassicalBI
TextualBI
Contract
ContractLaw
TEXTUAL BI AN EXAMPLE
So what does Textual BI look like? Consider the following example of Textual BI, from
Forest Rim Technology.
In the diagram below, the basic screen is shown. It is seen that there is basic query
management, there is parametric control of the query, there is execution of the query, and
there is the display of the results of the query. In many ways this screen is analogous the
a SQL statement. The difference is that this elaborate query management tool is built
specifically for the management of textual data, not general purpose access and analysis
of a relational data base.
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
10/13
The most interesting part of the textual Business Intelligence query is in the execution.
The screen below shows a simple query where there is a search for all contracts where
there is a mention of naphtha and helium
The query is executed and there are six occurrences of contracts in which naphtha andhelium are mentioned.
Now that the query has been executed, the results are displayed. First the basic
parameters of the query are shown. Not that the results can be displayed in four ways
- showing the contracts where the text is found,
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
11/13
- showing the byte locations in the contracts where the references are found- showing snippets of text where the references are found,- showing the entire contract where the references are found.
Suppose the analyst merely wants to find the contracts where the references are found.The results would look like
Or suppose the analyst wants to find the exact byte location where the references are
found. The results would look like -
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
12/13
Or suppose the analyst wanted to see snippets of text where the references are found. The
results would look like -
There are more snippets to be shown. They look like
Or suppose the analyst wanted to see the entire document and at a glance see where the
references are found in the document. The results would look like -
There are then many different ways to look at and analyze text using Textual Business
Intelligence.
8/7/2019 TEXTUAL BUSINESS INTELLIGENCE
13/13
The example that has been shown was selected for its simplicity. Textual Business
Intelligence can look at text in many different ways, other than the simple example that
has been shown. The diagram below depicts just some of the many sophisticated ways
that analysis can be done with textual Business Intelligence.
Textual
BI
Textual
ETL
Text
Text Search can be done in many ways -- by a word- by words (AND/OR)
- by categories of words- by indexes of words- by words in proximity- many other ways
COMPLEMENTARY BUSINESS INTELLIGENCE
As a final note, is textual BI a replacement for classical BI? The answer is not at all.
Textual BI and classical BI are complementary. A sophisticated organization is going to
need BOTH forms of Business Intelligence.
Textual
BI
Classical
BI
References
- BUILDING THE UNSTRUCTURED DATA WAREHOUSE, W H Inmon, KrishKrishnan, TechnicsPubs, 2011
- TAPPING INTO UNSTRUCTURED INFORMATION, W H Inmon, TonyNesavich, Pearson Publications, 2008
- DW 2.0 ARCHITECTURE FOR THE NEXT GENERATION OF DATAWAREHOUSING, W H Inmon, Morgan Kauffman, 2009