+ All Categories
Home > Documents > Developing Portal Taxonomy

Developing Portal Taxonomy

Date post: 10-Apr-2018
Category:
Upload: ariefb88
View: 221 times
Download: 0 times
Share this document with a friend
30
8/8/2019 Developing Portal Taxonomy http://slidepdf.com/reader/full/developing-portal-taxonomy 1/30 This chapter introduces the fundamental principles of the taxonomy and o ffers practical hints about how to create a taxonomy for your portal. As dis- cussed in Chapter 9 and further addressed in Chapter 12, the taxonomy is one of the keys to organizing your content, including site navigation, folder construction, and categories. It may be used for all types of portals, includ- ing outward-facing and enterprise portals. You can think of taxonomy as an organizational map that is used to cat- egorize information related to one or more areas of knowledge. A well- designed taxonomy helps a user locate information that would be difficult to nd through a simple search process. The taxonomy provides a context or a knowledge map for documents. In this chapter, I start by discussing the principal concepts of the tax- onomy. I then apply these principles and illustrate several technical approaches toward developing taxonomy. Then I show how developing a corporate taxonomy can solve a real-world business problem. Next I discuss the business value of developing taxonomy, and nally I discuss the various methods for instantiating taxonomy. What Is Taxonomy? Though the concept of taxonomy may be new to you as it relates to manag- ing information within a portal, the basic concept has been used widely for quite some time in a number of disciplines. Taxonomy provides a structure that serves to bring order to a particular area of knowledge. Chapter 10 Developing Portal Taxonomy James J. T
Transcript
Page 1: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 1/30

This chapter introduces the fundamental principles of the taxonomy ando ffers practical hints about how to create a taxonomy for your portal. As dis-

cussed in Chapter 9 and further addressed in Chapter 12, the taxonomy isone of the keys to organizing your content, including site navigation, folderconstruction, and categories. It may be used for all types of portals, includ-ing outward-facing and enterprise portals.

You can think of taxonomy as an organizational map that is used to cat-egorize information related to one or more areas of knowledge. A well-designed taxonomy helps a user locate information that would be difficultto nd through a simple search process. The taxonomy provides a context ora knowledge map for documents.

In this chapter, I start by discussing the principal concepts of the tax-o n o m y. I then apply these principles and illustrate several technicalapproaches toward developing taxonomy. Then I show how developing a

corporate taxonomy can solve a real-world business problem. Next I discussthe business value of developing taxonomy, and nally I discuss the variousmethods for instantiating taxonomy.

What Is Taxonomy?

Though the concept of taxonomy may be new to you as it relates to manag-ing information within a portal, the basic concept has been used widely forquite some time in a number of disciplines. Taxonomy provides a structurethat serves to bring order to a particular area of knowledge.

C h a p t e r 1 0

Developing PortalTaxonomy

James J. T

Page 2: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 2/30

In biology, arguably the king of taxonomies was devised by Carolus Lin-

naeus in 1858. He created a simplied system of classication of all livingthings (binomial nomenclature) that is still in use today. The classicationbegins with the most general groupings (kingdoms: Animalia, Plantae) tothe most specic (species: rodentis, carnivora) (Figure 10.1).

This classication system includes all living things. For example, humanbeings t within the following categories in the taxonomy, as shown in Fig-ure 10.2.

Biologists use the attributes of a living thing to place it properly withinthis structure. This system has helped biologists in a number of areas by providing a structure within which theories, and eventually biological laws,can be deduced.

332 Chapter 10 Developing Portal Taxonomy

Figure 10.1 Binomial Nomenclature Used to Classify Plants and Animals

chpt_10.qxd 2/2/04 9:00 AM Page 332

Page 3: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 3/30

People employ taxonomies in their everyday lives to help nd things. If

I were to ask you to get those crazy red socks that I’ve seen you wear to work, you would simply go into your bedroom to your chest of drawers, andopen your sock drawer. Without knowing it, you have used your own orga-nizational taxonomy to help you locate those red socks. You have many places to store things in your home, so how did you know to go to the bed-room and then the sock drawer?

Suppose that you have a living room and a bedroom in your home, andthe living room contains two storage cabinets, a bookcase and a TV cabinet.Moving on to your bedroom, you have a chest of drawers and a wardrobe in which you store most of your clothes. In your chest of drawers, you storesocks and underwear in the top drawer and t-shirts and sweaters in the bot-tom drawers. You also store items of clothing in the wardrobe, though you

reserve the wardrobe for clothes that should be hung up, such as pants anddress shirts. As a result, you think of this piece of furniture as the hanging

What is Taxonomy? 333

Figure 10.2 Humans in Binomial Nomenclature

chpt_10.qxd 2/2/04 9:00 AM Page 333

Page 4: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 4/30

clothes wardrobe. To be more precise, you might decide to create a map to

help you visualize this organizational scheme (see Figure 10.3). This gureis analogous to the knowledge map you create for a portal.The vocabulary we have created is highly specic: The terms “clothes

drawer” and “hanging clothes wardrobe” refer to items within your home.In other words, by developing this taxonomy, we are making the inherentassumption that you know what is meant by these terms. Taxonomy is a spe-cialized view of content. As you searched for the socks in the sample sce-nario, you said to yourself that socks are an article of clothing and all art i c l e sof clothing are stored in your bedroom in either your closet or your drawer,and small objects belong in a drawer rather than hanging in a closet. Inessence what we have done here is to create our own vocabulary to describethe organization of elements within your home.

334 Chapter 10 Developing Portal Taxonomy

Figure 10.3 Sample Storage Taxonomy for a Home

chpt_10.qxd 2/2/04 9:00 AM Page 334

Page 5: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 5/30

Page 6: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 6/30

Taxonomy Concepts

Let’s explore the key concepts that relate to taxonomy and discuss its prac-ticality in the context of managing large amounts of information within yourbusiness. A good place to start is to gain some insight into the nature of thedata we are organizing. These data fall into two general categories: struc-tured data and unstructured data.

Structured Data

Structured data is easier to understand than unstructured data, because itis more consistent and follows stricter rules. By understanding structureddata, we will gain insights into unstructured data, which is a fundamental

concept to taxonomy. Structured data resides within a database and con-tains well-dened tables, columns, and elds. Generally, a table representssome kind of entity, each row represents an instance of the entity, and eachcolumn represents a piece of data surrounding this entity. For example, wecould represent the data surrounding a company’s orders with Tables 10.1and 10.2.

336 Chapter 10 Developing Portal Taxonomy

Table 10.1 Customer Data

Customers

Customer number Name Telephone no. Credit limit

1 Avis 0171 123 4567 $10,0002 Boeing 0181 345 6789 $2,5003 CA 0123 45678 $50,0004 Dell 0134 56789 $21,000

Table 10.2 Order Data

Orders

Order no Date Customer number Item Quantity

11234 2-Mar-99 1 A 15011235 15-Mar-99 2 B 2511236 21-Apr-99 3 C 1,00011237 7-May-99 4 D 6,789

chpt_10.qxd 2/2/04 9:00 AM Page 336

Page 7: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 7/30

The rst table represents data pertaining to customers, and the second

table contains all information relating to orders. Furthermore, the associa-tion of which orders apply to which customers is tracked through theunique customer ID, which maps to an associated row in the Orders table.

The point is that a user can query the database on very specic ques-tions and receive answers, provided that the questions can be answered by the data model. For example, for the relational data model described intables 10.1 and 10.2, we could ask questions such as:

I Who are your customers?I How many customers do you have?I Which customers have orders?I What is the average number of orders per customer?

U n f o rt u n a t e l y, unstru c t u red information presents a situation that is notas straightforward because there is no well-dened model. Imagine tryingto catalog the contents of your My Documents folder, which probably con-tains fax cover sheets, letters, work and personal documents, presentations,budget spreadsheets, family photos, downloaded software ready to install,and may other kinds of les.

Unstructured Data

U n s t ru c t u red data is any electronic data that does not reside in a stru c t u re ddatabase (it is typically stored in documents). In contrast, structured data

provides its own context: The data model itself describes what each eldmeans. For instance, a eld called Shipping Address in a table called Cus-tomers probably contains a physical address. A data eld in the Invoicestable probably stores the date that the invoice was created. Search tech-niques and rules for dealing with structured data are quite mature and gen-erally understood. The mapping between data and metadata (data aboutdata) is direct and straightforward. It is simple to generate a data dictionary in an automated fashion, even for a large database. Examples of unstruc-tured information include Word documents, streaming audio and video,email, and PowerPoint presentations.

U n s t ru c t u red information presents many challenges compared tostructured data. It is relatively simple to produce a query in a relationaldatabase that shows all invoices created in a specied date range, becausethe structure of the data lends itself to the query. Adherence to the datamodel ensures that your answer will be unambiguous and complete, and

Taxonomy Concepts 337

chpt_10.qxd 2/2/04 9:00 AM Page 337

Page 8: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 8/30

the results will be consistent and replicable over time. You can be assured

that an important eld such as invoice date would be a required eld andthus be included in all records. It would be validated to ensure that only valid values were present. Neither completeness nor consistency are guar-anteed for unstructured data, on the other hand. To the search for all newarticles on a topic such as knowledge management on a website you arecounting on the date of the article being entered in a metatag, and consis-tent terms being used to identify the key topics of articles.

With unstructured data, you can search either data or metadata (or acombination of the two). The problem is that you cannot take much forgranted in the quality of either. To nd new articles, you could search for adate in the text of an article in a Word le or HTML, or in the metadata,such as in the HTML metatags or Word document properties. But how

would you determine when the article was written? Does the le creationdate mean the same thing as the date written? What if two conicting dates were found? Which date formats should be considered? What about vagueor incorrect date values?

Whereas a structured database encourages a xed vocabulary of termsby means of dropdown lists, reference tables, and other means, unstruc-tured data sources are by denition free of such constraints. Users cannotbe sure that the same terms will be used in two different documents, evenif both documents cover the same topic On the other hand, the same wordmay be used in two unrelated documents. The richness of human languagebecomes the enemy of search accuracy.

Semantics

Semantics is the science of modeling the context and relationships of all theobjects in a system for the purpose of attaching meaning to the informationgenerated by the system. All information must be placed within a speciccontext in order to be useful. Although beyond the scope of this book, thisis the crux of the work surrounding articial intelligence and the study of human intelligence. For example, the information “two” has little meaningunless you know two of what. By attaching two to the object apples, youhave an answer to the “what” of two; but you lack any knowledge of how“two apples” relates to other objects within the system. By adding the entity “Jim” with the relationship “have”, you have a clear understanding that “Jimhas two apples”. So in fact the basic constructs used to model the semantics

338 Chapter 10 Developing Portal Taxonomy

chpt_10.qxd 2/2/04 9:00 AM Page 338

Page 9: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 9/30

of a system are very similar to the basic constructs of any language—

n a m e l y, subject-verb-object or entity-relationship-object, which is known asan associative model in the eld of knowledge management.

Knowledge Representation/Ontology

A knowledge representation or ontology is a semantic representation of theobjects within a specic domain of knowledge. The domain of knowledgecould be sports, nance, insurance, and so on. The goal of the ontology would be to re p resent all the objects and relationships among objects withinthis domain. For example, suppose we chose to semantically model thesame body of knowledge I described relationally earlier, namely, customero rders for a company. Within this knowledge space, only two objects exist—

customers and orders. Furthermore, only one basic relationship exists: Acustomer “has” an order. All objects inherently possess the IS relationshipindicating the entity (or class) to which the object belongs. We could beginto describe the relationship of the data in words in the following way:

I Knowledge space: All customers’ ordersI Two entities: Customers, OrdersI Allowed relationship: Customers have (0 to N) orders, IS inherent

So we can start to talk about information that ts into this semanticmodel in the following way:

I Avis IS a customerI Avis HAS a telephone number of 1234567I Avis HAS a credit limit of $15,000I # 11234 IS an order numberI vis HAS order #11234

Obviously, an area of knowledge such as sports or even orders containsmany more objects, and the relationships between objects within this body of knowledge are far more complex, but the idea is the same. For example,for sports, we would dene objects such as baseball, player, team, and so on.Next we would begin to dene the relationship rules, such as a player is partof a specic team, a team belongs to one type of sport, and baseball is a typeof sport. Next we would begin to instantiate the model with re a l - l i f e

Taxonomy Concepts 339

chpt_10.qxd 2/2/04 9:00 AM Page 339

Page 10: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 10/30

information, such as the Blue Jays is a type of baseball team. Whenever we

read through a document, we naturally bring our own ontology to bear thatis modeled by individual neural pathways within our brains. For example, when we study a subject, we are in effect strengthening our ontological mapthrough the creation of neural pathways, thereby strengthening the associ-ations between various concepts within a particular subject.

What does this have to do with taxonomy? Well, unstructured docu-ments are written using human languages. When someone reads a docu-ment, he is sifting through a large number of words that represent objectsthat are semantically related. By comparing our own internal ontologicalmap to the information we read, we extract meaning. The goal of develop-ing taxonomy is to make it easier for users to extract the meaning of contentby providing a context.

Once the meaning is determined, the document can be appropriately classied within the categories dened by the taxonomy. For example, sup-pose you wanted to classify this book within your book collection. Yo u would probably identify the area of knowledge covered as being related tothe computer industry even though the word “computer” is not used withinthe book. As discussed later in the chapter, there are a number of tools thatautomate the process of classication; however, all require some humanintervention because of the inherent complexity of human judgment.

Because the relationships that describe a whole body of knowledge arefar more complex than the relationships represented through a relationalmodel, we must use an associative model. The associative data model dia-grams the semantics of the complex relationships of language using subject- verb-object term i n o l o g y. While a database uses a relational model fordescribing the relationships between objects, a knowledge representationrequires the associative model to model the concepts of a body of knowl-edge. The associative model represents the major technology used by thenatural language query model.

Vocabulary

Within the context of the taxonomy, vocabulary re p resents a stru c t u re dgroup of words that are used to dene the main concepts used within thet a x o n o m y. To enforce a system within your taxonomy, you must choose words carefully and consistently. Metadata tags content with words from your taxonomy.

340 Chapter 10 Developing Portal Taxonomy

chpt_10.qxd 2/2/04 9:00 AM Page 340

Page 11: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 11/30

Thesauri

A thesaurus keeps track of synonyms or words with the same or similarmeanings. When we are dealing with unstructured information, the the-saurus creates a link between the words used to describe the same or simi-lar concepts.

Taxonomy built on the thesaurus model (designating a pre f e rred orauthorized term with entry terms or variants) helps to link these differentterms together. At search time, the term that the knowledge worker uses isassociated with the preferred (or key) term for more precise searching, orthe knowledge worker’s term is expanded to include the variant forms of theterm as well as the authorized term for a broader search. Taxonomies builton the thesaurus model do not force all work groups to use a common setof terminology.

Categories

A category represents a structured vocabulary that is decided upon by as p e c i c concept within a body of knowledge. It is an individual node amonga group of nodes that are related to one another. When dening categories,the terms used should be intuitive so people can deduce the informationcontained within. When categories are created with meaningful words, acontext is automatically created for all documents and subcategories resid-ing within the category. For example, in the biological plants and animalstaxonomy example, a very specic category called Homo sapiens is used tocategorize the last leaf in the classication of human beings. For thisexample, the category names used are unique; but with a portal taxonomy,the same category name can exist within diff e rent locations of the hierarc h y.To understand why, we rst need to realize that all categories have proper-ties associated with them.

Attributes/Properties

Integral to the naming of the categories are the pro p e rties or characteristicsassociated with each category. The pro p e rties serve to describe where within the logical hierarchy of categories an individual category resides. Allsubcategories inherit the attributes describing the parent knowledge space, which means that the information contained in a subcategory represents amore specic area of the same knowledge space. For example, suppose we

Taxonomy Concepts 341

chpt_10.qxd 2/2/04 9:00 AM Page 341

Page 12: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 12/30

dene a category called golf and a category called sports. Golf represents a

specic area of knowledge within the parent category of sports. Like thedata, attributes can be searched. Attributes can also be stored as metadata.For example, you may have used the properties associated with a Microsoft Word document to store data such as author, title, subject, and even cate-gories (Figure 10.4).

Categorization Rules

You can devise rules that determine the categories in which an itembelongs. For instance, the word “y” could appear in items relating to civilaviation, bird watching, and trout shing. You might have a rule that an itemcontaining “trout,” “rod,” “reel,” and “y” pertains to y shing, while one

containing “bird,” “plumage,” “habitat,” and “binoculars” should be led

342 Chapter 10 Developing Portal Taxonomy

Figure 10.4 Word Document Properties

chpt_10.qxd 2/2/04 9:00 AM Page 342

Page 13: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 13/30

under “bird watching.” SharePoint Portal Server offers categorization rules

for its Audience feature.

Document Metadata

Metadata is information that describes another piece of information, object,or thing. Basically metadata makes nding a particular piece of informationeasier because moving through a stack of metadata takes much less timethan moving through the data itself. For example, in the old days when you would thumb through a card catalog system in the library, you were usingmetadata to locate a book. The card catalog contained the necessary infor-mation for you to locate the physical book on the library shelf. Without themetadata located in the card catalog, you would have been forced to wade

through all the books in the library to locate the book.

Document Card

A document card represents the metadata associated with a le, just as apaper card in a library card catalog contains information about a book itlists, such as author, title, publisher, and date. The document cards areindexed and made available for searching. The card contains a link to theoriginal document.

Context Specicity of Taxonomy

Within an organization, a particular piece of information may have a num-ber of diff e rent uses, and diff e rent knowledge workers may have a diff e re n tperspective on the same piece of information. For example, let’s supposethat a contract has been developed with a vendor. Where within the exist-ing corporate taxonomy should this information reside? The answerdepends on who is looking for the information. The legal department wouldsay the contract should reside in the contracts category under that particu-lar vendor. Because the contract re p resents revenue, the accountingd e p a rtment would suggest the accounts receivable category. Finally thesales department would see the document as a part of a client relationship.

Furthermore, many portals today support the ability of a user to createa personalized taxonomy. Remember that, although multiple views to the

Context Specicity of Taxonomy 343

chpt_10.qxd 2/2/04 9:00 AM Page 343

Page 14: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 14/30

same document can be created, only one physical document exists. Gener-

a l l y, the portal stores only the document cards that store the metadata within your infrastructure. On the document card is stored the location of the physical document so the portal can locate that document when theuser requests it. The actual document could reside anywhere within theinfrastructure of the company. In other words, the document could resideon a le serv e r, on the company’s website, or within a Lotus Notes database.As long as the taxonomy keeps track of the document cards in a local docu-ment repository, the user should have no problem locating the document.

Documents are partitioned into logical groupings that are easier to nav-igate. These allow users to locate information even if they start with as i n g l e - w o rdsearch term. The categories within a taxonomy move from gen-eral to more specic. Taxonomies help avoid problems with common

English language peculiarities of similar-sounding words or words withmultiple meanings. Taxonomies facilitate iterative, drill-down searches thatboth advanced and beginning users can quickly traverse.

A taxonomy category can be used to limit the scope of a search, thusreducing the number of irrelevant documents returned. The category facil-itates browsing of content, allowing a user to traverse a large number of related documents.

Information is ltered based on the attributes of each category. Thinkof the unstructured information within your business as water that is gush-ing out of a re hydrant. The portal taxonomy is designed to catch hold of this information and place it into the appropriate places within the corpo-rate taxonomy

Taxonomies provide exibility in retrieving content. One of the centralp roblems with nding information, as Humpty Dumpty said to Alice, isthat words can mean so many different things. The inherent ambiguity of language makes searching more challenging because items are missed thata re tagged with diff e rent, but related, terms or extraneous results areb rought in because too broad a meaning has been assigned to a searc ht e rm. For instance, a large Canadian systems integrator unfort u n a t e l y shares its name, CGI, with the acronym for Common Gateway Interface, a widely used scripting tool for the web. Therefore, searching for “cgi” on asearch engine returns thousands of results quite useful for CGI program-mers and tens of thousands of pages that contain “CGI” in their URLs—and perhaps buried deep within the search results, a link or two to thecompany called CGI.

344 Chapter 10 Developing Portal Taxonomy

chpt_10.qxd 2/2/04 9:00 AM Page 344

Page 15: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 15/30

Taxonomy Best Practices

A few simple guidelines can steer your taxonomy development. While somemay seem obvious, they bear repeating, much as soccer players mustd r i l l on passing, blocking, and shooting whether they are beginners orprofessionals.

Use Industry-Accepted Vocabulary

The taxonomy is all about the terms that you use. If you are not an experton the subject matter of your portal, nd someone who is to assist. You will want to use the correct terms and understand synonyms for those terms as well. Nearly every discipline has its share of jargon, as well as more precise

meanings assigned to common words.

Be Consistent

Try to use a single classication approach. If it makes sense to combineapproaches, keep the classication consistent on the sibling level. Use con-sistent vocabulary and thesauri. Maintain a consistent degree of generality in sibling categories.

Control Depth

A at taxonomy ensures that a user will be able to locate inform a t i o n

q u i c k l y. On the other hand, the information should be sufficiently seg-mented to make the taxonomy worthwhile. A at taxonomy ensures thatusers can nd information quickly with fewer clicks. A good rule of thumbis to go no more than 3–6 levels deep.

Control Breadth

A focused taxonomy ensures that users can easily digest the scope of infor-mation. Just as there is a limit to the patience of users in traversing a taxon-omy from top to bottom, there is a limit to the width of the taxonomy. Formost purposes, you should consider starting with 10–15 top-level cate-gories. Assuming you restrict yourself to 15 categories at lower levels and

are using no more than 3–6 levels, your taxonomy could then hold a maxi-mum of 156 or 11,390,625 entries. A typist entering these terms at 100

Taxonomy Best Practices 345

chpt_10.qxd 2/2/04 9:00 AM Page 345

Page 16: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 16/30

words per minute would need about 80 days working 24 hours a day to

complete the task. If you were more restrained and conned yourself to a3-level taxonomy and a width of 15 entries, you would need to shoehorn your content into a mere 3,375 categories.

Divide and Conquer

Taxonomy development is much like spring cleaning at your house. It is afrightening prospect to tackle the entire project at one time, so you shouldstart in one corner of your taxonomy house and complete a room at a time.You will have ample opportunity later to see your errors and rene yourapproach.

Keep Users in MindBe sure to consider the needs of your target users, and understand whatthey are trying to do on your portal and what mental baggage (or lackthereof) they bring with them. One of the frequent taxonomy mistakes is toassume outsiders have inside knowledge of your organization. Many tax-onomies are built to reect the bureaucratic structure of an organizationrather than the functions of the departments. For instance, if a county gov-e rnment were to assign taxonomy development to each agency or office andthen merged the results together, you would have a signicant amount of overlap and duplication. More o v e r, the taxonomy might contain “blindspots” for functions that didn’t cleanly map to a particular office.

Implementing a Taxonomy

Now that we have shared more than you ever planned to learn about tax-onomies, semantics, and articial intelligence, it’s time to tackle the imple-mentation of your portal taxonomy. There are three general approaches tocreating taxonomy:

I Automatic taxonomy creation and document categorizationI Human taxonomy creationI Assisted taxonomy creation and document categorization

You may nd that two or even all three approaches have value for yourproject. Bear in mind that taxonomies are never really complete as long as

346 Chapter 10 Developing Portal Taxonomy

chpt_10.qxd 2/2/04 9:00 AM Page 346

Page 17: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 17/30

new content is being added. They grow and evolve over time in response to

the changing demands of users.The taxonomy industry has continued to grow and now offers a widerange of technology and products at various price points. Small to mid-sized companies are now also able to offer their employees and customersthe same benets formerly available only to corporate behemoths. In addi-tion to public Internet and corporate portals, taxonomies are also ndingtheir way into vertical portals, customer and partner extranet sites, and evento very specialized knowledge worker document repositories.

Automatic Taxonomy Creation and DocumentCategorization

Microsoft does not currently offer a product that automatically categorizesdocuments or creates taxonomy. Several other vendors have taken thisapproach, however, and you can use these third-party products to searchand categorize your web pages, documents, and other content sources.

A number of algorithms have been developed to enable categorizationof data repositories. In its taxonomy and content categorization study, theDelphi Group identied several basic algorithms, including:

I Linguistic analysis, which identies the subject, verbs, and objects of a sentence and then analyzes them to extract meaning.

I Statistical text analysis and clustering, which measures word fre-q u e n c y, placement, and grouping and the distance between words ina document.

I Rule-based taxonomies, which classify documents based on specicrules created and maintained by experts using if-then statementsthat measure how well a document ts into a category.1

Even vendors of automated taxonomy tools (Table 10.3) concede thathuman judgment is essential to a nished taxonomy. Their tools can savetime and money, however, and nd patterns in data that would not be obvi-ous to the analyst.

Implementing a Taxonomy 347

1 A Delphi Group White Paper, “Taxonomy and Content Classication: Market Milestone Report,” April 11, 2002,p. 16.

chpt_10.qxd 2/2/04 9:00 AM Page 347

Page 18: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 18/30

Human Taxonomy Creation

The second approach discussed here is to unleash a specialist in taxonomy development to master the domain of your portal and formulate a taxon-omy. This approach might be expensive and time consuming, but the tax-onomy would benet from the expertise and experience of the analyst.Many portal projects have taken this approach. One risk is that the taxon-omy development might expand to soak up too many project resources andmake it more challenging for the project to remain on schedule.

348 Chapter 10 Developing Portal Taxonomy

Table 10.3 Taxonomy and Categorization Tools

Product Vendor Notes URL

BrainEKP (Enter-prise Knowl-edge Platform)

The BrainTe c h n o l o-g i e s

Despite the name, thisis software; innova-tive visualization of taxonomy

www.thebrain.com

IDOL Server Autonomy Corp

www.autonomy.com

Inxight Categorizer Inxight Uses linguistic andstatistical analysis;includes visualization

www.inxight.com/

LexisNexis ContentOrganizer

Verity Inc. Prebuilt taxonomiesfrom those used by LexisNexis; may becombined with cus-tom taxonomy

www.verity.com

SemioTagger Entrieva Uses linguistic andstatistical clusteringtechniques

www.entrieva.com/

SemioTaxonomy Entrieva Collection of 27 prebuilttaxonomies

www.entrieva.com/

Stratify Classica-tion Server

Stratify Inc Linguistic and statisticalanalysis, statisticalclustering techniques

http://www.stratify.com/

chpt_10.qxd 2/2/04 9:00 AM Page 348

Page 19: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 19/30

Assisted Taxonomy Creation and Document

CategorizationThis third approach is a hybrid of the rst two. It involves human analystsin conjunction with automated taxonomy and search tools. There are many sources of data that can help with taxonomy development. Search query logs, analysis of library re f e rence requests, focus group results, n d i n g sf rom in-person interviews of individual knowledge workers, and surv e y results are all indicators of what content each segment of employees needs,and on what schedule. These sources also tell you about the knowledge workers’ information-seeking behavior, which in turn lets you know whichaccess methods (such as searching and browsing) and access points (such asmetadata elements) you need to use in schemas and in descriptive and nav-igational taxonomies.

Instantiating a Taxonomy

Now that you have dened the taxonomy, you need to implement it. This isthe phase when content and documents are assigned to locations within thetaxonomy. Once the taxonomy tree has been created, all the documents inthe system are tagged as belonging to one or more taxonomy categories.This process is typically referred to as categorization, tagging, or proling,depending on the vendor. Users can then browse and search within specic

categories.

Creating Categories in Content Management Server

Content Management Server uses channels to store taxonomy information.The channels help organize the pages of a site along consistent patterns andfacilitate navigation. Chapter 9 discusses how channels are created in Con-tent Management Server. Channels are represented as a hierarchy muchlike the taxonomy.

Instantiating a Taxonomy 349

chpt_10.qxd 2/2/04 9:00 AM Page 349

Page 20: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 20/30

Creating Categories in Commerce Server

Like Content Management Server, Commerce Server has no explicit func-tionality that is called taxonomy. It nonetheless uses taxonomy in its cataloghierarchies and categories. Therefore, the catalog is the place where thetaxonomy is instantiated in a Commerce Server site. Figure 10.5 shows thecatalog list. For instance, you could have a sporting goods store with cata-logs for each sport.

Catalogs are related to product categories. The categories can berelated as parents and children or as siblings. For instance, the golf category might have child categories of clubs, balls, apparel, and accessories. Golf apparel in turn might have shirts, trousers/pants, and footwear. Categoriescan be nested up to ve levels deep in Commerce Serv e r. Commerc eServer has been designed for up to 10,000 catalogs, and each catalog can

350 Chapter 10 Developing Portal Taxonomy

Figure 10.5 Commerce Server Catalog Denition Designer

chpt_10.qxd 2/2/04 9:00 AM Page 350

Page 21: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 21/30

contain up to 5 million items and 1,000 pro p e rty denitions. When you cre-

ate a category, you determine which properties are associated with items inthat category, as shown in Figure 10.6.You can assign catalog items to one or more categories, as shown in Fig-

ure 10.7.Your taxonomy and hence categories will evolve during the life of your

portal. If you monitor how users nd content on your portal, you can shapethe taxonomy to help them reach their destinations sooner.

Creating Areas in SharePoint

Areas are the means of instantiating your site taxonomy in SharePoint Por-tal Server. They are a hierarchy of terms used for three related purposes.

First, areas provide a vocabulary of terms from the taxonomy that are used

Instantiating a Taxonomy 351

Figure 10.6 New Category Denition

chpt_10.qxd 2/2/04 9:00 AM Page 351

Page 22: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 22/30

to categorize documents. Using areas simplies and helps promote consis-tency in the categorization process. Users may be more willing to check ahandful of areas than to spend the time to think up the keywords identied with a document.

Second, areas are used to organize the portal, as they are not only asso-ciated with documents but also with sites and people. You may want to useareas to show the expertise of your employees, and help establish networksamong peers to share knowledge.

Finally, areas are used to streamline searching. They provide metadatathat helps people nd content even when the area itself is not included inthe text of a document or other item. You can expose the areas as searchterms in the search itself, or use them as links on the results to allow a userto browse area results.

352 Chapter 10 Developing Portal Taxonomy

Figure 10.7 Catalog Editor

chpt_10.qxd 2/2/04 9:00 AM Page 352

Page 23: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 23/30

SharePoint Portal Server takes areas so seriously that there is a special

administrator role for maintaining them. The area manager maintains theareas, maps areas to users, and approves or rejects content requests. Areasare maintained in the site settings. They are also referenced in many otherparts of the portal, such as content pages.

To create an area, follow these steps starting from the portal home page:

1. Click the Site Settings link in the top navigation to open the SiteSettings page (Figure 10.8).

2. In the Portal Site Content section, click the Manage portal sitestructure link to open the Portal Site Map page (Figure 10.9).

3. ClickC reate Are a in the Actions menu on the Portal Site Map pageto open the Create Area page (Figure 10.10).

Instantiating a Taxonomy 353

Figure 10.8 SharePoint Portal Server Site Settings Page

chpt_10.qxd 2/2/04 9:00 AM Page 353

Page 24: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 24/30

354 Chapter 10 Developing Portal Taxonomy

Figure 10.9 Portal Site Map Page

Figure 10.10 Create Area Page

chpt_10.qxd 2/2/04 9:00 AM Page 354

Page 25: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 25/30

4. Decide where the area ts in your overall hierarchy. For instance,

the area MSF (Microsoft Solutions Framework) could be placed inthe Methodology area within Topics.5. ClickChange location to select a location for the new area (Figure

10.11).6. ClickOK in the Change Location dialog box and then again on the

Create Area page to save the new area.

TIP If you place the area in the wrong location, you can always move it later from the Portal Site Map page.

To delete an area, click the area name on the Portal Site Map page and

select Delete from the dropdown menu (Figure 10.12).In SharePoint, areas are the key to instantiating your taxonomy. If youused SharePoint Portal Server version 1, you can think of areas as the suc-cessors to categories.

Instantiating a Taxonomy 355

Figure 10.11 Change Location Dialog Box

chpt_10.qxd 2/2/04 9:00 AM Page 355

Page 26: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 26/30

Using Topics

SharePoint Portal Server has a special area called Topics that is included in

the default installation of the product. Topics is like other areas but isdesigned to highlight frequently used content and to be visible to the gen-eral population of portal users. Topics can contain web pages and objectssuch as news, calendar items, document libraries, people, and lists.

SharePoint includes a tool called the Topic Assistant to help you set upthe topic structure. To use the Topic Assistant:

1. On the portal home page, clickSite Settings in the top navigation.2. On the Site Settings page, clickUse Topic Assistant in the Portal

Site Content section to open the Use Topic Assistant page (Figure10.13).

3. The Area Assistant examines the areas that have been assigned to

existing content and suggests areas for new content. Click theEnable Topic Assistant checkbox.

356 Chapter 10 Developing Portal Taxonomy

Figure 10.12 Area Actions Menu

chpt_10.qxd 2/2/04 9:00 AM Page 356

Page 27: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 27/30

4. Click the Train Now link in the Training Status section of the pageto launch the training process.

The Topic Assistant is trained by running searches and analyzing thecontents of the documents that are found in the topic. The trained searchengine is subsequently directed to organize more search results accordingto the structure of Topics.

Adding a Person to an Area

By associating people with areas, you can catalog the expertise of your org a-nization and build communities of interest. To add a person to an area:

1. Navigate to the area by browsing or searching.2. Click Add Person in the Select Action portion of the left pane of the

page.3. On the Add Person page (Figure 10.14), click theSelect person l i n k

to nd the person in the Active Directory.

Instantiating a Taxonomy 357

Figure 10.13 Use Topic Assistant Page

chpt_10.qxd 2/2/04 9:00 AM Page 357

Page 28: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 28/30

4. On the Select a person page (Figure 10.15), ll in the person’s nameand clickFind to search through the directory and nd the account.

5. Choose the account from the Results list on the left and click Add to

select it. You can also enter the name directly in the account list if you prefer. ClickOK.

As you add a person to an area, you can map that entry to one or moreaudiences so the list for the person will be targeted to those audiences. Thisis a helpful way of encouraging people to nd one another, such as devel-oping a directory of specialists or experts in a eld.

358 Chapter 10 Developing Portal Taxonomy

Figure 10.14 Add Person Page

chpt_10.qxd 2/2/04 9:00 AM Page 358

Page 29: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 29/30

Conclusion: Business Value of the Taxonomy

The most important part of this chapter is a discussion of why taxonomy isso vital in today’s organizations. Knowledge is power, and the organization with the right knowledge gains competitive advantage. I am familiar withthe sales process within the IT industry, and I have learned rsthand thepower of this principle. I am far more likely to be successful in selling my product when I am able to show superior knowledge of my product to my clients. Equally important is knowledge of my client’s organization. Thisinformation is difficult to obtain; but before going into a sales call, I would want access to all previous presentations to this client, any recent news orpress releases regarding the company, and specic information concerning

Conclusion: Business Value of the Taxonomy 359

Figure 10.15 Select a Person Page

chpt_10.qxd 2/2/04 9:00 AM Page 359

Page 30: Developing Portal Taxonomy

8/8/2019 Developing Portal Taxonomy

http://slidepdf.com/reader/full/developing-portal-taxonomy 30/30

the personalities involved. Armed with this kind of knowledge, I can more

easily navigate through any obstacle to the sale to the best possible positionagainst my competition.The problem facing many corporations today isn’t that the information

doesn’t exist; rather it is that they cannot locate the information quickly.Imagine that salesperson leaves a company without handing over his noteson clients and prospects. Many organizations have learned from such expe-riences to keep a re p o s i t o ryof employees’ documents; but without an eff e c-tive knowledge map, the right documents would be difficult to locate.Analysts estimate that over 80% of an organization’s information exists inunstructured format, such as meeting notes. This information is accumu-lating at an accelerating rate as more and more organizations move to elec-t ronic re c o rds management and document management. Yet, data and

i n f o rmation adds little value to an organization because only knowledgegives an organization power and competitive advantage. Inform a t i o nrequires context in order to become knowledge, and this knowledge mustget to the hands of someone who can use it.

Knowledge management experts dene data in its proper context asinformation. At the next higher rung of the knowledge ladder, informationin its proper context is called knowledge. One could go one step higher andcall information in its proper context wisdom, but for now most companiesare content with extracting knowledge from their organization. So the bigmessage is that successful organizations need to look toward moving up thepyramid by providing context to that which already exists within their orga-nization. Over the last 50 years, businesses have invested trillions of dollarsin IT in order to get and record information. Now businesses need to beginto shift their focus toward getting this information to the right people. Fur-t h e rm o re, because time is money, the faster an organization can impartknowledge to its employees and clients by supplying the correct informa-tion at the right time, the more money it will make. It really pays for anorganization to invest in developing a well-designed taxonomy as a founda-tion for maximizing the potential of its people and, in turn, the org a n i z a t i o n .The downside, of course, is the considerable upfront cost involved, thoughthe potential benets are well worth the investment.

360 Chapter 10 Developing Portal Taxonomy

chpt_10.qxd 2/2/04 9:00 AM Page 360


Recommended