+ All Categories
Home > Documents > Hardware to support runtime intelligence

Hardware to support runtime intelligence

Date post: 22-Sep-2016
Category:
Upload: in
View: 220 times
Download: 2 times
Share this document with a friend
4
__ ....... Hardware to Support Runtime Intelligence Ian N. Robinson, Hewlett-Packard Laboratories, 1501 Page Mill Rd., Bldg. 3L, Palo Alto, CA 94304, e-mail [email protected] A n application’s “intelligence” is often judged by how it in- teracts with its environment. This could involve, for instance, the control of an automated factory floor or a dialogue with a user. Such interaction requires the ability to readily store, ac- cess, and modify information acquired during execution. The data structures that encode this information are typi- cally expressed declaratively (as for ex- ample, facts, constraints, rules, or frame- based objects) and constitute the application’s knowledge base. They are essentially interpreted at runtime - their activation being based on pattern matching against some other structure encoding a current event, query, or goal in the application. This access mecha- nism is central to the performance of the system. Software mechanisms to support this access (commonly based on hashing) are complicated by the fact that these data structures often have an arbitrarily complex structure, as opposed to the sets of uniformly formatted data types found in databases. Also, when dealing with knowledge bases, very general ac- cess is typically required; this, in turn, requires that indexes be maintained on all fields of the stored data. Such index- ing is complicated by the use &-wild cards, or variables, which allow general- izations or partial information to be stored. Various compile-time techniques have been developed to handle such com- plexities (for example, discrimination nets,’ Rete networks for OPS5,2 and WAM (Warren abstract machine) code for Prolog3), but they do not adapt well Expressions, symbols, and pattern matching , among possibly other things. b robot at location (3, 9) carrying of 2. Expression 3 describes so ng rules over these parentheses match es any other variabl list variable matche /or parentheses, in- @robot ( at 2-10 41 .......... Expression 1 @robot ( at 3 9 ) 4 4 4 4 4 EQ EQ EQ V1- EQ .......... Expression 3 U Substructure
Transcript

__ .......

Hardware to Support Runtime Intelligence

Ian N . Robinson, Hewlett-Packard Laboratories, 1501 Page Mill Rd., Bldg. 3L, Palo Alto, CA 94304, e-mail [email protected]

A n application’s “intelligence” is often judged by how it in- teracts with its environment.

This could involve, for instance, the control of an automated factory floor or a dialogue with a user. Such interaction requires the ability to readily store, ac- cess, and modify information acquired during execution. The data structures that encode this information are typi- cally expressed declaratively (as for ex- ample, facts, constraints, rules, or frame- based objects) and consti tute the application’s knowledge base. They are

essentially interpreted at runtime - their activation being based o n pattern matching against some other structure encoding a current event, query, or goal in the application. This access mecha- nism is central to the performance of the system.

Software mechanisms to support this access (commonly based on hashing) are complicated by the fact that these data structures often have an arbitrarily complex structure, as opposed to the sets of uniformly formatted data types found in databases. Also, when dealing

with knowledge bases, very general ac- cess is typically required; this, in turn, requires that indexes be maintained on all fields of the stored data. Such index- ing is complicated by the use &-wild cards, or variables, which allow general- izations or partial information to be stored.

Various compile-time techniques have been developed to handle such com- plexities (for example, discrimination nets,’ Rete networks for OPS5,2 and WAM (Warren abstract machine) code for Prolog3), but they do not adapt well

Expressions, symbols, and pattern matching

, among possibly other things. b robot at location (3, 9) carrying of 2. Expression 3 describes so

ng rules over these

parentheses match es any other variabl list variable matche /or parentheses, in-

@robot ( at 2-10 4 1 . . . . . . . . . . Expression 1

@robot ( at 3 9 )

4 4 4 4 4 EQ EQ EQ V1- EQ

. . . . . . . . . . Expression 3

U Substructure

to runtime. The overhead of maintain- ing these complex and highly interde- pendent indexing structures places a considerable burden on the application's performance. For the application to be interactive, there is often a real-time constraint on the applicability of the data being accessed. Information on how

best to avoid colliding with another moving object is of little use to the sys- tem after impact.

Associative hardware,4 on the other hand, excels in applications where rap- id access is required to data that is dy- namic, unordered, and unpredictable; witness the widespread use of associa-

tive hardware in cache memories and translation look-aside buffers (TLBs). This article concerns an associative memory system that is specialized to support the syntax and associated pat- tern-matching rules common to declar- ative expressions (see the sidebar enti- tled "Expressions, symbols, and pattern

Figure 1. Pattern-addres~ahle-tnem~r~ architecture.

Match token

Wrap- around i"l Match engine 0 ~ Match engine 4

E- . Direction of stack growth

Figure 2. Pages and match engines (page 0 selected). Multiplexing is based on a hypothetical PAM chip containing five match engines and 20 words of storage (enough to hold the two expressions). Tokens appear after a match on @robot. Pages 1 and 3 can be dropped because they contain no tokens. Pages are reactivated by the wraparound logic that allows tokens to move from one page to the next.

64 COMPUTER

matching”). Hence, the name pattern- addressable memory , or PAM. The arti- cle describes a coprocessor board based on an array of custom VLSI chips that combine both logic and memory, and shows how this hardware supports a number of popular intelligent-system architectures.

Pattern-addressable memory. A num- ber of associative memories have been designed to support matching on sym- bolic data.4~6 Such hardware has typical- ly been based on traditional content- addressable memory (CAM), with any additional logic replicated for every word of storage. Pattern matching using the full syntax of declarative expressions is not well suited to this organization be- cause so much of the process cannot be supported by comparators alone. Add- ing to the complexity of the additional logic, with the frequency of its replica- tion, quickly gives rise to unworkably poor memory densities.

The PAM chip uses an organization in which the comparators and the match logic are multiplexed over small blocks of conventional RAM. This match en- gine, with its attached memory, forms the basic building block for the PAM chip (see Figure 1). The coprocessor board, in turn, contains an array of these chips and an array controller. Except for individual reads and writes, opera- tions are performed in parallel over all the match engines in all the chips in the array. The board is designed so that a number of them also can be used to- gether, attached to the same host.

Expressions are stored, matched, and output by the PAM as strings of sym- bols in an as written order. The memory within each PAM chip is managed col- lectively as a single stack. Expressions written to the stack have their symbols stored in consecutive words, or slots.

During pattern matching, the array controller broadcasts the query sequence to the match engines. As each query symbol is entered, the match engines scan their attached memories for match- es. Matching on the stored expressions entails a sequence of matches on their individual symbols. Match tokens mark the state of each of these match se- quences. Match tokens are generated

Figure 3. Prototype PAM board in host system.

by the match engines (based on the comparator output and the symbol types), and are stored alongside each matching slot. Tokens move through the expressions as they match the in- coming query, disappearing on symbol mismatches. At the end of query input, responders (expressions that pattern match) are marked by surviving tokens.

Figure 2 shows how sequential scan- ning of the memory blocks effectively divides the stack into pages. The page select is common to all chips in the array and is driven by the array controller. Match times can be reduced by not re- scanning pages on which no match to- kens exist. In the best case, the page sequence will rapidly be cut down to the one page containing a responder. Match- ing would then proceed as if there were no multiplexing. Match times can be further reduced if a priori knowledge, regarding where groups of expressions are located, can be used to restrict the number of pages matched on from the outset. These page control schemes help to offset the performance penalties of higher multiplexing ratios, making high- er memory densities feasible.

A conservative multiplexing ratio of 16 was chosen for the prototype chip, partially because this yielded a roughly 50150 split between logic and RAM cell area. (It isinteresting tocompare this to the 40/60 split common to conventional

RAM chips.) Using a 1.2-micron CMOS process, an 8-by-8 array of these blocks resulted in a prototype chip containing 1,024 32-bit-symbol-plus-2-bit-status slots and 64 match engines in an active area of about 20 square millimeters. This represents a storage density more than20 times that of the nearest compa- rable content-addressable memory- based design.(‘ The chip has a cycle time of 200 nanoseconds.

Figure 3 shows the completed proto- type board residing in its host chassis (a Hewlett-Packard 9000 series 300 work- station). The board contains an array of 16 PAM chips. Apart from input and pattern matching, the hardware also supports the output of responders, their modification, their deletion, and the garbage collection of freed-up slots.

Applications. The most obvious ap- plication is that of querying a dynamic database (see the sidebar entitled “Find that robot!”). The expressions, and the slot values within them, can be subject to continual insertion. deletion, and update, with little overhead besides read- ing or writing to the PAM’S storage. (Because it is run in parallel across all chips. occasional garbage collections take only 400 microseconds.) N o index- ing schemes need be supported. The declarative language Prolog uses a sim- ilar form of pattern matching between

May 1992 65

subgoals and a knowledge base of facts and rules as part of its fundamental execution mechanism.'

The stored data structures can also take on a more active role as rriggers activated by particular queries. If these queries represent events observed by the system. then actions can be trig- gered by the responders to these events. Just as before. the PAM can hold thou- sands of triggers and allow them to be continually created, modified, and de- leted. This style of operation allows for the kind of interrupt-driven behavior popular in real-time control systems. Triggers can be extended to activate expressions based on particular combi- nations of events.'This allows the PAM to support the firing of situation-action rules, as found in production-system- style languages.

Finally. blackboard systems- are a popular software architecture for appli- cations involving monitoring and con-

trol in complex environments. The black- board is a central knowledge base shared by a number of knowledge sources. I t serves to schedule and control them. as well as provide a context for their ac- tions. Since all of these actions involve associative access. commonly to transi- tory data. the PAM is well suited for use as a blackboard accelerator.

Conclusions. This article argues for the applicability of associative hardwarc to the problems of handling runtime data structures in interactive applica- tions, having pointed out some short- comings of previous hardware designs and software approaches. The PAM design has the advantages of directly supporting the complexities of expres- sions and their pattern matching. high memory density. and ease ofimplemcn- tation through the use of conventional memory technology.

The architecture enjoys all the band-

0 1 4 4 9 6 0 0 - 3 6 2 4 0 0 -

1 1 5 1 1 0 2 0 6 1 6 5 4 3 3 0 0 6 5 7 2 1 5 3 1 0 5 1 2 - 4 5 3 3 1 2 -

3 1 5 5 1 0 8 1 6 - 4 7 3 6 1 6 -

4 1 5 7 1 1 1 2 0 - 4 9 3 9 2 0 -

66

width and scalability advantages of a logic-in-memory SlMD organization. The computational bandwidth on even the small prototype chip is in excess of I O Gbits per second. The system can be scaled more or less arbitrarily: more blocks t o a chip (through larger die and tighter design rules). more chips to a board (through better packaging), and more boards to a system. Using com- mercial design rules, a system with four boards. each containing64 1-Mbit chips, would have a capacity of 32 Mbytes and an aggregate computational bandwidth of 2 x 1O"bits per second.

References 1. E. Charniak. C.K. Riesbeck. and D.V.

McDermotl . Arrificiul frirelligerice Pro- gruwiruitig. Lawrence Erlbaum Assoc.. Hillsdale. N.J.. 1980. pp. 121-176.

2. C.L. Forgy. "RETE: A Fast Algorithm for the Man)r-PatterniMany-Object Pattern- Match Problem." Arrificial Intelligence. Vol. 19, No. I . Aug. 1082. pp. 17-37.

3. D.H.D. Warren. "Implementing Prolog." Tech. Report 39. Edinburgh Univ.. 1977.

4. S.S. Yau and H.S. Fung. "Associative Pro- cessor Architecture -A Survcy," Compur- iriXSirrvr\...\.Vo 1.9 .No. I . Mar . 1 9 7 7 . p ~ . 3- 28.

S. P. Kogge et al.. "VLSI and Rulc-Based Sys- tems." V L S l f o r A rt i f ic iul Intelligrrice, J.G. Delgado-Frias and W.R. Moore,eds., Kluw- erAcadcmic. Boston. 1989,pp. 95-IOX.

6. M. Hirata et al.. "A Versatile Data String- Search VLSI." J . Solid-Sture Circi t ih, Vol. SC-23.No.Z.Apr. 1988.pp.329-335.

7. B. Hayes-Roth. "A Blackboard Architec- turc for Control." J . Artificial Intelligencr. Vol. 26. No. 3. July 1985, pp. 251.321.

Ian N. Robinson is a member of the technical staff at the Hewlett-Packard Laboratories, Palo Alto. California. His research interests include VLSI design. parallel computer ar- chitecture. and artificial intelligcnce.

Robinson received a RSc degree in physics with an emphasis o n electronics from the university of Sussex. England, in 1979. H e is a member of the IEEE Computer Society.

COMPUTER


Recommended