+ All Categories
Home > Documents > Transputer implementations of chemical substructure searching algorithms

Transputer implementations of chemical substructure searching algorithms

Date post: 24-Apr-2023
Category:
Upload: sheffield
View: 0 times
Download: 0 times
Share this document with a friend
11
Tetrahedron Computer Methodology Vol. 1, No. 3, pp. 207 to 217, 1988 0898-5529/89 $3.00 + .00 Printed in Great Britain. © 1989 Pergamon Press plc Transputer Implementations of Chemical Substructure Searching Algorithms Geoffrey M. Downs*, Michael F. Lynch a, Peter WilletP,*, Gordon A. Manson b, George A. Wilson b aDepartmentof InformationStudies, bDepartment of ComputerScience, SheffieldUniversity,SheffieldS10 2TN, UK Received 28 November 1988,Accepted16 January 1989 Key words: Substructure search; Parallel processing; Relaxation algorithm; Set reduction algorithm Abstract: Two chemical substructure searching algorithms, the relaxation algorithm and the set reduction algorithm, are introduced and described. Transputer based serial implementations of both are compared for performance; the relaxation algorithm is shown to be both more effective and more efficient. Strategies are discussed for multi-transputer implementations of the relaxation algorithm. Experimental results show that near-linear speedups are obtained with networks containing up to 21 transputers. 1. CHEMICAL SUBSTANCE SEARCHING Given a chemical substructure query (a query) and a database of chemical structures (structures), chemical substructure searching attempts to find the particular pattern of atoms and bonds present in the query in each of the structures in the database. Chemical structures can be represented as topological graphs, where the nodes of the graph represent atoms and the edges represent bonds; the problem of substructure search then becomes that of subgraph isomorphism. Substructure searching ultimately involves a computationally intensive, exhaustive and exact atom-by-atom backtracking search, la where individual query atoms are mapped onto the atoms of the database structure; possible matches are then extended by further assignments until either all of the query atoms have been matched, in which case the query substructure is present and the molecule is retrieved, or a mismatch is determined, in which case the search backtracks to enable alternative assignments to be made. The computational demands of substructure searching have led to considerable interest in software and hardware techniques that allow interactive substructure searching to be carded out on large files. Software approaches. Software approaches involve the development of computationally simple screening searches, where the requirement is that the great majority of structures that cannot possibly match the query are efficiently eliminated, thus reducing the number of candidates that must be processed by the final intensive search. A trivial (but important) example of a screen is that the number of query atoms must not exceed the number of structure atoms. Examples of this approach are the set reduction, 3,4 and relaxation 5-7 screening algorithms, and such techniques are widely used in operational substructure searching systems),2 In general, these types of screen search will terminate on one of three conditions: (1) the structure fails to pass the screen and is not processed further; (2) a specific match is identified and reported; (3) no further refinement is possible with the search, and further detailed processing by the backtracking search is required. In this case, the tentative assignments of query atom to structure atoms already identified may be used as initial data for the backtracking algorithm, thus avoiding much of the search tree. Other approaches 207
Transcript

Tetrahedron Computer Methodology Vol. 1, No. 3, pp. 207 to 217, 1988 0898-5529/89 $3.00 + .00 Printed in Great Britain. © 1989 Pergamon Press plc

Transputer Implementations of Chemical Substructure Searching Algorithms

Geoffrey M. Downs*, Michael F. Lynch a, Peter WilletP,*, Gordon A. Manson b, George A. Wilson b

aDepartment of Information Studies, bDepartment of Computer Science, Sheffield University, Sheffield S 10 2TN, UK

Received 28 November 1988, Accepted 16 January 1989

Key words: Substructure search; Parallel processing; Relaxation algorithm; Set reduction algorithm

Abstract: Two chemical substructure searching algorithms, the relaxation algorithm and the set reduction algorithm, are introduced and described. Transputer based serial implementations of both are compared for performance; the relaxation algorithm is shown to be both more effective and more efficient. Strategies are discussed for multi-transputer implementations of the relaxation algorithm. Experimental results show that near-linear speedups are obtained with networks containing up to 21 transputers.

1. CHEMICAL SUBSTANCE SEARCHING

Given a chemical substructure query (a query) and a database of chemical structures (structures), chemical substructure searching attempts to find the particular pattern of atoms and bonds present in the query in each of the structures in the database. Chemical structures can be represented as topological graphs, where the nodes of the graph represent atoms and the edges represent bonds; the problem of substructure search then becomes that of subgraph isomorphism. Substructure searching ultimately involves a computationally intensive, exhaustive and exact atom-by-atom backtracking search, la where individual query atoms are mapped onto the atoms of the database structure; possible matches are then extended by further assignments until either all of the query atoms have been matched, in which case the query substructure is present and the molecule is retrieved, or a mismatch is determined, in which case the search backtracks to enable alternative assignments to be made.

The computational demands of substructure searching have led to considerable interest in software and hardware techniques that allow interactive substructure searching to be carded out on large files.

Software approaches. Software approaches involve the development of computationally simple screening searches, where the requirement is that the great majority of structures that cannot possibly match the query are efficiently eliminated, thus reducing the number of candidates that must be processed by the final intensive search. A trivial (but important) example of a screen is that the number of query atoms must not exceed the number of structure atoms. Examples of this approach are the set reduction, 3,4 and relaxation 5-7 screening algorithms, and such techniques are widely used in operational substructure searching systems) ,2 In general, these types of screen search will terminate on one of three conditions: (1) the structure fails to pass the screen and is not processed further; (2) a specific match is identified and reported; (3) no further refinement is possible with the search, and further detailed processing by the backtracking search is required. In this case, the tentative assignments of query atom to structure atoms already identified may be used as initial data for the backtracking algorithm, thus avoiding much of the search tree. Other approaches

207

208 G.M. DowNs et al.

involve the identification of properties, e.g., atom or bond types, or fragment types, that minimize the number of structure atom to query atom correspondences that must be considered during backtracking searches.

H a r d w a r e approaches . Hardware approaches involve the use of computer systems exhibiting some degree of parallelism. An operational example of large-scale parallelism is provided by the CAS ONLINE system, s The actual matching operations, i.e., screen search and atom-by-atom search, are carried out in a conventional manner using conventional processors; however, the search file is partitioned into discrete subsets, each simultaneously being searched by a separate processor, so that the parallelism here is at the database level. A freer degree of parallelism is evident in work by Wipke and Rogers 9 which considers the simulation of a star network of microprocessors being used for atom-by-atom searching. A basis for parallelism is provided by allocating each possible assignment of a query atom to a separate processor, and such assignments spawn further processes as the mapping is extended. Their work suggests that a multiprocessor system could lead to gubstantial increases in the speed of substructure searching, and similar conclusions were arrived at by Gillet et al. 7 in a simulation of a distributed implementation of the relaxation algorithm. Although these results are promising, neither of the investigations implemented 'an operational, microprocessor-based, multiprocessing system for chemical substructure searching. In this paper we discuss such a system; some preliminary results have been reported by Brint et al., 1° and Lynch et al. n

The relaxation algorithm is described in Section 2, and the set reduction algorithm in Section 3. Both have been implemented as screening algorithms; the implementation of the set reduction algorithm can als0 be extended to include an exhaustive backtracking search, but this is not discussed in this paper. Section 4 presents comparative timings of serial implementations of both algorithms, using fifteen different data sets. Section 5 discusses a distributed implementation of the relaxation algorithm using a common multi-process0r construct, the processor farm, and the paper closes with a brief summary of our main findings.

2. RELAXATION ALGORITHM

Relaxation algorithms were originally developed for numerical applications, but have been applied more recently to chemical substructure searching. 35 In general, such algorithms consist of applying inithl estimate values to some parameters of interest, and then iteratively refining those values until one of thr~ termination points is encountered: (1) a solution to the problem has been found; (2) some property of the values is detected that proves no solution can be found; (3) further iterations produce no improvement in the values, and hence the solution remains imprecise.

In the chemical structure context, the parameters are the atoms of the structure; the initial values of those parameters are sets of possible corresponding query atoms (the labelset of each structure atom), estimated according to some properties of the query and structure atoms; and refining the values consists of eliminating some of the members of each labelset, according to some properties of the first order neighbors of the atoms. The properties used to generate the initial labelsets arc: atom type, atom degree (the number of connected neighbor atoms), and whether the atom is connected in a ring of atoms or not.

The conditions satisfying the termination points described above arc: (1) each labelset contains only one member, each of which is different, (2) at least one query atom is not contained in the labelmap (the set of all labelsets), i.e., it cannot correspond to any of the structure atoms, (3) some labelsets will still contain more than one member. This may be due to two conditions: (a) the algorithm is simply not powerful enough to identify the substructure, or (b) a number of isomorphic solutions are possible, wherein the structure atom may be assigned to more than one of the members of the labelset.

The iterative refinement stage examines and removes atoms from the labelsets by examining the connectivities and properties of the neighbors of the tentative assignments identified in the labelsets. If any tentative assignment fails any of the tests, then it cannot be a valid assignment and is eliminated from the labelset. A detailed description of the algorithm is presented by yon Scholley. 6

A simple pseudo code description of the algorithm can be found in Fig. 1. The procedures of this description perform the following functions:

Chemical substructure searching algorithms 209

create.label.sets REPEAT

REPEAT check.neighbors (any.eliminations) check.degrees (any.eliminations)

UNTIL NOT any.eliminations prune.structure.atoms check.all.query.atoms.active (mismatched) IF NOT mismatched

check.assignments (any.eliminations) IF NOT any.eliminations

matched := TRUE UNTIL matched OR mismatched

Fig. 1. Relaxation Algorithm in Pseudo-code.

check.neighbors - for each tentative assignment: check whether further tentative assignments exist for all the first order neighbors, if not, eliminate the assignment.

check.degrees - for each tentative assignment: check whether the degree of the query atom is greater than that of the structure atom, if so, eliminate the assignment.

prune.structure.atoms - remove any structure atoms identified as not possibly mapping to any of the query atoms.

check.all.query.atoms.active - fail if any query atom is not present in the labelmap. check.assignments - identify any examples of unique tentative assignments.

3. SET R E D U C T I O N A L G O R I T H M

Two main phases can be identified in the set reduction algorithm: a generation phase and a reduction phase.

The generation phase generates a list of pairs of sets {Q/., Si}; each Q/. contains all of the atoms of the query that exhibit some property Pi, and each Si contains all of the structure atoms that exhibit some property Ri, where the properties Pi and R i are chosen such that each member of Qi corresponds to some member(s) of S i. For complete structure matching, the properties P~, R i are the same, whereas for substructure searching this is not always true; for example, if Dq, Ds are the degrees of the query and structure atom, then the properties P, R are (Dq = n), (Ds >= n).

The reduction phase forms a new, generally smaller, list of sets Srom some intersection of the original sets. The new sets exhibit properties that are combinations of the old properties; i.e., the new properties are more discriminatory than the old properties and hence the sets {Q, S} would be expected to contain fewer members each.

The algorithm begins by generating an initial list of sets using the same properties as for the relaxation algorithm, i.e., atom type, degree, etc. There is then repetition of a reduction phase followed by a generation phase, again using the same connectivity property as in the relaxation algorithm; from this we see that the set reduction algorithm is another instance of a relaxation algorithm, and the same three termination points are available. The conditions to satisfy these terminations are: (1) every query atom can be found as the only member of some set Qi, where S i also contains only one member (the members of all sets S are all different implicitly from the nature of the reduction phase); (2) if a pair Qi, Si containing n, m members is found, and n > m, then there are (n - m) members of Qi that cannot correspond to any structure atoms: a cardinality violation has occurred; (3) no further different sets are generated by the connectivity property, and the

210 G . M . DowNs et al.

generate.initial.sets REPEAT

partition (matched, cardinality) IF NOT (matched OR cardinality)

add.connectivity (matched, cardinality, none.generated) IF none.generated

matched := TRUE UNTIL matched OR cardinality

Fig. 2. Set Reduction Algorithm in Pseudo-code.

solution remains unknown. As mentioned in Section 1, a backtracking phase may be entered here to identify the specific matches, but the implementation discussed here simply terminates, so as to allow a direct comparison of the efficiency of screening with that resulting from the relaxation algorithm. Detailed discussions of the algorithm are presented by Sussenguth 3 and by Figueras. 4

A simple pseudo-code description of the algorithm can be found in Fig. 2. The procedures of this description perform the following functions:

partition - for each query atom qk, generate the sets {Q~, S,}, comprising the intersection of all sets {Q~, Si} (where q is a member of Qi) with an sets {NOT Qj, NOT S~} (where q is not a member of Qj, and I~, Sj have the same number of members). Check each new set for cardinality violation, and for specific assignment. Set flag matched if specific assignments have been identified for all query atoms. Finally, the new list of generated sets replaces the old list of sets.

add.connectivity - for each pair {Qi, Si}, add new sets {Q', S'} comprising the first order neighbors of the members of {Qi, Si}. Check each new set as for partition procedure.

4. T IMING OF SERIAL IMPLEMENTATIONS

The two algorithms were implemented as serial algorithms on a single transputer family processor, 1213 using a Research Machines Nimbus PC with local hard disc as a host terminal and file support system. The only transputer processor available at the time equipment was purchased for this project was the T4 revisionA; the transputer family has now been extended to include new and much higher performance devices. Generic family features, and specific device features, are discussed further in the Appendix to this paper. Specific features of the T4 revisionA device used are: four 10 Mbit/s communication links enabling communication rates up to approximately 800 Kbytes/s, a 12.5 Mhz processor clock, an internal hardware scheduler which supports simulated concurrency, and 2 Kbytes of fast internal memory. The implementations described in this section make no use of the transputer's special features for concurrent operation ( i .e . , the links and simulated concurrency); Section 5 discusses distributed implementations that do make use of these features.

Each algorithm was written as a procedure in occam 14 (the primary language available for programming networks of transputers) and supported by the same program harness. The harness read the data files into memory buffers (to avoid timing variations due to disk accesses), prepared query/structure pairs for submission to the searching procedures, and performed timings on the searching procedures. The two programs were then run against test files characteristic of those that would be encountered in a conventional substructure searching system. Specifically, 14 typical queries were extracted from the chemical structure handling literature and then screened against the Fine Chemicals Directory (FCD), a file of ca. 50,000 commercially available fine chemicals. Algorithmically defined atom-centered and bond-centered fragment screens, analogous to those used in systems such as CAS Online or MACCS: ,2 were used in the searches

Chemical substructure searching algorithms

Table 1. Comparison of Serial Relaxation to Serial Set Reduction Algorithm.

211

Relaxation Algorithm Set Reduction Algorithm

File No. of No. of Elapsed Average No. of Elapsed Average searches matches (secs) (ms) matches (secs) (ms)

1 194 183 47.41 244 185 53.33 275 2 112 112 28.99 259 112 27.61 247 3 14 14 1.89 135 14 3.44 246 4 15 15 2.08 139 15 4.88 325 5 43 26 15.36 357 28 11.82 275 6 99 99 10.72 108 99 22.37 226 7 11 11 1.09 99 11 1.31 119 8 134 134 3.65 27 134 2.34 17 9 92 92 2.45 27 92 1.63 18

10 37 37 1.00 27 37 0.66 18 11 207 207 7.09 34 207 4.10 20 12 33 33 1.00 30 33 3.25 98 13 63 63 1.12 18 63 2.77 44 14 74 74 1.37 19 74 5.52 75 99i 99 99 42.43 429 99 48.72 492 99f 9801 105 182.65 19 200 671.03 68

For each query, the molecules passing the screen search were written to a data file for subsequent processing by the two graph matching algorithms; these f'des contained between 11 and 207 structures each. An additional file of 99 structures was also used in this work; these structures had been used previously by Gillet et al . , 7 and the file was used in two different ways: (1) in the identity search each query was searched against itself only, i .e . , 99 searches were performed; (2) in the full search each query was searched against the whole file, i .e . , 99*99 (9801) searches were performed.

Table 1 reports timings generated from these data; in the table the identity search is identified 99i, and the full search 99f. The average search time reported is simply the elapsed time divided by the number of searches performed.

Inspection of these results shows that the elapsed time of the relaxation search is generally smaller than of the set reduction search and that the relaxation algorithm is more discriminating (in that there are cases where it eliminates slightly more structures than the set reduction algorithm). Finally, we note that the relaxation algorithm requires a known maximum data space according to the maximum number of atoms supported for any single structure atom, whereas the set reduction ~ algorithm requires a variable data space according to the complexity of any individual search. We hence conclude that the relaxation algorithm is more cost effective as a screening search. This finding forms the basis for the work reported in the remainder of this paper, which considers the parallel implementation of the relaxation algorithm on a network of transputers.

5. IMPLEMENTATION OF THE RELAXATION ALGORITHM ON TRANSPUTER-BASED PROCESSOR FARMS

Many mechanisms are available to introduce parallelism into any general application, including general distributed implementation, datastream pipelining and datastream partitioning (processor farms). Of these, the processor farm has been identified as a convenient, simple and extensible (in the number of processors used) mechanism to support the distributed execution of a wide range of problems in which the following

212 G.M. DOWNS et al.

PROC root read (from.hitfile, query) IN PARALLEL

FOR each structure in hitfile read (from.hitfUe, structure) write (to.farm, query, structure)

FOR number of structures in hitfile read (from.farm, query.id, structure.id, status) IF status = matched

write (to.user, query.id, "matched with ", structure.id)

PROC farm WHILE TRUE - - ( i . e . , loop forever)

read (from.root, query, structure) relaxation.search (query, structure, status) write (to.root, query.id, structure.id, status)

Fig. 3. Database Parallelism Using Processor Farm.

characteristics can be identified: (a) the same operation can be carded out concurrently on a number of distinct data items; (b) for any given set of data items introduced into the farm, the order of the set of returning results is not important; (c) the number of data items is greater (preferably much greater) than the size (i.e., number of nodes) of the farm.

Notice that condition (b) does not prohibit iterative algorithms such as those discussed here, where a complete set of results from one iteration is required for a subsequent iteration. Successive iterations are simply regarded as distinct calls to the processor farm, where each call does not proceed until the previous has finished.

The processor farm comprises two sections, a single root processor, and a number of node worker processors. The root runs the application process, sends data packets into, and collects results packets out of, the farm section, which contains the nodes of the farm. Each node contains routing processes to distribute data and result packets around the farm, and a work process to perform the function required by the farm. Notice that a process is some logical unit of an occam program, whereas a processor is a physical device that executes one or more processes of the program.

Two levels of parallelism are available when implementing the relaxation algorithm using the processor farm; we refer to these as database parallelism and algorithmic parallelism.

Database Parallelism. Database parallelism attempts to increase the overall database search speed by distributing the database across the nodes of the farm. The root application forms data packets comprising a query and a structure, and the nodes of the farm each contain the complete searching algorithm. Fig. 3 illustrates the operation.

A l g o r i t h m i c Paral le l ism. Algorithmic parallelism attempts to increase the speed of individual searches. The root application is the search algorithm, which iteratively farms off computationally intensive components for processing by the nodes of the farm. This type of parallelism has been investigated previously in the simulation studies of Gillet et al. 7 Both the check.neighbors and check.degrees procedures are suitable for implementation on the processor farm, although check.degrees is computationally simple and so retained as a serial procedure. Fig. 4 illustrates the node processes, and the revised check.neighbors procedure.

Chemical substructure searching algorithms 213

PROC check.neighbors in parallel

FOR each structure.atom[j] write (to.farm, j, structure.atom[j], labelmap)

FOR number of structure.atoms read (from.farm, labelmap[i])

PROC farm WHILE TRUE

read (from.root, j, structure.atom, labelmap) FOR each tentative correspondence i,j in labelmap[j]

correspondence.test (query.atom[i], structure.atom, labelmap)

write (to.root, labelmap[j])

Fig. 4. Algorithmic Parallelism in Relaxation Algorithm.

Having discussed the two main types of parallelism that can be exploited by means of a farm, we now consider practical, transputer-based implementations of a farm. Clearly the limited number of links and their point-to-point connection constrains the number of topologies that may be implemented in practice. Possible options include ring, linear chain, binary and ternary tree, square array, and/or combinations of all the above.

Earlier experiments in this project investigated versions of the linear chain, ternary tree, and square array. 1°,11 Experience gained in these experiments has shown some general design principles that are useful to follow: (1) simple topologies are simple to program and debug, and require low data routing overheads; (2) concurrent operation of the links imposes very little overhead on the CPU (a maximum of only 10% with all links operating at full capacity), suggesting the use of as many links as possible to gain maximum communication bandwidth; (3) transmission of data in large packets is very much more efficient than in small packets. In general, techniques of blocking and un-blocking of data improve communications performance, but impose greater processing overheads.

In the light of these factors, the topology considered most suitable for further development and study was a simple (single) linear chain and its natural variants, the double and triple chain. Fig. 5 illustrates the logical and physical implementation of the triple chain. Here, the farm contains three identical chains of nodes (of any length), each node in any chain being identical to any other node in the farm. Each node contains two processes running concurrently; the Router process routes data and result messages, and is connected to the router processes of other nodes in the same chain; the Task process is connected to the router process on the same node, and communicates data/result messageg with it. Note that many processes in the system comprise a number of smaller sub-processes; for example the Router process contains two sub- processes enabling concurrent manipulation of data and result packets. The Root processor is connected to the PC host, and hence to the terminal and local disk, via one link, and to the farm by the remaining three links. The root processor contains the Application process, and three Buffer processes to enable the three links to transmit data concurrently.

An important parameter in multi-processor systems is the speedup that can be obtained when more than one processor is used. If some computational task takes time T(p) to execute on a network of p processors, then the speed-up Sfp) is defined S(p) = T(p) / T(1). Ideally, S(p) = p so that, for example, a network containing 10 transputers would process data at 10 times the rate of a single transputer; however, factors such as inter-processor communication or synchronization delays can result in substantially sub-linear behavior. Earlier experiments with database parallelism using a single chain implementation of the relaxation algorithm

214 G . M . DOWNS et al.

Host (RM Nimbus PC)

[Link interface~

IT User al ermin

KEY: l I

(')

<b-C>

Root

(Transput:er)

Processor

Process

L inks / Channels

Farm

I

Fig. 5. Logical and Physical Implementation of the Triple Chain Processor Farm.

showed near-linear speedups for up to 6 / 7 node farms, and no further decrease in execution time for larger farmsJ l For the larger farms, the granularity (the typical ratio of the size of a data packet to the processing required by that packet) of the application means that the communications capacity of the single root link is insufficient to provide the extra nodes with any computational work. The double/triple chains alleviate the problem by providing further links to the root, and hence (approximately) double/triple the number of processors that can be exploited. Table 2 reports results of an experiment using the triple chain, in which the speedup obtained in a database-level parallel implementation of the relaxation algorithm using 3 to 21 nodes is calculated for a selection of large data files. In all cases, speedup is comparable and near-linear, but becoming more sub-linear with larger farm sizes.

Both database and algorithmic level parallelism in the relaxation algorithm were implemented on the triple chain for maximum performance. Table 3 compares the execution times of both, and calculates the speedup obtained over the serial case.

The farm contained fifteen processors; other data recorded showed that in all cases the algorithmic level of parallelism made good use of only six of the processors, minor use of three others, and no use of the rest. In contrast, except for the cases where there were fewer than fifteen structures to be searched, all cases of the database level of parallelism were seen to make approximately even use of all available nodes.

These results reinforce the following guidelines. Algorithmic parallelism will always demand less overall use of the nodes in the processor farm for an iterative algorithm such as the relaxation algorithm, since the nodes of the farm are idle between iterations, but may produce speedups for individual searches. Database parallelism will always make better use of the farm and give better speedups for large database searches, but cannot provide any speedup for individual searches, and is thus less suited to very small databases (where the overall database search time is limited to the longest individual search). Finally, we conclude that the granularity obtained with algorithmic level parallelism in the relaxation algorithm, using the present set of data, is too fine to enable farm sizes greater than around nine nodes to be exploited. Coarser granularity, and

Chemical substructure searching algorithms

Table 2. Speedups Obtainable with the Triple Chain Processor Farm.

215

File 1 File 2 File 8 Elapsed Speedup Elapsed Speedup Elapsed Speedup

(sees) (sees) (sees)

File 11 Elapsed Speedup

(sees)

Serial 47.406 28.995 3.653 7.092

Nodes 3 15.919 2.98 9.732 2.98 1.244 2.94 6 8.146 5.82 4.984 5.82 0.642 5.69 9 5.671 8.36 3.545 8.18 0.456 8.01

12 4.297 11.03 2.681 10.81 0.355 10.29 15 3.796 12.49 2.403 12.06 0.301 12.14 18 3.085 15.37 2.094 13.84 0.256 14.27 21 2.815 16.87 2.078 13.95 0.242 15.10

2.397 2.96 1.222 5.80 0.846 8.38 0.665 0.66 0.557 2.73 0.485 4.62 0.425 6.69

Table 3. Comparison of Serial to Distributed Relaxation Algorithm.

File Size

Serial Database Parallel Algorithm Parallel

Elapsed Elapsed Speedup Elapsed Speedup (sees) (sees) (sees)

1 194 47.41 3.80 12.5 27.18 1.7 2 112 28.99 2.40 12.1 12.02 2.4 3 14 1.89 0.32 5.9 1.76 1.1 4 15 2.08 0.30 6.9 1.42 1.5 5 43 15.36 1.71 9.0 6.98 2.2 6 99 10.72 0.94 11.4 8.16 1.3 7 11 1.09 0.24 4.5 0.84 1.3 8 134 3.65 0.30 12.2 3.07 1.2 9 92 2.45 0.22 11.1 2.05 1.2

10 37 1.00 0.11 9.1 0.84 1.2 11 207 7.09 0.56 12.7 6.96 1.0 12 33 1.00 0.12 8.3 0.78 1.3 13 63 1.12 0.12 9.3 0.81 1.4 14 74 1.37 0.14 9.8 1.01 1.4 99i 99 42.43 4.56 9.3 34.19 1.2 99f 99 182.65 15.10 12.1 160.47 1.1

hence better performance including the exploitation of larger farms, may be obtained with more computationally intensive searches such as may be found when processing generic or 3-D structures. 15'16

216 G.M. DOWNS et al.

CONCLUSIONS

In serial implementations, the relaxation algorithm seems to be a more cost-effective screening algorithm than the set reduction algorithm.

Distributed implementation of the relaxation algorithm has been shown to be effective, although the parallelism present in the algorithmic level distribution is of too fine a granularity in this context to enable significant speedups to be obtained. Thus, for substructure searches of the sort considered here, a database parallel approach seems the most effective way of utilizing the power of microprocessor-based multiprocessing systems.

ACKNOWLEDGEMENTS

We thank the British Library Research and Development Department for financial support of this work under grant number SI/G/757, and Fraser Williams (Scientific Systems) Ltd. for providing us with the Fine Chemicals Directory and with connection table software.

APPENDIX. THE TRANSPUTER AND FURTHER PROCESSOR FARM DESIGNS

The transputer is a new high-performance generic component family developed especially for multi- processor systems. Specific devices contain some selection of the following architectural components: a high-performance (typically 10 Mips) 16 or 32 bit RISC central processor, including hardware support for simulated concurrency; fast on-chip RAM (typically 4 Kbytes); some number (typically two or four) of concurrent serial links of 5/10/20 Mbits/s speed for inter-device communication within the transputer family; and further on-chip, concurrent, special function coprocessors (e.g., floating point processor, disc controller).

Most important of all these features, the links enable transputer devices to be used as building blocks in the construction of low cost, high performance multi-processing systems. The classic von Neumann bottleneck found in bus-based multiprocessor systems is not a problem in transputer- based systems, since communication, as well as processing, is distributed. Furthermore, even simple single processor applications can contain limited concurrency; while the CPU is processing some data, a link may be concurrently transferring the next data item from disc to memory buffer.

The current T4 processor contains the following improvements over the T4 revisionA device used in this project: an extra 2 Kbytes of internal memory, faster and improved serial links (up to 1.7 Mbytes/s unidirectionally and 2.3 Mbytes/s bidirectionaUy at 20 Mbit/s link speed), and a higher maximum processor clock speed (typically 20 Mhz). In particular, the improved ratio of link communications to processor speed means that device is better able to process applications of higher granularity. In the context of processor farms this means: a more linear speedup should be observed at higher numbers of processors; and larger farms may be exploited before the communications bandwidth limit is once again reached.

Two other family devices of particular interest to this application are the T2 and M212 processors: The T2 processor is a 16 bit device available for approximately one quarter of the cost of the 32 bit T4.

The fast internal memory of 4 Kbytes is sufficient for simple farms with simple algorithms (e .g . , relaxation algorithm on the linear chains), and the use of large numbers of processors without extra external memory enables production of very high performance, inexpensive farms on small numbers of circuit boards.

The M212 is a T2 processor with the addition of a concurrent disc controller, at the expense of two links. A system similar to, but much cheaper than, the CAS ONLINE system can be envisaged, in which the query is distributed to the farm from the root processor, and each node searches a partition of the database held on a local hard disc.

Chemical substructure searching algorithms 217

REFERENCES

1. Ash, J, E.; Chubb, P. A.; Ward, S. E.; Welford, S. M.; Willett, P. Communication, Storage and Retrieval of Chemical Information; Ellis Horwood: Chichester; 1985.

2. WiUett, P. "A Review of Chemical Structure Retrieval Systems". J, Chemometrics 1987, 1,139-155. 3. Sussenguth, E. H. "A Graph-Theoretic Algorithm for Matching Chemical Structures". J. Chem. Docum.

1965, 5, 36-43. 4. Figueras, J. "Substructure Search by Set Reduction". J. Chem. Docum. 1972, 12,237-244. 5. Kitchen, L.; Krishnamurthy, E. V. "Fast, Parallel Relaxation Screening for Chemical Database Search".

J. Chem. Inform. Comput. Sci. 1982, 22, 44-48. 6. yon Scholley, A. "A Relaxation Algorithm for Generic Chemical Structure Screening". J. Chem. Inf.

Comput. Sci. 1984, 24, 235-241. 7. Gillet, V. J.; Welford, S. M.; Lynch, M. F.; WiUett, P.; Barnard, J. M.; Downs, G. M.; Manson, G. A.;

Thompson, J. "Computer Storage and Retrieval of Generic Chemical Structures in Patents. Part 7. Parallel Simulation of a Relaxation Algorithm for Chemical Substructure Search". J. Chem. Inform. Comput. Sci. 1986, 26, 118-126.

8. Dittmar, P. G.; Farmer, N. A.; Fisanick, W.; Haines, R. C.; Mockus, J. "The CAS ONLINE Search System. I. General System Design and Selection, Generation, and Use of Search Screens". J. Chem. Inform. Comput. Sci. 1983, 23, 93-102.

9. Wipke, W. T.; Rogers, D. "Rapid Subgraph Search Using Parallelism". J. Chem. Inform. Comput. Sci. 1984, 24, 255-262.

10. Brim, A. T.; Gillet, V. J.; Lynch, M. F.; Willett, P.; Manson, G.A.; Wilson, G.A. "Chemical Graph Matching Using Transputer Networks". Parallel Computing (in press).

11. Lynch, M. F.; Manson, G. A; Willett, P.; Wilson, G. A. The Application of Reconfigurable Microprocessors to Information Retrieval Problems. BLRDD: London, report no. 5941.

12. INMOS Ltd. Transputer Reference Manual; Prentice Hall International; 1988. 13. INMOS Ltd. Transputer Technical Notes; Prentice Hall International; 1988. 14. INMOS Ltd. Communicating Process Architecture; Prentice Hall International; 1988. 15. Barnard, J. M. Computer Handling of Generic Chemical Structures; Gower: Aldershot; 1984. 16. Brint, A. T.; Willett, P. "Pharmacophoric Pattern Matching in Files of 3-D Chemical Structures:

Comparison of Geometric Searching Algorithms". J. Molec. Graphics 1987, 5, 49-56.


Recommended