+ All Categories
Home > Documents > Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to...

Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to...

Date post: 03-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
30
www.exegy.com Exploiting Reconfigurability for Text Search Roger D. Chamberlain, Mark A. Franklin, and Ron S. Indeck Exegy Inc.
Transcript
Page 1: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

www.exegy.com

Exploiting Reconfigurabilityfor Text Search

Roger D. Chamberlain, Mark A. Franklin, and Ron S. IndeckExegy Inc.

Page 2: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Exegy TextMiner

Highly Optimized Data Pipeline from Input thru Output

Specialized Processing in Close

Proximity to Data

1-7 TB fast RAID;RAM / FPGA contiguous

Page 3: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Specialized Processing on Custom Board

FPGA accelerated custom board

• Permits massively parallel operations

• Offloads work from CPU

• Integrates with other system components enabling high-speed data ingress and egress

• Designed with common APIs to give user control of functionality

• Draws from a library of pre-defined modules used to perform certain operations

• New functional modules readily incorporated

Analogous to graphic accelerator cards

Page 4: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Exegy A2000 Appliance

processordiskcontroller

diskdata

toprocessor

configurationsubsystem

reconfigurablelogic

network

Page 5: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

TextMiner Application

• Searching through an unindexed text corpus for items of interest

• Example query(Cardinals NEAR[200] Baseball) AND

(Manchester NEAR[200] Soccer)“Cardinals” within 200 characters of “Baseball” and“Manchester” within 200 characters of “Soccer”

• Supported combining operators includeBoolean: AND, OR, NOTProximity: NEAR, ANDTHEN

Page 6: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Benefits of reconfiguration

formulateinitialquery

Page 7: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Benefits of reconfiguration

formulateinitialquery

Page 8: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Benefits of reconfiguration

formulateinitialquery

analyzequeryresults

Page 9: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Benefits of reconfiguration

formulaterevisedquery

analyzequeryresults

Page 10: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Query Options

Exact SearchLiteral keywords must match exactlyTens of thousands of keywords searched in one pass across the data

Approximate SearchWildcard charactersCase insensitivityCharacter substitution up to specified bound

Regular Expression SearchFull expressive power of finite-state machine recognizer

Page 11: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Exact Match Engine

StartupHash keywords to a bit vector in FPGARabin-Karp hash functions

RunStream text corpus from disk or network to FPGAHash text to bit vector positionCheck position for keyword hit

CheckFalse positives from hash collisions checked in software

Page 12: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Approximate Match Engine

• Data shift register receives inbound text

• Compared with keyword at character level

• Count of matching characters is checked with threshold

• If character matches exceed threshold, keyword is a match

h o r s e

h o u s e

= = = =

compareregister

fine-graincomparison

data shift register

count (4)

inputdata

word-levelcomparison

> threshold?

match signal

Page 13: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Regular Expression Engine

symbol encoding addr.logic

stateselection

logic

currentstate

regular expression compiler

indi

rect

ion

tabl

e

trans

ition

tabl

e

inpu

tda

ta

match signal

Page 14: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Regular Expression Engine

• Multi-character strings are combined into single symbol for finite state machine recognizer

symbol encoding addr.logic

stateselection

logic

currentstate

regular expression compiler

indi

rect

ion

tabl

e

trans

ition

tabl

e

inpu

tda

ta

match signal

Page 15: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Regular Expression Engine

• Multi-character strings are combined into single symbol for finite state machine recognizer

• State dependent transitions are deferred to end of pipeline

symbol encoding addr.logic

stateselection

logic

currentstate

regular expression compiler

indi

rect

ion

tabl

e

trans

ition

tabl

e

inpu

tda

ta

match signal

Page 16: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Combining Operations

• Combining operations implemented in software• Based on keyword hits from FPGA

NEAR NEAR

AND

Cardinals SoccerManchesterBaseball

(Cardinals NEAR Baseball) AND (Manchester NEAR Soccer)

Page 17: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Summary of 3 Hardware Search Engines

• Searching for individual terms, combining operations performed in software

• Three distinct engines supported:Exact match

Thousands of terms, 800 MB/s search rateApproximate match

Can trade off # of terms vs. characters per term, 800 MB/s search rate

Regular expression searchCapable of ~50 expressions, 400 MB/s search rate

• Data source(s) can be local or remote

Page 18: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Managing FPGA Configuration

Page 19: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Manage Configuration StoreOn-board non-volatile storage for configurationsSupports multiple configuration files

Manage DirectoryMeta-data for configurations in configuration store

Load FPGA as instructedReconfigure FPGA from specified config file currently in configuration store

ProtectionBlock data path during reconfigurationCheck configuration is appropriate for that FPGA

Supervisor Functions

Page 20: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Software Options: Insert

Place a configuration in the on-board store

Page 21: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Software Options: Insert

Place a configuration in the on-board store

Page 22: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Read Directory

Query current directory contents

Page 23: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Read Directory

Query current directory contents

Page 24: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Read Configuration

Primarily for verification and debugging purposes

Page 25: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Read Configuration

Primarily for verification and debugging purposes

Page 26: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Load

Reconfigure FPGA

Page 27: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

ConfigurationFiles

User Data

SupervisorCPLD

ConfigurationStore Directory Application

FPGA(s)

Results

Load

Reconfigure FPGA

Page 28: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Back to Text Search Application

Sequence of events:1. User provides query and initiates search2. Examining search terms, software selects

appropriate engine3. Load configuration in FPGA, concurrently

queue up data from source4. Load search terms into engine5. Stream data through engine6. Process hits that return, performing

combining operations7. Return results to user

Page 29: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Comments

Benefits• Application software chooses appropriate

FPGA engine• Engine is tailored to problem at handConcerns• Heterogeneous query

Requires multiple engines or multiple data passes• Configuration overhead

20 ms is longer than we would likeHowever, it’s not out of line with startup times required for disk access

Page 30: Exploiting Reconfigurability for Text Search€¦ · Stream text corpus from disk or network to FPGA Hash text to bit vector position ... • Multi-character strings are combined

Summary

• Exegy A2000 appliance supports dynamic reconfiguration of application FPGAs

• Exegy TextMiner application exploits dynamic reconfiguration for text search

• 3 distinct search engines: exact, approximate, and regular expression

• FPGA configuration is concurrent with initial data reads to mask latency

• Result is a true exploitation of the physical ability to reconfigure FPGAs on the fly


Recommended