+ All Categories
Home > Documents > Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc...

Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc...

Date post: 28-Mar-2015
Category:
Upload: justin-andersen
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
rching very large bod data using a transpar peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data [email protected] [email protected] Albertosaurus sacophagus skull modified from Carr 1999. (Not relevant to the talk, but pretty.)
Transcript
Page 1: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Searching very large bodiesof data using a transparent

peer-to-peer proxyMike Taylor and Marc Cromme, Index Data

[email protected]@indexdata.dk

Albertosaurus sacophagus skull modified from Carr 1999.(Not relevant to the talk, but pretty.)

Page 2: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Overview

Where we're headed in the next half-hour:

The problemStandardised semantically rich search-and-retrieve protocols

ANSI/NISO Z39.50SRU

Transparent protocol proxiesFan-out proxies, singly and in combinationPeer-to-peer proxiesOperation of the peer-to-peer proxy networkUsing the peer-to-peer proxy in AlvisConclusions

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Page 3: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

The Problem

The key advantage of the Internet is distribution– That's why there is so much information out there.The key problem of the Internet is aggregation– That's why it's so darned hard to find anything!

How can we get at all that tasty data?

Monolithic systems can only get us so far.Even Google – with its huge index – is limited by its inability toprobe into the “deep web”. It is limited to dumb screen-scraping.

We propose a solution made up of many autonomous nodes.We will approach this in several steps.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Page 4: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Step 1: standardised search-and-retrieve protocols

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Z39.50 client

Z39.50

Library of CongressZ39.50 server

Page 5: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

British LibraryZ39.50 server

Library of CongressZ39.50 server

Step 1: standardised search-and-retrieve protocols

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Z39.50 client

Z39.50

Page 6: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Step 1: standardised search-and-retrieve protocols

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Z39.50 client

Z39.50

Library of CongressZ39.50 server

British LibraryZ39.50 server

Local catalogueZ39.50 server

Page 7: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Step 1: standardised search-and-retrieve protocols

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressZ39.50 server

MetasearchingZ39.50 client

Z39.50

British LibraryZ39.50 server

Local catalogueZ39.50 server

Z39.50 Z39.50

This is possible becauseof the semantic alignmentof the servers.

Page 8: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

So can Z39.50 save the world?

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Page 9: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

No.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Then the serpent saith unto Adam, “Lo, why doth thineinformation service not use XML?” And Adam saith, “Verily,Z39.50 worketh just fine.” But the serpent, who was subtleof tongue, saith unto him, “But XML is more fashionable.”And, behold, Adam was deceived, and did fall.

– The Book of Standards, ch. 3, v. 4-6.

Page 10: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Welcome to the 21st Century

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressZ39.50 server

MetasearchingZ39.50 client

Z39.50

British LibraryZ39.50 server

Local catalogueZ39.50 server

Z39.50 Z39.50

Everything

Everything

must be XML

must be XML

Page 11: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Welcome to the 21st Century

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressZ39.50 server

MetasearchingZ39.50 client

Z39.50

British LibraryZ39.50 server

Local catalogueZ39.50 server

Z39.50 Z39.50Resistance

Resistance

is useless!

is useless!

Page 12: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

XML-based search-and-retrieve protocols

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

The binary Z39.50 protocol is superseded by SRU.(Search/Retrieve by Url). This is a NISO-registeredstandard for expressing queries using rich URLs, to obtainXML responses that contain records matching the query.

http://sru.miketaylor.org.uk/sru.pl?version=1.1&operation=searchRetrieve&query=dinosaur&startRecord=1&maximumRecords=1&recordSchema=dc

Page 13: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

An SRU response (single DC record)

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

<?xml version="1.0"?><zs:searchRetrieveResponse xmlns:zs='http://www.loc.gov/zing/srw/'> <zs:version>1.1</zs:version> <zs:numberOfRecords>29</zs:numberOfRecords> <zs:records> <zs:record> <zs:recordSchema>info:srw/schema/1/dc-v1.1</zs:recordSchema> <zs:recordPacking>xml</zs:recordPacking> <zs:recordPosition>1</zs:recordPosition> <zs:recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> <title>Fossils</title> <creator>Lappi, Megan.</creator> <type>text</type> <publisher>New York, NY: Weigl Publishers</publisher> <date>2005</date> <language>en</language> <description>Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints</description> <identifier>http://www.loc.gov/catdir/toc/ecip0415/2004004136.html</identifier> <identifier>URN:ISBN:1590362136</identifier> </srw_dc:dc> </zs:recordData> </zs:record> </zs:records></zs:searchRetrieveResponse>

Page 14: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

An SRU response (single DC record)

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

<?xml version="1.0"?><zs:searchRetrieveResponse xmlns:zs='http://www.loc.gov/zing/srw/'> <zs:version>1.1</zs:version> <zs:numberOfRecords>29</zs:numberOfRecords> <zs:records> <zs:record> <zs:recordSchema>info:srw/schema/1/dc-v1.1</zs:recordSchema> <zs:recordPacking>xml</zs:recordPacking> <zs:recordPosition>1</zs:recordPosition> <zs:recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> <title>Fossils</title> <creator>Lappi, Megan.</creator> <type>text</type> <publisher>New York, NY: Weigl Publishers</publisher> <date>2005</date> <language>en</language> <description>Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints</description> <identifier>http://www.loc.gov/catdir/toc/ecip0415/2004004136.html</identifier> <identifier>URN:ISBN:1590362136</identifier> </srw_dc:dc> </zs:recordData> </zs:record> </zs:records></zs:searchRetrieveResponse>

Page 15: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

An SRU response (single DC record)

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

<?xml version="1.0"?><zs:searchRetrieveResponse xmlns:zs='http://www.loc.gov/zing/srw/'> <zs:version>1.1</zs:version> <zs:numberOfRecords>29</zs:numberOfRecords> <zs:records> <zs:record> <zs:recordSchema>info:srw/schema/1/dc-v1.1</zs:recordSchema> <zs:recordPacking>xml</zs:recordPacking> <zs:recordPosition>1</zs:recordPosition> <zs:recordData> <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> <title>Fossils</title> <creator>Lappi, Megan.</creator> <type>text</type> <publisher>New York, NY: Weigl Publishers</publisher> <date>2005</date> <language>en</language> <description>Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints</description> <identifier>http://www.loc.gov/catdir/toc/ecip0415/2004004136.html</identifier> <identifier>URN:ISBN:1590362136</identifier> </srw_dc:dc> </zs:recordData> </zs:record> </zs:records></zs:searchRetrieveResponse>

Page 16: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

So we can go back to doing what we did before

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressZ39.50 server

MetasearchingZ39.50 client

Z39.50

British LibraryZ39.50 server

Local catalogueZ39.50 server

Z39.50 Z39.50

Page 17: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

So we can go back to doing what we did before

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressZ39.50 server

MetasearchingZ39.50 client

Z39.50

British LibraryZ39.50 server

Local catalogueZ39.50 server

Z39.50 Z39.50

Page 18: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

So we can go back to doing what we did before

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressSRU server

MetasearchingSRU client

SRU

British LibrarySRU server

Local catalogueSRU server

SRU SRU

SRU gives us the samesemantic alignmentas Z39.50.

Page 19: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

SRU's query language: CQL

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

CQL (Common Query Language) is used by SRU.It may also be used in other contexts (including Z39.50).

Its syntax is easy to learn, but very expressive.

dinosaurtitle=dinosaurtitle=(dinosaur or pterosaur) and author=martilldc.title=*saur and dc.author=martilltitle exact "the complete dinosaur" and date < 2000name=/phonetic "smith"fish prox/distance<3/unit=sentence frog

Page 20: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

2. Transparent protocol proxies

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Just a Squid acts as a proxy for the dumb HTTP protocol, so we canhave proxies for semantically rich search-and-retrieve protocols.

YAZ Proxy is one such – http://indexdata.com/yazproxy

Because the protocol is rich, the proxy can do more than Squid:Performance improvements:

Cache and re-use intialised sessionsCache and re-use search resultsCache and re-use fetched records

Server protection:Query sanitisation (for broken servers ... you know who you are)Client throttling, based on request frequency or bandwidth

Protocol-level and application-level logging.

Page 21: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

2. Transparent protocol proxies

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressSRU server

SRU proxy

SRU client

SRU

SRU

Page 22: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

3. Fan-out proxies – free metasearching for simple clients

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Library of CongressSRU server

MetasearchingSRU proxy

SRU

British LibrarySRU server

Local catalogueSRU server

SRU SRU

SRU client

SRU

The client knows nothingabout what is happeningto its innocent requests.

All the metasearchingintelligence goes here.

Page 23: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

3. Cascading fan-out proxies

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Server 1

Proxy 1

Proxy 2 Proxy 3

Client

Server 5 Server 6Server 3Server 2 Server 4

Page 24: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Hey, go nuts

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Server

Proxy

Proxy Proxy

Client

Server ProxyServerProxy Server

ServerServer Proxy

ServerProxy Proxy

ProxyServer Server

ServerServer Server

ServerServer Server Serveretc., etc., etc. ...

Page 25: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Why this doesn't actually work

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Scaling problems!

Every proxy must be administrated:Information about searched resources kept up to dateProxies must be kept running– a single failure knocks out a whole subtree

Load on serversEvery server is visited by every queryWhat happens when a proxy calls another proxy higher up the tree?

Loop detection is difficult in protocols such as Z39.50 and SRU.

Page 26: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

4. The peer-to-peer proxy

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Another appropriateSRU server

SRU

Some appropriateSRU server

Really coolSRU server

SRU SRU

SRU client

SRU

Big cloud of peers, acting as a proxy

Page 27: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

What's going on here?

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Life is simple at the edges of the cloud:

SRU clients connect to peers that act as SRU serversSRU servers respond to requests from peers that act as SRU clients

This means that off-the-shelf SRU clients and servers can be used.Web-based SRU clients can be redeployedServers such as the Library of Congress catalogue are availableYou can use our free Z39.50/SRU-enabled XML database, Zebrahttp://indexdata.com/zebra

Although the cloud has its own structure, it is opaque to the clients andservers at the edge.

Page 28: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

What's going on here?

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Life is a little more complex within the cloud:

Peers must communicate between themselves using a P2P protocolPeers associated with a server must also make SRU requestsPeers associated with a client must also handle SRU requests

Each peer may act for a client, a server, both, or neither:“Client” peers are the entry-points into the P2P cloud“Server” peers actually get the job of searching done“Servent” peers behave as both clients and serversSome peers may participate in the network for routing purposes only.

Clearly the “servent” peer is the general case that all the other specialise.This only needs to be written once.

Page 29: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Why this rocks

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

The separation between the edges and the cloud is very important.

SRU clients and servers are easy to writeThere will be lots of them out there:

Clients providing many different user interfacesServers providing access to many different collections

“Servent” peers are difficult to writeBut that's OK, because only one such peer need ever be builtMany instances make up a single peer-to-peer proxy cloudThe cloud can be used by many clients, and can use many servers

Client and server writers don't have to think about the hard stuff.

Page 30: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

5. Operation of the peer-to-peer proxy network

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

What goes on inside the mysterious cloud?

There is a dedicated peer-to-peer protocol used to:Introduce a new peer to the networkWelcome a new peer with an initial list of neighboursPass queries in a chain between peersReturn search results back along the chain

We won't cover the protocol in detail here, but:Many P2P protocols have “Ping” and “Pong” messages for introductionsIn ours, we have different messages:

New peers cry “Cathy!” when they nuzzle up the networkExisting peers respond with a cry of “Heathcliff!”

Page 31: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Everybody needs good neighbours

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Key principle: the network contains NO global information.(So there is no single point of failure.)

Each peer knows only about a few “nearby” peers – its “neighbours”Old neighbours are dropped from the pool if they don't prove usefulNew neighbours are discovered in search responses:

Peer A forwards a query to its neighbour Peer BPeer B can't answer it, so it forwards it to its neighbour, Peer XPeer X responds with useful informationPeer A accepts this response, and remembers Peer X for next time.

The “usefulness” of peers may be judged relative to specific subject areasrather than with a single absolute score.

Page 32: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

SRU client

SRU

Everybody needs good neighbours

Peer A

Page 33: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

SRU client

SRU

Everybody needs good neighbours

Peer BPeer ACan you helpme with this?

Page 34: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

SRU client

SRU

Everybody needs good neighbours

Peer B Peer XPeer ACan you helpme with this?

No, but I'llask my friend.

Page 35: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

SRU

Dinosaur DataSRU server

SRU client

SRU

Everybody needs good neighbours

Peer B Peer XPeer ACan you helpme with this?

No, but I'llask my friend.

Page 36: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

SRU

Dinosaur DataSRU server

SRU client

SRU

Everybody needs good neighbours

Peer B Peer XPeer ACan you helpme with this?

No, but I'llask my friend.

Here you go, pal.

Page 37: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

SRU

Dinosaur DataSRU server

SRU client

SRU

Everybody needs good neighbours

Peer B Peer XPeer ACan you helpme with this?

No, but I'llask my friend.

Here you go, pal.

(I'll remember that.)

Page 38: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Why this rocks, part II

The peer-to-peer network has the following desirable properties:

joinable – easy for new peers to join the networkadaptive – the system evolves to improve through timeautonomous – each peer can have its own strategies and policiesrobust – can cope seamlessly with holes appearing in the networkefficient – generates minimal network traffictunable – has parameters that we can play with“ecologically diverse” – different peers may be tuned differently

Most of these properties are related to the key issue:NO GLOBAL KNOWLEDGE!

Page 39: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Life history of a query

Queries can't be allowed to wander around the network forever.Each query begins with a certain lifespan.Each peer that accepts a query decrements its lifespan by one.The remaining lifespan travels with the query to other peers.

If a query is passed to multiple peers, the lifespan is divided between them:It might be divided equally between two relevant neighboursIt might be split between many neighboursIt might be allocated to a single promising neighbour

When a query's lifespan is expired, the peer may not propagate it further.

Page 40: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

Ecological diversity

Different peers will have different strategies for propagating queries.Some will tend to fan out in a broad but shallow patternSome will produce tend to pass almost all lifespan to a single neighbour.Some will behave differently depending on the query

We hope that diversity of peer strategies will help make the network robust.

A query may carry with it a hint about how it likes its lifetime to be spent.“Tracer bullet” queries have a long, thin trajectory, then fan out.This is a useful way to periodically probe remote parts of the network,

in order to discover new and relevant neighbours.

Page 41: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

6. Using the peer-to-peer proxy in Alvis

Alvis is an ongoing European collaborative project to build what the proposaldocument calls a “Superpeer semantic search engine”.

Named after the dwarf Alvis (“all-wise”) from Norse mythology, who answeredThor's questions all night ... (And then turned to stone when the sun rose.)

The Alvis “superpeers” are what we just call “peers” in this presentation.

This is because Alvis also has another whole layer of peers. These implementa distributed hash table (DHT) of individual keys to find a suitable entry-pointto the superpeer network.

Testing will show us how much this optimisation buys us.

Page 42: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Transparent peer-to-peer proxy Mike Taylor, Index Data <[email protected]>

7. Conclusions

Standardised search-and-retrieve protocols facilitate interoperability.A well-defined protocol can be proxied.Proxies may transparently perform many different services.They may perform metasearching (“fan-out proxy”).Metasearching proxies may be cascaded.In practice, such cascades are hard to maintain, and scale poorly.Instead, an entire cloud of peers may function as a proxy.Queries are routed through the cloud to reach the most appropriate servers.Neither clients nor servers need know anything at all about the proxy.Tunable parameters allow us to tweak performance.The European project Alvis is built on such a peer-to-peer proxy.We want to see this kind of network running “in the wild” with many nodes.

Page 43: Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data mike@indexdata.com marc@indexdata.dk.

Thanks for listening!

Mike Taylor and Marc Cromme, Index [email protected]@indexdata.dk

Albertosaurus sacophagus skull modified from Carr 1999.(We should all take the time to look at more dinosaurs.)


Recommended