- 1. Thesis presentation Yakham NDIAYE November, 13 the2001
Interoperability of a Scalable Distributed Data Manager with an
Object-relational DBMS
2.
- Develop techniques for the interoperability of a DBMS with an
external SDDS file.
- Examine various architectural issues, making such a coupling
the most efficient.
- Validate our technical choices by the prototyping and the
experimental performances analysis.
- Our approach is at the crossing the main memory DBMS, the
object-relational-DBMS with the foreign functions, and the
distributed/parallel DBMS.
Objective 3.
- Coupling SDDS and AMOS-II
Plan 4. Multicomputers
- A collection of loosely coupled computers
-
- Computers inter-connected by high-speed local area
networks.
-
- offers potentially storage and processing capabilities rivaling
a supercomputer at a fraction of the cost.
- New architectural concepts
-
- offer to applications the cumulated CPU and storage
capabilities of a large number of inter-connected computers.
5.
- New data structures specifically for Multicomputers
-
-
- parallel scans & function shipping
- Overflowing servers split into new servers
-
- - appended to the file without informing the clients
- Queries come from multiple autonomous clients
-
- -Not using any centralized directory for access
computations
- See for more : http://ceria.dauphine.fr
SDDS 6.
- AMOS-II: A ctiveM ediatingO bjectS ystem
- Amain memory database system .
- Declarative query language :AMOSQL .
- External data sources capability.
- External program interfaces AMOS-II using :
-
- - Call-level interface (call-in)
-
- - Foreign functions (call-out)
- See the AMOS-II page for more:
- http://www.dis.uu.se/~udbl/
AMOS-II DBMS 7.
- IBM object-relational DBMS
- Typical representative of a commercial relational-object DBMS
.
- Capabilities to handle external data through the user-defined
functions (UDF) .
DB2 Universal Database 8. Coupling Strategies
- - for a scalable RAM file supporting database queries
- - Use aDBMS for manipulations best handled through by the query
language;
- - Direct fast data access for manipulations not supported well,
or at all, by a DBMS;
- - Distributed queries processing with functions shipping .
9. AMOS-SDDS System AMOS-SDDS scalable parallel query processing
10. Coupling Strategies
- - UsesAMOS-IIas the memory manager at each SDDS storage
site;
- - Scalable generalization of a parallel DBMS ;
- - D ata partitioning becomes dynamic .
11. SD-AMOS System SD- AMOS scalable parallel query processing
12. Couplage SDDS & DB2
- - C oupling of a DBMS with an external data repository with
direct fast data access .
- - Use of a SDDS file by a DBMS like an external data
repository.
- - Offer to the user an interface more elaborate than that of
SDDS manager, in particular by his query language.
13. Coupling SDDS & DB2 DB2-SDDSOverall Architecture
Register a user-defined external table function: CREATE FUNCTION
scan(Varchar(20)) RETURNS TABLE (ssn integer, name Varchar(20),
city Varchar(20)) EXTERNAL NAME interface !fullscan' 14. Coupling
SDDS & DB2Foreign functions to access SDDS records from DB2 :
range (cleMin, cleMax) -> liste enregistrements dont cleMin <
cl < cleMax scan( nom_fichier ) -> liste de tous les
enregistrements du fichierSample queries : -Parallel scan All SDDS
records. select * from table( scan(fichier) )as table_sdds(SSN,
NAME,CITY) -Range query SDDS records where key between 1 and 100.
select * from table( range(1, 100) )as table_sdds(SSN, NAME,CITY)
order by Name 15.
- Six Pentium III 700 MHz with 256 MB of RAM running Windows
2000
- On a 100Mbit/s Ethernet network.
- One site is used as Client and the five other as Servers
- We run many servers at the same machine(up to 3 per machine)
.
- File scaled from 1 to 15 servers .
The Hardware 16.
-
- Table Person (SS#, Name, City) .
-
- Size 20,000 to 300,000 tuples of 25 bytes .
- Benchmark query : couples of persons in the same city
-
- Query 1,the file resides at a single AMOS-II.
-
- Query 2,the file resides atAMOS-SDDS.
-
- Join evaluation : Two strategies.
- Processing time of aggregate functions
Benchmark queries 17. Server Query Processing
-
- Data stay external to AMOS
-
- Custom foreign functions perform the query
-
- Data are dynamically imported into AMOS-II
-
-
- Possibly with the local index creation
-
-
- Deleted after the processing
18. Speed-up Elapsed time of Query 2 according to the
strategyfor a file of 20,000 records, distributed over1 to 5
servers. I-Strategy for Query 2: elapsed timeE-Strategy for Query
2: elapsed time Elapsed time per tuple of Query 2 according to the
strategy Server nodes 1 2 3 4 5 Elapsed time(s) 1,344 681 468 358
288 Time per tuple (ms) 67.2 34 23.4 17.9 14.4 Serveur nodes 1 2 3
4 5 Nested-loop(s) 128 78 64 55 48 Index lookup(s) 60 39 37 36 32
19.
- The results showed an important advantage
ofI-StrategyonE-Strategyfor the evaluation of the join query.
- For 5 servers, the rate is 6 times for the nested loop, and 9
times if an index is creates.
- The favorable result makes us study the scale-up
characteristics of AMOS-SDDS on a file that scales up to 300,000
tuples.
Discussion 20. Scaling the number of servers Elapsed time of
join queries to AMOS-SDDS Q1=AMOS-SDDSjoin;Q2=AMOS-SDDSjoin with
count. Time per tuple(extrapolated for AMOS-SDDS) File size 20,000
60,000 100,000 160,000 200,000 240,000 300,000 # SDDS servers 1 3 5
8 10 12 15 Q1(ms) 3.05 5.02 6.84 11.36 12.77 16.25 18.55 Q2(ms)
2.55 3.08 3.35 6.16 6.39 8.43 8.75 Q1w. extrap. (ms) 3.05 5.02 6.84
8.28 9.6 10.64 12.72 Q2w. extrap.(ms) 2.55 3.08 3.35 3.11 3.2 2.84
2.94 AMOS-II(ms) 2.30 7.17 12.01 19.41 24.12 2 9.08 36.44 21.
Scaling the number of servers
- Results are extrapolated to 1 server per machine.
- -Basically, the CPU component of the elapsed time is divided by
3
- The extrapolation of the processing time of the join query
withcountshows a linear scalability of the system.
- Processing time per tuple remainsconstant (2.94ms) when the
file size and the number of servers increase by the same
factor.
Expected time per tuple of join queries to AMOS-SDDS 22.
Aggregate Functioncount Elapsed time of aggregate function
CountElapsed times for AMOS-II = 280ms Elapsed time of aggregate
functionsCountunder AMOS-SDDS Elapsed time over 100,000-tuple file
on AMOS-SDDS # servers 1 2 3 4 5 E-Stratgie (ms) 10 10 10 10 10
I-Stratgie (ms) 1,462 761 511 440 341 23. Aggregate Functionmax
Elapsed time of aggregate function MaxElapsed times for AMOS-II =
471ms Elapsed time over 100,000-tuple file on AMOS-SDDS Elapsed
time of aggregate functionsMaxunder AMOS-SDDS #servers 1 2 3 4 5
I-Stratgie (ms) 420 210 140 110 90 I-Stratgie (ms) 1,663 831 561
491 390 24.
- Contrary to the join query, the external strategy is gaining
for the evaluation of aggregate functions.
- Forcountfunction,improvement is about 34 times .
- Formaxfunction,improvement is about 4 times .
- Due to the importation cost and to a SDDS property : the
current number of records is a parameter of a bucket.
- LinearSpeed-up: processing time decreases with the number of
servers.
- The use of the external functions can thus be very advantageous
for certain kind of operations.
Discussion 25. SD-AMOS performance measurements Creation time of
3,000,000 records file. The bucket size is 750,000 records of 100
bytes Global and moving average insertion time of a record 26.
SD-AMOS performance measurements Elapsed time of range queryAverage
time per tuple 27.
- The average insertion time of a record with the splits is of
0.15ms .
- The average access time to a record on a distributed file is of
0.12ms .
- -Itis 100 times faster than that with a traditional file on
disc .
- Linear scalability: The insertion time and the access timeper
tuple remains constant when the file size and the number of servers
increase .
Discussion 28. DB2-SDDS performance measurements Elapsed time of
range query Time per tuple (i) access time to the data in a DB2
table, (ii) access time to SDDS file from the DB2 external
functions (DB2-SDDS) and (iii) direct access time to SDDS file from
a SDDS client. 29.
- Access time to SDDS file is much faster than the access time to
a DB2 table: 0.02ms versus 0.07ms.
- Access time to external data from DB2 (0.08ms), is less fast
than the access to the internal data (0.07ms) .
-
- -fastdirect access to the data
-
- -through the DBMS, access by the query language
Discussion 30.
- We have coupled a SDDS manager with a main-memory DBMS AMOS-II
and DB2to improve the current technologies for high-performance
databases and for the coupling with external data
repositories.
- The experiments we have reported in the Thesis prove the
efficiency of the system.
- AMOS-SDDS et DB2-SDDS :use of a SDDS file by a DBMS and the
parallel query processing on the server sites .
- SD-AMOS : appears as a scalable generalisation of a
parallelmain-memoryDBMS where the data partitioning becomes
automatic.
Conclusion 31.
- Other types of DBMS queries.
- Client's scalable distributed query decomposer.
- challenging appears the design of a scalable distributed
queryoptimizerhandling the dynamic data partitioning .
Future Work 32. End Thank You for Your Attention CERIA Universit
Paris IX Dauphine[email_address]