Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | debra-sutton |
View: | 25 times |
Download: | 1 times |
© 2008 ITA Software, Inc.
Lisp at ITA Software Inc.
Using Common Lisp in a high performance seach environment.
Martin Cracauer – ITA Software, Cambridge, MA
© 2008 ITA Software, Inc.
ITA – the ultimate cherry-pickers
We want all the features of dynamic programming languages ...
... but it all of it has to be gone by runtime.
3 Confidential
Why we have to be fast – part 1► Flights don't matter (much): Boston - Hamburg
4 Confidential
Why we have to be fast – part 1
Pricing insanity 1:- Boston to Hamburg: - no direct flights - but 1,574 direct fares that you can fill with flights - going through Amsterdam and using two fares (BOS->AMS, AMS->HAM) gets you 1700 * 290 fares = 0.5 million fares to consider, all of them you can fill with flights "freely" - going through London and Paris, using three fares: 2226 * 261 * 199 = 115 million fare combinations, all of them you can fill with flights "freely"
5 Confidential
Why we have to be fast – part 2
► Flights don't matter (much) – Easter Island to some tiny town at the tip of Denmark. You think they publish prices for that?
6 Confidential
Why we have to be fast – part 2
Easter Island:
- Island way out west of Chile- Population: 3,791 humans, 887 big stone figures (they don't fly)- Yesterday's flights: 1 in, 1 out [May 2006 data]- Day before yesterday: 0 (zero) flights- Prices directly published in database files: ==> 1432- Prices after expanding various forms of autogenerated fares: ==> 160,000- Prices published for travel from Easter Island to Aalborg, Denmark: ==> 57- Of these 57 prices to Aalborg, actually used when flying from Easter Island to Aalborg: ==> 0 (zero) That is because the cheapest routes always use price combinations through other cities. But we still have to look at those prices.
7 Confidential
The slides with the raw number dump... one
Going from Boston to L.A.:
- reasonable ways (flight combinations) to get there: - 10,000 (10^4)- reasonable ways to get back: - 10,000 (10^4)
- valid pricing solutions (as opposed to flights) for each of them (if you don't have complications): - 10,000 - 40,000 (10^4)==> 10^12 solutions
This is not like getting on the bus paying at the door.
8 Confidential
The slides with the raw number dump... two
What travel companies were doing before ITA (on mainframes):- 10 ways out (1 promille of useful pool)- 10 ways back (1 promille of useful pool)- 10,000 pricing solutions==> 10^6 solutions
What ITA is doing (on Linux PCs), simple case:- 400 ways out- 400 ways back- variable number of pricing solutions==> 10^9 solutions for “simple” search
(and this isn't looking at the more complex international pricing)
9 Confidential
The slides with the raw number dump... three -when things are not that “simple”Sometimes you just don't get away with slacking off with 10^9 solutions:
- Picking the worst of random 1,000 actual customer queries from ITA's website shows one complicated itinerary with 3 adults, 2 youth and 1 child: ==> 10^28 solutions. (we saved that family a whole lot of money!)- Manually constructing the worst we could we could (in one minute on a Pentium-III Xeon with 1 GHz and 2 GB RAM at the time): ==> 10^31 solutions.
“Solutions” here means verified flyable: every price in there checked to be allowed for this travel, all seats checked to be available.
Keep in mind that even in 2009 we only have 10^10 bytes of RAM to keep all this.
10 Confidential
ITA SoftwareProgramming language cherry-pickers, Inc
All the features of dynamic languages – and by the time it actually runs there should be no trace left.
Classic dynamic languages drawbacks (to kill one by one):
Mixed causes:- non-native code compilation [assumed solved – yeah, right]- run-time type checks- GC- producing lots of heap garbage where static languages don't (e.g. from bignum arithmetic)- unneccessary initialization of new data structures- too many function calls, no inlining, required proxy functions- mandatory array bounds checking
11 Confidential
ITA SoftwareProgramming language cherry-pickers, Inc
All the features of dynamic languages – and by the time it actually runs there should be no trace left.
Classic dynamic languages drawbacks (to kill one by one):
Memory:
- inefficient way to represent user-defined structs in memory- inefficient way to represent arrays in memory- [combination of the last two]- inability to have efficient sub-byte data types (bitfieds), either lack the capability entirely, or no fast bitfields
12 Confidential
ITA SoftwareProgramming language cherry-pickers, Inc
All the features of dynamic languages – and by the time it actually runs there should be no trace left.
Classic dynamic languages drawbacks (to kill one by one):
Memory, the outside world and C functions:
- inability to access C data wiithout proxy functions for conversion- inability to call C functions without proxy functions to convert arguments and return value
==> leading to inability to use mmap'ed data built externally (mmap == big deal for mostly readonly data users)
13 Confidential
High-performance Lisp at ITA Software – getting there
<== [whiteboard]