Date post: | 21-Apr-2017 |
Category: |
Internet |
Upload: | kazuho-oku |
View: | 24,461 times |
Download: | 0 times |
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Developing the fastest HTTP/2 server
DeNA Co., Ltd.Kazuho Oku
1
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Who am I?
n Kazuho Okun Major works:
⁃ Palmscape / Xiino (web browser for Palm OS)• awarded M.I.T. TR 100/2002
⁃ Mitoh project 2004 super creator⁃ Q4M (message queue plugin for MySQL)• MySQL Conference Community Awards 2011
⁃ H2O (HTTP/2 server)• Japan OSS Contribution Award 2015
2Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Background
3Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Responsiveness is important
4Developing the fastest HTTP/2 server
source:h@p://radar.oreilly.com/2009/06/bing-and-google-agree-slow-pag.html
n 500ms increase → -1.2% revenue
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Increasing size and # of requests
5Developing the fastest HTTP/2 server
source:h@p://[email protected]/trends.php?s=All&minlabel=Aug+1+2011&maxlabel=Aug+1+2015#bytesTotal&reqTotal
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Bandwidth is also increasing
n end-usersʼ B/W increase 50% every year (Nielsenʼs Law)
6Developing the fastest HTTP/2 server
source:h@p://www.nngroup.com/arRcles/law-of-bandwidth/
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
More bandwidth doesnʼt matter
7Developing the fastest HTTP/2 server
source:MoreBandwidthDoesn'tMa@er-2011MikeBelshe(Google)
* effective B/W reaches ceiling at around 1.6Mbps
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Latency is the new bottleneck
8Developing the fastest HTTP/2 server
source:MoreBandwidthDoesn'tMa@er-2011MikeBelshe(Google)
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Latency cannot be optimized
n latency = speed of light⁃ round-trip bet. Japan and US: 80ms
n mobile carriers have huge latency⁃ LTE ~ 50ms
n the Web is becoming more and more complex
9Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Web is becoming slower ... unless we do something.
10Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Solution: new protocol
11Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
HTTP/2!
12Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
The reasons HTTP/1.1 is slow
n concurrency is too small⁃ multiple round-trips required when issuing many
requestsn no prioritization between. requests
⁃ can suspend HTML / image streams in favor of CSS / JS
n big request / response headers⁃ typically hundreds of octets⁃ becomes an overhead when issuing many reqs.
13Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
HTTP/2
n RFC 7540 (2015/5)⁃ based on SPDY by Google
n key features:⁃ binary protocol⁃ header compression⁃ multiplexing⁃ prioritization
14Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Benchmark
n red bar: time spent until first-paintn big difference bet. server implementations
n reason: quality of prioritization logicn H2O shows the true potential of HTTP/2
15Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Have we reached the limit?
16Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Letʼs consider what would be the ideal HTTP flow.
17Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
TCP slow start
n Initial Congestion Window (IW)=10⁃ only 10 packets can be sent in first RTT⁃ used to be IW=3
n window increase: 1.5x/RTT
18Developing the fastest HTTP/2 server
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
1 2 3 4 5 6 7 8
bytestransmi,ed
RTT
TCPslowstart(IW10,MSS1460)
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Flow of the ideal HTTP
n fastest within the limits of TCP/IPn receive a request 0-RTT, and:
⁃ first send CSS/JS*⁃ then send the HTML⁃ then send the images*
*: but only the ones not cached by the browser
19Developing the fastest HTTP/2 server
client server
1RT
T
request
response
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
The reality in HTTP/2
n TCP establishment: +1 RTTn TLS handshake: +2 RTT*n HTML fetch: +1 RTTn JS,CSS fetch: +2 RTT**
n Total: 6 RTT
*: 1 RTT on reconnection**: servers often cannot switch to sending JS,CSS instantly, due to the output buffered in TCP send buffer
20Developing the fastest HTTP/2 server
client server
1RT
T
TCPSYN
TCPSYNACK
TLSHandshake
TLSHandshake
TLSHandshake
TLSHandshake
GET/
HTML
GETcss,js
css,js〜〜
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Ongoing optimizations
n TCP Fast Open⁃ connection establishment in 0 RTT
n TLS 1.3⁃ initial handshake complete in 1 RTT⁃ resumption in 0 RTT
n what can be done in the HTTP/2 layer?
21Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Further optimizations in HTTP/2 layer
n optimize TCP for responsivenessn Cache-aware server push
22Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Optimizing TCP for responsiveness
23Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Typical sequence of HTTP/2
24Developing the fastest HTTP/2 server
HTTP/2 200 OK
<!DOCTYPE HTML>…<SCRIPT SRC=”jquery.js”>…
client server
GET /
GET /jquery.js
needtoswitchsendingfromHTMLtoJSatthisverymoment(meansthatamountofdatasentin*mustbesmallerthanIW)
1RTT
*
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Buffering in TCP and TLS layer
25Developing the fastest HTTP/2 server
TCPsendbuffer
CWNDunacked pollthreshold
BIObuf.
// ordinary code (non-blocking)while (SSL_write(…) != SSL_ERR_WANT_WRITE) ;
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Why do we have buffers?
26Developing the fastest HTTP/2 server
n TCP send buffer:⁃ reduce ping-pong bet. kernel and application
n BIO buffer:⁃ for data that couldnʼt be stored in TCP send buffer
TCPsendbuffer
CWNDunacked pollthreshold
BIObuf.
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Improvement: poll-then-write
27Developing the fastest HTTP/2 server
TCPsendbuffer
CWNDunacked pollthreshold
// only call SSL_write when polls notifies the app.while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…);
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Adjust poll threshold
28Developing the fastest HTTP/2 server
TCPsendbuffer
CWNDunacked pollthreshold
n set poll threshold to the end of CWND?⁃ setsockopt(TCP_NOTSENT_LOWAT)⁃ in linux, the minimum is CWND + 1 octet• becomes unstable when set to CWND + 0
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Adjust poll threshold
29Developing the fastest HTTP/2 server
CWNDunacked pollthreshold
// only call SSL_write when polls notifies the app.while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…);
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
TCPsendbuffer
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Further improvement: read TCP states
30Developing the fastest HTTP/2 server
CWNDunacked pollthreshold
// calc size of data to send by calling getsockopt(TCP_INFO)if (poll_for_write(fd) == SOCKET_IS_READY) { capacity = CWND + unacked + ONE_MSS - TLS_overhead; SSL_write(prepare_http2_frames(capacity));}
TLSRecords
sentimmediately notimmediatelysent
HTTP/2frames
TCPsendbuffer
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Issues in the proposed approach
n increased delay bet. ACK recv. → data send⁃ leads to slower peak speed⁃ reason:• traditional approach: completes within kernel• this approach: application needs to be notified to
generate new datan solution:
⁃ use the approach only when necessary• i.e. when RTT is big and CWND is small• increased delay can be ignored if: delay << RTT
31Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Code for calculating size of data to sendsize_t get_suggested_write_size() { getsockopt(fd, IPPROTO_TCP, TCP_INFO, &tcp_info, sizeof(tcp_info)); if (tcp_info.tcpi_rtt < min_rtt || tcp_info.tcpi_snd_cwnd > max_cwnd) return UNKNOWN;
switch (SSL_get_current_cipher(ssl)->id) { case TLS1_CK_RSA_WITH_AES_128_GCM_SHA256: case …: tls_overhead = 5 + 8 + 16; break; default: return UNKNOWN; }
packets_sendable = tcp_info.tcpi_snd_cwnd > tcp_info.tcpi_unacked ? tcp_info.tcpi_snd_cwnd - tcp_info.tcpi_unacked : 0; return (packets_sendable + 1) * (tcp_info.tcpi_snd_mss - tls_overhead);}
32Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Benchmark
33Developing the fastest HTTP/2 server
n conditions:⁃ server in Ireland, client in Japan (RTT 250ms)⁃ load tiny js at the top of a large HTML
n result: delay decreased from 511ms to 250ms⁃ i.e. JS fetch latency was 2RTT, became 1 RTT• similar results in other environments
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Conclusion
n near-optimal result can be achieved⁃ by adjusting poll threshold and reading TCP
states⁃ 1-packet overhead due to restriction in Linux
kerneln 1-RTT improvement in H2O
⁃ estimated 1-RTT improvement per the depth of the load graph
34Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Same problem exists with load balancers
n L4 L/B or TLS terminator also act as buffers⁃ impact bigger than that of TCP send buffer of
httpdn solution:
⁃ best: donʼt use L/B⁃ next to best: implement mitigations in L/B⁃ long-term: TCP migration + L3 NAT or DSR• i.e. accept in L/B, then transfer the connection to
HTTP/2 server
35Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Cache-aware Server Push
36Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
What is server-push?
n start the delivery of CSS / JS when receiving a request for HTML
n effect:⁃ 1 RTT reduction, or more
37Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Use-case: conceal request process time
n ex. RTT=50ms, process time=200ms
38Developing the fastest HTTP/2 server
req.�
processrequest�push-asset�
HTML�
push-asset�
push-asset�
push-asset�
req.�
processrequest�
asset�
HTML�
asset�
asset�
asset�
req.�
450ms(5RTT+processing=m
e)�
250ms(1RTT+processing=m
e)�
withoutpush� withpush�
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Use-case: conceal network distance
n CDNsʼ use-case⁃ utilize the conn. while waiting for app. response⁃ side-effect: reduce the number of app DCs
39Developing the fastest HTTP/2 server
req.�
push-asset�
HTML�
push-asset�
push-asset�
push-asset�
client� edgeserver(CDN)� app.server�
req.�
HTML�
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Issues of server-push
n how to determine if a resource is already cached⁃ shouldnʼt push a resource already in cache• waste of bandwidth (and time)
⁃ canʼt issue a request to identify the cache state• since it would waste 1 RTT we are trying to reduce!
40Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Cache-aware server push
n experimental feature since H2O 1.5n create a digest of URLs found in browser cache
⁃ uses Golomb coded sets• space-efficient variant of bloom filter
n server uses the digest to determine whether or not to push
41Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Memo: fresh vs. stale
n two states of a cached resourcen fresh:
⁃ resource that can be used⁃ example: Expires: Jan 1 2030
n stale:⁃ needs revalidation before use• i.e. issue GET with if-modified-since
42Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Generating a digest
1. calc hashcode of URLs of every fresh cache⁃ range: 0 .. #-of-URL / false-positive-rate
2. sort the hashcodes, remove duplicates3. emit the first element using the following encoding:
1. “value * FPR” using unary coding2. “value mod (1/false-positive-rate)” using binary
coding4. for every other element, emit the delta from
preceding element subtracted by one using the encoding
5. pad 1 up to the byte boundary43Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Generating a digest
n scenario:⁃ FPR: 1/256⁃ URLs of fresh resources in cache:• https://example.com/ecma.js• https://example.com/style.css
n calc hash modulo 512: 0x3d, 0x16bn sort, remove dupes, and emit the delta:
⁃ 0x3d → 0 00111101⁃ 0x16b - 0x3d - 1 → 0x12d → 10 00101101⁃ padding → 111111
44Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Overhead of sending the digest
n size: #-of-URLs * (1/log2(FPR) + 1.x) bitsn 1,400 URLs can be stored in 1 packet
⁃ when false-positive-rate set to 1/128n can raise FPR to cram more URLs
⁃ false-positive means the resource is not pushed, browser can just pull it
⁃ pushing some of the required resources is better than none
45Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Where to store the digest?
n cookie⁃ pros: runs on any browser, anytime⁃ cons: digest becomes inaccurate• only the browser knows whatʼs in the browser cache
n ServiceWorker (+ServiceWorker Cache)⁃ pros: runs on Chrome, Firefox⁃ cons: doesnʼt start until leaving the landing page
n HTTP/2 frame⁃ pros: minimal octets transferred• thanks to the knowledge of HTTP/2 connection
⁃ cons: needs to be implemented by browser developer
46Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Discussion at IETF
n IETF 95 (April)⁃ initial submission of the internet draft• co-author: Mark Nottingham (HTTP WG Chair)
⁃ defines the HTTP/2 frame• since itʼs the best way in the long-term• store the frame in headers / cookies for the short-
termn IETF 96, HTTP Workshop (July)
⁃ to define digest calculation of stale resources
47Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Handling stale resources
n hash key changed to URL + Etag⁃ anyone needs support for last-modified?
n server uses URL + Etag of the resource to check the digest⁃ push the resource in case a match is not found⁃ push 304 Not Modified in case a match is found
48Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Difficulties in pushing 304
n Etag cannot always be obtained immediately⁃ cannot build If-Match request header without
etag⁃ the “request*” of a pushed resource SHOULD be
sent before the main responsen proposed solution:
⁃ allow 304 against a non-conditional GET
*: in case of server-push, the server generates both request and response, sends them to the client.
49Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Using server-push from Ruby
n Link: rel=preload header⁃ web server pushes the specified URL
HTTP/1.1 200 OK
Content-Type: text/html
Link: </style.css>; rel=preload # this header!!!
⁃ supported by:• H2O, nghttpx (nghttp2), mod_h2 (Apache)
⁃ patch for nginx exists
50Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
The issue with Link: rel=preload
n cannot initiate push while processing the request
51Developing the fastest HTTP/2 server
client HTTP/2server Webapp.
GET/
can’tpushatthismoment
GET/
200OKLink:…200OK
processrequest
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
1xx Early Metadata
52Developing the fastest HTTP/2 server
n send Link: rel=preload as interim response⁃ application sends 1xx then processes the request
n supported in H2O 2.1n might propose for standardization in IETF
GET / HTTP/1.1Host: example.com
HTTP/1.1 1xx Early MetadataLink: </style.css>; rel=preload
HTTP/1.1 200 OKContent-Type: text/html; charset=utf-8
<!DOCTYPE HTML>...
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Sending 1xx from Rack
n in case of Unicorn:Proc.new do |env| env[”unicorn.socket”].write( ”HTTP/1.1 1xx Early Metadata\r\n” + ”Link: </style.js>; rel=preload\r\n” + ”\r\n”); # time-consuming operation ... [ 200, [ ... ], [ ... ] ]end
...we need to define the formal API
53Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Conclusion
54Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Conclusion
n the Web has become faster with HTTP/2n HTTP/2 becomes fast as to the limit of TCP/IP with:
⁃ optimizing TCP for responsiveness⁃ Cache Digest⁃ 1xx Early Metadata
55Developing the fastest HTTP/2 server
Copyright(C)2016DeNACo.,Ltd.AllRightsReserved.
Q&A
n Q. Can it be made faster than the limits o TCP/IP?n A. Yes!
⁃ shorten the RTT!• CDNsʼ approach
⁃ make DNS query part of TLS handshake• was part of TLS 1.3 draft (removed as too
premature)⁃ fairness isnʼt a issue for a private network!• TCP optimizer for mobile carriers
56Developing the fastest HTTP/2 server