+ All Categories
Home > Documents > Lessons learned: Hosting large-scale backends like the...

Lessons learned: Hosting large-scale backends like the...

Date post: 03-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
51
Dr. Christian Geuer-Pollmann @chgeuer http://blog.geuer-pollmann.de/ Lessons learned: Hosting large-scale backends like the “Eurovision Song Contest” on Microsoft Azure
Transcript
Page 1: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Dr. Christian Geuer-Pollmann

@chgeuer

http://blog.geuer-pollmann.de/

Lessons learned:

Hosting large-scale backends like the “Eurovision Song Contest” on Microsoft Azure

Page 2: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Architecture Overview

Operations Security

Load Testing Performance Connectivity

Agenda

Page 3: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Kampf der Orchester

(SRF)

Eurovision Song Contest 2015

(EBU / ORF)

Quizduell im Ersten

(Das Erste / NDR)

Spiel für Dein Land

(Das Erste / SRF / ORF)

Page 4: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Projekte

Page 5: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 6: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

im Ersten

Page 7: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Technology Partner

Page 8: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 9: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 10: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Nächster Sendetermin

Sa, 12. 12. 2015 | 20:15 Uhr

Page 11: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 12: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• Support 2+ mio concurrent connections

• Sub-second in-app notifications

• Voting and fast aggregation

• Web Sockets for bi-directional communications

• Build on Azure "Cloud Services" ("PaaS v1")

ASP.NET, SignalR

Solution Overview

Page 13: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

General Architecture

Page 14: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Patterns

Page 15: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• KISS!!!

• Cloud Services - Affinity, Network, CPU,

Memory only.

• Reduce moving pieces. If you can eliminate

3rd party services, do so.

• Asynchronous to potentially blocking / failing

components.

• Retry operations towards data store

shouldn't block critical path.

Paranoia – Trust no one

https://github.com/chgeuer/RedisCloudService

Page 16: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

No external dependencies

Page 17: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Multi-paradigm fallbacks

• Realtime updates via

WebSockets,

• Fallback to CDN.

Paranoia – Don‘t trust your own solution

Page 18: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 19: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• Quorum in PaaS v1 Cloud Services is "difficult“;

On paper, Compute v1 has 2 FDs only

• New Compute v2 (ARM, Service Fabric) provides 3FD

Unfortunately v2 not avail end CY14

Quorum and Fault Domains

Page 20: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• Don't let all web roles hammer

the backend. Reduce traffic to

central DB

• Aggregate in role

• Constant load on backend

• Shared-Access Signatures for

Profile Pictures

Reduce Load on Backend

http://blog.smarx.com/posts/architecting-scalable-counters-with-windows-azure

Page 21: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 22: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• Establishing TCP connections is expensive strain on

TCP/IP stack

• Closed TCP connections are expensive (TIMED_WAIT2)

• UX: Minimize realtime delay and latency

• WebSockets have no poll interval

• Authenticating each request

HTTP Polling versus WebSockets

Page 23: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Don‘t use plain http polling

Votes per POST

Status per GET

Page 24: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Automate everything!

Page 25: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Automate everything!

Page 26: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Network Tweaking

• HTTP • http.sys max connections

• Concurrent requests per CPU

• Request queue limit

• TCP • TIME_WAIT2

• max. TCP retransmissions

• Windows OS

https://github.com/chgeuer/Quizzer/blob/master/Quizzer.Web/SetupScripts/install2.ps1

Page 28: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 29: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• Egress data volume for client is high.

Questions and Answers can have image attachments.

• Individually encrypted questions, zipped JSON in CDN

• Change distribution time, path and costs

• Goal: Separate bulk data and realtime traffic • 500k people * 100kB == 50GB.

• 500k people * question ID + key only == 1MB.

Traffic Volume Optimization

https://github.com/chgeuer/SelectiveFieldConfidentiality

Page 30: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• There‘s no sizing info, as patterns vary heavily

• Load Test is the (only) answer!

SignalR (and other) Performance Guidance

https://github.com/SignalR/SignalR/wiki/Performance

Page 31: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 32: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• There is never enough time for testing.

• High # of concurrent users

• Long-lived connections

• Each public IP can establish a theorerical maximum of

64k connections to http://target:80/

• Custom protocol on top of SignalR

Developed an own load test framework (“bot net” )

Load Testing Challenges

https://github.com/chgeuer/AzureDistributedRunner

Page 33: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

https://github.com/chgeuer/AzureDistributedRunner

Load Test Setup

Page 34: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Spin 60 individual nodes (unique src IPs)

Page 35: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 36: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Security Rule #1 – Know your threat model

• Quizduell Gewinnspiel

• QD “Hall of Fame”

• QD Double-voting

• ESC votes per SMS

Page 37: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Caution: Don‘t generalize that specific decision!

• Used TLS for registration and login only

• TLS is burden on CPU, we did custom authN (HMAC only).

• Different APIs might have different security requirements &

protocols (possible due to closed system nature)

Performance vs. Security

Page 38: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Security Reviews you didn‘t ask for… Your client implementation is never private

Page 39: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

3 Tage vor „Go Live“

http://quizduellforum.de/index.php?topic=478.0

Page 40: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Live API provides status,

and links to bulk data in

CDN

A manifest // http://qd-prod.appsfactory.de/api/info { "AgbChange": "2015-10-23T18:55:00", "DsChange": "2015-10-23T18:55:00", "TbChange": "2015-10-23T18:55:00", "Live": false, "IdxVersion": 2, "RankingBlobUrl": "https://az692393.vo.msecnd.net/rankings/top50.zip", "RankingTimestamp": 1446231308743, "Capped": false, "PlayAlong": false, "Apps": [ { "OS": "iOS", "Version": "1.6", "Force": false }, { "OS": "Android", "Version": "1.2.7", "Force": false }, { "OS": "Windows", "Version": "1.8.0.0", "Force": false }, { "OS": "WindowsTablet", "Version": "1.8.0.0", "Force": false } ], "Duells": [ "https://az692393.vo.msecnd.net/duells/1179.zip", "https://az692393.vo.msecnd.net/duells/1253.zip", "https://az692393.vo.msecnd.net/duells/2274.zip", "https://az692393.vo.msecnd.net/duells/2275.zip" ] }

Page 41: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 42: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Thanks for the voluntary analysis

„Für alle weiteren Zugriffe auf die Web-API müssen wir

jedoch ein sog. User-Token mitliefern, damit der Server uns

überhaupt antwortet. Dieses User-Token erhalten wir erst

nach Authentifizierung über Googles OAuth 2-Dienst mit

unserem Google-Konto.“

http://quizduellforum.de/index.php?topic=478.0

Page 43: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

„Nach Herunterladen [...] entdecken wir [...] einen Katalog aller

Fragen, allerdings sind die Fragen verschlüsselt. [...] Der Schlüssel

wird mit Beginn jeder Fragerunde [...] an die Spieler ausgeliefert. Bis

dahin bleiben die Fragen unter Verschluss, denn das eingesetzte

symmetrische AES-Verschlüsselungsverfahren ist unknackbar. [...]

Cheaten ist also nicht drin, es sei denn man wertet die Kenntnis über

die zweite und dritte Frage schon zu Beginn einer Runde, während

andere erst die erste Frage sehen, als einen solchen Betrug.“

Thanks for the voluntary analysis (2)

http://quizduellforum.de/index.php?topic=478.0

Page 44: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

„Durch die geschickten Vorab-Downloads der Fragen und

Team-Fotos müssen während eines Live-Duells nur noch

kryptographische Schlüssel und einige Metadaten

ausgetauscht werden. Dies ist sicher eine deutliche

Reduktion des übertragenen Datenvolumens. Weiterhin

kommen sog. Websockets zum Einsatz, welche gegenüber

der alten App viele Performance-Vorteile bei der Live-

Synchronisation des Spielgeschehens bieten.“

Thanks for the voluntary analysis (3)

http://quizduellforum.de/index.php?topic=478.0

Page 45: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision
Page 46: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Pre-heating up your app

Heating up

Production

Page 47: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Autoscaling

Page 48: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

How we handled auto-scaling

We did not auto-scale!

Page 49: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

• Instrument your infrastructure. Know what the load is on your

nodes.

• Using Microsoft standard logging (Performance Counters) helps.

• Monitor everything (VMs, CDN)

• Realtime-logging for startup tasks:

chgeuer/UnorthodoxAzureLogging

Logging and Instrumentation

Page 50: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Schedule

Page 51: Lessons learned: Hosting large-scale backends like the ...download.microsoft.com/download/F/7/E/F7E9A431-DD1... · Lessons learned: Hosting large-scale backends like the “Eurovision

Dr. Christian Geuer-Pollmann

@chgeuer

http://blog.geuer-pollmann.de/


Recommended