12-Jun-20001
NSI Registry Engineering& Operations Update
Ari BaloghVP of Engineering
12-Jun-20002
High-Level Architecture
R egistra tionSystem CSRs
Root, gTLD ,Node
RR PProxy
R egr.Tools
App.Server
Dom ains, N Ss,R egistrars
W hoisServer
Network SolutionsRegistry
W hois
DN SZones
R egr.Reprts.
C SRTools
InternetUsers
CSR sF irew all
Registrars
H TTPR R P/S S L H TTPR R P/S S L
12-Jun-20003
Welcome letter sent to Registrar candidates 31Registrars in pre-production 48Registrars in production 44Total number of Registrars in Registry 123
I CANN accredited Registrars 123
Total Number of Names in the Registry Database
Registrar Growth
12-Jun-20004
0.0
5.010.0
15.020.0
25.0
Plan of Record 1.2 1.4 1.6 1.9 2.21/ 1/ 00 Projection 2.8 4.5 5.2 6.0 7.0Actual 2.8 5.6 19.4
4Q99 1Q00 2Q00 3Q00 4Q00
Average Daily Transactions
Qtr to 5/31
In millions, compared to Original Plan and New Projections (peak of 27.5M)
12-Jun-20005
0.0100.0200.0300.0400.0500.0600.0700.0
Write 1.6 4.5 5.0 4.0 4.5 4.5 8.1 8.4Query 0.4 3.3 4.4 2.6 3.8 4.2 15.4 18.0Check 29.2 38.6 77.7 113.7 151.6 212.2 518.6 616.9
Oct Nov Dec J an Feb March April May
Total Transactions Summary
In millions
38%49% 88%
33% 38%
145%
19%
12-Jun-20006
Availability & Performance
• Service Level Agreement (SLA) allowances:– 8 hours total outage per month, 4 hours unplanned– 3 seconds average for check domain (excluding worst
5%)– 5 seconds average for add domain (excluding worst 5%)
• January observed performance:– 3.5 hours planned outage to implement governance
issues, no unplanned– 600 ms per check domain, 2.5 seconds per add
• February observed performance– No planned or unplanned outages– 700 ms per check domain, 2.6 seconds per add
12-Jun-20007
Availability & Performance
• March observed performance– Two 2 hour planned outages, 1.25 hour unplanned
outage– 60 ms per check domain, 300 ms per add
• April observed performance– 2.5 hours planned outage, no unplanned– 78.5 ms per check domain, 319.5 ms per add
• May observed performance– 2 hours planned outage, no unplanned– 34.7 ms per check domain, 257.2 ms per add
12-Jun-20008
A Root Performance - UDP Packets/Second
5 Minute Average
30 Minute Average
12-Jun-20009
A Root Performance - Drops & Overflows
Drops - 5 Minute Average
Overflows - 5 Minute Average
12-Jun-200010
J gTLD Performance - UDP Packets/Second
5 Minute Average
30 Minute Average
12-Jun-200011
M gTLD Performance - UDP Packets/Second
5 Minute Average
30 Minute Average
12-Jun-200012
The Infrastructure Problem
• SLA that incurs $500K/day outage and performance penalties
• Single shared database experiencing 30% - 90% per month OLTP growth– Heavyweight stored procedures– Sustained 50%-70% utilization with peaks to 100% … and no
more easy software fixes– Increasing extract duration for zones, Whois, registrar extracts, 5
- 14 hours• Immature or end-of-life HA options for E4500• Sun, Veritas, EMC version and support issues
12-Jun-200013
DB Server Evaluation
• Evaluated top Unix machines– Sun E10000, HP V2500, IBM S7A/S80
• Narrowed to E10000 and S7A/S80• Conducted three month live test of S7A/S80
– Ported gateway and application servers to IBM Java environment
– Created RRP path configuration– Demonstrated performance and availability (HA/CMP)
• Investigated impacts of E10K– Different administrative model– EMC integration issues
12-Jun-200014
Definitive Results
• Excellent Java and C code portability• S80 clear performance leader, benchmarks and real-world– Approximately 3 times the throughput per CPU vs. E10K– Noticeably improved Java performance (!)
• Robust HA implementation• Complete 64-bit environment• Native file system and volume management;
excellent EMC integration• Impressive and thorough support
– Demonstrated appreciation for multi-vendor, mission critical computing
12-Jun-200015
Scaling DNS
• Domain name resolutions on A Root– 4Q99 - 220M per day– 1Q00 - 430M per day– 2Q00 - 650M per day– 4Q00 - 1.5B per day, more?
• Need 64-bit machines to scale past 4GB/23M domain name wall
• Developing bind extensions for high performance gTLD
12-Jun-200016
64-bit DNS Evaluation
• Engaged Unix vendors to aid with in-house evaluation of 64-bit mid-range Unix servers– HP N4000, IBM H70, Sun E3500
• E3500 eliminated early -- scale and 64-bit issues
• H70 within 15% of N4000, upcoming upgrade substantially faster
• Chose M80 as new root/gTLD platform• Using E4500s as alternate platform and
placeholder for UltraSparcIII generation
12-Jun-200017
0
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
The Dot Problem
Resolutions per day. A Root meltdown?
12-Jun-200018
Dot Diagnosis and Fix
• Too much load for existing E450• Qualified and put into production the [evaluation]
H70– Greater than 60% increased throughput– Jump from 220M resolutions per day to over 400M
• Qualified and put into production an S80 as placeholder for upcoming M80 deployment– Greater than factor of three improvement over previous E450
• Tweaked TCP keepalive defaults and bind select loop
• Filtered dynamic updates
12-Jun-200019
The New Dot
050,000,000
100,000,000150,000,000200,000,000250,000,000300,000,000350,000,000400,000,000450,000,000500,000,000
A Root resolutions per day with H70
12-Jun-200020
Packet Drops
Percent packets dropped, day of H70 deployment
Deployed 11 a.m.
“Current” time(9 a.m. day after)
12-Jun-200021
Upcoming access -www.dnsentral.net