SARA Reken- en NetwerkdienstenDashboard updateSander Boele Reminder
SARA Reken- en NetwerkdienstenPast workPerformance tweaks made the dashboard much faster.
Baseline OWDs still change every now and then.
MySQL has been optimized with allowance of more memory usage and weve changed storage engines and some other parameters. This greatly improved site performance as it was getting sluggish with increased DB size. I have also tried changing storage engine to archive_db, but default mysql version of centos has no support for this. Ive also made a smaller table for the frontpage. Ruby code has been optimizedOWDs in the LHCOPN tend to move around a little, therefore baseline OWDs need to be monitored. Please check your baselines every now and then and drop me a line if youd like to have them changed! (next slide shows an example)
SARA Reken- en NetwerkdienstenBaselines that are off
This is an example of an off baseline. Maybe the link has moved to a different slower circuit or maybe ntp is out of sync?
15% deviation shows orange and if one of the UDP's is lost intransit, I show a red status. Each minute they all send 9 udp's to oneanother, so 1/9th lost = red.
1) the OWD is off sometimes because1) NTP sync is broken2) A link is on a backup path3) a link is moved to a slower/faster pathI cannot distinguish between these 3.
SARA Reken- en NetwerkdienstenAnother example
Im not sure what happened here. Can anybody explain this?
To differentiate between NTP failures and real OWD drift or changes in network infrastructure we really need to know NTP sync status.
Current work (traceroute)Match this:
From US-FNAL-CMS-HADES to NL-T1-HADES at commonTime: 2011-06-10 09:08:15 +0200 (1307689695)22.214.171.124-126.96.36.199-188.8.131.52-184.108.40.206
The next logical step (well, it makes sense to me) would be to match traceroute data
This is a little bit dificult because1) lots of unknowns2) lots of labour to attribute a site to each IP address I see in the traces and crossreference this with the routing matrix.So if the OWD is off and I've verified with the traceroutes that I'm on a backup path I can immediately give the cause of the off OWD in the dashboard.
SARA Reken- en NetwerkdienstenPossible qualificationsPrimary route
On backup path
On external path (via internet)
I was thinking about adding a little mark 1 (for primary), 2 (for backup) and 3 for external inside each field inside the table on the first page.
SARA Reken- en NetwerkdienstenTraceroute problemsLots of unknowns: 2 examples:
From CA-TRIUMF-HADES to CH-CERN-HADES at commonTime: 2011-06-10 09:08:15 +0200 (1307689695)220.127.116.11-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-
From DE-KIT-HADES to US-FNAL-CMS-HADES at commonTime: 2011-06-10 09:27:02 +0200 (1307690822)18.104.22.168-22.214.171.124-126.96.36.199-188.8.131.52-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-UNKNOWN-Lots of unknowns appear when harvesting traceroute data. I harvest from the same server in Germany, that is the measurement archive at DFN
I guess this is a failure to implement good firewall policies at the sites, the sites should have received them with the HADES boxes. Have the sites received them?
SARA Reken- en NetwerkdienstenTraceroute sollution
For traceroute to work, the routers need to allow outgoing and incoming UDP traffic to/from the involved HADES interfaces on ports 60900-60930 as well as incoming ICMP TTL exceeded and port unreachable messages.John, please convince them of this
Click here to load reader