Date post: | 06-May-2015 |
Category: |
Technology |
Upload: | bcantrill |
View: | 1,716 times |
Download: | 2 times |
DTrace in the Non-global ZoneBryan CantrillSVP Engineering, Joyent
DTrace and zones: Fraternal twins
• DTrace and zones were developed in parallel during development of Solaris 10
• DTrace integrated (September 2003) before zones (early 2004)
• When zones integrated, the priority was making DTrace in the global zone be able to meaningfully instrument non-global zones
• DTrace in the non-global zone was hard — and a lower priority than other work on both technologies
DTrace and zones: Basic functionality
• In 2006, Dan Price (with help from Adam Leventhal and Jonathan Adams) added initial support for DTrace in the non-global zone
• Allowed use of syscall provider, pid provider and (in a deranged, broken way) the profile provider
• This was significant work: required modifications to both the zones privilege model and the DTrace privilege model
• For example, required an implicit predicate on syscall and profile probes
DTrace and zones in SmartOS
• As the worldʼs heaviest user of zones, we at Joyent ran into (and fixed) a number of annoying bugs:
• USDT probes from the non-global were not properly being enabled in the global zone (illumos#908)
• Tick and profile probes did not properly fire when used in the non-global zone (illumos#1456)
• Fixing the latter required an extension of the DTrace privilege model: introduced a notion of restricted operation in which args could not be referenced
DTrace and zones in SmartOS
• Other (very) annoying issues still lurked:
• Inability to read “cpu” in the non-global zone
• Inability to read any fields from “curlwpsinfo” and “curpsinfo”— especially “pr_dmodel”
• Inability to read the “fds[]” array
• Failure mode highly obnoxious: [my-non-global-zone ~]# dtrace -n BEGIN'{trace(curpsinfo->pr_psargs)}' dtrace: description 'BEGIN' matched 1 probe dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid kernel access in action #1 at DIF offset 44
Divide and conquer
• curlwpsinfo and curpsinfo both are translators over the current thread (“kthread_t”) and current process (“proc_t”)
• Importantly, the state contained in oneʼs own kthread_t and proc_t:
• Is safe to read while executing (threads cannot disappear out from under themselves)
• Does not represent potential privilege escalation
• This can be fixed by simply allowing the loads where one has privileges to the current process!
fds[]: A magic bullet?
• Somehow, I convinced myself that the problem with fds[] was the translator that translates the member accesses into kernel accesses: inline fileinfo_t fds[int fd] = xlate ( fd >= 0 && fd < t_procp->p_user.u_finfo.fi_nfiles ? curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL);
• If the problem was the static translators, the solution must be dynamic translators — a(n in)famously unimplemented feature of DTrace!
• After dtrace.conf(12), I realized that the expression was orthogonal to the fact that the in-kernel implementation must not allow privilege escalation
fds[]: No magic bullets
• Focussing on the implementation, allows one to consider the specifics of the fds[] case
• Helped by the fact that the fi_list implementation uses memory retiring for scalability of file descriptor lookups: the array is only freed upon process exit
• Assures that oneʼs own fi_list is always pointing to memory that is (or was) an array of uf_entry_t
• Leaves the file_t itself, which can be freed during probe context (specifically, by another thread in the same process)
Dealing with file_t
• We can deal with this by forcing everyone out of probe context after a file_t has been removed from the uf_entry_t, but before being freed
• This is done by issuing a dtrace_sync() — a synchronous (empty) cross-call to all CPUs
• This is expensive, and required answering an important question: just how hot is the closef() path, anyway?
• By instrumenting our guinea pigs production cloud, we could answer this concisely: closef() is pretty damned hot (> 5,000/second on some machines!)
Adding getf()
• To track when fds[] was active in the non-global zone, we added a getf() subroutine (ht: ken)
• Allows us to issue the sync only when we have a closef() from a non-global zone using fds[]
• Had to take the final step of cleaning up the path output to strip off the zone path from the file name (as a cleanliness issue, not a security issue)
• De-mo, de-mo, de-mo!
sched and proc providers
• With fds[] done, focus turned the only meaningful impediment to DTrace in the non-global zone: enabling the sched and proc providers
• Recall the restricted operation introduced for the profile provider in the non-global zone...
• Used this to have limited (non-global) DTrace privileges imply restricted operation for some SDT providers
• Thanks to the curlwpsinfo/curpsinfo work, these providers can be meaningfully used without access to arguments
Thank you.FOR MORE INFORMATION VISIT
www.joyent.comOR
www.smartos.org