Date post: | 26-Jun-2015 |
Category: |
Technology |
Upload: | david-strauss |
View: | 1,467 times |
Download: | 0 times |
PHP at Density and Scale...with security and consistent performance
About Me
● Four Kitchens● Drupal.org● Pressflow● Pantheon● systemd
Broadly Defining Security
Your data...1. Is accessible to the right people (access)2. Isn’t to anyone else (access)3. Is usable (quality of service)
Topics● Performance
○ Socket activation○ Automount/autofs○ cgroups○ “Customer Experience Monitor”○ Migration
● Security○ Users○ Namespaces○ Defense-in-depth○ Non-disruptive fixes
Challenge: PHP-FPM Overhead● Using a full PHP-FPM instance per stack
○ Isolated opcode cache space○ Defense-in-depth against PHP issues○ Low-impact reconfiguration
● Idle PHP-FPMs take ~0.5% of a core each○ At 10k dense, that’s over six cores
● Initial solution used error capture in nginx○ Masked real failures to connect to PHP-FPM○ Slower than necessary○ Production use of HTTP 418 (arguably a bonus)
Traditional server sockets: overview
...
nginxTCP80Client
nginxTCP81If you want a service
available, the daemon has to be running.
Socket activation: overview
systemd
TCP80Client
TCP81
nginxfd=3
Only a socket in systemd has to run for service availability.
Socket activation: details
● systemd squats on all listeners○ Looks for incoming traffic with EPOLL○ Starts the services/containers on-demand○ Passes socket to daemon as fd=3+
● Not a proxy (same performance)● No client awareness● No CPU or memory overhead when idle
Socket activation: Pantheon’s use
● nginx and PHP-FPM● MariaDB soon
○ Using an alternative now● Allows 90%+ containers to be idle● Makes bootup sensible● Reconfiguration pattern is stop, not restart
Socket Activation
Demo
Demoed this at NYC Camp a few weeks ago
Automount/autofs
● Like socket activation for file system mounts○ Kernel squats on mount path and looks for traffic○ Brings up file mount lazily
● Used for FuseDAV (Valhalla client)
Automount Demo
Challenge: Resource Availability
● Per-site load isn’t predictable● Different sites compete for resources
○ Between customers○ Among customers’ own sites
● Traditional prioritization isn’t adequate○ VMs are too heavyweight○ Tools like “nice” can cause starvation○ Generally want burstability
cgroups
● Many options○ Pantheon uses CPUShares and BlockIOWeight
● Keeps things fair under contention○ Kind of like adding purple ropes when people are
queueing
Contentionwith cgroups Demo
Customer Experience Monitor
● Runs a representative Drupal site on every container host
● Reports scores to the API and monitoring● Influences migration and container
placement
Migration
● At density, rebalancing is important● Keep state lightweight
○ No OS○ No runtime
● Mutiny: migration as replication + promotion
Challenge: Security Isolation
● Many users● One kernel● VMs too heavyweight● Users run their own code● Can’t betray expectations
○ Many users develop locally and push code○ Some customers import existing, working sites
Isolation for security
● Users● Namespaces● Seccomp filters
Defense in depth● Application
○ Drupal● Runtime
○ nginx, PHP-FPM, FuseDAV● Container: “binding” certificate
○ Linux user, namespaces, etc.● Container host: “endpoint” certificate
○ Only trusted for the containers assigned● Platform: root certificate
Challenge: Security Responses
● Traditional approach too big a hammer○ Rebooting hundreds of hosts with 10k+ containers
each would be a fail-over storm○ Basic customers don’t have fail-over○ Not going to pack it less dense
● Customers can run own code○ May load executables and libraries themselves
Non-disruptive fixes
● Kernel upgrades via migration● Rolling daemon and library upgrades
○ Heartbleed
Heartbleed Fix Demo