What we'll talk about
1) Process - releases and such (how we got to where we are)
2) Features and futures (Cool stuff and what it's good for)
Cadence
The 2.6.34 cycle began February 25, 2010
Since then:Four releases have been made (another almost ready)49,079 changesets have been merged ...from 2,826 developers, 326 employers1.3 million lines of code have been added
The process is working smoothly
Who supports this work
Volunteers 17.6%Red Hat 11.7%unknown 7.7%Intel 6.7%Novell 4.8%IBM 3.7%Nokia 2.3%Consultants 2.2%Texas Inst. 2.2%Oracle 1.7%AMD 1.6%
Samsung 1.5%academics 1.4%Fujitsu 1.4%Renesas Tech. 1.4%Pengutronix 1.3%Google 1.2%Broadcom 1.1%Atheros 1.1%Analog Devices 1.1%Wolfson Micro 1.0%New Dream Net 1.0%
The 2.6.21 version
Unknown 27%Red Hat 14%IBM 8%Novell 7%Linux Found. 5%Hobbyists 5%Intel 4%Oracle 2%Google 2%
SGI 2%MIPS Tech. 1%HP 1%Consultants 1%Nokia 1%Astaro 1%MontaVista 1%Linux Networx 1%Qlogic 1%
Who supports this work
Volunteers 17.6%Red Hat 11.7%unknown 7.7%Intel 6.7%Novell 4.8%IBM 3.7%Nokia 2.3%Consultants 2.2%Texas Inst. 2.2%Oracle 1.7%AMD 1.6%
Samsung 1.5%academics 1.4%Fujitsu 1.4%Renesas Tech. 1.4%Pengutronix 1.3%Google 1.2%Broadcom 1.1%Atheros 1.1%Analog Devices 1.1%Wolfson Micro 1.0%New Dream Net 1.0%
2.6.34
May 15, 2010 (9,443 changes, 1,151 developers)
Asynchronous suspend/resumeperf lock, perf Python scripting supportLogFSCeph distributed filesystem
2.6.35
Aug. 1, 2010 (9,801 changes, 1,188 developers)
perf kvmReceive packet/flow steeringMemory compactionIdle pattern detectionRAMoopsBtrfs direct I/O support
2.6.36
Oct. 20, 2010 (9,501 changes, 1,176 developers)
AppArmor security moduleWakeup countsLIRC infrared driversNew OOM killerfanotifyConcurrency-managed workqueues
2.6.37
Jan 4, 2011 (11,446 changes, 1,276 developers)
VFS scalability work (inode_lock removal)GFS2 is no longer “experimental”Block I/O bandwidth controllerPPTP supportBasic pNFS supportHugepage migrationWakeup sourcesBlock layer barrier work
2.6.38
Mar. ??, 2011 (8,888 changes, 1,111 developers) (so far)
Per-session group schedulingDcache scalability workTransmit packet steeringBatch discardTransparent hugepagesMultitouch panel supportSCSI target subsystemBtrfs: read-only snapshots and LZO compression
You are here
Stable updates
Mainline release is not the end of the story
Stable/longterm updates for:Serious bug fixesSimple hardware support (PCI IDs)Occasional backports
Currently-maintained stable kernels
2.6.27Long term, deep freeze mode
2.6.32Base of a number of “enterprise” distributions
2.6.35Embedded “flag version”
2.6.362.6.37
Recent mainline releases
Currently-maintained stable kernels
2.6.27Long term, deep freeze mode
2.6.32Base of a number of “enterprise” distributions
2.6.35Embedded “flag version”
2.6.362.6.37
Recent mainline releases
Currently-maintained stable kernels
2.6.27Long term, deep freeze mode
2.6.32Base of a number of “enterprise” distributions
2.6.35Embedded “flag version”
2.6.362.6.37
Recent mainline releases
What's coming?
2.6.39
(Merge window is still open)
O_PATH opensOpen by file handleCLOCK_BOOTTIME...
A new version numbering scheme?
No.
A new version numbering scheme?
No.
(At least, not until we hit 2.6.42)
Hardware support and vendor participation
Good news: Broadcom releases an open driver
Qualcomm joins the Linux Foundation
Ralink starts submitting patches
Embedded flag kernel
On the other hand:
Embedded graphics remains a problem
GPL compliance is spotty
Power management
CPU power management works very well
Now working on memory,peripherals
Power management
Android code still not merged.
We do have an alternative:Wakeup sources (2.6.36)Wakeup counters (2.6.37)Currently unused
Power domains
Server or desktop PM is relatively simple
Newer systems less so
Dealing with complexity
Power domainsMap power relationships on each systemUsed to make power management decisions2.6.39 (maybe)
Media controller subsystemHandle connections between media processors2.6.39 (probably)
Other things to watch for
ARM PAE support>4GB in your pocket2.6.39 maybe
Device tree support...over time
Vast numbers of new drivers...as always
Solid-state devices
Solid-state devices
SSD challenges
Optimizing I/O patternsTransfer sizes and alignment
Block I/O subsystem scalability100 I/O operations/second -> 100,000+ IOPS
Communication with the deviceTRIM/DISCARD operations
What will we do with that much fast memory?
SCSI targets
Linux as a SCSI device
Useful for storage arrays and such
Replaces STGT
Two choices:LIO (now in-tree)SCST (out-of-tree)
Dentry scalability
A directory entry (dentry) represents a name in the filesystem.
The dentry scalability patches
Remove dcache_lockUse RCU for walking the dentry tree
Result: lockless file name lookup
Filesystems
Ext4 Ready for production use
Ongoing scalability work
Occasional bug fixes
Ext2/ext3 code removal?
Filesystems
Btrfs Almost there
Needs a filesystem checker
Remaining featuresDeduplicationRAID 4/5 support
Beginning deploymentDefault MeeGo filesystemDefault for Fedora 16?
Filesystems
Others yaffs2Fast embedded filesystem2.6.39?
xfsContinued evolution
GFS2 no longer experimentalMerge with OCFS2?
Transparent huge pages
Linux uses 4096-byte pages (most arch's)
Transparent huge pages
The processor can deal with larger sizes
2MB is common
Virtual address translations
Address translation is complicated
The translation lookaside buffer
Caches address mappings-> Avoids that whole lookup process
The TLB tends to be smallOn this laptop: 128 instruction, 256 data
One 2M huge page saves 511 TLB entries!(If all internal pages are used)
Thus: huge pages make the system go faster
Transparent huge pages
Linux has had hugetlbfs for yearsFiddly, administration-heavy mechanism
THP makes huge pages “just happen”Not as fast as hugetlbfsBut it works for everybodyMerged for 2.6.38
Other memory management issues
WritebackOne of our biggest performance problems
Hybrid memory techniquesKSMTranscendent memory...
Control groups
A means for grouping related processes
Control groups
Are hierarchicalGroups can contain other groups
Are inheritedChildren stay in their parent's group
Are associated with controllersApply some policy to contained processes
Control groups
Are hierarchicalGroups can contain other groups
Are inheritedChildren stay in their parent's group
Are associated with controllersApply some policy to contained processes
Are old news...merged in 2.6.24
Group CPU scheduling
Per-session group scheduling
Group scheduling is old...but nobody was using it
Per-session group schedulingMakes it all “just work”Interactivity improvements result2.6.38
Other control group stuff
Expanded group schedulingSystemd/gnome-session integrationNice separation of tasksBandwidth control
Memory controllerMore focused reclaim
Block I/O controllerHierarchical group I/O schedulingAsynchronous I/O control
...
Deadline scheduling
Of interest to real time, media communities
Chilly reception at the 2010 kernel summitNeeds better use cases
Work continuesWe'll have it someday
Realtime Preemption patch set
We'll have that someday too!
Networking
Expand initial congestion window
2.6.39
DFS compliance2.6.40+
Ongoing issues:IPv6BufferbloatScalability Photo: Arenamontanus
Security
Stackable modules2.6.40+
HardeningAn area of increased focus
User namespacesUnprivileged container creation
Photo: CarbonNYC
Tracing and visibility
Continued ftrace and perf workEmphasis on usability and unificationAddition of tracepoints
Improved user-space tracing/debuggingSomeday
Still outside:SystemTapLTTng
Things not discussed
Thousands of bug fixesVirtualizationAdditional architecturesDocumentationRAS improvementsCPU isolationInterrupt layer reworkClock enhancementsMM preemptabilityRegression trackingBKL removal...
Questions?