Jonathan Corbet LWN.net corbet@lwnlwn.net/images/pdf/google-class/01-kreport.pdf · MontaVista 1%...

Post on 01-Sep-2018

216 views 0 download

transcript

The kernel report

Jonathan CorbetLWN.net

corbet@lwn.net

What we'll talk about

1) Process - releases and such (how we got to where we are)

2) Features and futures (Cool stuff and what it's good for)

Cadence

The 2.6.34 cycle began February 25, 2010

Since then:Four releases have been made (another almost ready)49,079 changesets have been merged ...from 2,826 developers, 326 employers1.3 million lines of code have been added

The process is working smoothly

Who supports this work

Volunteers 17.6%Red Hat 11.7%unknown 7.7%Intel 6.7%Novell 4.8%IBM 3.7%Nokia 2.3%Consultants 2.2%Texas Inst. 2.2%Oracle 1.7%AMD 1.6%

Samsung 1.5%academics 1.4%Fujitsu 1.4%Renesas Tech. 1.4%Pengutronix 1.3%Google 1.2%Broadcom 1.1%Atheros 1.1%Analog Devices 1.1%Wolfson Micro 1.0%New Dream Net 1.0%

The 2.6.21 version

Unknown 27%Red Hat 14%IBM 8%Novell 7%Linux Found. 5%Hobbyists 5%Intel 4%Oracle 2%Google 2%

SGI 2%MIPS Tech. 1%HP 1%Consultants 1%Nokia 1%Astaro 1%MontaVista 1%Linux Networx 1%Qlogic 1%

Who supports this work

Volunteers 17.6%Red Hat 11.7%unknown 7.7%Intel 6.7%Novell 4.8%IBM 3.7%Nokia 2.3%Consultants 2.2%Texas Inst. 2.2%Oracle 1.7%AMD 1.6%

Samsung 1.5%academics 1.4%Fujitsu 1.4%Renesas Tech. 1.4%Pengutronix 1.3%Google 1.2%Broadcom 1.1%Atheros 1.1%Analog Devices 1.1%Wolfson Micro 1.0%New Dream Net 1.0%

2.6.34

May 15, 2010 (9,443 changes, 1,151 developers)

Asynchronous suspend/resumeperf lock, perf Python scripting supportLogFSCeph distributed filesystem

2.6.35

Aug. 1, 2010 (9,801 changes, 1,188 developers)

perf kvmReceive packet/flow steeringMemory compactionIdle pattern detectionRAMoopsBtrfs direct I/O support

2.6.36

Oct. 20, 2010 (9,501 changes, 1,176 developers)

AppArmor security moduleWakeup countsLIRC infrared driversNew OOM killerfanotifyConcurrency-managed workqueues

2.6.37

Jan 4, 2011 (11,446 changes, 1,276 developers)

VFS scalability work (inode_lock removal)GFS2 is no longer “experimental”Block I/O bandwidth controllerPPTP supportBasic pNFS supportHugepage migrationWakeup sourcesBlock layer barrier work

2.6.38

Mar. ??, 2011 (8,888 changes, 1,111 developers) (so far)

Per-session group schedulingDcache scalability workTransmit packet steeringBatch discardTransparent hugepagesMultitouch panel supportSCSI target subsystemBtrfs: read-only snapshots and LZO compression

You are here

Stable updates

Mainline release is not the end of the story

Stable/longterm updates for:Serious bug fixesSimple hardware support (PCI IDs)Occasional backports

Currently-maintained stable kernels

2.6.27Long term, deep freeze mode

2.6.32Base of a number of “enterprise” distributions

2.6.35Embedded “flag version”

2.6.362.6.37

Recent mainline releases

Currently-maintained stable kernels

2.6.27Long term, deep freeze mode

2.6.32Base of a number of “enterprise” distributions

2.6.35Embedded “flag version”

2.6.362.6.37

Recent mainline releases

Currently-maintained stable kernels

2.6.27Long term, deep freeze mode

2.6.32Base of a number of “enterprise” distributions

2.6.35Embedded “flag version”

2.6.362.6.37

Recent mainline releases

What's coming?

2.6.39

(Merge window is still open)

O_PATH opensOpen by file handleCLOCK_BOOTTIME...

A new version numbering scheme?

No.

A new version numbering scheme?

No.

(At least, not until we hit 2.6.42)

Hardware support and vendor participation

Good news: Broadcom releases an open driver

Qualcomm joins the Linux Foundation

Ralink starts submitting patches

Embedded flag kernel

On the other hand:

Embedded graphics remains a problem

GPL compliance is spotty

Power management

CPU power management works very well

Now working on memory,peripherals

Power management

Android code still not merged.

We do have an alternative:Wakeup sources (2.6.36)Wakeup counters (2.6.37)Currently unused

Power domains

Server or desktop PM is relatively simple

Newer systems less so

Dealing with complexity

Power domainsMap power relationships on each systemUsed to make power management decisions2.6.39 (maybe)

Media controller subsystemHandle connections between media processors2.6.39 (probably)

Other things to watch for

ARM PAE support>4GB in your pocket2.6.39 maybe

Device tree support...over time

Vast numbers of new drivers...as always

Solid-state devices

Solid-state devices

SSD challenges

Optimizing I/O patternsTransfer sizes and alignment

Block I/O subsystem scalability100 I/O operations/second -> 100,000+ IOPS

Communication with the deviceTRIM/DISCARD operations

What will we do with that much fast memory?

SCSI targets

Linux as a SCSI device

Useful for storage arrays and such

Replaces STGT

Two choices:LIO (now in-tree)SCST (out-of-tree)

Dentry scalability

A directory entry (dentry) represents a name in the filesystem.

The dentry scalability patches

Remove dcache_lockUse RCU for walking the dentry tree

Result: lockless file name lookup

Filesystems

Ext4 Ready for production use

Ongoing scalability work

Occasional bug fixes

Ext2/ext3 code removal?

Filesystems

Btrfs Almost there

Needs a filesystem checker

Remaining featuresDeduplicationRAID 4/5 support

Beginning deploymentDefault MeeGo filesystemDefault for Fedora 16?

Filesystems

Others yaffs2Fast embedded filesystem2.6.39?

xfsContinued evolution

GFS2 no longer experimentalMerge with OCFS2?

Transparent huge pages

Linux uses 4096-byte pages (most arch's)

Transparent huge pages

The processor can deal with larger sizes

2MB is common

Virtual address translations

Address translation is complicated

The translation lookaside buffer

Caches address mappings-> Avoids that whole lookup process

The TLB tends to be smallOn this laptop: 128 instruction, 256 data

One 2M huge page saves 511 TLB entries!(If all internal pages are used)

Thus: huge pages make the system go faster

Transparent huge pages

Linux has had hugetlbfs for yearsFiddly, administration-heavy mechanism

THP makes huge pages “just happen”Not as fast as hugetlbfsBut it works for everybodyMerged for 2.6.38

Other memory management issues

WritebackOne of our biggest performance problems

Hybrid memory techniquesKSMTranscendent memory...

Control groups

A means for grouping related processes

Control groups

Are hierarchicalGroups can contain other groups

Are inheritedChildren stay in their parent's group

Are associated with controllersApply some policy to contained processes

Control groups

Are hierarchicalGroups can contain other groups

Are inheritedChildren stay in their parent's group

Are associated with controllersApply some policy to contained processes

Are old news...merged in 2.6.24

Group CPU scheduling

Per-session group scheduling

Group scheduling is old...but nobody was using it

Per-session group schedulingMakes it all “just work”Interactivity improvements result2.6.38

Other control group stuff

Expanded group schedulingSystemd/gnome-session integrationNice separation of tasksBandwidth control

Memory controllerMore focused reclaim

Block I/O controllerHierarchical group I/O schedulingAsynchronous I/O control

...

Deadline scheduling

Of interest to real time, media communities

Chilly reception at the 2010 kernel summitNeeds better use cases

Work continuesWe'll have it someday

Realtime Preemption patch set

We'll have that someday too!

Networking

Expand initial congestion window

2.6.39

DFS compliance2.6.40+

Ongoing issues:IPv6BufferbloatScalability Photo: Arenamontanus

Security

Stackable modules2.6.40+

HardeningAn area of increased focus

User namespacesUnprivileged container creation

Photo: CarbonNYC

Tracing and visibility

Continued ftrace and perf workEmphasis on usability and unificationAddition of tracepoints

Improved user-space tracing/debuggingSomeday

Still outside:SystemTapLTTng

Things not discussed

Thousands of bug fixesVirtualizationAdditional architecturesDocumentationRAS improvementsCPU isolationInterrupt layer reworkClock enhancementsMM preemptabilityRegression trackingBKL removal...

Questions?