Virtual Machine Disk Images Introspection
Vasily Tarasov (SBU)
Dean Hildebrand (IBM Almaden)
Renu Tewari (IBM Almaden)
Erez Zadok (SBU)
File system and Storage Lab (FSL)
and a bit more...
Outline
• How all that started
• The idea of introspection
• A couple of results from a 1st prototype
• Future work
• Benchmarking, Filebench
Virtual Machines (VMs)- Computational resources consolidation- Flexible, efficient and scalable- Hardware support- Multiple solutions: VMWare, KVM, Xen, ...- Cloud-way of delivering services
Network Attached Storage (NAS)- Storage consolidation- Scalable, manageable and efficient- NFS/CIFS available on majority of Operating Systems- NAS sales jumped from $540M in 1998 to $5.1B in 2003 - IBM SONAS
Two important technologies
VM NAS
Two technologies…
Dean
VM NAS
…and they grow
Dean
VM NASIBM
SO
How do VM & NAS work together?Can we make them work better?
Typical Setup
VM 2-1
VM 2-2Virtual
MachinesHost
2
NFS
CLIE
NT
Storage 2-1
Storage 2-2
GPFSNode 1
Storage 3-1
Storage 3-2GPFSNode 3
GPFSNode 1
Storage 4-1
Storage 4-2GPFSNode 4VM 3-2
VirtualMachines
Host3
NFS
CLIE
NT
VM 3-1
NFSSERVER
2
VM 1-1
VM 1-2Virtual
MachinesHost
1
NFS
CLIE
NT
NFSSERVER
1
Storage 1-1
Storage 1-2GPFSNode 1
GPFSNode 2
VMWare, KVM, XEN, ...
Datapath DecomposedApplications
Virtual File System
On-Disk File System
Block Layer
Controller Driver
Controller Emulator
NFS Client
VM
Guest
NETWORK
Host
NFS Server
Virtual File System
On-Disk File System
Block Layer
Controller Driver
NA
S
CA RA
RA RM
CA RA RM
RM
CA RA
RM
RM
CA RA
RA RM
CA RA RM
RM
– CAchingCA
– Read-AheadRA
– Request Mangling and Scheduling
RM
Collecting traces: setup
1Gbps
VMWare ESX4NFS Server
VSCSI LayerTrace
Network Trace
Block LayerTrace
Within VMtrace
Rand/Seq Read Rand/Seq Write Various I/O sizes Multi-file workloads Multi-process workloads Meta-data intensive
Collecting traces: setup
VSCSI LayerTrace
Network Trace
Applications
Virtual File System
On-Disk File System
Block Layer
Controller Driver
Controller Emulator
NFS Client
VM
Guest
NETWORK
Host
NFS Server
Virtual File System
On-Disk File System
Block Layer
Controller Driver
NA
SBlock Layer
Trace
User-SpaceWorkload
Rand/Seq Read Rand/Seq Write Various I/O sizes Multi-file workloads Multi-process workloads Meta-data intensive
Some interesting results
I/O sizes change4MB
4KB
128KB
256KB
Applications
Virtual File System
On-Disk File System
Block Layer
Controller Driver
Controller Emulator
NFS Client
VM
Guest
NETWORK
Host
NFS Server
Virtual File System
On-Disk File System
Block Layer
Controller Driver
NA
S
1MB
32KB
WIOV’11 - Revisiting the Storage Stack in Virtualized NAS Environments
Meta-data Ops Data Ops
# stat /foo/bar
sys_stat(/foo/bar)
NFS_GETATTR(foobar_fh)
# stat /foo/bar
sys_stat(/foo/bar)
NFS_READ(dskimg_fh)
Non-VM case VM case Update attributes List directories Creation/deletion Lookup Access permissions Link/Symlink operations
NFS_WRITE(dskimg_fh)
Come up with an idea
READ(dskfh, offset, len)
Disk Image File
NFSServer
OffsetSize
What is located inthis region?
Ext, NTFS,UFS, ...
READ from: Inode Directory entry Data of specific file ...
Do intelligent things!
Prototype Results: Find
find
0
5
10
15
20
25
30
35
40
35
7
Non-optimizedOptimized
Ru n
ti me
(sec
)80% improvement
Prototype Results: Startup
Non-optimized Optimized0
20
40
60
80
100
120
140130 sec
50 sec
2.6x times faster
Future work
• Solid implementation
• More efficient cache policies
• Optimizations on the write path
• Analysis of more complex workloads
Virtual MachineDisk Images Introspection
a bit more...
18
A Recent Study Concluded that…1. Much of what researchers conclude in their studies is
misleading, exaggerated,or flat-out wrong
2. A new claim about a research findings is more likely to be false than true
3. Researchers tend to publish positive results more often than negative findings
4. Chances to be accepted to a conference are higher if the results are “more exciting”
5/4/2011
MedicineA SociologyB Computer ScienceC
BiologyD PhysicsE
2005-2008 studyby J. Ioannidis
HotOS’11: Benchmarking FS Benchmarking: It is Rocket Science
Filebench
• Originally created by SUN Microsystem (RIP )• Maintained by FSL• Used in many papers• Flexible: Workload Model Language – WML• Portable: Linux, FreeBSD, Solaris, MacOS,
Windows *
Filebench WML
define fileset name=myfileset,size=16kb,entries=1000define process name=reader,instances=1 {
thread name=readerthread,memsize=10m,instances=10 {
flowop read name=myread,filesetname=myfileset,iosize=2kb }
}
Filebench for Cloud Services
• Reads• Writes• Creates• Deletes• +20 more
sophisticated
POSIX
NFS RPC
AFS RPC
Cloud
flowops:
Filebench for Virtualized Environments
define process name=reader,instances=1 {
thread name=readerthread,memsize=10m,instances=10 {
flowop read name=myread1,filesetname=myfileset,…
}
}
define hypervisor name=hpv,type=esx3.1,instances=1{
define vm name=hpv,type=windows,instances=5{
}}
Virtual Machine Disk Images Introspection
Vasily Tarasov (SBU)
Dean Hildebrand (IBM Almaden)
Renu Tewari (IBM Almaden)
Erez Zadok (SBU)
File system and Storage Lab (FSL)
and a bit more...
Thank you!