+ All Categories
Home > Documents > Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL...

Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL...

Date post: 21-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
27
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Why use Xen for large scale Enterprise Deployments? Konrad Rzeszutek Wilk Software Developer Manager
Transcript
Page 1: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Why use Xen for large scale Enterprise Deployments?

Konrad Rzeszutek Wilk Software Developer Manager

Page 2: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2

Page 3: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 3

A bit of history

Where does the code come from?

Distributions and kernels

Features

The end result

Page 4: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Unbreakable Enterprise Kernel and Oracle Linux purpose:

• Red Hat and Oracle split:

– Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want better performance for customers. The kernel is being updated more often and with features and benefits to take advantage of Oracle products.

– As such an Oracle Linux Distribution along with UEK kernels is offered. The UEK kernel is used in other products – OVM.

4

Page 5: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Oracle’s virtualization product (OVM):

We use Xen for hypervisor. For kernel we use UEK – in the past (OVM 2) we had SLES based kernel.

• OVM 2 (Xen 3.4)

– Linux 2.6.32 based on SLES Xen Patches (classic)

While the newer ones are based on paravirt (pvops):

• OVM 3 (Xen 4.1)

– UEK2 kernel (2.6.39)

• OVM 3.3 (Xen 4.3) – UEK3 kernel (3.8.13)

5

Page 6: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Kernels (UEK: 2.6.39, 3.8).

• Oracle’s approach is

– Available for anybody (https://oss.oracle.com/git/).

– Make features available for everybody. • Best way is to have it upstream so every distribution can have it.

• The end goal is for applications to run as best as they can.

• Large set of patches (big divergence from upstream) inhibit this as there is a lot of complexity in them. Classic Xen patches is an example of this.

6

Page 7: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Developers approach to patches:

• We forget what we did after 6 months (more or less).

• Want the code in one place (one repository).

• Want to develop new features against code to make it better and faster. Don’t want to retouch the old code over and over.

• Want to fix new bugs in new shinny code.

• Big patches are scary.

7

Page 8: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Quality Assurance approach to patches:

• Want to find the bug and have it fixed.

– Don't want bugs to re-appear later in a new version of kernel (aka regressions).

• Want to catch new bugs, expose new scenarios, not find old bugs.

• Ideal situation:

– new hardware = new bugs

– not new hardware = old bugs.

8

Page 9: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Linux kernel: 2.6.32 (…) 3.0 (…) 3.8 (…) 3.11 (…) 3.15 Linux stable tree: 2.6.32LT 3.0 LT 3.8 LT Unbreakable Linux UEK1 UEK2 UEK3

Unbreakable Enterprise Kernel origin

Backporting patches from upstream (Linus's tree) for new functionality.

Long-term kernels is where the community puts in the fixes and features deemed necessary by maintainers. The version number gives an idea of origin, for example 2.6.39 was 3.0 but some of the code is from 3.11.

9

Page 10: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The process to make this work:

• Patches MUST go upstream (Linus’s tree).

• New functionality developed against upstream kernel.

• Bug-fixes also developed against upstream kernel (where applicable as some code had been re-worked).

• In some instances, where they do not make sense to go upstream, we keep them in our tree.

• The problem we had with OVM2 was that it had a huge patchset of Xen code – and not in any way easy to review.

10

Page 11: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Upstreaming Xen in Linus’s tree

We started with slowly integrating pieces and pieces, one on top of each

other.

Linux 3.0 had the initial domain support (but no backend drivers).

Later versions gained different backend drivers (block, network, etc).

For Xen (hypervisor) we did not have a huge set so much easier.

What we ended up doing was:

Linus tree UEK tree OVM and Oracle Linux

Xen upstream OVM

11

Page 12: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

The “problem” with Linus’s tree and Xen tree:

• High quality of code.

– Code has to go through numerous reviews before accepted. It takes time.

• The end result is:

– High quality and beautiful code.

– Performance driven (no maintainer wants code that slows things down).

– Improve the existing code.

• A fantastic side effect is that other distributions and users gain these features right out of the box (such as Fedora Core, Debian, Red Hat, etc)

12

Page 13: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Linux features that we are developing:

Data safety

DIF and DIX (Data Integretty), hardening ext4 and XFS against fuzzing attacks and corrupt filesystems.

DIRECT_IO - bypass caches so that data goes directly to the disk. Expose this via the AIO system call for applications.

Better use of CPU and memory for: Making fsck work faster.

De duplication of various filesystems (btrfs).

Faster snapshotting.

Quota calculations on XFS.

dtrace

13

Page 14: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Linux features we have been developing:

NFS/RDMA (InfiniBand), NFS v4.0, support for NFS client using ZFS storage and Solaris NFS.

• Security fixes before Linux gets released (And after too).

• Xen:

– The initial domain support and hardware features to match classic Xen support.

– Features in block and frontend to improve I/O.

– Lower latency for PCI passthrough devices.

– Near bare metal performance of guests.

– Continuous upstream presence to catch and fix regressions during Linus's merge window.

– perf’ support for Xen and more.

14

Page 15: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

In Xen ecosystem (hypervisor and toolstack):

• Xen Advisory Board where we collaborate with other companies using Xen

– To do more testing across all vendors workloads.

– Get more developers.

– Companies work together on features (Xen block subsystem).

• OASIS VirtIO workgroup to define the VirtIO specification.

• Faster boot, faster deallocation/allocation for huge guests.

• Faster performance on NUMA machines.

• Faster guests – replacing PV with PVH.

15

Page 16: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

In the Xen ecosystem (hypervisor and toolstack)

• 'perf' support

– For full stack (hypervisor, guests, etc) performance view of what they are running and performance bottlenecks.

• Xen hypervisor debugger – to troubleshoot in the field.

• Lower interrupt latencies for PCI passthrough.

• Transcendent memory (cooperative memory ballooning with benefits)

– An answer to memory overcommit – where Linux balloons out pages it does not think it will use often but which can take a lot of memory space. Hypervisor can deduplicate + compress those across different guests. End result is that we can fit more guests on a machine and still have good performance (sometimes even 4% benefit!)

16

Page 17: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Exadata Database Machine (have X4-2, X4-4, X4-8).

17

Page 18: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

X4-8:

18

From Sun Server X4-8 Service Manual

Page 19: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Under the hood we have:

• NUMA

– 2, 4 or 8 sockets (CPU)

– Each socket has its own local memory.

– PCIe slots off sockets (I/O NUMA) with InfiniBand or flash in them.

– All sockets connected via QuickPath Interconnect (QPI).

• For best performance we don’t want to use QPI excessively, an solution is:

– Partitioning per socket.

– We have various size guests that reside within their NUMA node.

• Combined with intelligent software (GRID, Oracle RAC) gives top-notch performance.

19

Page 20: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Networking – 40G and more:

• Multiple ways of having better performance:

– PCIe passthrough (InfiniBand or Network Integrated Cards) – SRIOV – what we concentrate on for best performance for Engineered Systems. But no migration!

– Intel Data Plane Development Kit (DPDK). Low latency, but no migration!

– Improving Xen netback and netfront (Citrix driven, they are the maintainers of Linux Xen netback driver).

• Want the guest to run without invoking the hypervisor for privileged operations (aka less VMEXITs):

– Interrupts go directly to the guest (posted interrupts). Improvement in Linux to use vAPIC instead of event channels for PCIe interrupt.

– Lower the latency of interrupt delivery if we have to go through hypervisor.

20

Page 21: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Storage: More IOPS!

• Classis OVM deployment is OCFS2 shared across different hosts.

• We have SSDs, now PCIe flash, and in the future NVMe.

• For better performance we do:

– Improve Xen block frontend and backend. Joint projects with Citrix on increasing throughput and lowering latency.

– SR-IOV for even higher throughput and low latency (but no migration) for Engineered Systems.

21

Page 22: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Guests improvements:

• ParaVirtualized guests problem:

– Page updates and syscall require context switch to hypervisor.

– ParaVirtualized Hardware uses the hardware to do page updates and syscall instead of requiring the guest to do the hypercalls. End result is removal of bottlenecks in PV

22

Page 23: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Xen hypervisor bottlenecks:

• Identify them using ‘perf’ to visualize and get full system stack (hypervisor and guests).

23

Page 24: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Xen transcendent memory.

• Memory is becoming a bottleneck in virtualized system – we want more! However we have memory in-efficient workloads.

24

Page 25: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

End goal

• Performance, high quality, stability and security for all different workloads.

• Push patches upstream to benefit everybody.

25

Page 26: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Oracle is hiring! [email protected]

26

Page 27: Why use Xen for large scale Enterprise Deployments? · –Oracle supports a kernel based on RHEL distribution but with our own kernel - Unbreakable Enterprise Kernel (UEK). We want

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27


Recommended