+ All Categories
Home > Documents > Strategies in Cluster-Design · – I.e. RocksCluster, CentOS, PelicanHPC, Corvix, ... Good source...

Strategies in Cluster-Design · – I.e. RocksCluster, CentOS, PelicanHPC, Corvix, ... Good source...

Date post: 27-Jan-2021
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
45
Strategies in Cluster-Design Gerolf Ziegenhain, TU Kaiserslautern, Germany
Transcript
  • Strategies in Cluster-Design

    Gerolf Ziegenhain, TU Kaiserslautern, Germany

  • Outline of This Talk● Look at the technologies once again● Provide more detail for making decisions● What to consider?● What should be avoided by all means?● Provide keywords / directions for further reading

    ● Less organized talk● Contains personal experience

  • Making Decisions● Strategic decisions:

    – Do once, changes difficult, expensive● Setup relatively easy● Therefore know some numbers

    (per person, group, university)– #jobs– Runtime of jobs– CPUs per job– Memory per job– Coupling of system: latency / bandwidth– Hdd storage (also consider final storage)

  • Buy or Build?● Buying

    – Less work– More costs– You will have more than you want– Vendor may help in consulting

    ● Building yourself– More work– High learning effect– Less costs– You will have what you buy

  • Technological Overview

  • DHCP

    NIS

    Firewall

    Queue

    Syslog

    Login1

    Mirror

    Mail

    Boot

    Admin Login2

    NAS1

    NAS3

    NAS2

    User1 User3

    User2

    Nodes

    Components of a Cluster

  • Networking

  • A Word on Entropy● Managing 10 Workstations differs a lot from

    managing a cluster● Entropy of cables

    – Sort them immediately– Use colors– Use hook-and-loop-tape– Use printed labels

  • Choice of Hardware● Nodes● Networking● Overhead servers

  • Choosing Nodes

  • Example: Google

  • Example: Google● Stock hardware● Custom build low-tech cases● Modular approach● Components

    – Mainboard, CPU, Memory– 2x HDD (Stripe)– UPS Battery

    ● Advantage:– Cheap– High learning effect

  • Example: BlueGene/P● PowerPC● Custom build

    – Boards– Chips – Networking

    ● Advantage:– Scales very good

  • Buy a Rack● Common beowolf cluster● Buy ready-built 19”

    pizza-boxes● Mounting in 19” rack

    – Usually 42HE● Advantage

    – Less work– High packing density

  • Use Ready-Built Desktops

  • Processors and Architectures?● Know your problem● What to know about your algorithms?

    – How much memory?– Can the problem easily be decomposed?– What precision?

    ● Libraries– Do they exist for your problem (i.e. QM calculations)– Do they run on all architectures?

    ● Choices:– Architecture (usually AMD / Intel is a good choice)– #CPUs

  • Storage Management● Know your problem● Parameters to know

    – How much HDD space?– What is the common bandwidth?

    ● Evaluating 100GB files in real-time?● Writing out 1TB files?

    ● Choices: – NAS (multiple?)– SAN– Distributed filesystem

  • Backup● RAID ≠ backup

    – You still can kill your stuff byrm -rf /my_stuff

    ● Incremental backup – Critical user configuration– Configuration files– Complete overhead server installation

  • Networking

  • Types● Know your problem● Choices

    – Bandwidth● Gbit < Infiniband● Gbit: channel bonding possible

    – Latency time ● Gbit > SCI

    – Scalability● Stacked network switches● Fat tree architecture

  • Switches● Important parameters

    – Backbone speed ● throughput when all ports are under load?

    – Can it be configured? ● Auto-Sensing ● IP ● ARP● ...

    – Stackable?– (Uplink ports?)

  • Which #Cores/Node is Optimum?● Currently cheapest cost per core: 8 cores per node● Small systems (48 nodes)

    – Doesn't matter because one switch is enough● Average systems

    – Do you need all-to-all connections?– Different rings or change network topology– If you want to stick to single-switched networks:

    current optimum is 16 cpu per node for this● Big system

    – Go for fat tree network :)

  • Infrastructure Requirements● Cooling

    – Each W burned in CPU heat⇒● Stable power supply

    – Black out?– Fluctuations in voltage level

    ● Cheap power supplies will break on fluctuations

  • Notes about Power Consumption● Less power consumption

    less heat ⇒ less defects(?)⇒

    ● Running costs per year can easily reach initial investment costs!– Do the math blade center could also pay off!⇒

    ● Do not switch on / off all nodes at once– Voltage peaks!

  • Decomposition of the Servers

  • Why Separate Login Nodes?● User interaction● May hang due to jobs ● Security

    – Ssh ports open– May be hacked

    ● Configuration of user packages– System more on bleeding edge

  • Splitting Servers● Easily >10 overhead tasks● Why not in one big server?

    – Security (one hole all broken)⇒– Stability– Maintenance

    ● Updates (what was done 3 years ago?)● Dependencies (how do software packages interfere?)● No plugin structure (no testing of different variants)

    ● Solution– Split the tasks >10 overhead servers⇒– Problem:

    ● Cost ● Hardware failures?

  • Combining Servers● Use XEN ● Host servers: 1...3 servers

    – Hardware failure tolerant● Further advantages

    – Extremely reduced costs– Complete rollback possible– Try different configurations

    ● Experiments are possible with limited budget– Clear separation of tasks

  • Administration

  • Administration Policies● Interaction with human beings

    – Difficult social aspects– Good administrator is never realized (system works)

    ● Who has the root password?● Who will document what has been done?● Split the work, but communicate:

    – Design decisions– Buying, writing grant proposals– Installation, bug fixing– Educating end-users

  • Administration Policies● User interaction

    – Keep the users informed (mailing list)– Monitor system to cancel out problems before they

    occur

  • Managing Different Groups● Impossible!● Each group has to provide at least one person

    – Managing user education– Monitoring performance– Know the needs ( cluster design decisions)⇒

    ⇒ Sharing administrator not possible!● Sharing resources: possible & meaningful

  • What is the Critical Data?● What data has to be stored?

    – User programs– Final data– May be put on RAID-Mirror

    ● What data can be exposed to potential loss?– Temporary files– May be put on RAID-Stripe

  • Compilation● Custom user programs / libraries● Where to install

    – /usr/local/ (system-wide)– $HOME (per-user)

    ● Autotools provide possibility to install whole distribution in home-directory!

    ⇒ Depends on how often the code changes● Choosing a compiler

    – GNU compilers are good & free– Special CPU instructions: buy a compiler

    ● Intel compiler● Portland compiler

  • Security● University networks are

    – Insecure– Treasured victims

    ● Risks– Ssh password login– Open ports– Updating

    ● Keep up to date with serious bugs!– Users

    ● Therefore (attacks will happen on daily basis!)– Use firewall– Monitor system for odd behavior

  • Operating Systems

  • Which Operating System?● Different OS / distributions exist

    – But widely compatible configuration– Way of doing stuff differs slightly in detail

    ● I.e. Directories / files– Watch out for licenses: BSD, GPL, ...

    ● OS: provide basic stable & secure functionality– Linux

    ● Debian● RedHat● SuSE (slow, costly, small community)

    – FreeBSD (more secure, but ~older versions)– OpenBSD (most secure)

  • Updating or not?● Motivations

    – Stability– Security– Features

    ● Possible solution: – Keep login servers and firewall up-to-date– Keep computation nodes stable (out-of-date)– Works only if nodes are in inner network

  • Rolling your own distribution● Possible solution for installation issues● Possibilities

    – From scratch distribution– Modify existing distribution– Compiling only custom packages (/usr/local/bin)– Keep system hdd-images and clone them

  • Lesson Learned● Reproduceable?

    – Making a distribution is exhausting● Documentation (wiki)

    – Someday you have to handover– Or reinstall

    ● Keep a complete mirror – Packages may vanish

  • The Gentoo-Approach● Use source-packages● Autotools binary files⇒● Create special configuration files for dependencies

    – In gentoo: portage (→ corvix: egatrop)– In bsd: ports

    ● Alternative– Linux from scratch

    ● Missing the configuration files ● Rely on autotools

    – Arch linux● Websites are good sources for step-by-step

    howtos

  • The Debian-Approach● Compile once, distribute binary packages● Create custom-packages with only one command● Advantage

    – Extremely fast– Easier to maintain for big number of servers– Embedded devices use similar packages architecture

  • Our solution● Stable basis system:

    – Debian overlays● Additional package source with custom packages

    – Xen images of the installed debian-system ⇒ Even faster reinstallations

    ● Custom software– I.e. user demanded libraries– Compilation in ~

  • Other cluster distributions● Debian-Based / RedHat-Based exist

    – I.e. RocksCluster, CentOS, PelicanHPC, Corvix, ...● Good source for howtos● Good as cheat-sheet● But

    – HPC is inherently customized– Flexibility highest with customized installation– None of the distros solved a problem that we had

  • Thank you!

    ● Acknowledgements– Prof. Dr. rer. Nat. Herbert M. Urbassek,

    TU Kaiserslautern, Germany


Recommended