KernelNewbies
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

Kernel Hacking

  • Frontpage

  • Kernel Hacking

  • Kernel Documentation

  • Kernel Glossary

  • FAQ

  • Found a bug?

  • Kernel Changelog

  • Upstream Merge Guide

Projects

  • KernelJanitors

  • KernelMentors

  • KernelProjects

Community

  • Why a community?

  • Regional Kernelnewbies

  • Personal Pages

  • Upcoming Events

References

  • Mailing Lists

  • Related Sites

  • Programming Links

Wiki

  • Recent Changes

  • Site Editors

  • Side Bar

  • Tips for Editors

  • Hosted by WikiWall

Navigation

  • RecentChanges
  • FindPage
  • HelpContents

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment

KernelNewbies:
  • Documents
  • HighAvailability
   1 <marcelo> The old model of high availability is "fault tolerance" usually hardware-based.
   2 <marcelo> Expensive, proprietary.
   3 <marcelo> This old model goal is to have the hardware system running 
   4 <andres> plas
   5 <riel> so basically, a single computer is an unreliable piece of shit  (relatively speaking) ...
   6 <riel> ... and High Availability is the collection of methods to make the job the computer does more reliable
   7 <riel> you can do that by better hardware structures
   8 <riel> or by better software structures
   9 <riel> usually a combination of both
  10 <marcelo> the Linux model of high availability is software based.
  11 <marcelo> Now let me explain some basic concepts of HA
  12 <marcelo> First, its very important that we dont rely on unique hardware components in a High Availability system
  13 <marcelo> for example, you can have two network cards connected to a network
  14 <marcelo> In case one of the cards fail, the system tries to use the other card.
  15 <marcelo> A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"
  16 <marcelo> SPOF, to make it short. :)
  17 <marcelo> Another important concept which must be known before we continue is "failover" 
  18 <marcelo> Failover is the process which one machine takes over the job of another node
  19 <riel> "machine" in this context can be anything, btw ...
  20 <riel> if a disk fails, another disk will take over
  21 <riel> if a machine from a cluster fails, the other machines take over the task
  22 <riel> but to have failover, you need to have good software support
  23 <riel> because most of the time you will be using standard computer components
  24 <marcelo> well, this is all the "theory" needed to explain the next parts. 
  25 <riel> so let me make a quick condensation of this introduction
  26 <riel> 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliable
  27 <riel> 2. high availability is the collection of these methods
  28 <riel> 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and software
  29 <riel> 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOF
  30 <riel> 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take over
  31 <riel> (this is called "failover")
  32 <riel> now I think we should explain a bit about how high availability works .. the technical side
  33 <riel> umm wait ... sorry marcelo ;)
  34 <marcelo> ok
  35 <marcelo> Lets talk about the basic components of HA 
  36 <marcelo> Or at least some of them,
  37 <marcelo> A simple disk running a filesystem is clearly an SPOF
  38 <marcelo> If the disk fails, every part of the system which depends on the data contained on it will stop.l
  39 <marcelo> To avoid a disk from being a SPOF of a system, RAID can be used.
  40 <marcelo> RAID-1, which is a feature of the Linux kernel...
  41 <marcelo> Allows "mirroring" of all data on the RAID device to a given number of disks...
  42 <marcelo> So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.
  43 <marcelo> This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue working
  44 <riel> because the system has a copy of the data on each disk
  45 <riel> and can just use the other copies of the data
  46 <riel> this is another example of "failover" ... when one component fails, another component is used to fulfill this function
  47 <riel> and the system administrator can replace (or reformat/reboot/...) the wrong component
  48 <riel> this looks really simple when you don't look at it too much
  49 <riel> much
  50 <riel> but there is one big problem ... when do you need to do failover?
  51 <riel> in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not careful
  52 <riel> think for example of 2 machines which are fileservers for the same data
  53 <riel> at any time, one of the machines is working and the other is on standby
  54 <riel> when the main machine fails, the standby machine takes over
  55 <riel> ... BUT ...
  56 <riel> what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?
  57 <riel> which copy of the data is right, which copy of the data is wrong?
  58 <riel> or worse ... what if _both_ copies of the data are wrong?
  59 <riel> for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are alive
  60 <riel> for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this program
  61 <riel> marcelo: could you tell us some of the things "heartbeat" does?
  62 <marcelo> sure
  63 <marcelo> "heartbeat" is a piece of software which monitors the availability of nodes
  64 <marcelo> it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.
  65 <marcelo> when a node is considered to be dead when can failover the services which it was running
  66 <marcelo> the services which we takeover are previously configured in both systems.
  67 <marcelo> Currently heartbeat works only with 2 nodes.
  68 <marcelo> Its been used in production environments in a lot of situations...
  69 <riel> there is one small problem, however
  70 <riel> what if the cleaning lady takes away the network cable between the cluster nodes by accident?
  71 <riel> and both nodes *think* they are the only one alive?
  72 <riel> ... and both nodes start messing with the data...
  73 <riel> unfortunately there is no way you can prevent this 100%
  74 <riel> but you can increase the reliability by simply having multiple means of communication
  75 <riel> say, 2 network cables and a serial cable
  76 <riel> and this is reliable enough that the failure of 1 component still allows good communication between the nodes
  77 <riel> so they can reliably tell if the other node is alive or not
  78 <riel> this was the introduction to HA
  79 <riel> now we will give some examples of HA software on Linux
  80 <riel> and show you how they are used ...
  81 <riel> ... <we will wait shortly until the people doing the translation to Espa�ol have caught up> ... ;)
  82 <marcelo> Ok
  83 <marcelo> Now lets talk about the available software for Linux
  84 <riel> .. ok, the translators have caught up .. we can continue again ;)
  85 <marcelo> Note that I'll be talking about the opensource software for Linux
  86 <marcelo> As I said above, the "heartbeat" program provides monitoring and basic failover of services 
  87 <marcelo> for two nodes only
  88 <marcelo> As a practical example...
  89 <marcelo> The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeat
  90 <marcelo> In case our primary web server fails, the standby node will detect and start the apache daemon
  91 <marcelo> making the service available again 
  92 <marcelo> any service can be used, in theory, with heartbeat.
  93 <riel> so if one machine breaks, everybody can still go to our website ;)
  94 <marcelo> It only depends on the init scripts to start the service
  95 <marcelo> So any service which has a init script can be used with heartbeat
  96 <marcelo> arjan asked if takes over the IP address
  97 <marcelo> There is a virtual IP address used by the service
  98 <marcelo> which is the "virtual serverIP" 
  99 <marcelo> which is the "virtual server" IP address. 
 100 <marcelo> So, in our webserver case...
 101 <marcelo> the real IP address of the first node is not used by the apache daemon
 102 <marcelo> but the virtual IP address which will be used by the standby node in case failover happens
 103 <marcelo> Heartbeat, however, is limited to two nodes.
 104 <marcelo> This is a big problem for a lot of big systems.
 105 <marcelo> SGI has ported its FailSafe HA system to Linux recently (http://oss.sgi.com/projects/failsafe)
 106 <marcelo> FailSafe is a complete cluster manager which supports up to 16 nodes.
 107 <marcelo> Right now its not ready for production environments
 108 <marcelo> But thats being worked on by the Linux HA project people :)
 109 <marcelo> SGI's FailSafe is GPL.
 110 <riel> another type of clustering is LVS ... the Linux Virtual Server project
 111 <riel> LVS uses a very different approach to clustering
 112 <riel> you have 1 (maybe 2) machines that request http (www) requests
 113 <riel> but those machines don't do anything, except send the requests to a whole bunch of machines that do the real work
 114 <riel> so called "working nodes"
 115 <riel> if one (or even more) of the working nodes fail, the others will do the work
 116 <riel> and all the routers (the machines sitting at the front) do is:
 117 <riel> 1. keep track of which working nodes are available
 118 <riel> 2. give the http requests to the working nodes
 119 <riel> the kernel needs a special TCP/IP patch and a set of usermode utilities for this to work
 120 <riel> RedHat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy way
 121 <riel> in Conectiva, we are also working on a very nice HA project
 122 <riel> the project marcelo and Olive are working on is called "drbd"
 123 <riel> the distributed redundant block device
 124 <riel> this is almost the same as RAID1, only over the network
 125 <riel> to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your data
 126 <riel> with one copy of the data on every disk
 127 <riel> drdb extends this idea to use disks on different machines on the network
 128 <riel> so if one disk (on one machine) fails, the other machines still have the data
 129 <riel> and if one complete machine fails, the data is on another machine ... and the system as a whole continues to run
 130 <riel> if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own disk
 131 <riel> and your programs can continue to run
 132 <riel> (with ext2, you would have to do an fsck first, which can take a long time)
 133 <riel> this can be used for fileservers, databases, webservers, ...
 134 <riel> everything where you need the very latest data to work
 135 <riel> ...
 136 <riel> this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)
 137 
 138 <> See also http://www.linux-ha.org/
 139 ----
 140 CategoryDocs
 141 
  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01