Diff for "Documents/HighAvailability"

Differences between revisions 1 and 4 (spanning 3 versions)

   1 <marcelo> The old model of high availability is "fault tolerance" usually hardware-based.
   2 <marcelo> Expensive, proprietary.
   3 <marcelo> This old model goal is to have the hardware system running 
   4 <andres> plas
   5 <riel> so basically, a single computer is an unreliable piece of shit  (relatively speaking) ...
   6 <riel> ... and High Availability is the collection of methods to make the job the computer does more reliable
   7 <riel> you can do that by better hardware structures
   8 <riel> or by better software structures
   9 <riel> usually a combination of both
  10 <marcelo> the Linux model of high availability is software based.
  11 <marcelo> Now let me explain some basic concepts of HA
  12 <marcelo> First, its very important that we dont rely on unique hardware components in a High Availability system
  13 <marcelo> for example, you can have two network cards connected to a network
  14 <marcelo> In case one of the cards fail, the system tries to use the other card.
  15 <marcelo> A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"
  16 <marcelo> SPOF, to make it short. :)
  17 <marcelo> Another important concept which must be known before we continue is "failover" 
  18 <marcelo> Failover is the process which one machine takes over the job of another node
  19 <riel> "machine" in this context can be anything, btw ...
  20 <riel> if a disk fails, another disk will take over
  21 <riel> if a machine from a cluster fails, the other machines take over the task
  22 <riel> but to have failover, you need to have good software support
  23 <riel> because most of the time you will be using standard computer components
  24 <marcelo> well, this is all the "theory" needed to explain the next parts. 
  25 <riel> so let me make a quick condensation of this introduction
  26 <riel> 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliable
  27 <riel> 2. high availability is the collection of these methods
  28 <riel> 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and software
  29 <riel> 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOF
  30 <riel> 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take over
  31 <riel> (this is called "failover")
  32 <riel> now I think we should explain a bit about how high availability works .. the technical side
  33 <riel> umm wait ... sorry marcelo ;)
  34 <marcelo> ok
  35 <marcelo> Lets talk about the basic components of HA 
  36 <marcelo> Or at least some of them,
  37 <marcelo> A simple disk running a filesystem is clearly an SPOF
  38 <marcelo> If the disk fails, every part of the system which depends on the data contained on it will stop.l
  39 <marcelo> To avoid a disk from being a SPOF of a system, RAID can be used.
  40 <marcelo> RAID-1, which is a feature of the Linux kernel...
  41 <marcelo> Allows "mirroring" of all data on the RAID device to a given number of disks...
  42 <marcelo> So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.
  43 <marcelo> This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue working
  44 <riel> because the system has a copy of the data on each disk
  45 <riel> and can just use the other copies of the data
  46 <riel> this is another example of "failover" ... when one component fails, another component is used to fulfill this function
  47 <riel> and the system administrator can replace (or reformat/reboot/...) the wrong component
  48 <riel> this looks really simple when you don't look at it too muhc
  49 <riel> much
  50 <riel> but there is one big problem ... when do you need to do failover?
  51 <riel> in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not careful
  52 <riel> think for example of 2 machines which are fileservers for the same data
  53 <riel> at any time, one of the machines is working and the other is on standby
  54 <riel> when the main machine fails, the standby machine takes over
  55 <riel> ... BUT ...
  56 <riel> what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?
  57 <riel> which copy of the data is right, which copy of the data is wrong?
  58 <riel> or worse ... what if _both_ copies of the data are wrong?
  59 <riel> for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are alive
  60 <riel> for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this program
  61 <riel> marcelo: could you tell us some of the things "heartbeat" does?
  62 <marcelo> sure
  63 <marcelo> "heartbeat" is a piece of software which monitors the availability of nodes
  64 <marcelo> it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.
  65 <marcelo> when a node is considered to be dead when can failover the services which it was running
  66 <marcelo> the services which we takeover are previously configured in both systems.
  67 <marcelo> Currently heartbeat works only with 2 nodes.
  68 <marcelo> Its been used in production environments in a lot of situations...
  69 <riel> there is one small problem, however
  70 <riel> what if the cleaning lady takes away the network cable between the cluster nodes by accident?
  71 <riel> and both nodes *think* they are the only one alive?
  72 <riel> ... and both nodes start messing with the data...
  73 <riel> unfortunately there is no way you can prevent this 100%
  74 <riel> but you can increase the reliability by simply having multiple means of communication
  75 <riel> say, 2 network cables and a serial cable
  76 <riel> and this is reliable enough that the failure of 1 component still allows good communication between the nodes
  77 <riel> so they can reliably tell if the other node is alive or not
  78 <riel> this was the introduction to HA
  79 <riel> now we will give some examples of HA software on Linux
  80 <riel> and show you how they are used ...
  81 <riel> ... <we will wait shortly until the people doing the translation to Espa�ol have caught up> ... ;)
  82 <marcelo> Ok
  83 <marcelo> Now lets talk about the available software for Linux
  84 <riel> .. ok, the translators have caught up .. we can continue again ;)
  85 <marcelo> Note that I'll be talking about the opensource software for Linux
  86 <marcelo> As I said above, the "heartbeat" program provides monitoring and basic failover of services 
  87 <marcelo> for two nodes only
  88 <marcelo> As a practical example...
  89 <marcelo> The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeat
  90 <marcelo> In case our primary web server fails, the standby node will detect and start the apache daemon
  91 <marcelo> making the service available again 
  92 <marcelo> any service can be used, in theory, with heartbeat.
  93 <riel> so if one machine breaks, everybody can still go to our website ;)
  94 <marcelo> It only depends on the init scripts to start the service
  95 <marcelo> So any service which has a init script can be used with heartbeat
  96 <marcelo> arjan asked if takes over the IP address
  97 <marcelo> There is a virtual IP address used by the service
  98 <marcelo> which is the "virtual serverIP" 
  99 <marcelo> which is the "virtual server" IP address. 
 100 <marcelo> So, in our webserver case...
 101 <marcelo> the real IP address of the first node is not used by the apache daemon
 102 <marcelo> but the virtual IP address which will be used by the standby node in case failover happens
 103 <marcelo> Heartbeat, however, is limited to two nodes.
 104 <marcelo> This is a big problem for a lot of big systems.
 105 <marcelo> SGI has ported its FailSafe HA system to Linux recently (http://oss.sgi.com/projects/failsafe)
 106 <marcelo> FailSafe is a complete cluster manager which supports up to 16 nodes.
 107 <marcelo> Right now its not ready for production environments
 108 <marcelo> But thats being worked on by the Linux HA project people :)
 109 <marcelo> SGI's FailSafe is GPL.
 110 <riel> another type of clustering is LVS ... the Linux Virtual Server project
 111 <riel> LVS uses a very different approach to clustering
 112 <riel> you have 1 (maybe 2) machines that request http (www) requests
 113 <riel> but those machines don't do anything, except send the requests to a whole bunch of machines that do the real work
 114 <riel> so called "working nodes"
 115 <riel> if one (or even more) of the working nodes fail, the others will do the work
 116 <riel> and all the routers (the machines sitting at the front) do is:
 117 <riel> 1. keep track of which working nodes are available
 118 <riel> 2. give the http requests to the working nodes
 119 <riel> the kernel needs a special TCP/IP patch and a set of usermode utilities for this to work
 120 <riel> RedHat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy way
 121 <riel> in Conectiva, we are also working on a very nice HA project
 122 <riel> the project marcelo and Olive are working on is called "drbd"
 123 <riel> the distributed redundant block device
 124 <riel> this is almost the same as RAID1, only over the network
 125 <riel> to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your data
 126 <riel> with one copy of the data on every disk
 127 <riel> drdb extends this idea to use disks on different machines on the network
 128 <riel> so if one disk (on one machine) fails, the other machines still have the data
 129 <riel> and if one complete machine fails, the data is on another machine ... and the system as a whole continues to run
 130 <riel> if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own disk
 131 <riel> and your programs can continue to run
 132 <riel> (with ext2, you would have to do an fsck first, which can take a long time)
 133 <riel> this can be used for fileservers, databases, webservers, ...
 134 <riel> everything where you need the very latest data to work
 135 <riel> ...
 136 <riel> this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)
 137 
 138 <> See also http://www.linux-ha.org/

-  ⇤ ← Revision 1 as of 2006-08-14 22:40:52 → 
  Size: 11111
  Editor: njit-border-ce01
  Comment:
+   ← Revision 4 as of 2006-08-15 05:25:46 → ⇥
  Size: 10291
  Editor: h-64-105-74-181
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-<marcelo> The old model of high availability is "fault tolerance" usually hardware-based.[[BR]]
<marcelo> Expensive, proprietary.[[BR]]
<marcelo> This old model goal is to have the hardware system running [[BR]]
<andres> plas[[BR]]
<riel> so basically, a single computer is an unreliable piece of shit  (relatively speaking) ...[[BR]]
<riel> ... and High Availability is the collection of methods to make the job the computer does more reliable[[BR]]
<riel> you can do that by better hardware structures[[BR]]
<riel> or by better software structures[[BR]]
<riel> usually a combination of both[[BR]]
<marcelo> the Linux model of high availability is software based.[[BR]]
<marcelo> Now let me explain some basic concepts of HA[[BR]]
<marcelo> First, its very important that we dont rely on unique hardware components in a High Availability system[[BR]]
<marcelo> for example, you can have two network cards connected to a network[[BR]]
<marcelo> In case one of the cards fail, the system tries to use the other card.[[BR]]
<marcelo> A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"[[BR]]
<marcelo> SPOF, to make it short. :)[[BR]]
<marcelo> Another important concept which must be known before we continue is "failover" [[BR]]
<marcelo> Failover is the process which one machine takes over the job of another node[[BR]]
<riel> "machine" in this context can be anything, btw ...[[BR]]
<riel> if a disk fails, another disk will take over[[BR]]
<riel> if a machine from a cluster fails, the other machines take over the task[[BR]]
<riel> but to have failover, you need to have good software support[[BR]]
<riel> because most of the time you will be using standard computer components[[BR]]
<marcelo> well, this is all the "theory" needed to explain the next parts. [[BR]]
<riel> so let me make a quick condensation of this introduction[[BR]]
<riel> 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliable[[BR]]
<riel> 2. high availability is the collection of these methods[[BR]]
<riel> 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and software[[BR]]
<riel> 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOF[[BR]]
<riel> 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take over[[BR]]
<riel> (this is called "failover")[[BR]]
<riel> now I think we should explain a bit about how high availability works .. the technical side[[BR]]
<riel> umm wait ... sorry marcelo ;)[[BR]]
<marcelo> ok[[BR]]
<marcelo> Lets talk about the basic components of HA [[BR]]
<marcelo> Or at least some of them,[[BR]]
<marcelo> A simple disk running a filesystem is clearly an SPOF[[BR]]
<marcelo> If the disk fails, every part of the system which depends on the data contained on it will stop.[[BR]]
<marcelo> To avoid a disk from being a SPOF of a system, RAID can be used.[[BR]]
<marcelo> RAID-1, which is a feature of the Linux kernel...[[BR]]
<marcelo> Allows "mirroring" of all data on the RAID device to a given number of disks...[[BR]]
<marcelo> So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.[[BR]]
<marcelo> This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue working[[BR]]
<riel> because the system has a copy of the data on each disk[[BR]]
<riel> and can just use the other copies of the data[[BR]]
<riel> this is another example of "failover" ... when one component fails, another component is used to fulfill this function[[BR]]
<riel> and the system administrator can replace (or reformat/reboot/...) the wrong component[[BR]]
<riel> this looks really simple when you don't look at it too muhc[[BR]]
<riel> much[[BR]]
<riel> but there is one big problem ... when do you need to do failover?[[BR]]
<riel> in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not careful[[BR]]
<riel> think for example of 2 machines which are fileservers for the same data[[BR]]
<riel> at any time, one of the machines is working and the other is on standby[[BR]]
<riel> when the main machine fails, the standby machine takes over[[BR]]
<riel> ... BUT ...[[BR]]
<riel> what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?[[BR]]
<riel> which copy of the data is right, which copy of the data is wrong?[[BR]]
<riel> or worse ... what if _both_ copies of the data are wrong?[[BR]]
<riel> for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are alive[[BR]]
<riel> for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this program[[BR]]
<riel> marcelo: could you tell us some of the things "heartbeat" does?[[BR]]
<marcelo> sure[[BR]]
<marcelo> "heartbeat" is a piece of software which monitors the availability of nodes[[BR]]
<marcelo> it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.[[BR]]
<marcelo> when a node is considered to be dead when can failover the services which it was running[[BR]]
<marcelo> the services which we takeover are previously configured in both systems.[[BR]]
<marcelo> Currently heartbeat works only with 2 nodes.[[BR]]
<marcelo> Its been used in production environments in a lot of situations...[[BR]]
<riel> there is one small problem, however[[BR]]
<riel> what if the cleaning lady takes away the network cable between the cluster nodes by accident?[[BR]]
<riel> and both nodes *think* they are the only one alive?[[BR]]
<riel> ... and both nodes start messing with the data...[[BR]]
<riel> unfortunately there is no way you can prevent this 100%[[BR]]
<riel> but you can increase the reliability by simply having multiple means of communication[[BR]]
<riel> say, 2 network cables and a serial cable[[BR]]
<riel> and this is reliable enough that the failure of 1 component still allows good communication between the nodes[[BR]]
<riel> so they can reliably tell if the other node is alive or not[[BR]]
<riel> this was the introduction to HA[[BR]]
<riel> now we will give some examples of HA software on Linux[[BR]]
<riel> and show you how they are used ...[[BR]]
<riel> ... <we will wait shortly until the people doing the translation to Español have caught up> ... ;)[[BR]]
<marcelo> Ok[[BR]]
<marcelo> Now lets talk about the available software for Linux[[BR]]
<riel> .. ok, the translators have caught up .. we can continue again ;)[[BR]]
<marcelo> Note that I'll be talking about the opensource software for Linux[[BR]]
<marcelo> As I said above, the "heartbeat" program provides monitoring and basic failover of services [[BR]]
<marcelo> for two nodes only[[BR]]
<marcelo> As a practical example...[[BR]]
<marcelo> The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeat[[BR]]
<marcelo> In case our primary web server fails, the standby node will detect and start the apache daemon[[BR]]
<marcelo> making the service available again [[BR]]
<marcelo> any service can be used, in theory, with heartbeat.[[BR]]
<riel> so if one machine breaks, everybody can still go to our website ;)[[BR]]
<marcelo> It only depends on the init scripts to start the service[[BR]]
<marcelo> So any service which has a init script can be used with heartbeat[[BR]]
<marcelo> arjan asked if takes over the IP address[[BR]]
<marcelo> There is a virtual IP address used by the service[[BR]]
<marcelo> which is the "virtual serverIP" [[BR]]
<marcelo> which is the "virtual server" IP address. [[BR]]
<marcelo> So, in our webserver case...[[BR]]
<marcelo> the real IP address of the first node is not used by the apache daemon[[BR]]
<marcelo> but the virtual IP address which will be used by the standby node in case failover happens[[BR]]
<marcelo> Heartbeat, however, is limited to two nodes.[[BR]]
<marcelo> This is a big problem for a lot of big systems.[[BR]]
<marcelo> SGI has ported its FailSafe HA system to Linux recently ([http://oss.sgi.com/projects/failsafe])[[BR]]
<marcelo> FailSafe is a complete cluster manager which supports up to 16 nodes.[[BR]]
<marcelo> Right now its not ready for production environments[[BR]]
<marcelo> But thats being worked on by the Linux HA project people :)[[BR]]
<marcelo> SGI's FailSafe is GPL.[[BR]]
<riel> another type of clustering is LVS ... the Linux Virtual Server project[[BR]]
<riel> LVS uses a very different approach to clustering[[BR]]
<riel> you have 1 (maybe 2) machines that request http (www) requests[[BR]]
<riel> but those machines don't do anything, except send the requests to a whole bunch of machines that do the real work[[BR]]
<riel> so called "working nodes"[[BR]]
<riel> if one (or even more) of the working nodes fail, the others will do the work[[BR]]
<riel> and all the routers (the machines sitting at the front) do is:[[BR]]
<riel> 1. keep track of which working nodes are available[[BR]]
<riel> 2. give the http requests to the working nodes[[BR]]
<riel> the kernel needs a special TCP/IP patch and a set of usermode utilities for this to work[[BR]]
<riel> RedHat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy way[[BR]]
<riel> in Conectiva, we are also working on a very nice HA project[[BR]]
<riel> the project marcelo and Olive are working on is called "drbd"[[BR]]
<riel> the distributed redundant block device[[BR]]
<riel> this is almost the same as RAID1, only over the network[[BR]]
<riel> to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your data[[BR]]
<riel> with one copy of the data on every disk[[BR]]
<riel> drdb extends this idea to use disks on different machines on the network[[BR]]
<riel> so if one disk (on one machine) fails, the other machines still have the data[[BR]]
<riel> and if one complete machine fails, the data is on another machine ... and the system as a whole continues to run[[BR]]
<riel> if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own disk[[BR]]
<riel> and your programs can continue to run[[BR]]
<riel> (with ext2, you would have to do an fsck first, which can take a long time)[[BR]]
<riel> this can be used for fileservers, databases, webservers, ...[[BR]]
<riel> everything where you need the very latest data to work[[BR]]
<riel> ...[[BR]]
<riel> this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)[[BR]]
+#FORMAT IRC
-Line 138:
+Line 3:
-See also [http://www.linux-ha.org/]
----
CategoryDocs
+<marcelo> The old model of high availability is "fault tolerance" usually hardware-based.
<marcelo> Expensive, proprietary.
<marcelo> This old model goal is to have the hardware system running 
<andres> plas
<riel> so basically, a single computer is an unreliable piece of shit  (relatively speaking) ...
<riel> ... and High Availability is the collection of methods to make the job the computer does more reliable
<riel> you can do that by better hardware structures
<riel> or by better software structures
<riel> usually a combination of both
<marcelo> the Linux model of high availability is software based.
<marcelo> Now let me explain some basic concepts of HA
<marcelo> First, its very important that we dont rely on unique hardware components in a High Availability system
<marcelo> for example, you can have two network cards connected to a network
<marcelo> In case one of the cards fail, the system tries to use the other card.
<marcelo> A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"
<marcelo> SPOF, to make it short. :)
<marcelo> Another important concept which must be known before we continue is "failover" 
<marcelo> Failover is the process which one machine takes over the job of another node
<riel> "machine" in this context can be anything, btw ...
<riel> if a disk fails, another disk will take over
<riel> if a machine from a cluster fails, the other machines take over the task
<riel> but to have failover, you need to have good software support
<riel> because most of the time you will be using standard computer components
<marcelo> well, this is all the "theory" needed to explain the next parts. 
<riel> so let me make a quick condensation of this introduction
<riel> 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliable
<riel> 2. high availability is the collection of these methods
<riel> 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and software
<riel> 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOF
<riel> 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take over
<riel> (this is called "failover")
<riel> now I think we should explain a bit about how high availability works .. the technical side
<riel> umm wait ... sorry marcelo ;)
<marcelo> ok
<marcelo> Lets talk about the basic components of HA 
<marcelo> Or at least some of them,
<marcelo> A simple disk running a filesystem is clearly an SPOF
<marcelo> If the disk fails, every part of the system which depends on the data contained on it will stop.l
<marcelo> To avoid a disk from being a SPOF of a system, RAID can be used.
<marcelo> RAID-1, which is a feature of the Linux kernel...
<marcelo> Allows "mirroring" of all data on the RAID device to a given number of disks...
<marcelo> So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.
<marcelo> This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue working
<riel> because the system has a copy of the data on each disk
<riel> and can just use the other copies of the data
<riel> this is another example of "failover" ... when one component fails, another component is used to fulfill this function
<riel> and the system administrator can replace (or reformat/reboot/...) the wrong component
<riel> this looks really simple when you don't look at it too muhc
<riel> much
<riel> but there is one big problem ... when do you need to do failover?
<riel> in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not careful
<riel> think for example of 2 machines which are fileservers for the same data
<riel> at any time, one of the machines is working and the other is on standby
<riel> when the main machine fails, the standby machine takes over
<riel> ... BUT ...
<riel> what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?
<riel> which copy of the data is right, which copy of the data is wrong?
<riel> or worse ... what if _both_ copies of the data are wrong?
<riel> for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are alive
<riel> for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this program
<riel> marcelo: could you tell us some of the things "heartbeat" does?
<marcelo> sure
<marcelo> "heartbeat" is a piece of software which monitors the availability of nodes
<marcelo> it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.
<marcelo> when a node is considered to be dead when can failover the services which it was running
<marcelo> the services which we takeover are previously configured in both systems.
<marcelo> Currently heartbeat works only with 2 nodes.
<marcelo> Its been used in production environments in a lot of situations...
<riel> there is one small problem, however
<riel> what if the cleaning lady takes away the network cable between the cluster nodes by accident?
<riel> and both nodes *think* they are the only one alive?
<riel> ... and both nodes start messing with the data...
<riel> unfortunately there is no way you can prevent this 100%
<riel> but you can increase the reliability by simply having multiple means of communication
<riel> say, 2 network cables and a serial cable
<riel> and this is reliable enough that the failure of 1 component still allows good communication between the nodes
<riel> so they can reliably tell if the other node is alive or not
<riel> this was the introduction to HA
<riel> now we will give some examples of HA software on Linux
<riel> and show you how they are used ...
<riel> ... <we will wait shortly until the people doing the translation to Espa�ol have caught up> ... ;)
<marcelo> Ok
<marcelo> Now lets talk about the available software for Linux
<riel> .. ok, the translators have caught up .. we can continue again ;)
<marcelo> Note that I'll be talking about the opensource software for Linux
<marcelo> As I said above, the "heartbeat" program provides monitoring and basic failover of services 
<marcelo> for two nodes only
<marcelo> As a practical example...
<marcelo> The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeat
<marcelo> In case our primary web server fails, the standby node will detect and start the apache daemon
<marcelo> making the service available again 
<marcelo> any service can be used, in theory, with heartbeat.
<riel> so if one machine breaks, everybody can still go to our website ;)
<marcelo> It only depends on the init scripts to start the service
<marcelo> So any service which has a init script can be used with heartbeat
<marcelo> arjan asked if takes over the IP address
<marcelo> There is a virtual IP address used by the service
<marcelo> which is the "virtual serverIP" 
<marcelo> which is the "virtual server" IP address. 
<marcelo> So, in our webserver case...
<marcelo> the real IP address of the first node is not used by the apache daemon
<marcelo> but the virtual IP address which will be used by the standby node in case failover happens
<marcelo> Heartbeat, however, is limited to two nodes.
<marcelo> This is a big problem for a lot of big systems.
<marcelo> SGI has ported its FailSafe HA system to Linux recently (http://oss.sgi.com/projects/failsafe)
<marcelo> FailSafe is a complete cluster manager which supports up to 16 nodes.
<marcelo> Right now its not ready for production environments
<marcelo> But thats being worked on by the Linux HA project people :)
<marcelo> SGI's FailSafe is GPL.
<riel> another type of clustering is LVS ... the Linux Virtual Server project
<riel> LVS uses a very different approach to clustering
<riel> you have 1 (maybe 2) machines that request http (www) requests
<riel> but those machines don't do anything, except send the requests to a whole bunch of machines that do the real work
<riel> so called "working nodes"
<riel> if one (or even more) of the working nodes fail, the others will do the work
<riel> and all the routers (the machines sitting at the front) do is:
<riel> 1. keep track of which working nodes are available
<riel> 2. give the http requests to the working nodes
<riel> the kernel needs a special TCP/IP patch and a set of usermode utilities for this to work
<riel> RedHat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy way
<riel> in Conectiva, we are also working on a very nice HA project
<riel> the project marcelo and Olive are working on is called "drbd"
<riel> the distributed redundant block device
<riel> this is almost the same as RAID1, only over the network
<riel> to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your data
<riel> with one copy of the data on every disk
<riel> drdb extends this idea to use disks on different machines on the network
<riel> so if one disk (on one machine) fails, the other machines still have the data
<riel> and if one complete machine fails, the data is on another machine ... and the system as a whole continues to run
<riel> if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own disk
<riel> and your programs can continue to run
<riel> (with ext2, you would have to do an fsck first, which can take a long time)
<riel> this can be used for fileservers, databases, webservers, ...
<riel> everything where you need the very latest data to work
<riel> ...
<riel> this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)

<> See also http://www.linux-ha.org/