The old model of high availability is "fault tolerance" usually hardware-based.[[BR]] Expensive, proprietary.[[BR]] This old model goal is to have the hardware system running [[BR]] plas[[BR]] so basically, a single computer is an unreliable piece of shit (relatively speaking) ...[[BR]] ... and High Availability is the collection of methods to make the job the computer does more reliable[[BR]] you can do that by better hardware structures[[BR]] or by better software structures[[BR]] usually a combination of both[[BR]] the Linux model of high availability is software based.[[BR]] Now let me explain some basic concepts of HA[[BR]] First, its very important that we dont rely on unique hardware components in a High Availability system[[BR]] for example, you can have two network cards connected to a network[[BR]] In case one of the cards fail, the system tries to use the other card.[[BR]] A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"[[BR]] SPOF, to make it short. :)[[BR]] Another important concept which must be known before we continue is "failover" [[BR]] Failover is the process which one machine takes over the job of another node[[BR]] "machine" in this context can be anything, btw ...[[BR]] if a disk fails, another disk will take over[[BR]] if a machine from a cluster fails, the other machines take over the task[[BR]] but to have failover, you need to have good software support[[BR]] because most of the time you will be using standard computer components[[BR]] well, this is all the "theory" needed to explain the next parts. [[BR]] so let me make a quick condensation of this introduction[[BR]] 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliable[[BR]] 2. high availability is the collection of these methods[[BR]] 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and software[[BR]] 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOF[[BR]] 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take over[[BR]] (this is called "failover")[[BR]] now I think we should explain a bit about how high availability works .. the technical side[[BR]] umm wait ... sorry marcelo ;)[[BR]] ok[[BR]] Lets talk about the basic components of HA [[BR]] Or at least some of them,[[BR]] A simple disk running a filesystem is clearly an SPOF[[BR]] If the disk fails, every part of the system which depends on the data contained on it will stop.[[BR]] To avoid a disk from being a SPOF of a system, RAID can be used.[[BR]] RAID-1, which is a feature of the Linux kernel...[[BR]] Allows "mirroring" of all data on the RAID device to a given number of disks...[[BR]] So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.[[BR]] This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue working[[BR]] because the system has a copy of the data on each disk[[BR]] and can just use the other copies of the data[[BR]] this is another example of "failover" ... when one component fails, another component is used to fulfill this function[[BR]] and the system administrator can replace (or reformat/reboot/...) the wrong component[[BR]] this looks really simple when you don't look at it too muhc[[BR]] much[[BR]] but there is one big problem ... when do you need to do failover?[[BR]] in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not careful[[BR]] think for example of 2 machines which are fileservers for the same data[[BR]] at any time, one of the machines is working and the other is on standby[[BR]] when the main machine fails, the standby machine takes over[[BR]] ... BUT ...[[BR]] what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?[[BR]] which copy of the data is right, which copy of the data is wrong?[[BR]] or worse ... what if _both_ copies of the data are wrong?[[BR]] for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are alive[[BR]] for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this program[[BR]] marcelo: could you tell us some of the things "heartbeat" does?[[BR]] sure[[BR]] "heartbeat" is a piece of software which monitors the availability of nodes[[BR]] it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.[[BR]] when a node is considered to be dead when can failover the services which it was running[[BR]] the services which we takeover are previously configured in both systems.[[BR]] Currently heartbeat works only with 2 nodes.[[BR]] Its been used in production environments in a lot of situations...[[BR]] there is one small problem, however[[BR]] what if the cleaning lady takes away the network cable between the cluster nodes by accident?[[BR]] and both nodes *think* they are the only one alive?[[BR]] ... and both nodes start messing with the data...[[BR]] unfortunately there is no way you can prevent this 100%[[BR]] but you can increase the reliability by simply having multiple means of communication[[BR]] say, 2 network cables and a serial cable[[BR]] and this is reliable enough that the failure of 1 component still allows good communication between the nodes[[BR]] so they can reliably tell if the other node is alive or not[[BR]] this was the introduction to HA[[BR]] now we will give some examples of HA software on Linux[[BR]] and show you how they are used ...[[BR]] ... ... ;)[[BR]] Ok[[BR]] Now lets talk about the available software for Linux[[BR]] .. ok, the translators have caught up .. we can continue again ;)[[BR]] Note that I'll be talking about the opensource software for Linux[[BR]] As I said above, the "heartbeat" program provides monitoring and basic failover of services [[BR]] for two nodes only[[BR]] As a practical example...[[BR]] The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeat[[BR]] In case our primary web server fails, the standby node will detect and start the apache daemon[[BR]] making the service available again [[BR]] any service can be used, in theory, with heartbeat.[[BR]] so if one machine breaks, everybody can still go to our website ;)[[BR]] It only depends on the init scripts to start the service[[BR]] So any service which has a init script can be used with heartbeat[[BR]] arjan asked if takes over the IP address[[BR]] There is a virtual IP address used by the service[[BR]] which is the "virtual serverIP" [[BR]] which is the "virtual server" IP address. [[BR]] So, in our webserver case...[[BR]] the real IP address of the first node is not used by the apache daemon[[BR]] but the virtual IP address which will be used by the standby node in case failover happens[[BR]] Heartbeat, however, is limited to two nodes.[[BR]] This is a big problem for a lot of big systems.[[BR]] SGI has ported its Fail``Safe HA system to Linux recently ([http://oss.sgi.com/projects/failsafe])[[BR]] Fail``Safe is a complete cluster manager which supports up to 16 nodes.[[BR]] Right now its not ready for production environments[[BR]] But thats being worked on by the Linux HA project people :)[[BR]] SGI's Fail``Safe is GPL.[[BR]] another type of clustering is LVS ... the Linux Virtual Server project[[BR]] LVS uses a very different approach to clustering[[BR]] you have 1 (maybe 2) machines that request http (www) requests[[BR]] but those machines don't do anything, except send the requests to a whole bunch of machines that do the real work[[BR]] so called "working nodes"[[BR]] if one (or even more) of the working nodes fail, the others will do the work[[BR]] and all the routers (the machines sitting at the front) do is:[[BR]] 1. keep track of which working nodes are available[[BR]] 2. give the http requests to the working nodes[[BR]] the kernel needs a special TCP/IP patch and a set of usermode utilities for this to work[[BR]] Red Hat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy way[[BR]] in Conectiva, we are also working on a very nice HA project[[BR]] the project marcelo and Olive are working on is called "drbd"[[BR]] the distributed redundant block device[[BR]] this is almost the same as RAID1, only over the network[[BR]] to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your data[[BR]] with one copy of the data on every disk[[BR]] drdb extends this idea to use disks on different machines on the network[[BR]] so if one disk (on one machine) fails, the other machines still have the data[[BR]] and if one complete machine fails, the data is on another machine ... and the system as a whole continues to run[[BR]] if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own disk[[BR]] and your programs can continue to run[[BR]] (with ext2, you would have to do an fsck first, which can take a long time)[[BR]] this can be used for fileservers, databases, webservers, ...[[BR]] everything where you need the very latest data to work[[BR]] ...[[BR]] this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)[[BR]] See also [http://www.linux-ha.org/] ---- CategoryDocs