KernelNewbies
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

Kernel Hacking

  • Frontpage

  • Kernel Hacking

  • Kernel Documentation

  • Kernel Glossary

  • FAQ

  • Found a bug?

  • Kernel Changelog

  • Upstream Merge Guide

Projects

  • KernelJanitors

  • KernelMentors

  • KernelProjects

Community

  • Why a community?

  • Regional Kernelnewbies

  • Personal Pages

  • Upcoming Events

References

  • Mailing Lists

  • Related Sites

  • Programming Links

Wiki

  • Recent Changes

  • Site Editors

  • Side Bar

  • Tips for Editors

  • Hosted by WikiWall

Navigation

  • RecentChanges
  • FindPage
  • HelpContents

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment

Revision 2 as of 2006-08-14 22:44:20
KernelNewbies:
  • Documents
  • HighAvailability

<marcelo> The old model of high availability is "fault tolerance" usually hardware-based.BR <marcelo> Expensive, proprietary.BR <marcelo> This old model goal is to have the hardware system running BR <andres> plasBR <riel> so basically, a single computer is an unreliable piece of shit (relatively speaking) ...BR <riel> ... and High Availability is the collection of methods to make the job the computer does more reliableBR <riel> you can do that by better hardware structuresBR <riel> or by better software structuresBR <riel> usually a combination of bothBR <marcelo> the Linux model of high availability is software based.BR <marcelo> Now let me explain some basic concepts of HABR <marcelo> First, its very important that we dont rely on unique hardware components in a High Availability systemBR <marcelo> for example, you can have two network cards connected to a networkBR <marcelo> In case one of the cards fail, the system tries to use the other card.BR <marcelo> A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"BR <marcelo> SPOF, to make it short. :)BR <marcelo> Another important concept which must be known before we continue is "failover" BR <marcelo> Failover is the process which one machine takes over the job of another nodeBR <riel> "machine" in this context can be anything, btw ...BR <riel> if a disk fails, another disk will take overBR <riel> if a machine from a cluster fails, the other machines take over the taskBR <riel> but to have failover, you need to have good software supportBR <riel> because most of the time you will be using standard computer componentsBR <marcelo> well, this is all the "theory" needed to explain the next parts. BR <riel> so let me make a quick condensation of this introductionBR <riel> 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliableBR <riel> 2. high availability is the collection of these methodsBR <riel> 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and softwareBR <riel> 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOFBR <riel> 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take overBR <riel> (this is called "failover")BR <riel> now I think we should explain a bit about how high availability works .. the technical sideBR <riel> umm wait ... sorry marcelo ;)BR <marcelo> okBR <marcelo> Lets talk about the basic components of HA BR <marcelo> Or at least some of them,BR <marcelo> A simple disk running a filesystem is clearly an SPOFBR <marcelo> If the disk fails, every part of the system which depends on the data contained on it will stop.BR <marcelo> To avoid a disk from being a SPOF of a system, RAID can be used.BR <marcelo> RAID-1, which is a feature of the Linux kernel...BR <marcelo> Allows "mirroring" of all data on the RAID device to a given number of disks...BR <marcelo> So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.BR <marcelo> This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue workingBR <riel> because the system has a copy of the data on each diskBR <riel> and can just use the other copies of the dataBR <riel> this is another example of "failover" ... when one component fails, another component is used to fulfill this functionBR <riel> and the system administrator can replace (or reformat/reboot/...) the wrong componentBR <riel> this looks really simple when you don't look at it too muhcBR <riel> muchBR <riel> but there is one big problem ... when do you need to do failover?BR <riel> in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not carefulBR <riel> think for example of 2 machines which are fileservers for the same dataBR <riel> at any time, one of the machines is working and the other is on standbyBR <riel> when the main machine fails, the standby machine takes overBR <riel> ... BUT ...BR <riel> what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?BR <riel> which copy of the data is right, which copy of the data is wrong?BR <riel> or worse ... what if _both_ copies of the data are wrong?BR <riel> for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are aliveBR <riel> for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this programBR <riel> marcelo: could you tell us some of the things "heartbeat" does?BR <marcelo> sureBR <marcelo> "heartbeat" is a piece of software which monitors the availability of nodesBR <marcelo> it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.BR <marcelo> when a node is considered to be dead when can failover the services which it was runningBR <marcelo> the services which we takeover are previously configured in both systems.BR <marcelo> Currently heartbeat works only with 2 nodes.BR <marcelo> Its been used in production environments in a lot of situations...BR <riel> there is one small problem, howeverBR <riel> what if the cleaning lady takes away the network cable between the cluster nodes by accident?BR <riel> and both nodes *think* they are the only one alive?BR <riel> ... and both nodes start messing with the data...BR <riel> unfortunately there is no way you can prevent this 100%BR <riel> but you can increase the reliability by simply having multiple means of communicationBR <riel> say, 2 network cables and a serial cableBR <riel> and this is reliable enough that the failure of 1 component still allows good communication between the nodesBR <riel> so they can reliably tell if the other node is alive or notBR <riel> this was the introduction to HABR <riel> now we will give some examples of HA software on LinuxBR <riel> and show you how they are used ...BR <riel> ... <we will wait shortly until the people doing the translation to EspaƱol have caught up> ... ;)BR <marcelo> OkBR <marcelo> Now lets talk about the available software for LinuxBR <riel> .. ok, the translators have caught up .. we can continue again ;)BR <marcelo> Note that I'll be talking about the opensource software for LinuxBR <marcelo> As I said above, the "heartbeat" program provides monitoring and basic failover of services BR <marcelo> for two nodes onlyBR <marcelo> As a practical example...BR <marcelo> The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeatBR <marcelo> In case our primary web server fails, the standby node will detect and start the apache daemonBR <marcelo> making the service available again BR <marcelo> any service can be used, in theory, with heartbeat.BR <riel> so if one machine breaks, everybody can still go to our website ;)BR <marcelo> It only depends on the init scripts to start the serviceBR <marcelo> So any service which has a init script can be used with heartbeatBR <marcelo> arjan asked if takes over the IP addressBR <marcelo> There is a virtual IP address used by the serviceBR <marcelo> which is the "virtual serverIP" BR <marcelo> which is the "virtual server" IP address. BR <marcelo> So, in our webserver case...BR <marcelo> the real IP address of the first node is not used by the apache daemonBR <marcelo> but the virtual IP address which will be used by the standby node in case failover happensBR <marcelo> Heartbeat, however, is limited to two nodes.BR <marcelo> This is a big problem for a lot of big systems.BR <marcelo> SGI has ported its FailSafe HA system to Linux recently ([http://oss.sgi.com/projects/failsafe])[[BR]] <marcelo> FailSafe is a complete cluster manager which supports up to 16 nodes.BR <marcelo> Right now its not ready for production environmentsBR <marcelo> But thats being worked on by the Linux HA project people :)BR <marcelo> SGI's FailSafe is GPL.BR <riel> another type of clustering is LVS ... the Linux Virtual Server projectBR <riel> LVS uses a very different approach to clusteringBR <riel> you have 1 (maybe 2) machines that request http (www) requestsBR <riel> but those machines don't do anything, except send the requests to a whole bunch of machines that do the real workBR <riel> so called "working nodes"BR <riel> if one (or even more) of the working nodes fail, the others will do the workBR <riel> and all the routers (the machines sitting at the front) do is:BR <riel> 1. keep track of which working nodes are availableBR <riel> 2. give the http requests to the working nodesBR <riel> the kernel needs a special TCP/IP patch and a set of usermode utilities for this to workBR <riel> Red Hat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy wayBR <riel> in Conectiva, we are also working on a very nice HA projectBR <riel> the project marcelo and Olive are working on is called "drbd"BR <riel> the distributed redundant block deviceBR <riel> this is almost the same as RAID1, only over the networkBR <riel> to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your dataBR <riel> with one copy of the data on every diskBR <riel> drdb extends this idea to use disks on different machines on the networkBR <riel> so if one disk (on one machine) fails, the other machines still have the dataBR <riel> and if one complete machine fails, the data is on another machine ... and the system as a whole continues to runBR <riel> if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own diskBR <riel> and your programs can continue to runBR <riel> (with ext2, you would have to do an fsck first, which can take a long time)BR <riel> this can be used for fileservers, databases, webservers, ...BR <riel> everything where you need the very latest data to workBR <riel> ...BR <riel> this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)BR

See also [http://www.linux-ha.org/]


CategoryDocs

  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01