2001-01-02 -+- Jim Dennis -+- jimd@starshine.org -+- It's a pity this article is riddled with so many typos and grammatical
errors. It actually made it difficult to read in places.

I would be nice if there were good information about Linux support for multi-initiator ("shared") SCSI. I haven't found any. There are
references (a Google search will reveal that much). However, none of
them provides enough information for a typical or senior sysadmin to
make a decision: Will it work? Should I recommend it? How would I do it?

It seems that FC (fibre channel) and the SANS that people are building
around it are going to accomplish the same things that people were trying
to do with M-I SCSI. Apparently there have traditionally been device
driver and even hardware microcode problems that make multi-initiator
SCSI difficult and that constrain the range of hardware that can be
used in such a setup. (The author of this article mentions the fact
that some SCSI hosts don't offer an option to change their own ID
--- the can only be SCSI ID 7. That's just one problem; they must also
allow the system to skip the bus reset! Otherwise any initiator/system
on the SCSI chain that was rebooted would reset the whole bus. Clearly
the (Linux) device driver must also refrain from gratuitous resets. It's
not clear that all of the Linux SCSI drivers could be used in an M-I
cluster).

2001-11-04 -+- Sandeep Banerjee -+- sandeep_b@email.com -+- Hi Atif,

Your page proved to be a good starting point for my search on redundancy. I am looking for some non-commercial solution for system level redundancy - the objective being host switchover on any process or OS failure.

I need a simple solution that does not require RAID or any kind of checkpointing or backup. All it should do is to monitor the health of the system, and switch over at the slightest hint of failure to the backup host. Note that there are only 2 hosts.

I look forward to some guidance from you in this regard.

Regards,
Sandeep

Bangalore, India. 2001-11-27 -+- Witzbold -+- -+- dies ist ein test und kann gelöscht werden 2002-02-03 -+- Allen Stewart -+- -+- This article is a good starting point for HA but it seems the technologies listed are still not mature and it will be years before the number of IP failover clusters and stateful failover clusters are even a drop in the bucket vs Windows 2000 based HA systems.

Allen 2002-02-27 -+- Ghazanfer -+- falkenn@hotmail.com -+- Hi! Atif.
ur article gave good insides of HA. but i also wanted to know that, do v really want 2 NIC's and a serial connection? Can't it be done using one NIC on each machine. 2002-03-07 -+- zhanghai -+- jordan_zhang@dell.com -+- Hi Atif

I am appreciate you share your experence about cluster and SAN. It’s just what I had in mind.I hope you can offer detail configuration about GFS. I want to know if you have non-commercial and similarity software.I am looking for your feedback

Jordan 2002-04-11 -+- Bernd Tannenbaum -+- tannenbaum@itenos.de -+- Hi all :)

----Problems with 2 Servers and 1 Router (Arp-Table)----

At first, i have to say i`m new to linux but now got the job of clustering 2 Apache-webserver.
I found Heartbeat but am not happy with it. 90% of the possible errors in the network result in a disconnection of the server from the internet. (Reasons could be a single switchport going down or a network cable accidently removed by the wrong people.) In this case Heartbeat would do nothing because the slave still reaches his Master and will not take over the resource.
In a second try i did not configure the failover with a second network card but configured it with the standard eth0. Now the slave noticed when the Master was unreachable, took the resource and sended a garp to tell the router that he now has the Web-server-IP. But when the Master comes back (maybe because the Switchport that was down reinitialized) it still has the resource, too. In fact, he never gave it up. The Slave notices now that the master is back and gives up the resource, so the Master is the only owner again. But because the Master thinks he never has lost the resource he doesnt send the garp. So the router doesnt know the new owner of the Resource-IP and the webserver is down.
I implemented some cronjobs on both servers to do what heartbeat doesnt do:
Testing the Gateway, then testing the Server-IP and sending Garps.
They work fine, but i feel that a professional programm written by people who know a lot more about linux-cluster than me should be able to do the same.
Maybe someone can help me out?

With best regards,
Bernd Tannenbaum 2002-07-09 -+- Frederick L. Belfon -+- fredericksec@yahoo.com -+- I would like to know how to configure a 2 node high availability cluster for an apache webserver that makes use of an MySQL database.

I am new to Linux and need some real ehlp

Thank you -+- 136.148.1.94 = Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) 2003-02-27 -+- richard clason -+- richard@firstdirectsolutions.co.uk -+- Can anyone suggest how to set up the following. To run a server using Red Hat, Ensim, PHP, MqSQL, E-mail and SSL. Should this server fail a second server to cut in and take over all traffic. Both servers would have upto data data.

Thanks -+- 80.195.20.133 = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705) 2003-07-30 -+- amol -+- amolbhore@yahoo.com -+- I want information regarding the failure of the backup node
in the cluster computing, i dont know whether MPICH take care
of it or we have to take care of it explicitly -+- 203.197.93.66 = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 2004-07-23 -+- Mohsin Khan -+- aaghaz@aaghaz.net -+- Hi,

This is a nice but brief article, LVS is almost the same technology but it is more powerful than hearbeat. It uses pluse instead, that is kind of heartbeat and lots of other tools and make your Linux Server a cool Load balancer and redundant Server.

Regards
Mohsin
http://portal.aaghaz.net -+- 212.165.158.100 = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322) 2004-08-30 -+- varun -+- vsorc@yahoo.co.in -+- hi atif,
i have read your article on HA.and it has inpired me alot.i am a computer science&engg student from india..as part of our curriculum we have selected ha clustering as our final year project..can u suggest any links or books on ha so that we can build a clear idea about the subject.... -+- 202.88.243.122 = Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90) #2005-01-28 -+- Tamra Burgess -+- -+- Regarding SteelEye Technology, which is attempting to sell consumer software under the disguise of being "credible" in the high availability world. Given the reluctance to involve QA and other departments in the product planning ahead of time, I know and experienced personally their lack of credibility.

http://tam-ra.us/donate.php
-+- 69.22.218.69 = Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 2005-03-02 -+- Ravikumar -+- naidu_trk@rediffmail.com -+- I found the article quite interesting , But the projects discussed in the article seem to focuss the HA in case of hardware failures but not the software failures ( A more common scenario ) . I would be interested in running a redundant ( hot standby ) process to take over from exactly the same point where the primary died. -+- 221.134.8.146 = Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 #2005-08-17 -+- Tamra Burgess -+- mstamra@yahoo.com -+- Here's a company which is asking customers and partners to trust them with their critical data while committing one fraud and deception after another:

http://tam-ra.us/livelearn.php -- regarding SteelEye Technology -+- 70.112.123.166 = Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 2006-07-27 -+- Clemon Allison -+- clemon.allison@academy.com -+- Having issues with HA over long distances without the benefit of a serial cable for redundancy. Teamlead seems to think that this can be done. I have tested the failover via the documentation on this webpage and it works great!! I now have been tasked with testing this over a dedicated vlan (the second network card is using this vlan for the heartbeat), with the primary server in the main computer room and hte secondary server in the disaster recovery room across campus. I can ping the heartbeat ip addresses over this vlan so I know it works, but when I start the primary server, it comes up fine, however when I startup the secondary server, it takes over the HA, but I never see where the primary gives up the heartbeat, not even in the debug logs. I've tweaked the deadtime and initdead time and still the secondary server takes over the resources. Any advice as to what I should do? -+- 70.244.58.142 = Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1