We have a Lighttpd/Perl/MySQL web service we run on an Ubuntu VPS, and want to add redundacy so that if our datacenter has issues, we stay up.

Interested in thoughts and comments on our proposed solution:

  • We're looking at using GlusterFS to mirror the web roots and config files for our apps, and MySQL Replication in multimaster mode to mirror the database. Both would run over the WAN/Public Internet between the two datacenters with IPSec Transport mode encryption.

  • We'd use dual A records (an IP at each datacenter) to host the sites. This would provide for round-robin while things were working, and would failover within 4 seconds (worst case, most browsers release a DNS pinning in 1000ms) to the other server, should connectivity be lost.

  • GlusterFS and MySQL replication would both "self heal" and update the other server automatically once connectivity was restored, so there is no issue of needing to update an out-of-sync server after failover, and both servers can run in "live mode" with both A records live all the time - so there is no DNS propagation to take place to make a failover happen.

  • In the event of software failure or a need to take one server offline for maintenance (rather than connectivity failure) we could simply pull one server's IP offline using the VPS control panel, or firewall it temporarily with iptables on the server itself.

  • As well as the automatic failover we'd experience with a datacenter outage, we could also automatically initiate a failover in the event of software failure on one server using automatic monitoring - if one server isn't returning the content we expect to see, we would get an alert, and the monitoring software would automatically pull the offending server offline using the VPS host's API - causing requests to fail over to the other.

Interested to know if anyone has tried doing anything similar, or for any thoughts, comments or suggestions on the above strategy.

asked 02 Jun '10, 20:57

Jeffery's gravatar image

Jeffery
74113
accept rate: 100%

looks well-thought, just not sure about the 4 seconds failover given that (as I understand) they are on two different physical locations. also, have you considered the (odd) case when only the DB fails but the web keeps serving the pages? is this handled properly?

(03 Jun '10, 15:21) pmarini

@pmarini in the instance that the DB fails this could get caught by the "monitoring" process, or, I suppose by the CGI scripts which could determine an invalid sql response and take their own IP address offline

The 4 seconds is based on Page 2 Table 1 of http://crypto.stanford.edu/dns/dns-rebinding.pdf - though this is talking about DNS rebinding attacks, the same DNS "Pinning" scenarios in the browser is what we'd be looking at here with dual A records where one A record's IP goes offline and the browser is forced to re-pin the DNS name to a different IP... in theory :)

(03 Jun '10, 16:49) Jeffery

Propably this question will receive more answers on Serverfault.com. I think, it's very hard question...

(04 Jun '10, 09:02) guerda

Please accept an answer so the question/answer can be finished. Or provide more details so we can help.

(20 Apr '11, 14:10) rfelsburg ♦



Possibly not the answer you're looking for, but have you done a cost benefit analysis to ascertain the true impact of an outage and then compared that with the costs of the kind of redundancy you're talking about? Have you taken into account the added complexity the above setup adds and ensured it won't cause more downtime then the issue you're trying to avoid? True real-time live geographically redundant infrastructure can be extremely complex and in many cases is overkill. A good backup schedule and properly planned disaster recovery plan can often mitigate a datacenter outage (by failing over to your backup datacenter) to just a couple of minutes, while avoiding a lot of the complexity. FWIW, the fact that you're currently running on a VPS was the first indication that what you're looking to do is almost certainly overkill, but I could be wrong.

--jeremy

link

answered 08 Jun '10, 18:45

jeremy's gravatar image

jeremy ♦♦
1.0k1516
accept rate: 37%

As a followup to some of these "why bother" / "paranoia" comments:

We have experienced four outages in the last month within our current datacenter (naming no names, but they are a huge organization in Dallas) which is why we're looking to add redundancy.

Each time our services go down we lose a significant amount of money as our advertising partners (Google, etc) are bringing visitors for which we pay, but who can't see our site to convert to sales. While some advertising partners such as Adwords can be "paused" in a short space of time during an outage, not all of them can. Working with small margins at high volume as we are, any extended outage can be very expensive.

Although the datacenter claims 100% uptime, things can still happen, as we saw last month.

The amount we lost due to the outages so far last month, was far LESS than the cost of a redundant setup in another facility would have been.

-J

link

answered 11 Jun '10, 13:58

Jeffery%201's gravatar image

Jeffery 1
111
accept rate: 0%

1

FWIW, my response was definitely not meant to be of the "why bother" OR "paranoia" variety. It's just that you need to have a good understanding of exactly how much downtime actually costs you vs. how much a properly setup redundant setup that won't cause more downtime than it avoids (and I see a ton of these in the real world) will cost. You should also keep in mind how it will impact scaling, administrative tasks, upgrades, etc. Does this wrong can have quite a few long term negative consequences.

(11 Jun '10, 14:22) jeremy ♦♦

I would recommend looking into Linux-HA, and heartbeat and pacemaker.

I use them for high availability failovers between servers.

link

answered 23 Jul '10, 15:30

rfelsburg's gravatar image

rfelsburg ♦
6061618
accept rate: 25%

-2

Paranoia => greater cost than benefit obtained. Sounds like a Fed job.

link

answered 09 Jun '10, 18:34

nanodiamond's gravatar image

nanodiamond
71
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×90
×8

Asked: 02 Jun '10, 20:57

Seen: 13,050 times

Last updated: 20 Apr '11, 14:10

powered by OSQA