We were thinking about redundancy options for CCIE.PL today. There are few restrictions we have there, both came either from policy or our personal thoughts about several aspects of paid services and sharing admin access. But simply we are thinking how to automate failover in case our primary server or database have problems. Easiest solution would be to use Cloudflare free tier service but let’s say we don’t want to do this now. So we were looking on the other options and there was an idea that maybe we can use cloud load-balancer for on-premise services. First thought – it’s brilliant. On second thought – definitely that idea was wrong. Let me show you why.
Why there is no such service as cloud load-balancer for on-premises services?
Our idea was simple – load balancer service in any public cloud that will probe our internet-facing servers in on-premises data centers and will provide failover functionality in case our primary servers is down. Sounds easy. But it’s not possible to do that. Here are the reasons:
- Reason 0: This is the only reason that have it’s weight here, others are in pretty much random order – Neither Amazon Web Service, Microsoft Azure or e24cloud provide service where pool can contain external servers. So you can create load balancer service but it can only distribute between servers your service inside public cloud. Why? For same reasons why our idea was not that brilliant
- Reason 1: Health probes might not give back correct result. Let’s say our servers are in Poland but closest load balancer can be set up in UK. So we face here typical “traditional network” problems:
- RTT (round-trip time) may be quite long so probes may return false positive alerts
- Jitter (variance of RTT) may cause problems for health checks because path may change quite often in Internet
- Servers in Poland may be temporarily unaccessible from UK even if they are still accessible from Poland. Well, that’s how internet works after all
- Reason 2: We increase latency for users and slow down our users – most of people visiting ccie.pl are right now from Poland (try to check it, there is English section as well!). So if we set up load balancer in UK their request will goes from Poland to UK then back to Poland. That means higher latency and bad user experience due to longer than expected time required for content to load
- Reason 3: Routing asymmetry – AWS and Azure load balancers don’t apply SNAT for incoming traffic (I’m not sure about e24cloud buy I assume it’s the same). That means we will send questy to load balancer IP 126.96.36.199 while response will be received from public IP address of the real server 188.8.131.52. Asymmetry in routing is generally thing that cause problems. On network, system and application level.
- Reason 4: Cost – common practice in public clouds is that you pay for service and traffic outgoing from the cloud. With asymmetry they can earn only for the service. It there would be SNAT they can charge us for traffic but well…. imagine DDoS attack (I know they have protection in place) or large download of data from our server. Yeah that may cost us a lot of money.
There are lot more, but in my opinion those are the major ones that makes this service not a good idea.
What we can do if cloud load-balancer for on-premises services is a bad idea? There are other options like Cloudflare or our on data center load balancer, even virtual. And we consider other options as well 🙂