High Availability vs Fault Tolerance

Reliability/Resiliency: Trait of an application/system to avoid or recover from failure with minimal manual intervention.

Availability: Is measure of reliability or the duration of time that a system is performing as expected. If there's performance degradation it implies low availability.

Redundancy (multiple machines) enables loose coupling (achieved using load balancers) which improves performance and availability.

Performance and availability are directly proportional; good availability, good performance, and vice versa. 

High Availability (HA) and Fault Tolerance (FT/continuously available) are used interchangeably when talking about keeping systems running with little or no degradation in their availability.

Though the goal of HA and FT is the same the difference lies in how they are achieved.

HA typically aims at keeping the systems available with a slight degradation of SLAs and can tolerate some downtime. HA is achieved by either provisioning fixed number of servers (typically at least 2 or more) or auto scaling. 

Businesses will define SLAs for a system during normal and impact times. 

Example:

  • During normal operation the service needs to handle 500 transactions per sec.
  • During impact, when one node is down, 100 transactions per sec is acceptable.
HA tries to keep up with the minimum acceptable SLA during an impact by either scaling out to a max number of nodes or manually failing over to a secondary region within defined downtime.

FT also uses multiple nodes or auto-scaling to provide tolerance, but it tries to maintain the SLA defined for normal operation even during an impact. That means the system adjusts to maintain 500 transactions per sec by spinning up nodes automatically or failing over to secondary region with zero downtime. This makes FT complex to implement and expensive compared to HA.

So, to summarize HA tolerates some downtime and compromises on SLA but FT does not and is more complex and expensive. Fault tolerant systems can be said to be continuously available.


Comments

Popular posts from this blog

Offline Apps Patterns and Tools

Private Cloud Platforms

Application Delivery in the cloud