2.4 Explain the key concepts of high availability for servers.
📘CompTIA Server+ (SK0-005)
Fault tolerance is the ability of a server or system to keep running even when something fails. This is very important for high availability because it ensures that services and applications remain online without interruption. In other words, even if a part of the system stops working, the server continues to operate normally.
Fault tolerance can be achieved in two main ways:
- Server-level redundancy
- Component-level redundancy
1. Server-Level Redundancy
Definition:
Server-level redundancy means having more than one server performing the same task so that if one server fails, the other takes over immediately. This usually works with clustering or load balancing.
Key Points:
- The servers are usually configured in a cluster (active-active or active-passive).
- If one server crashes, another server in the cluster continues serving clients.
- It’s like having a backup server ready to run at any time.
Example in IT environment:
- A company has a web server cluster hosting a website.
- Server A and Server B are in an active-passive setup.
- If Server A fails, Server B automatically takes over without downtime.
- Users don’t even notice that one server failed.
Advantages:
- Minimizes downtime.
- Can handle more load by distributing traffic (active-active setup).
Considerations:
- Requires extra servers, which means higher cost.
- Needs proper configuration of clustering software and network.
2. Component-Level Redundancy
Definition:
Component-level redundancy means duplicating parts inside a single server, rather than having multiple servers. If one component fails, the server can continue running using the backup component.
Key Components that can be redundant:
- Power supplies: Dual power supplies so if one fails, the other keeps the server powered.
- Network cards (NICs): Multiple NICs so if one fails, the other can handle traffic.
- Hard drives: Using RAID (Redundant Array of Independent Disks) to duplicate or stripe data across disks.
- Memory (RAM): Some servers have ECC memory with mirroring, which allows one set of RAM to fail without crashing the server.
Example in IT environment:
- A database server has 2 power supplies.
- One power supply fails. The server continues running because the second supply is active.
- Or a server uses RAID 1 (mirrored disks). If one hard drive fails, the data is still available from the second drive.
Advantages:
- Provides redundancy without needing extra servers.
- Lower cost compared to server-level redundancy.
Considerations:
- Only protects against hardware failures, not software or complete server failure.
- Needs careful planning to ensure critical components are redundant.
Comparison: Server-Level vs Component-Level Redundancy
| Feature | Server-Level Redundancy | Component-Level Redundancy |
|---|---|---|
| What fails? | Entire server | Individual components (disk, power, NIC, etc.) |
| Cost | Higher (requires extra servers) | Lower (just duplicate components) |
| Protection against | Server failure, heavy load | Hardware failures only |
| Example | Web server cluster | RAID disks, dual power supplies |
| Complexity | High (clustering and failover configuration) | Medium (hardware setup inside server) |
Exam Tips:
- Remember: Server-level redundancy = multiple servers, Component-level redundancy = backup parts inside a server.
- Think about failover scenarios:
- Server-level redundancy handles full server crash.
- Component-level redundancy handles hardware failure like power supply or hard drive.
- Questions may ask about cost vs protection:
- Server-level is more expensive but protects from server crashes.
- Component-level is cheaper but limited to hardware issues.
In short:
- Fault tolerance keeps your system running even if something fails.
- Server-level redundancy = multiple servers, protects against server failure.
- Component-level redundancy = multiple parts inside one server, protects against hardware failure.
