4.2 Given a scenario, troubleshoot common hardware failures.
📘CompTIA Server+ (SK0-005)
1. Predictive Failures
Predictive failures are early warnings that hardware may fail soon.
Key points:
- Detected through monitoring tools and SMART (Self-Monitoring, Analysis, and Reporting Technology) for drives.
- Indicators include:
- Increasing error counts
- Degraded performance
- Warning alerts in system logs
In IT environments:
- Monitoring software alerts administrators that a disk is likely to fail soon.
- Allows replacement before actual failure occurs.
2. Memory Errors and Failures
a) System Crash
A system crash occurs when the system stops functioning due to a critical error.
Causes:
- Faulty RAM
- Incompatible memory modules
- Overheating
b) Blue Screen / Purple Screen
- Blue Screen of Death (BSOD) occurs in Windows systems.
- Purple Screen occurs in VMware ESXi environments.
Meaning:
- Indicates a critical system error, often related to:
- Memory issues
- Driver problems
- Hardware faults
c) Memory Dump
A memory dump is a file created when a crash occurs.
Purpose:
- Helps administrators analyze the cause of the crash.
- Contains system state and memory contents at the time of failure.
d) Memory Utilization
Refers to how much RAM is being used.
Issues:
- High utilization → slow performance or system freeze
- Memory leaks → applications consuming increasing memory over time
e) Power-On Self-Test (POST) Errors
POST is a diagnostic test performed when the system starts.
Memory-related POST errors:
- Beep codes or error messages
- Indicate faulty or missing RAM
f) Random Lockups
The system becomes unresponsive without a clear error.
Causes:
- Faulty RAM
- Driver conflicts
- Resource exhaustion
g) Kernel Panic
A kernel panic is a critical error in Linux/Unix systems.
Meaning:
- The operating system cannot recover safely.
- Often caused by:
- Hardware failure (RAM, CPU)
- Corrupted drivers
- Memory errors
3. CMOS Battery Failure
The CMOS battery powers the BIOS/UEFI settings.
Symptoms of failure:
- Incorrect system time and date
- BIOS settings reset to default
- Boot errors
Impact:
- Loss of hardware configuration settings
4. System Lockups
A system lockup occurs when the server stops responding.
Indicators:
- No keyboard/mouse response
- No network activity
- Requires reboot
Causes:
- Memory failure
- CPU overload
- Hardware conflicts
5. Random Crashes
Random crashes occur without a consistent pattern.
Causes:
- Faulty RAM
- Overheating
- Power supply issues
- Software conflicts
6. Fault and Device Indicators
These are signs that help identify failing hardware.
a) Visual Indicators
i) Light-Emitting Diode (LED)
LEDs are lights on hardware components.
Uses:
- Indicate power status
- Show disk activity
- Signal hardware errors (amber/red LEDs)
ii) Liquid Crystal Display (LCD) Panel Readouts
Some servers have built-in LCD screens.
Uses:
- Display system status
- Show error messages or codes
- Provide hardware diagnostics
b) Auditory and Olfactory Cues
Auditory (Sound-based):
- Beep codes during POST
- Fan noise changes (indicating overheating)
Olfactory (Smell-based):
- Burning smell → possible hardware damage
- Indicates overheating or electrical failure
c) POST Codes
POST codes are diagnostic signals during startup.
Types:
- Beep codes (audio signals)
- LED codes
- Numeric display codes
Purpose:
- Help identify which component is failing (RAM, CPU, etc.)
7. Misallocated Virtual Resources
This refers to incorrect allocation of system resources.
Examples:
- Too much or too little RAM assigned to virtual machines
- CPU overcommitment
- Improper storage allocation
Effects:
- Poor performance
- Application crashes
- System instability
In IT environments:
- Virtual machines may fail to start
- Server resources may become overutilized
- Leads to memory errors or system crashes
Key Exam Takeaways
You should be able to:
- Identify symptoms of memory failures:
- System crashes
- BSOD / Purple screen
- Kernel panic
- Random lockups
- Understand hardware indicators:
- LED lights
- LCD displays
- POST beep/codes
- System logs and alerts
- Recognize early warning signs:
- Predictive failures
- Increasing memory errors
- SMART alerts
- Understand supporting hardware issues:
- CMOS battery failure (BIOS resets, incorrect time)
- Resource misallocation in virtual environments
