Issues related to high CPU/memory usage
- 1 Monitoring CPU Usage and Performance
- 1.1 Run CPU Stress Tests
- 1.1.1 What to look for
- 1.2 Run a memory stress test
- 1.2.1 What to look for
- 1.1 Run CPU Stress Tests
- 2 Testing Disk I/O Performance
- 2.1 Run Disk I/O Benchmarks
- 2.1.1 What to look for
- 2.2 Advanced Test with fio
- 2.2.1 What to look for
- 2.1 Run Disk I/O Benchmarks
- 3 Network Throughput and Latency Testing
- 4 Check for Resource Contention Using System Logs
- 5 Summary
When running a cluster on a host where resources are shared instead of dedicated, it’s important to perform checks to identify if resources are overprovisioned or facing contention. This can happen when multiple virtual machines (VMs) share the same physical host, leading to performance degradation. In environments where there is no swap space, if memory resources are oversold or overprovisioned, it becomes even more critical to watch for out-of-memory (OOM) issues or CPU throttling. Here’s how to check for potential issues related to high CPU and memory usage and to verify performance of your network.
Monitoring CPU Usage and Performance
Run CPU Stress Tests
A stress test can help determine if your CPU is operating within the expected limits or if contention is causing performance issues. Tools like stress-ng or sysbench can simulate high CPU load and highlight resource issues.
sudo apt install stress-ng
stress-ng --cpu 16 --timeout 60s
This command will stress 16 CPU cores for 60 seconds.
What to look for
Monitor CPU usage with tools like top or htop. If CPU usage remains inconsistent or hits a limit prematurely, it may indicate CPU sharing or throttling.
Since there’s no swap, monitor how the system behaves under load. If CPU-intensive tasks trigger memory pressure (which can lead to other services being affected), it’s a sign of possible contention.
Tip: Run dmesg | tail
after the test to look for logs indicating CPU throttling or "soft lock-ups," which can point to the host being overloaded with resources.
Run a memory stress test
To test if the memory allocation is sufficient for your workload, use tools like stress-ng with memory allocation tests. Without swap space, the test becomes more critical in identifying issues like oversubscription or contention.
stress-ng --vm 1 --vm-bytes 90% --timeout 160s
This test allocates about 90% of your memory. If you start to see swap usage while still having RAM available, it may indicate memory oversubscription, where the host uses techniques like ballooning or memory swapping to share RAM among VMs.
What to look for
While the test is running, monitor the system logs (
dmesg -w
) to check for any OOM killer events. If your memory allocation triggers OOM kills unexpectedly, it may indicate that the host’s memory is oversold or shared among too many VMs.
If your stress test causes an OOM kill event even when you haven’t reached the allocated RAM, it’s a clear sign of resource contention on the host.
Testing Disk I/O Performance
Run Disk I/O Benchmarks
Use tools like fio
or dd
to benchmark disk I/O. Perform these tests at different times of the day to see if performance fluctuates, which could imply overselling.
dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct status=progress
What to look for
Significant variation in disk performance at different times (e.g., high performance during off-peak hours and slower during business hours) could indicate that the disk is shared between multiple VMs and resources are being contested.
Advanced Test with fio
This will run a more detailed benchmark. Run it at different times (e.g., peak business hours and late night) to spot fluctuations.
What to look for
Pay attention to fluctuations in write speed and latency across different tests. Variations in performance could indicate disk contention, particularly if latency is high or if write speeds dip during peak usage times.
Monitor I/O Wait Time: Use iostat
or vmstat
to monitor iowait
over time. High iowait
percentages, especially with low activity on your VM, could mean the disk is shared among many users.
Network Throughput and Latency Testing
Run Network Bandwidth Tests: Use
iperf3
to test network speed. If your network performance is much lower than advertised or varies significantly throughout the day, this may indicate a shared and oversold network.
What to look for:
If network speeds fluctuate heavily over the day, or if there is high latency within the same data center, it may indicate network oversubscription.
Check for Resource Contention Using System Logs
Look for Throttling or Contention Logs: Check system logs (/var/log/syslog
, /var/log/kern.log
) for messages that might indicate resource contention or throttling imposed by the hypervisor.
Messages about CPU throttling, OOM kills, or "CPU soft lockups" may point to contention or an overloaded host. These logs are vital for identifying performance problems caused by shared resources.
Summary
By carefully monitoring CPU and memory usage, conducting stress tests, and examining system logs, you can pinpoint when overselling or resource contention is affecting performance. These tests help ensure that your system is performing optimally and is not overburdened by shared resources.
Related content
Classified as Getvisibility - Partner/Customer Confidential