How to Monitor Disk I/O and Performance on Fedora (iostat, iotop)

The drive is spinning but nothing loads

You boot Fedora, open a terminal, and run a routine dnf upgrade --refresh. The progress bar stalls at twelve percent. You check the system monitor and the CPU is idle. The network is quiet. Yet the machine feels frozen. You wait ten minutes. The upgrade finally resumes. This is not a broken package manager. This is a disk I/O bottleneck. The storage subsystem is saturated, and the kernel is throttling every process that tries to write to the disk. You need to see the queue before you guess.

What the bottleneck actually looks like

Storage latency is invisible until it stops you. Think of disk I/O like a highway toll booth. The CPU is the car engine. The RAM is the passenger seat. The disk is the toll booth. When traffic is light, cars pass through instantly. When the booth closes or the attendant slows down, every car backs up. The engine idles. The passengers wait. The system appears frozen even though the hardware is technically running.

Modern SSDs hide this well, but mechanical drives, cheap NVMe controllers, heavy journaling, or aggressive backup daemons will still create queues. The kernel tracks these queues in the block layer. Every read or write request enters a queue, waits for the device to be ready, gets serviced, and leaves. Your job is to read the queue length and identify which process is holding the toll booth hostage. Run journalctl first. Read the actual error before guessing.

Mapping system-wide I/O with iostat

The sysstat package provides iostat. It reads directly from /proc/diskstats and translates kernel counters into human-readable throughput and latency numbers. You need it before you guess. Run the installation command first.

sudo dnf install sysstat
# WHY: sysstat ships iostat, sar, and mpstat. Fedora does not include it by default to save space on minimal installs.
# WHY: The package also enables the sa1 cron job, which archives historical I/O data for later analysis.
# WHY: Installing sysstat creates /var/log/sa/ where daily performance snapshots are stored automatically.

Once installed, run the extended report with a one-second interval.

iostat -xz 1
# WHY: -x requests extended statistics including await time and queue length.
# WHY: -z suppresses devices with zero activity, keeping the output readable on systems with many virtual disks.
# WHY: The trailing 1 sets a one-second polling interval. Omit it to get a single snapshot since boot.

Focus on three columns. await shows the average time a request spends in the queue plus the time the device spends servicing it. Values under five milliseconds are healthy for NVMe. Values over fifty milliseconds indicate saturation. svctm shows the average service time per request. If await is high but svctm is low, the queue is backing up. If both are high, the physical medium is struggling. %util shows the percentage of time the device was busy. A sustained one hundred percent means the drive is fully saturated. The kernel cannot process more requests until the current batch completes.

Convention aside: iostat reads cumulative counters. The first line after boot reflects the entire uptime. Always run it with an interval to see current activity. The sysstat cron job runs sa1 every five minutes. You can query historical data with sar -d -f /var/log/sa/sa$(date +%d). This saves you from leaving a terminal open overnight. Snapshot the system before the upgrade. Future-you will thank you.

Pinpointing the culprit with iotop

System-wide metrics tell you the highway is jammed. They do not tell you which car is blocking the lane. iotop maps I/O activity back to processes. It uses the blkio cgroup controller and kernel accounting to attribute read and write bytes to specific PIDs.

sudo dnf install iotop
# WHY: iotop requires root privileges to read cgroup I/O accounting data.
# WHY: The package depends on python3-iotop, which interfaces with the kernel's block layer directly.
# WHY: Fedora ships with cgroup v2 enabled by default, which changes the accounting path but iotop handles it transparently.

Launch it with the flags that filter noise and sort by actual activity.

sudo iotop -oPa
# WHY: -o shows only processes that are currently performing I/O. Idle processes disappear from the list.
# WHY: -P displays only real processes, excluding kernel threads that inflate the output.
# WHY: -a accumulates total I/O since the program started, giving you a clear ranking of heavy hitters.

The output shows READ, WRITE, and SWAPIN columns. SWAPIN is the percentage of time the process spent waiting for swapped pages to be read back into RAM. High swap-in combined with low disk throughput usually means the system is thrashing. Look for database daemons, rsync jobs, or systemd-journald flushing logs. If you see kworker or jbd2 dominating the list, the filesystem journal is catching up after an unclean shutdown or a heavy metadata operation.

Convention aside: iotop does not replace journalctl. If a service is stuck, check journalctl -xeu <unit> first. The journal often logs the exact reason a process is blocking on I/O. Read the actual error before assuming the disk is failing. Config files in /etc/ are user-modified. Files in /usr/lib/ ship with the package. Edit /etc/. Never edit /usr/lib/.

Verify the fix

You identified the process. You stopped the rogue backup job or tuned the database write cache. Now prove the bottleneck cleared. Run iostat -xz 1 again. Watch await drop below ten milliseconds. Watch %util fall below eighty percent. Run sudo iotop -oPa and confirm the suspicious process is no longer at the top. The system should respond normally. Reboot before you debug. Half the time the symptom is gone.

Common pitfalls and error patterns

Users often misread iostat output. A high %util on a modern NVMe drive does not always mean failure. NVMe controllers have multiple queues and can report one hundred percent utilization while still processing thousands of IOPS. Trust await and svctm over raw utilization percentages. Another mistake is running iotop without -o. The default view lists every process, including idle ones showing zero I/O. The list becomes useless.

You may encounter permission errors when launching iotop. The tool requires access to /sys/fs/cgroup/blkio/. If you see PermissionError: [Errno 13] Permission denied: '/sys/fs/cgroup/blkio/', your cgroup v2 hierarchy is not mounted correctly or SELinux is blocking access. Check journalctl -t setroubleshoot for denials. Do not disable SELinux. Adjust the policy or run the command with proper privileges. Fedora ships with cgroup v2 enabled by default. The path structure changed from v1, but iotop handles both transparently.

Some users try to fix I/O latency by disabling write caching in the BIOS. This trades performance for data safety. Never disable hardware write caching on a drive that holds a filesystem journal. The kernel relies on the drive's cache to maintain consistency. If power fails, you get corrupted metadata. Use hdparm -W 1 /dev/sdX to verify the setting if you must change it. Trust the package manager. Manual file edits drift, snapshots stay.

Choosing the right tool

Use iostat when you need a historical view of throughput and latency across all block devices. Use iotop when you need to identify which specific process is generating the load. Use sar -d when you need to correlate I/O spikes with CPU or memory usage over a full day. Use fio when you need to benchmark raw drive performance before blaming the OS. Stay on iostat and iotop if you only need to troubleshoot a live bottleneck.

Where to go next

These tools help you find out if your hard drive is slowing down your computer. iostat shows the overall health of your storage, while iotop tells you exactly which program is reading or writing data. Think of it like a traffic camera: one shows total traffic volume, and the other identifies the specific cars causing the jam.