How to Check Disk Health with SMART (smartctl) on Fedora

You hear a faint clicking rhythm from your laptop chassis, or `dmesg` is spamming I/O errors every time you open a large file. Maybe the system just feels sluggish and you suspect the storage is degrading. You need to know if the drive is dying before it takes your data with it. SMART data tells you the drive's internal health status. `smartctl` is the tool that reads that data and translates it into something you can act on.

What's actually happening

SMART stands for Self-Monitoring, Analysis, and Reporting Technology. The drive keeps a log of its own internal metrics: temperature, reallocated sectors, power-on hours, and error counts. It is like a car's dashboard computer tracking oil pressure and engine temperature. The drive compares these values against manufacturer thresholds. If a value crosses a threshold, the drive marks itself as failing.

smartctl queries the drive firmware and prints these metrics. It does not fix anything. It reports what the drive knows about itself. The tool is part of the smartmontools package. Fedora does not install this by default on Workstation images to keep the base system lean. You need to install it manually.

Install and identify the device

Install the package and list your block devices to find the correct path. Guessing the device name can lead to checking the wrong drive or locking up a mounted filesystem.

sudo dnf install smartmontools -y
# WHY: smartmontools provides smartctl and smartd. Fedora excludes this from default installs to save space.
lsblk -o NAME,MODEL,SERIAL,SIZE,TYPE
# WHY: List block devices with model and serial to identify the correct disk path.
# WHY: Avoid guessing /dev/sda if you have multiple drives. Match the model to your hardware.

Look for the device under NAME. The model column helps you distinguish between your NVMe drive and a secondary HDD. Note the path, such as /dev/sda or /dev/nvme0n1.

Run smartctl -H for a quick health check. This is the fastest way to see if the drive reports a problem.

sudo smartctl -H /dev/sda
# WHY: -H requests the overall health self-assessment. This is the quickest check.
# WHY: The drive firmware calculates this based on internal thresholds.
# WHY: A PASSED result means the drive has not crossed any critical failure limits.

The output ends with a single line. If you see SMART overall-health self-assessment test result: PASSED, the drive believes it is healthy. If you see FAILED, backup your data immediately and replace the drive.

Run smartctl -a to dump all attributes. This shows the detailed metrics behind the health status.

sudo smartctl -a /dev/sda
# WHY: -a dumps all SMART attributes, error logs, and self-test results.
# WHY: Use this to inspect specific metrics like Reallocated_Sector_Ct or temperature.
# WHY: The output is verbose. Pipe to less if you need to scroll: smartctl -a /dev/sda | less

Interpreting the attribute table

The attribute list contains raw values and normalized values. The normalized value scales from 100 to 0. A value of 100 means perfect. A value dropping toward the threshold indicates degradation. The threshold is the critical limit. If the normalized value hits the threshold, the drive marks the attribute as failed.

Focus on Reallocated_Sector_Ct. This counts sectors that the drive has moved to a spare area. A non-zero value means the drive found bad spots and replaced them. A rising count means the platter is degrading. Current_Pending_Sector counts sectors that are unstable and waiting to be reallocated. If this number is high, the drive is struggling to read those sectors. Offline_Uncorrectable counts sectors that could not be corrected during offline scanning.

The attribute table has columns for ID, Name, Value, Worst, Threshold, and Raw. The Value column is the current normalized score. The Worst column tracks the lowest value ever recorded. The Threshold is the failure point. If Value drops to Threshold, the attribute fails. The Raw column contains the manufacturer-specific raw data. This can be a count, a temperature in Celsius, or a binary blob. smartctl tries to decode the Raw value, but sometimes you need the vendor documentation to interpret it. For example, Temperature_Celsius usually shows the current temp in the Raw column. Reallocated_Sector_Ct shows the count. Focus on trends. A single bad sector might be a fluke. A rising count over weeks is a death sentence.

Running self-tests

The drive can run internal tests to scan for errors. A short test checks critical areas and takes a few minutes. A long test scans the entire surface and can take hours. The tests run in the background. The drive remains usable during the test.

sudo smartctl -t short /dev/sda
# WHY: -t short starts a short self-test. This usually takes 2 to 5 minutes.
# WHY: The test runs in the background. The drive remains usable during the test.
# WHY: Do not power off the drive while the test is running.
sudo smartctl -l selftest /dev/sda
# WHY: -l selftest shows the log of completed tests. Look for the 'Result' column.
# WHY: A result of 'Completed without error' means the test passed.
# WHY: 'Aborted' means the test was interrupted. 'Failed' means errors were found.

Check the self-test log after the test finishes. The LBA_of_first_error column points to the location of the first failure if the test failed. You can use this offset with dd or badblocks to inspect the specific sector, though replacing the drive is usually the safer path.

NVMe drives and special cases

NVMe drives use a different protocol. smartctl handles NVMe, but the output structure differs slightly. The path is /dev/nvme0n1, not /dev/sdX. The -a flag works for both, but NVMe reports health as a percentage in the Percentage Used attribute rather than a simple pass/fail line in older smartctl versions. Modern versions normalize this.

sudo smartctl -a /dev/nvme0n1
# WHY: -a works for NVMe devices too. The output includes NVMe-specific metrics.
# WHY: Check 'Percentage Used' for flash endurance. This estimates remaining lifespan.
# WHY: HDDs do not have this metric. It is specific to SSD and NVMe storage.

USB-to-SATA bridges often intercept commands. Some pass SMART through, some do not. If smartctl reports Device does not support SMART on a USB drive, the enclosure is likely the blocker. Try a different enclosure or connect the drive internally. Some enclosures require a jumper or firmware update to enable passthrough.

RAID arrays present a virtual device. smartctl on /dev/md0 queries the RAID controller, not the physical drive. The result is meaningless. You must run smartctl on the component devices like /dev/sda and /dev/sdb. If the drives are in a hardware RAID, the controller firmware usually provides its own monitoring tools. smartctl may not work with hardware RAID at all.

Check the underlying device, not the RAID array. smartctl on /dev/md0 will lie to you.

Automating with smartd

smartd runs as a daemon. It checks drives at intervals defined in /etc/smartd.conf. The default configuration checks every 30 minutes. You can enable it to monitor drives automatically.

sudo systemctl enable --now smartd
# WHY: enable --now starts the service and ensures it runs on boot.
# WHY: smartd monitors drives in the background and logs warnings to the journal.
# WHY: Check logs with journalctl -u smartd if you suspect missed alerts.

The default /etc/smartd.conf on Fedora enables monitoring for all drives. It uses the -a flag to check all attributes and -o on to enable offline data collection. You can add specific directives for individual drives. The syntax is /dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03). This runs a short test daily at 2 AM and a long test weekly on Saturday at 3 AM. The -m root option sends mail to root on failure. Adjust this to your email address. Remember to restart smartd after editing the config. sudo systemctl restart smartd.

Never edit files in /usr/lib/smartmontools/. Those are managed by the package. Copy any custom config to /etc/smartd.conf or use the drop-in directory. Config files in /etc/ are user-modified. Files in /usr/lib/ ship with the package. Edit /etc/.

Verify it worked

Run the health check again to confirm the status. If you ran a test, verify the log shows completion.

sudo smartctl -H /dev/sda
# WHY: Confirm the health status is PASSED after any changes or tests.
# WHY: If the status changed to FAILED, the drive is reporting imminent failure.

If you see PASSED, the drive is reporting normal operation. If you see FAILED, the drive has crossed a critical threshold. Backup your data and plan a replacement.

Run smartctl -a and grep for Reallocated. If that number is rising, backup now.

Common pitfalls

If you see Permission denied, you forgot sudo. The kernel restricts raw disk access. If you see Device does not support SMART, the drive is too old or the controller is hiding the SMART interface. Some USB enclosures block SMART passthrough. Check the enclosure firmware.

If smartctl hangs, the drive might be unresponsive or the connection is flaky. Check dmesg for I/O errors. A drive that hangs during SMART queries is often failing mechanically.

If you are on a laptop with a battery, ensure the system is not suspending during a long test. Suspend can abort the test. Use systemd-inhibit to prevent sleep if needed.

smartctl 7.4 2023-01-29 r5375 [x86_64-linux-6.5.11-300.fc40.x86_64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue
Device Model:     WDC WD10EZEX-08WN4A0
Serial Number:    WD-WMC3T0...
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5567
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 23 14:32:15 2023 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

When to use this vs alternatives

Use smartctl -H when you need a quick pass/fail status for a routine check. Use smartctl -a when you need to inspect specific attributes like reallocated sectors or temperature trends. Use smartctl -t short when you suspect intermittent errors and want a fast verification. Use smartctl -t long when you are planning to retire a drive and want a thorough surface scan. Use smartd when you want automated background monitoring and email alerts on failure. Use nvme-cli when you need NVMe-specific features like firmware updates or namespace management.

Backup before you trust the drive. SMART predicts failure, it does not prevent data loss.

Where to go next

SMART is a built-in monitoring system on your hard drive that tracks its health and predicts failures. Running this check is like getting a medical report for your storage device before it breaks. You use it to ensure your data is safe and to replace failing drives before you lose files.