How to Use grep, sed, and awk for Text Processing on Fedora

grep, sed, and awk are the three core Unix text-processing tools available on every Fedora system — grep searches, sed transforms, and awk extracts and reports structured data.

Text processing on Fedora: grep, sed, and awk

You just ran a dnf upgrade and the transaction failed. The terminal is scrolling past thousands of lines of output. You need to find the specific conflict error, extract the package names involved, and maybe patch a configuration file that has the wrong value. Scrolling is slow. Copy-pasting is error-prone. You need to filter, transform, and extract data directly from the stream.

Fedora ships three tools for this job: grep, sed, and awk. They are small, fast, and chain together via pipes to handle any text processing task. You will use these tools every day once you stop treating the terminal as a command runner and start treating it as a data pipeline.

What is actually happening

These tools operate on streams. Input comes from a file, standard input, or a pipe. Output goes to the terminal, a file, or the next command in the chain. This stream-oriented design is the backbone of the Unix philosophy. You build complex logic by combining simple filters.

Think of an assembly line. grep picks the parts that match a pattern. sed stamps, modifies, or deletes parts as they pass by. awk measures, sorts, and reports on the parts based on their fields. Each tool does one thing well. The combination handles the complexity.

All three tools support regular expressions. Regular expressions are patterns that describe text structures. A dot matches any character. The asterisk repeats the previous element. The caret anchors to the start of the line. Understanding these primitives prevents false positives and makes your commands precise.

grep — Search for patterns

grep reads input line by line and prints lines that match a pattern. It is the primary tool for finding information in logs, configuration files, and output streams. Fedora uses grep from the grep package, which supports POSIX extended regular expressions.

Here is how to search a log file for errors while ignoring noise and getting context.

# Search for "error" in the dnf log, ignoring case.
# -i makes the search case-insensitive so "Error" and "ERROR" both match.
grep -i 'error' /var/log/dnf.log

# Show line numbers and 2 lines of context around each match.
# -n prints line numbers. -C 2 adds context lines above and below.
# Context reveals the cause, not just the symptom.
grep -n -C 2 'failed' /var/log/messages

# Count matching lines instead of printing them.
# -c returns a count. Useful for quick metrics or scripts.
grep -c 'warning' /var/log/dnf.log

# Search recursively through a directory tree.
# -r descends into subdirectories.
# Use this to find configuration directives across /etc.
grep -r 'ServerName' /etc/httpd/

# Invert the match to show lines that do NOT contain the pattern.
# -v prints non-matching lines.
# This filters out comments starting with #.
grep -v '^#' /etc/ssh/sshd_config

Convention aside: grep -r is powerful but can be slow on large directory trees. If you are debugging a systemd service, run journalctl -xeu <unit> first. The journal is indexed and faster than grepping raw log files. Use grep when you need to search configuration files, codebases, or when the journal is not available.

Grep for the exact error string first. Context beats guessing.

sed — Stream editor for text transformation

sed applies editing commands to each line of input. It is a stream editor, meaning it processes text as it flows through the pipe. sed excels at substitutions, deletions, and simple text manipulations. Fedora ships sed as part of the base system.

Here is how to perform substitutions and edit files in place.

# Replace the first occurrence of 'foo' with 'bar' on each line.
# The s command performs substitution.
# Without the g flag, only the first match per line changes.
sed 's/foo/bar/' file.txt

# Replace all occurrences on each line.
# The g flag at the end makes the substitution global.
sed 's/foo/bar/g' file.txt

# Edit a file in place and keep a backup.
# -i.bak modifies the file directly and saves the original as file.txt.bak.
# Never run -i without a backup on production configs.
sed -i.bak 's/SELINUX=disabled/SELINUX=enforcing/' /etc/selinux/config

# Delete blank lines from input.
# The pattern /^$/ matches empty lines.
# The d command deletes matching lines.
sed '/^$/d' file.txt

# Print only a range of lines.
# -n suppresses default output.
# 5,10p prints lines 5 through 10.
sed -n '5,10p' file.txt

Convention aside: Configuration files in /etc/ are user-modified. Files in /usr/lib/ ship with packages. Always edit /etc/. Running sed -i on /usr/lib/ files will break package management and get overwritten on the next update. Use sed on /etc/ files, and always verify the change with cat or diff afterward.

Backup the file before in-place edits. A typo in sed can corrupt a config instantly.

awk — Pattern scanning and reporting

awk processes files field by field. It splits each line into fields based on a separator and lets you act on specific columns. awk is a full programming language, but you will mostly use it for extraction, filtering, and simple arithmetic. Fedora provides gawk as the default awk implementation, which supports arrays, functions, and extended regular expressions.

Here is how to extract columns, filter by value, and compute aggregates.

# Print the first and third columns of a file.
# $1 and $3 refer to field numbers.
# Default separator is whitespace.
awk '{print $1, $3}' /etc/passwd

# Use a custom field separator.
# -F: sets the separator to colon.
# This parses structured files like /etc/passwd correctly.
awk -F: '{print $1, $3}' /etc/passwd

# Print lines where the third field is greater than 1000.
# The condition $3 > 1000 filters lines.
# This extracts user accounts with UID above 1000.
awk -F: '$3 > 1000 {print $1}' /etc/passwd

# Sum the values in the second column.
# The loop accumulates values in the sum variable.
# The END block runs after all input is processed.
awk '{sum += $2} END {print "Total:", sum}' data.txt

# Combine with grep to extract timestamps from error lines.
# grep filters for ERROR. awk extracts the date and time fields.
grep 'ERROR' /var/log/messages | awk '{print $1, $2, $3}'

Convention aside: awk field splitting treats multiple spaces and tabs as a single separator by default. If your data uses fixed-width columns or specific delimiters, always set -F explicitly. Relying on the default whitespace split can produce wrong results when fields contain internal spaces.

Check the field separator before you parse. Whitespace defaults hide tabs and multiple spaces.

Chaining tools together

The pipe operator | connects these tools. Output from the left command becomes input for the right command. This allows you to build complex data pipelines from simple steps. The order matters. Filter first, then extract, then transform.

Here is a realistic chain to analyze failed login attempts from the auth log.

# Find failed logins, extract the username, count occurrences, and sort.
# grep filters for the specific failure message.
# awk extracts the 9th field which contains the username.
# sort and uniq -c group and count identical lines.
# sort -rn sorts by count in descending order.
# head -10 shows the top 10 offenders.
grep 'Failed password' /var/log/secure \
  | awk '{print $9}' \
  | sort | uniq -c | sort -rn | head -10

When a chain fails, isolate the break. Insert cat or head between pipes to inspect the data at that stage. This reveals whether the upstream command produced output or if the downstream command choked on the format. Debug left to right.

Test each stage of the pipe separately. Debug the chain from left to right.

Common pitfalls and errors

Text processing commands fail silently or produce confusing output when patterns are wrong. Watch for these issues.

If you see sed: -e expression #1, char 0: no previous regular expression, you likely forgot the substitution command or used an empty pattern. sed requires a valid expression. Run sed 's/old/new/' not sed ''.

If grep returns no output, check your case sensitivity and anchors. A pattern like error matches terror. Use grep -w 'error' to match whole words only. If you are searching for a literal string with special characters, use grep -F to disable regex interpretation.

sed: -e expression #1, char 0: no previous regular expression

If awk prints empty fields, verify the separator. The default whitespace split collapses multiple spaces. If your data has tabs, use awk -F'\t'. If your data has commas, use awk -F,.

Unquoted variables in scripts expand and break patterns. Always quote your patterns in shell scripts. Use grep "$pattern" not grep $pattern. Unquoted variables split on spaces and pass multiple arguments to the command.

Quote your patterns. Unquoted variables expand and break the regex.

When to use which tool

Use grep when you need to find lines matching a pattern or verify a string exists in a file. Use sed when you need to perform simple substitutions, delete lines, or edit configuration files in place. Use awk when you need to extract specific columns, perform arithmetic on fields, or handle structured data with custom separators. Use journalctl when you are debugging systemd services and need timestamped logs with unit context. Use ripgrep when you are searching large codebases or directories and need speed over POSIX compatibility.

Where to go next

These three tools are the Swiss Army knives of text processing. grep finds specific lines, sed changes text inside files, and awk pulls out specific columns or numbers. Think of them as a filter, a paintbrush, and a calculator for your text files.