The cheatsheet
Awk variables work like so:
$0– represents full line$1....$n– represents each field (1-index-based)NR– current record$NF– last column
Print out a specific column
# basic print command structure
awk '{ print $[column number]}' source_file
# print 2nd column
awk '{ print $2 }' source_file.txt
# print 1st and 3rd column from piped result
ps aux | awk '{ print $1, $3}' Filter out lines in a file
# basic filtering structure
awk 'filter_logic { print $1 }' source_file
# filter by column value
awk '$3 > 100 { print $0 }' source_file.txt
# filter by regex
awk '$2 ~ /pattern/ { print $0 }' source_file.txt Format output
# basic formatting structure
awk '{ print [comma-separated values]}' source_file
# printing this -> that sturcture
awk 'filter_logic { print $1, "->", $2}' source_file.txt
# you have to print spaces yourself
awk '{ print "Total ", $1}' source_file.txt Basic sum
# command structure
awk '{[per_row_calculation]} END {print [result]}' source_file
# average of the 2nd column
awk '{sum += $2; count++} END {print sum/count}' source_file.txt
# count lines matching a pattern
awk '/ERROR/ {count++} END {print count}' logfile.log The explanation
Oh, you don’t know what awk does even though your agentic AI uses it all the time? AWKward! (this joke wasn’t brought to you by AI).
Awk is one of those tools like sed and, honestly, the rest of my demystifying series. It’s an incredibly powerful command that can do a lot, it’s used everywhere, but trying to learn it is like divining arcane knowledge.
So what does awk do? It treats text like a database. It’s best for modifying and getting data from CSV-like structures. awk processes a text corpus line by line and splits each line into separate fields to treat them like a “column”. Each “column” is represented by a dollar argument with a 1-based index. For example, $0 represents the whole line while $1 is the first field. Thus $1...$n represents all possible fields in a text.
What is it good for?
- pulling out and processing CLI output
- filtering logs
- reformatting text
- doing tiny one-liners that feel like magic.
A good comparison is that grep is great for finding lines of text, awk is great for understanding and manipulating them.
Let’s use the following text as an example. We’ll call it our source_file.txt where the columns are:
- user’s name (eg. Antonin)
- user’s permission level (eg. admin)
- user’s username (eg. antonin_j)
User1 admin user_1
User2 user user_2
User3 guest user_3 The basic shape of an awk command
Most awk commands look like this:
awk 'pattern { action }' source_file.txt The pattern is the filterable property to figure out which lines we want to process. It’s OPTIONAL. The action is the action we want to take with the resulting data. If you provide a pattern without an action, awk will print the whole matching line by default.
awk '{ print $1 }' source_file.txt In this example, we’re skipping any pattern recognition and we’re just telling awk to print out the first field for each line.
Input
You can pass input to awk either via specifying a file or piping the result straight in.
Direct file reference:
awk '{ print $1 }' source_file.txt Piping the data in:
cat source_file.txt | awk '{ print $1 }' Note: this is super useful for logs. Just run
ps auxand you can select data and filter it viaawkby piping it through. eg.ps aux | awk '/node/ {print $2}'to print out currently running node processes.
Filtering
One of the big parts of using awk is filtering. Filtering works akin to if expressions in any language (or a where clause in SQL):
awk '$2 == "admin" { print $1 }' source_file.txt In this instance, awk would filter on the 2nd column where the user permission should be admin and it’ll print out the user’s name. In our case it’d be this:
User1 What are all the available filters?
regexmatch. eg.awk '$1 ~ /regex pattern/ { print $0 }' source_file.txt- less than/greater than. eg.
awk '$3 > 100 { print $1 }' source_file.txt - equals. eg.
awk '$2 == "admin" { print $1 }' source_file.txt
You can also filter on the whole row with regex: awk '/regex pattern/ {print $2} source_file.txt.
Actions
Things can get really complicated with awk. Awk is able to run loops, do sums, etc.
However, I’m going to focus on the basics which is just print. So how can we use print? Well, we can print out multiple columns like so:
awk '{ print $1, $2 }' source_file.txt This would print column 1 and 2.
We can also print out our own hardcoded text by wrapping a string in quotes:
# this version prints spaces because awk inserts the output field separator
awk '{ print $1, "::", $2}' source_file.txt
# this version concatenates values, so there are no spaces
awk '{ print $1 "::" $2 }' source_file.txt This would print out something like this:
User1::admin
User2::user
User3::guest Advanced Usage
Ok, so Awk can go deep. You can even write .awk files, create loops, use a custom separator, etc. It’s a lot. So if you’re not up for it, don’t keep going but if you are, let me tell you about that kind of functionality.
Custom Separator
Awk by default separates columns by runs of whitespace, so multiple spaces or tabs between values are treated as a single separator. You can change this behavior to separate by anything using the -F flag.
awk -F ':' '{ print $1 }' source_file.txt This will separate each line using : to create the various fields. In our example source_file, we’d just end up with 1 field because we can’t separate even more.
Awk files
You can create a .awk file to run awk commands. To use it, pass the file path using the -f flag. eg.
awk -f process_data.awk source_file.txt The structure of awk files can be a bit confusing but essentially, each file can have:
- a beginning to set data
- filtering logic / processing logic
- an end to print out result.
Note that you do not need to use each block/section. The BEGIN section just ensures you run it once and before processing. The filtering/processing logic runs per row but you also don’t need to do that. And the END section runs after all processing is done.
Example:
# begin block
BEGIN {
FS = ":" # this is how you setup a custom separator in an awk file
ADMIN_SUM = 0 # you can also declare custom variables
USER_SUM = 0
}
$2 == "admin" {
# Filtering lets you run logic per match
ADMIN_SUM++ # adjust/increment variables when this block matches
print $1 # you can print from here, too
}
# you can have multiple runtime blocks. This one runs on every row, unfiltered
{
USER_SUM++
}
# END lets you run actions at the end of processing
END {
print "Total admins: ", ADMIN_SUM, " Total users: ", USER_SUM
} Inline-awk loops
Ok so, guess what? You can run all this logic from the .awk file inline! It’s basically the same thing.
awk '{ sum +=1 } END { print sum }' source_file.txt We can also just specify:
awk 'END { print NR }' source_file.txt NR is the current record number so this essentially gives us the number of rows in the file.
We can also specify begin blocks and everything in between.
awk 'BEGIN { ADMIN_SUM = 0 } $2 == "admin" { ADMIN_SUM++ } END { print "Total admins: ", ADMIN_SUM}' source_file.txt