Demystifying awk

#The cheatsheet

Awk variables work like so:

$0 – represents full line
$1....$n – represents each field (1-index-based)
NR – current record
$NF – last column

#Print out a specific column

# basic print command structure
awk '{ print $[column number]}' source_file

# print 2nd column
awk '{ print $2 }' source_file.txt

# print 1st and 3rd column from piped result
ps aux | awk '{ print $1, $3}'

#Filter out lines in a file

# basic filtering structure
awk 'filter_logic { print $1 }' source_file

# filter by column value
awk '$3 > 100 { print $0 }' source_file.txt

# filter by regex
awk '$2 ~ /pattern/ { print $0 }' source_file.txt

#Format output

# basic formatting structure
awk '{ print [comma-separated values]}' source_file

# printing this -> that sturcture
awk 'filter_logic { print $1, "->", $2}' source_file.txt

# you have to print spaces yourself
awk '{ print "Total ", $1}' source_file.txt

#Basic sum

# command structure
awk '{[per_row_calculation]} END {print [result]}' source_file

# average of the 2nd column
awk '{sum += $2; count++} END {print sum/count}' source_file.txt

# count lines matching a pattern
awk '/ERROR/ {count++} END {print count}' logfile.log

#The explanation

Oh, you don’t know what awk does even though your agentic AI uses it all the time? AWKward! (this joke wasn’t brought to you by AI).

Awk is one of those tools like sed and, honestly, the rest of my demystifying series. It’s an incredibly powerful command that can do a lot, it’s used everywhere, but trying to learn it is like divining arcane knowledge.

So what does awk do? It treats text like a database. It’s best for modifying and getting data from CSV-like structures. awk processes a text corpus line by line and splits each line into separate fields to treat them like a “column”. Each “column” is represented by a dollar argument with a 1-based index. For example, $0 represents the whole line while $1 is the first field. Thus $1...$n represents all possible fields in a text.

What is it good for?

pulling out and processing CLI output
filtering logs
reformatting text
doing tiny one-liners that feel like magic.

A good comparison is that grep is great for finding lines of text, awk is great for understanding and manipulating them.

Let’s use the following text as an example. We’ll call it our source_file.txt where the columns are:

user’s name (eg. Antonin)
user’s permission level (eg. admin)
user’s username (eg. antonin_j)

User1 admin user_1
User2 user user_2
User3 guest user_3

#The basic shape of an awk command

Most awk commands look like this:

awk 'pattern { action }' source_file.txt

The pattern is the filterable property to figure out which lines we want to process. It’s OPTIONAL. The action is the action we want to take with the resulting data. If you provide a pattern without an action, awk will print the whole matching line by default.

awk '{ print $1 }' source_file.txt

In this example, we’re skipping any pattern recognition and we’re just telling awk to print out the first field for each line.

#Input

You can pass input to awk either via specifying a file or piping the result straight in.

Direct file reference:

awk '{ print $1 }' source_file.txt

Piping the data in:

cat source_file.txt | awk '{ print $1 }'

Note: this is super useful for logs. Just run ps aux and you can select data and filter it via awk by piping it through. eg. ps aux | awk '/node/ {print $2}' to print out currently running node processes.

#Filtering

One of the big parts of using awk is filtering. Filtering works akin to if expressions in any language (or a where clause in SQL):

awk '$2 == "admin" { print $1 }' source_file.txt

In this instance, awk would filter on the 2nd column where the user permission should be admin and it’ll print out the user’s name. In our case it’d be this:

User1

What are all the available filters?

regex match. eg. awk '$1 ~ /regex pattern/ { print $0 }' source_file.txt
less than/greater than. eg. awk '$3 > 100 { print $1 }' source_file.txt
equals. eg. awk '$2 == "admin" { print $1 }' source_file.txt

You can also filter on the whole row with regex: awk '/regex pattern/ {print $2} source_file.txt.

#Actions

Things can get really complicated with awk. Awk is able to run loops, do sums, etc.

However, I’m going to focus on the basics which is just print. So how can we use print? Well, we can print out multiple columns like so:

awk '{ print $1, $2 }' source_file.txt

This would print column 1 and 2.

We can also print out our own hardcoded text by wrapping a string in quotes:

# this version prints spaces because awk inserts the output field separator
awk '{ print $1, "::", $2}' source_file.txt

# this version concatenates values, so there are no spaces
awk '{ print $1 "::" $2 }' source_file.txt

This would print out something like this:

User1::admin
User2::user
User3::guest

#Advanced Usage

Ok, so Awk can go deep. You can even write .awk files, create loops, use a custom separator, etc. It’s a lot. So if you’re not up for it, don’t keep going but if you are, let me tell you about that kind of functionality.

#Custom Separator

Awk by default separates columns by runs of whitespace, so multiple spaces or tabs between values are treated as a single separator. You can change this behavior to separate by anything using the -F flag.

awk -F ':' '{ print $1 }' source_file.txt

This will separate each line using : to create the various fields. In our example source_file, we’d just end up with 1 field because we can’t separate even more.

#Awk files

You can create a .awk file to run awk commands. To use it, pass the file path using the -f flag. eg.

awk -f process_data.awk source_file.txt

The structure of awk files can be a bit confusing but essentially, each file can have:

a beginning to set data
filtering logic / processing logic
an end to print out result.

Note that you do not need to use each block/section. The BEGIN section just ensures you run it once and before processing. The filtering/processing logic runs per row but you also don’t need to do that. And the END section runs after all processing is done.

Example:

# begin block
BEGIN {
  FS = ":" # this is how you setup a custom separator in an awk file

  ADMIN_SUM = 0 # you can also declare custom variables
  USER_SUM = 0
}

$2 == "admin" {
  # Filtering lets you run logic per match

  ADMIN_SUM++ # adjust/increment variables when this block matches
  print $1 # you can print from here, too
}

# you can have multiple runtime blocks. This one runs on every row, unfiltered
{
  USER_SUM++
}

# END lets you run actions at the end of processing
END {
 print "Total admins: ", ADMIN_SUM, " Total users: ", USER_SUM
}

#Inline-awk loops

Ok so, guess what? You can run all this logic from the .awk file inline! It’s basically the same thing.

awk '{ sum +=1 } END { print sum }' source_file.txt

We can also just specify:

awk 'END { print NR }' source_file.txt

NR is the current record number so this essentially gives us the number of rows in the file.

We can also specify begin blocks and everything in between.

awk 'BEGIN { ADMIN_SUM = 0 } $2 == "admin" { ADMIN_SUM++ } END { print "Total admins: ", ADMIN_SUM}' source_file.txt