Getting started with jq

Examples and Patterns

7 min readNov 18, 2021

What is jq?

jq is a JSON processing tool written in C. It is a lightweight binary (~30kB) and can be run standalone i.e. you don’t need to install any additional dependencies. jq is available on Linux, OSX, and Windows and is a popular choice for command-line JSON processing.

Why another article on jq?

I agree there are hundreds of articles on jq and writing another seems wasteful. However, in this article, I have taken a problem-solution approach. First I outline a problem/situation where you have to deal with JSON output and then I provide some patterns on how jq can be used to solve that problem. This way you will not be copy/pasting esoteric filters but rather develop an appreciation for both where and how to use jq. The examples covered in this article are:

Using jq to process Linux command output
Using jq to process HTML Archive (HAR) files
Using jq to process System calls (strace output)

Example 1 — Using jq to process Linux command output

1.1 — The Problem

What do ls -lrt, ps -ef, and netstat -tlnp have in common? They are all Linux commands. And what do they not have in common? Their output format! I usually resort to using cut, sed, grep, awk to filter out the output of these commands.

Recently, I found a tool called jc that approaches this problem differently. jc converts the output of common Linux commands to JSON. This way you can use any JSON processing tool to slice and dice through the output. I thought this is a good problem to start with jq.

1.2 — The Patterns

In the below example we will look at enumerating the JSONized version of the ps command output using jq. We’ll look at the following patterns:

Pattern 1a — Selecting specific Processes from a List
Pattern 1b — Selecting specific fields from a Processes structure
Pattern 1c — Using Conditions to filter Processes of interest
Pattern 1d — Summarizing data from the Process output

1.3 — Examples — Enumerating processes

This is the sample output that we will be working with:

#- Pretty-print all the processes
ps -ef | jc --ps | jq '.'Sample Output:
[ 
  {
    "uid": "502",
    "pid": 47827,
    "ppid": 16698,
    "c": 0,
    "stime": "11:47am",
    "tty": "ttys000",
    "time": "0:00.00",
    "cmd": "jq ."
  }
  <---output snipped--->
]

Pattern 1a — Selecting specific Processes from a List

#- What's the first running process?
ps -ef | jc --ps | jq '.[0]'#- What are the first and fifth running processes?
ps -ef | jc --ps | jq '.[0,4]'#- What are the first 10 running processes?
ps -ef | jc --ps | jq '.[0:10]'#- What are the last 10 running processes?
ps -ef | jc --ps | jq '.[-10:]'

Back to TOC

Pattern 1b — Selecting specific fields from a Processes structure

#- Get only the commandline from the process list
ps -ef | jc --ps | jq '.[].cmd'#- Selecting two fields from the process list - Line output
ps -ef | jc --ps | jq '.[] | .cmd, .pid'#- Selecting two fields from the process list - List output
#- The jq -c compresses output to one list item per line
ps -ef | jc --ps | jq -c '.[] | [.cmd, .pid]'#- List output + Combine to a single line
#- The join command below will combine the fields into a pipe delimited string. 
ps -ef | jc --ps | jq -c '.[] | [.cmd, .pid] | join("|")'#- List output + csv/tsv
#- We are using -r to remove the extraneous double-quotes that jq adds to the output. Refer here. 
ps -ef | jc --ps | jq -c -r '.[] | [.cmd, .pid] | @csv'#- Selecting two fields from the process list - String output
ps -ef | jc --ps | jq '.[] | "\(.cmd), \(.pid)"'

Back to TOC

Pattern 1c — Using Conditions to filter Processes of interest

#- Getting the list of processes where the pid>10000
ps -ef | jc --ps | jq '.[] | select(.pid > 10000)'#- Getting list of proceses where pid>10000 and cmd is -zsh
ps -ef | jc --ps | jq '.[] | select(.pid > 10000 and .cmd=="-zsh")'#- Getting processes where the commandline contains "python"
ps -ef | jc --ps | jq '.[] | select(.cmd | contains("python"))'

Back to TOC

Pattern 1d — Summarizing data from the Process output

#- Counting the TOTAL number of processes
ps -ef | jc --ps | jq '. | length'#- Getting the userid from all processes. 
ps -ef | jc --ps | jq '.[].uid'#- Getting DISTINCT userids. 
#- To use the unique function the value has to be inside an array
ps -ef | jc --ps | jq '[.[].uid]|unique'#- GROUP processes by userid. This will break the main list into sub-lists for each userid. 
ps -ef | jc --ps | jq 'group_by(.uid)'#- Get the userid and corresponding count of processes
ps -ef | jc --ps | jq -c 'group_by(.uid) | .[] | [.[0].uid, length]'

Back to TOC

Example 2— Using jq to process HTML Archive (HAR) files

2.1 — The Problem

A HAR or HTML Archive is a JSON file describing the web browser’s activity when you visit a site. The JSON contains a lot of useful information such as sites contacted, cookies used, response content, page load times, etc. This file is a good candidate to work with jq and get some useful information about your browser’s activity!

Additionally, if you are interested in the HAR spec you can find it here. There’s also a couple of interesting videos on YouTube that I would recommend watching if you want to learn more: Capturing and analyzing HAR and HAR for Malware analysis — Derbycon 2015.

2.2 — The Patterns

Some common usage patterns with HAR files:

Pattern 2a — Enumerating the HAR file — fields and entries
Pattern 2b — Getting cookie information
Pattern 2c — Searching and selecting Response content
Pattern 2d — Passing values downstream

2.3— Examples — Processing a HAR file

#- Pretty-print contents of HAR file
cat example.har | jq '.'Sample Output:
{
  "log": {
    "version": "1.2",
    "creator": {... },
    "pages": [ {... } ],
    "entries": [
      {
 ...
        "request": {
          "method": "GET",
          "url": "https://example.com/",
          "httpVersion": "http/2.0",
          "headers": [ ]
          "queryString": [],
          "cookies": [ ],
   ...
        },
        "response": {
          "status": 200,
          "statusText": "",
          "httpVersion": "http/2.0",
          "headers": [ ],
          "content": { },
          "redirectURL": "",
   ...
        },
        "serverIPAddress": "...",
        "startedDateTime": "...",
        "time": 1890.5660000018543,
        "timings": {... }
      },
    ]
  }
}

Back to TOC

Pattern 2a — Enumerating the HAR file — fields and entries

#- Getting all possible paths in your JSON
cat example.har | jq '[path(..)|map(if type=="number" then "[]" else tostring end)|join(".")|split(".[]")|join("[]")]|unique|map("."+.)|.[]'#- You can also use a tool like gron to get all possible paths in a JSON. 
gron example.har#- How many entries?
cat example.har| jq '[.log.entries[]]|length'

Back to TOC

Pattern 2b — Getting Cookie Information

#- Requests that have cookies
cat example.har| jq '.log.entries[].request | select(.cookies != []) | .cookies'#- Which domains have most cookies?
cat example.har| jq '[.log.entries[].request | select(.cookies != []) | .cookies[].domain] | unique'#- What cookies is facebook.com setting?
cat example.har| jq '.log.entries[].request | select(.cookies != [] and .cookies[].domain==".facebook.com") | .cookies'

Back to TOC

Pattern 2c — Searching and Selecting Response Content

#- Get content from the response that is in JSON and contains the string MMAATTCCHH. Note the use of "fromjson" - this function removes the extraneous backslashes in your final json output. cat example.har| jq '.log.entries[].response.content | select(.mimeType=="application/json" and .size>0 and .text) | select(.text|contains("MMAATTCCHH")).text | fromjson'#- Also note, in the aboved example if you want a case insensitive match you can do: select(.text|contains("MMAATTCCHH","i")).text#- gron can also make searching in JSON easier!
gron example.har | grep "MMAATTCCHH" | cut -f1 -d"="Above command would give you an output like this, indicating that the following JSON paths contain the string MMAATTCCHH: json.log.entries[30].response.content.text
json.log.entries[73].request.postData.text

Back to TOC

Pattern 2d — Passing Values Downstream:

#- Here we are taking the page title early on in the JSON and storing it in a local variable $page_title. Later we are using this variable to print the value of the titlecat example.har | jq '.log | .pages[0].title as $page_title | "The last URL to load in the page \($page_title)  is \(.entries[-1].request.url)"'

Back to TOC

Example 3 — Using jq to process strace output

3.1 — The Problem

strace is a Linux tool that monitors the system calls made by a process. If we can get the output of strace as JSON then we can easily slice/dice/analyze the system call behavior of our program. However, strace does not provide output in JSON. However, there is a program called b3, that can convert saved strace output to JSON. You can install b3 with the following commands:

#- Download b3 from Github
wget https://github.com/dannykopping/b3/releases/download/0.3.0/b3#- Verify the download
file b3#- Give the file executable permissions
chmod +x b3#- Move it to /usr/bin
sudo mv b3 /usr/bin

3.2 — The Patterns

In the below example we will look at enumerating the JSONized version of the ps command output using jq. We’ll look at the following patterns:

Pattern 3a — Counting syscalls
Pattern 3b — Filtering and viewing arguments

3.3 — Examples — Enumerating syscalls made by tcpdump

This is how we generate the JSON data:

#- Cleanup
/bin/rm -f /tmp/out /tmp/out.json#- Run tcpdump with strace and write output to /dev/null
strace -o /tmp/out tcpdump -w /dev/null#- Convert the file to JSON. 
#- We are using jq -s to "slurp" the individual lines into a single array. 
cat /tmp/out | b3 | jq -s > /tmp/out.json

Pattern 3a — Counting syscalls

#- How many system calls? 
jq -s '.[] | length' /tmp/out.json#- How many distinct calls, group by count?
jq -rc '
[ group_by(.syscall) 
| .[] 
| {"name":.[0].syscall,"count":length}]  
| sort_by(.count) | reverse
| .[] | [.name // "-" ,.count] | join(" -----> ")' \
/tmp/out.json#- Notes:
* In the last line of the filter the // "-" means the alternative operator. If a value does not exist then replace by "-". 
* The reverse keyword is use to sort by least frequency of occurence.

Back to TOC

Pattern 3b — Filtering and viewing arguments

#- All system calls with their arguments
jq -rc '
.[] | select(.syscall != "nd") | 
[.syscall, (.args | tostring), .result] |
join(" -----> ")
' \
/tmp/out.json#- Filter only "bind" calls
jq -rc '
.[] | select(.syscall == "bind") | 
[.syscall, (.args | tostring), .result] |
join(" -----> ")
' \
/tmp/out.json

Back to TOC

That’s it for now!

Getting started with jq

Examples and Patterns

What is jq?

Why another article on jq?

Example 1 — Using jq to process Linux command output

1.1 — The Problem

1.2 — The Patterns

1.3 — Examples — Enumerating processes

Example 2— Using jq to process HTML Archive (HAR) files

2.1 — The Problem

2.2 — The Patterns

2.3— Examples — Processing a HAR file

Example 3 — Using jq to process strace output

3.1 — The Problem

3.2 — The Patterns

3.3 — Examples — Enumerating syscalls made by tcpdump

Written by diyinfosec