comm
, a rundown
Link to heading
What’s comm
?
Link to heading
The comm
command in Unix and Linux compares two text files line-by-line. It identifies the lines that are unique to each file and the lines shared by both files. It outputs this information to three columns:
- The first column has lines unique to the first file.
- The second column has lines unique to the second file.
- The third column has lines common to both.
This makes comm
useful for identifying differences and overlaps in:
- datasets
- config files
- lists(like this!)
- and more!
Usage Link to heading
Basic usage: Link to heading
To compare two files, run the command:
comm file1 file2
Options: Link to heading
You have the following options available:
-1
Suppress printing of column 1, lines only in file1.
-2
Suppress printing of column 2, lines only in file2.
-3
Suppress printing of column 3, lines common to both.
-i
Case insensitive comparison of lines.
Exit statuses: Link to heading
The comm
utility exits 0 on success and returns a status greater than 0 if an error occurs.
Caveat!: Link to heading
The comm
utility requires both input files to be sorted. You can sort your files using the sort
command before running comm
.
Examples: Link to heading
Presenting, our example files! Super complex, I know.
cat file1
cat file2
Base usage:
Running the basic command comm file1 file2
shows three columns: lines unique to file1, lines unique to file2, and lines common to both.
comm file1 file2
Suppress file1 output:
The command comm -1 file1 file2
hides lines unique to file1, leaving only lines unique to file2 and those common to both files.
comm -1 file1 file2
Suppress file2 output:
Using comm -2 file1 file2
removes lines unique to file2, displaying only file1-unique lines and common lines.
comm -2 file1 file2
Only lines unique to file1:
The command comm -23 file1 file2
shows only lines found only in file1.
comm -23 file1 file2
Only lines unique to file2:
Using comm -13 file1 file2
gives us lines found only in file2.
comm -13 file1 file2
Suppress common lines (only differences):
With comm -3 file1 file2
, only the lines unique to each file are displayed; the shared lines are omitted.
comm -3 file1 file2
Only lines common to both files:
The command comm -12 file1 file2
shows only lines shared by both file1 and file2.
comm -12 file1 file2
Improvements with awk and sort:
What if I want better readability? awk
time!
You can get better readability by labeling columns using awk
. For example:
comm file1 file2 | awk 'BEGIN{print "file1, file2, common"}1'
This prints a nice header to show the contents of each column.
Some Diffs Between comm
and diff
Link to heading
comm
and diff
both compare two files, but they differ. Ha, get it, diff
er?
comm
is for structured, line-by-line comparisons of sorted files. It gives three columns to view, see above. comm
is good for sorted lists, logs, structured data where differences between lines matter.
diff
is for deeper comparison, getting the exact differences between files. It gives a more detailed view and it doesn’t care about sorting. It shows line changes, often using a format that is good for version control or patching files. Everyone’s done a git diff
in their time, same diff. Protip: use delta to enhance your git diff
s. It’s cool, check it out!
Basically, use comm
when you need structured side-by-side comparisons of sorted files and use diff
when you need a precise analysis of the changes between two files, regardless of sorting.
When to use, simplified: Link to heading
Use comm
to compare sorted text files or simple comparisons in scripts. It’s great for quickly finding differences/overlaps without getting all complex about it. Its simplicity makes it helpful in scripts and automation, and I just think it’s neat. Marge Simpson potato image here.
In Conclusion Link to heading
comm
is simple and effective on the command line for comparing two text files. That it requires sorting is a small limitation, but it is straightforward and flexible. Use it for scripts and automation, try it out!
References: Link to heading
- The
man
page forcomm
- The
man
page fordiff
The Linux Command Line, 2nd Edition, William Shotts