comm, a rundown Link to heading

What’s comm? Link to heading

The comm command in Unix and Linux compares two text files line-by-line. It identifies the lines that are unique to each file and the lines shared by both files. It outputs this information to three columns:

  • The first column has lines unique to the first file.
  • The second column has lines unique to the second file.
  • The third column has lines common to both.

This makes comm useful for identifying differences and overlaps in:

  • datasets
  • config files
  • lists(like this!)
  • and more!

Usage Link to heading

Basic usage: Link to heading

To compare two files, run the command:

comm file1 file2

Options: Link to heading

You have the following options available:

-1 Suppress printing of column 1, lines only in file1. -2 Suppress printing of column 2, lines only in file2. -3 Suppress printing of column 3, lines common to both. -i Case insensitive comparison of lines.

Exit statuses: Link to heading

The comm utility exits 0 on success and returns a status greater than 0 if an error occurs.

Caveat!: Link to heading

The comm utility requires both input files to be sorted. You can sort your files using the sort command before running comm.

Examples: Link to heading

Presenting, our example files! Super complex, I know.

cat file1

cat file2

Base usage: Running the basic command comm file1 file2 shows three columns: lines unique to file1, lines unique to file2, and lines common to both.

comm file1 file2

Suppress file1 output: The command comm -1 file1 file2 hides lines unique to file1, leaving only lines unique to file2 and those common to both files.

comm -1 file1 file2

Suppress file2 output: Using comm -2 file1 file2 removes lines unique to file2, displaying only file1-unique lines and common lines.

comm -2 file1 file2

Only lines unique to file1: The command comm -23 file1 file2 shows only lines found only in file1.

comm -23 file1 file2

Only lines unique to file2: Using comm -13 file1 file2 gives us lines found only in file2.

comm -13 file1 file2

Suppress common lines (only differences): With comm -3 file1 file2, only the lines unique to each file are displayed; the shared lines are omitted.

comm -3 file1 file2

Only lines common to both files: The command comm -12 file1 file2 shows only lines shared by both file1 and file2.

comm -12 file1 file2

Improvements with awk and sort:

What if I want better readability? awk time!

You can get better readability by labeling columns using awk. For example:

comm file1 file2 | awk 'BEGIN{print "file1, file2, common"}1'

This prints a nice header to show the contents of each column.

Some Diffs Between comm and diff Link to heading

comm and diff both compare two files, but they differ. Ha, get it, differ?

comm is for structured, line-by-line comparisons of sorted files. It gives three columns to view, see above. comm is good for sorted lists, logs, structured data where differences between lines matter.

diff is for deeper comparison, getting the exact differences between files. It gives a more detailed view and it doesn’t care about sorting. It shows line changes, often using a format that is good for version control or patching files. Everyone’s done a git diff in their time, same diff. Protip: use delta to enhance your git diffs. It’s cool, check it out!

Basically, use comm when you need structured side-by-side comparisons of sorted files and use diff when you need a precise analysis of the changes between two files, regardless of sorting.

When to use, simplified: Link to heading

Use comm to compare sorted text files or simple comparisons in scripts. It’s great for quickly finding differences/overlaps without getting all complex about it. Its simplicity makes it helpful in scripts and automation, and I just think it’s neat. Marge Simpson potato image here.

In Conclusion Link to heading

comm is simple and effective on the command line for comparing two text files. That it requires sorting is a small limitation, but it is straightforward and flexible. Use it for scripts and automation, try it out!

References: Link to heading