Nine ways to compare files on Unix

apple orange
Credit: Thinkstock

Sometimes you want to know if some files are different. Sometimes you want to how they're different. Sometimes you might want to compare files that are compressed and sometimes you might want to compare executables. And, regardless of what you want to compare, you probably want to select the most convenient way to see those differences. The good news is that you have a lot more options than you probably imagine when you need to focus on file differences.

First: diff

The command most likely to come to mind for this task is diff. The diff command will show you the differences between two text files or tell you if two binaries are different, but it also has quite a few very useful options.

For text files, the diff command by default uses a format that shows the differences using < and > characters to represent the first and second of the two files and designations like 1c1 or 8d7 to report these differences in a format that could be used by the patch command.

$ diff file1 file2
1c1
< 0 top of file one
---
> 0 top of file 2
3c3
< 2
---
> 2 two tomatoes
6c6
< 5 five bananas
---
> 5
8d7
< 7 the end

If you just happen to have the patch command on your system, you can compare the two files, save the diff output to a file, and then use that file to force the second file to be the same as the first. The only time you'd likely want to do something like this is if you were trying to update files on a number of systems (rather than simply replace them) as the differences might be very small. You could do it like this:

Create the differences file:

$ diff file1 file2 > diffs

Use the differences file to make the seocnd file just like the first:

$ patch -i diffs file2

At this point, you'd have two identical files. Your file2 would be just like your file1. You could then use the diffs file on any number of systems to update the targeted file.

For most of us, this use of diff is probably not something we'd do very often.

If you only want to know if the files are different, you can try a simpler approach.

$ diff -q file1 file2
Files file1 and file2 differ

Second: side-by-side diff

If you want to see the differences between two files, but not the instructions that patch could use, you might like diff's side-by-side view. Note that lines with differences include a |.

$ diff -y file1 file2
0 top of file one                         | 0 top of file 2
1 one                                       1 one
2                                         | 2 two tomatoes
3 three                                     3 three
4                                           4
5 five bananas                            | 5
6                                           6
7 the end                                 <

Some of the most useful diff options are these:

-b	ignore white space differences
-B	ignore blank lines
-w	ignore all white space
-i	ignore case differences
-y	side-by-side

Third: top and bottom diff

The output from the diff command with the -c option displays the files sequentially with the different lines marked by exclamation points.

$ diff -c file1 file2
*** file1       2017-04-17 11:16:31.687059543 -0400
--- file2       2017-04-17 12:02:44.194623979 -0400
***************
*** 1,8 ****
! 0 top of file one
  1 one
! 2
  3 three
  4
! 5 five bananas
  6
- 7 the end
--- 1,7 ----
! 0 top of file 2
  1 one
! 2 two tomatoes
  3 three
  4
! 5
  6

Fourth: comparing binary files with diff

You can also use the diff command to compare binary files, but it will only tell you if the files are different unless you use the -s option.

$ diff /usr/bin/diff /usr/bin/cmp
Binary files /usr/bin/diff and /usr/bin/cmp differ
$ diff /usr/bin/diff mydiff
$ diff -s /usr/bin/diff mydiff
Files /usr/bin/diff and mydiff are identical

Fifth: cmp

The cmp command tells you if two files are different and where the first difference appears.

Here's an example comparing text files:

$ cmp file1 file2
file1 file2 differ: byte 15, line 1

Here's an example comparing binary files:

$ cmp /usr/bin/diff /usr/bin/cmp
/usr/bin/diff /usr/bin/cmp differ: byte 25, line 1

To illustrate just why we're getting this particular response with the second command above, you can use the od command to view the top of each of these files. What you see below is the heading that is assigned to binary files. The content that represents the coding of these executables begins at the 25th byte.

$ od -bc /usr/bin/diff | head -4
0000000 177 105 114 106 002 001 001 000 000 000 000 000 000 000 000 000
        177   E   L   F 002 001 001  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020 002 000 076 000 001 000 000 000  024 076 100 000 000 000 000 000
        002  \0   >  \0 001  \0  \0  \0 024   >   @  \0  \0  \0  \0  \0
$ od -bc /usr/bin/cmp | head -4
0000000 177 105 114 106 002 001 001 000 000 000 000 000 000 000 000 000
        177   E   L   F 002 001 001  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000020 002 000 076 000 001 000 000 000 060 047 100 000 000 000 000 000
        002  \0   >  \0 001  \0  \0  \0   0   '   @  \0  \0  \0  \0  \0

Sixth: comm

The comm command will display the differences in text files in a different format. In the example below, you can probably see that we're looking at three separate columns of output. The first and second represent the first and second files. The third shows the lines which are the same in both of the two files.

$ comm file1 file2
        0 top of file 2
0 top of file one
                1 one
2
        2 two tomatoes
                3 three
                4
        5
5 five bananas
                6
7 the end

As you can see in the second example below, comparing a file to itself shows all of the output in column 3.

$ comm file1 file1
                0 top of file one
                1 one
                2
                3 three
                4
                5 five bananas
                6
                7 the end

Seventh: cksum

Checksums can also tell you if files are different. While this might not be advantageous when the files are on the same system, it can help a lot when they're on different systems. By running the cksum command on each of the two systems, you can determine if they're the same without having to move either of the files to the other system or share a file system or directory. Note that the cksum command is often used to verify the integrity of system files.

$ cksum /usr/bin/diff /usr/bin/cmp
3928148852 193512 /usr/bin/diff
4012608687 44096 /usr/bin/cmp

You can use an ssh command to get the checksum for the file on a remote system to see if they are the same or different.

$ cksum /usr/bin/diff
3928148852 193512 /usr/bin/diff
$ ssh remhost -l jdoe “/usr/bin/cksum /usr/bin/diff”
3928148852 193512 /usr/bin/diff

Eighth: comparing text files across systems

You can also compare files on two system without having to copy one of the file between systems or compare checksums by using a command like this one:

ssh remhost -l jdoe cat /home/jdoe/file2 | diff – file2

Ninth: diff3

The diff3 command works a lot like diff, but allows you to compare three files instead of only two. However, this command doesn't have all the options that the diff has and, no, there's no diff4, diff5, etc. Comparing files two at a time with your favorite comparison tool is probably a better strategy most of the time.

$ diff3 file1 file2 file3
====
1:1c
  0 top of file one
2:1c
  0 top of file 2
3:0a
====
1:3c
  2
2:3c
  2 two tomatoes
3:2c
  2 two
====
1:5,8c
  4
  5 five bananas
  6
  7 the end
2:5,7c
  4
  5
  6
3:4,7c
  4 four
  5 five
  6 six
  7 seven

Wrap-Up

There are a lot of choices at your disposal when you wanto to compare files on Unix systems. Hopefully you'll find several that you really like using.

This article is published as part of the IDG Contributor Network. Want to Join?

Related:
Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon