文件行内容交差并集

在统计操作中，经常会遇到一些需要做文件内容集合操作。比如，所有今日登录的账号列表文件a.txt，今日新账号登录列表b.txt, 需要取出今日登录的旧账号（a - b）。

交集

使用uniq
```
cat a.txt b.txt | sort | uniq -d
```
-d：only print duplicate lines, one for each group

使用comm

comm -12  a.txt b.txt

命令参数含义如下：

comm - compare two sorted files line by line
-1     suppress column 1 (lines unique to FILE1)
-2     suppress column 2 (lines unique to FILE2)
-3     suppress column 3 (lines that appear in both files)

使用grep

grep -F -f a.txt b.txt | sort | uniq

命令参数含义如下：

-F, --fixed-strings, --fixed-regexp
              Interpret  PATTERN  as a list of fixed strings, separated by newlines, any of which
              is to be matched.  (-F is specified by POSIX, --fixed-regexp is an obsoleted alias,
              please do not use it in new scripts.)
-f FILE, --file=FILE
              Obtain patterns from FILE, one per line.  The empty file  contains  zero  patterns,
              and therefore matches nothing.  (-f is specified by POSIX.)

差集

使用uniq输出内容唯一的行。（a.txt - b.txt）,因为b.txt被重复输出，于是b文件内容必定重复。而两文件的交集也会重复，于是可以取到只在a文件中的行内容。
```
cat a.txt b.txt b.txt |sort | uniq -u
```
命令参数含义如下：
```
-u, --unique
              only print unique lines
```
使用comm
```
comm -23 a.txt b.txt
```
相当于不输出只出现在第二个文件的内容，并且不输出他们的交集。于是只能输出只在第一个文件a中行。

使用grep.

grep -F -v -f a.txt b.txt | sort | uniq

取反参数含义如是：

-v, --invert-match
              Invert the sense of matching, to select non-matching lines.  (-v  is  specified  by
              POSIX.)

并集

全部输出后去重复即可。

cat a.txt b.txt| sort | uniq

文件行内容交差并集

交集

差集

并集

评论 (0)