Finding intersection of two lists in shell
I got thrown up with some more tasks to perform on shell.
First was to extract numbers from long list of text. For ex, you have text as :
[
NumberLong(90),
NumberLong(123),
NumberLong(218),
NumberLong(221),
NumberLong(294),
NumberLong(317),
NumberLong(319),
NumberLong(322),
NumberLong(328),
NumberLong(344),
NumberLong(348)
]
To simple extract numbers from this file, we’ll use pcregrep command to capture the regex group and output it.
#!/bin/bash
cat input.txt | pcregrep -o1 -i '([0-9]+)' > output.txt
Second was to compare such two lists and result the numbers which are present in first file but not in second and vice-versa
This is where comm kicks in. It keeps as input two lexically sorted inputs and output the result such as :
The comm utility reads file1 and file2, which should be sorted lexically, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files.
#!/bin/bash
comm -23 <(sort first_file.txt ) <(sort second_file.txt) | sort -n
Note that comm command expects input to be lexically sorted and not numerically sorted, hence its necesary to sort the file even if its numerically sorted.