To find repeated matches by `uniq -d` - uniq

My data as /tmp/1
I run and I get nothing
cat /tmp/1 | uniq -d
This is strange, since uniq -d should
-d Only output lines that are repeated in the input.
How can you use uniq -d?

You have to sort your data before you use uniq. It only removes/detects duplicates on adjacent lines.

Try this to double check, it will output any lines which are duplicated:
cat /tmp/1 | awk 'seen[$0]++ == 1'
Oh, this is your problem:
cat /tmp/1 | sort | uniq -d
Sort it before running uniq!

awk '{_[$0]++}END{for(i in _)if(_[i]>1) print i}' /tmp/1
or just
awk '_[$0]++ == 1' file


Grep with regex expression

I need the content between the fourth and fifth "|" on all lines starting with FHEAD. The goal is to apply the regular expression in grep to read files.
I have this expression that returns all content between "|"
The goal in the example below would be to return
FHEAD|3|8401|230008|8401-|8401-Dcto|8401-Dcto 10FHEAD|1|235211|20190206000001|20190402235959|2||1||8||
Someone can help me?
Thanks in advance
To print the content of the fifth field (non-empty) on lines starting with FHEAD:
awk -F'|' '$1=="FHEAD" && $5!=""{print $5}' file
awk -F '|' '$5=="1047" || $5=="8401-"{ print $0 }" inputfile.txt
Above will find "1047" or "8401" in the fifth column of the inputfile "inputfile.txt"
grep -E "\|1047\||\|8401-\|" inputfile.txt
Above will do the same with grep (but this will not be restricted to column 5.
I must have missed the 'starting with FHEAD'....
awk -F\| '/^FHEAD/{ print $5 }' inputfile.txt
or with grep
grep -e '^FHEAD|\(.[^|]*|\)\{3\}\(.[^|]*\)' -o inputfile.txt | grep '.[^|]*|*' -o | grep -v '|$'
a combination of grep and cut:
grep -e '^FHEAD' inputfile.txt | cut -d'|' -f 5

grep command to find out how many times any character is followed by '.'

I have to find out how often any character is followed by a period (.) with the help of grep. After finding how many times character is followed by period and then I have to sort the result in ascending order.
For example in this string: "Find my input. Output should be obtained. You need to find output."
The output should be something like this:
d 1
t 2
What I have done so far :
cat filename | grep -o "*." | sort -u
But it is not working as intended.
Any ideas how to solve this? I have to perform this operation on huge library of books in .txt files.
An iterative approach with GNU grep:
grep -o '.\.' filename | sort | uniq -c
1 d.
2 t.
grep -Po '.(?=\.)' filename | sort | uniq -c
1 d
2 t
grep -Po '.(?=\.)' filename | sort | uniq -c | awk '{print $2,$1}'
d 1
t 2
With single GNU awk process:
awk -v FPAT='.[.]' 'BEGIN{ PROCINFO["sorted_in"]="#ind_str_asc" }
{ for(i=1;i<=NF;i++) a[substr($i,1,1)]++ }
END{ for(i in a) print i,a[i] }' filename
The output:
d 1
t 2
This one is ok too
echo "Find my input. Output should be obtained. You need to find output."| grep -o ".\." | sort | uniq -c | rev | tr -d .

Extract filenames that matches the pattern and remove duplicates and store in an array

I would like to know the easiest way to list a part of filenames without any duplication present in a directory.
A directory has files like this:
Now I want the result to be:
Here, extract the string that occurs before the first occurrence of "_" and remove if any duplication of the string.
ls -1 | awk '{split($0,a,"_"); print a[1]}' | sort -b | uniq
Only files, with find:
find . -maxdepth 1 -type f -printf "%f\n" | awk '{split($0,a,"_"); print a[1]}' | sort -b | uniq
Using sed
ls -l | sed -r 's/([a-zA-Z0-9])_.*/\1/' | uniq
you can even try this
ls -1 | cut -d "_" -f1 | uniq

Basic grep/sed/awk script to find duplicates

I'm starting out with regular expressions and grep and I want to find out how to do this. I have this list:
1. 12493 6530
2. 12475 5462
3. 12441 5450
4. 12413 5258
5. 12478 4454
6. 12416 3859
7. 12480 3761
8. 12390 3746
9. 12487 3741
10. 12476 3557
And I want to get the contents of the middle column only (so NF==2 in awk?). The delimiter here is a space.
I then want to find which numbers are there more than once (duplicates). How would I go about doing that? Thank you, I'm a beginner.
Using awk :
awk '{count[$2]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file
But you don't have duplicate numbers in the 2nd column.
the second column in awk is $2
count[$2]++ increment an array value with the treated number as key
the END block is executed #the end, and we test each array values to find those having +1
And with a better concision (credits for jthill)
awk '++count[$2]==2{print $2}' file
Using perl:
perl -anE '$h{$F[1]}++; END{ say for grep $h{$_} > 1, keys %h }'
Iterate the lines and build a hash (%h/$h{...}) with the count (++) of the second column values ($F[1]), and after that (END{ ... }) say all hash keys with count ($h{$_}) which is > 1.
With the data stored in test,
Using a combination of awk, uniq and grep commands
cat test | awk -v x=2 '{print $x}' | sort | uniq -c | sed '/^1 /d' | awk -v x=2 '{print $x}'
awk -v x=2 '{print $x}'
selects 2nd column
uniq -c
counts the appearance of each number
sed '/^1 /d'
deletes all the entries with only one appearance
awk -v x=2 '{print $x}'
removes the number count with awk again

using sed to get only line number of "grep -in"

Which regexp should I use to only get line number from grep -in output?
The usual output is something like this:
I need to get only "241113" from sed's output.
I suggest cut
grep -in keyword ... | cut -d: -f1
If you insist with sed:
grep -in keyword ... | sed 's/:.*$//g
You don't need to use sed. Cut is enough. Just pipe grep's output to
cut -d ':' -f 1
As an example:
grep -n blabla file.txt | cut -d ':' -f 1
Personally, I like awk
grep -in 'search' file | awk --field-separator : '{print $1}'
As said in other answers, cut is the right tool; but if you really want to use a swiss-army knife, you can also use awk:
grep -in keyword ... | awk -F: '{print $1}'
or using grep again:
grep -in keyword ... | grep -oE '^[0-9]+'
Just in case someone is wondering if all this could be done without grep, i.e. with sed alone ...
echo '
' |
sed -n '/[Kk][Ee][Yy][Ww][Oo][Rr][Dd]/{=;}'
#sed -n '/[Kk][Ee][Yy][Ww][Oo][Rr][Dd]/{=;q;}' # only line number of first match