match the first two bits (only) in a digital stream line (byte) using grep - regex

How should I match the first two bits (first occurrence only) in a digital stream line (1 byte line) using grep, in one direction only, i.e. (01 but not 10 in 01051015);
I've been tested:
grep -E '^[0-9]\{2\}$' | grep -Po --color '01' <<< 01051015
> 010-10-- (current output)
$cat -n test.txt
1 0001021113
2 0202031011
3 0103031113
4 ..........
$ grep -oE '^[0-1][0-9]\{2,2\}' | grep -E '(10)' ./test.txt > matchedList.txt
$ cat -n matchedList.txt
1 0001021113
2 0202031011
3 0103031113
But I need to parse and math the first "par occurrence", (in this case '10') ... in that specific order and one direction (like in line 2); so the correct utput should be: 2 0202031011
Tkx in advance
L.

grep -m 1 -e '^01' YourFile
where:
01 is your first 2 bit to find
-m 1 limit to 1st occurence

Related

Shell - Refactoring a string regex that join numbers

I am trying to refacto my script to make it readable and still usable on a single line.
My script do :
a regex on a string (GXXRXXCXX) that get all numbers matched into an array
a string to number for all string in the array (0X -> X)
a join on all numbers with a '.' delimiter
finally, it add a 'v' at the start of the string
The part i am strugguling the most to refacto is the array number (3 2 1) into a join (3.2.1) without using any tmp variable.
code :
GOROCO=G03R02C01
version=v$(tmp=( $(grep -Eo '[[:digit:]]+' <<< $GOROCO | bc) ); echo "${tmp[#]}" | sed 's/ /./g')
process :
G03R02C01
03 02 01
3 2 1
3.2.1
v3.2.1
Using a single sed you can do this:
GOROCO='G03R02C01'
version=$(sed -E 's/[^0-9]+0*/./g; s/^\./v/' <<< "$GOROCO")
# version=v3.2.1
Details:
-E: Enables extended regex mode in sed
s/[^0-9]+0*/./g: Replace all non-digits followed by 0 or more zero by a single dot
s/^\./v/: Replace first dot by a letter v
As an academic exercise here is a pure bash equivalent of doing same:
shopt -s extglob
version="${GOROCO//+([!0-9])*(0)/.}"
version="v${version#.}"
You're looking for paste
$ grep -Eo '[[:digit:]]+' <<< $GOROCO | bc | paste -s -d"."
3.2.1

grep sequence of chars and numbers (grep only)

So I have this problem where I need to use the grep command.
So I got 52 cards of 13 ranks and 4 colors.
The ranks consist of A,2,3,4,5,6,7,8,9,T(for ten),J,Q and K(king). The four colors: c,d,h,s.
Now I've got file (cards.txt) with all possible combinations of 13 cards.
Example: 8cKc6s4dKd8sQc4c2s6h9dTc4h
Now the question is output all combinations that contains 4 cards of the same ranks.
β = { h ∈ H | h contains 4 cards of the same ranks}.
examples:
Kd9dJs5sKs7c5c6cKcJhKhTh7h ∈ β
AdTdTc2d2cTsKh6c3c6s6dKc4h ∉ β
(Problem is I know how to use grep command for sequence of characters but only when they're next to each other. Help plz)
I guess you want back-references (see this link):
grep '\([A23456789TJQK]\).*\1.*\1.*\1' cards.txt
If grep matches a character from A23456789TJQK, then, because of the parentheses \(...\), grep will refer to it as \1 (this is the back-reference).
The pattern can also be written as follows:
grep '\([A23456789TJQK]\)\(.*\1\)\{3\}' cards.txt
while read line
do
if echo $line | sed 's/\(.\)./\1\n/g' | sort | uniq -c | grep -q '^ *4'
then
echo $line
fi
done < cards.txt

Regex: Matching digits that start with 4-9?

Below is my current command with the output. The problem is the line starting with a 2. How can I incorporate to only match lines starting with 4-9?
grep -o -P '(?:(?<!\d)\d{8}(?!\d))' * | sort -u
20100101
71160868
71161452
The grep source is an email so its pretty messy to post here.
You can use:
grep -oP '(?:(?<!\d)[4-9]\d{7}(?!\d))' * | sort -u
[4-9] will match only if first digit is between 4 and 9 followed by any 7 digits.

How to use grep to extract multiple groups

Say I have this file data.txt:
a=0,b=3,c=5
a=2,b=0,c=4
a=3,b=6,c=7
I want to use grep to extract 2 columns corresponding to the values of a and c:
0 5
2 4
3 7
I know how to extract each column separately:
grep -oP 'a=\K([0-9]+)' data.txt
0
2
3
And:
grep -oP 'c=\K([0-9]+)' data.txt
5
4
7
But I can't figure how to extract the two groups. I tried the following, which didn't work:
grep -oP 'a=\K([0-9]+),.+c=\K([0-9]+)' data.txt
5
4
7
I am also curious about grep being able to do so. \K "removes" the previous content that is stored, so you cannot use it twice in the same expression: it will just show the last group. Hence, it should be done differently.
In the meanwhile, I would use sed:
sed -r 's/^a=([0-9]+).*c=([0-9]+)$/\1 \2/' file
it catches the digits after a= and c=, whenever this happens on lines starting with a= and not containing anything else after c=digits.
For your input, it returns:
0 5
2 4
3 7
You could try the below grep command. But note that , grep would display each match in separate new line. So you won't get the format like you mentioned in the question.
$ grep -oP 'a=\K([0-9]+)|c=\K([0-9]+)' file
0
5
2
4
3
7
To get the mentioned format , you need to pass the output of grep to paste or any other commands .
$ grep -oP 'a=\K([0-9]+)|c=\K([0-9]+)' file | paste -d' ' - -
0 5
2 4
3 7
use this :
awk -F[=,] '{print $2" "$6}' data.txt
I am using the separators as = and ,, then spliting on them

Counting regex pattern matches in one line using sed or grep?

I want to count the number of matches there is on one single line (or all lines as there always will be only one line).
I want to count not just one match per line as in
echo "123 123 123" | grep -c -E "123" # Result: 1
Better example:
echo "1 1 2 2 2 5" | grep -c -E '([^ ])( \1){1}' # Result: 1, expected: 2 or 3
You could use grep -o then pipe through wc -l:
$ echo "123 123 123" | grep -o 123 | wc -l
3
Maybe below:
echo "123 123 123" | sed "s/123 /123\n/g" | wc -l
( maybe ugly, but my bash fu is not that great )
Maybe you should convert spaces to newlines first:
$ echo "1 1 2 2 2 5" | tr ' ' $'\n' | grep -c 2
3
Why not use awk?
You could use awk '{print gsub(your_regex,"&")}'
to print the number of matches on each line, or
awk '{c+=gsub(your_regex,"&")}END{print c}'
to print the total number of matches. Note that relative speed may vary depending on which awk implementation is used, and which input is given.
This might work for you:
sed -n -e ':a' -e 's/123//p' -e 'ta' file | sed -n '$='
GNU sed could be written:
sed -n ':;s/123//p;t' file | sed -n '$='