regex, repeat, count group - regex

i need some help with a regex that follows up this format:
First part of the string is a email address, followed by eight columns divided by ";".
a.test#test.com;Alex;Test;Alex A.Test;Alex;12;34;56;78
the first part i have is (.*#.*com)
these are also possible source strings:
a.test#test.com;Alex;;Alex A.Test;;12;34;56;78
a.test#test.com;Alex;;Alex A.Test;Alex;;34;;78
a.test#test.com;Alex;Test;;Alex;12;34;56; and so on

You can try this regex:
^(.*#.*com)(([^";\n]*|"[^"\n]*");){8}(([^";\n]*|"[^"\n]*"))$
If you have a different number of columns after the adress change the number between { and }
For your data here the catches:
1. `a.test#test.com`
2. `56;`
3. `56`
4. `78`
Here the test
If you are sure there will be no " in your strings you can use this:
^(.*#.*com)(([^;\n]*);){8}([^;\n]*)$
Here the test
Edit:
OP suggested this usage:
For use the first regex with sed you need -i -n -E flags and escape the " char.
The result will look like this:
sed -i -n -E "/(.*#.*com)(([^\";\n]*|\"[^\"\n]*\");){8}(([^\";\n]*|\"[^\"\n]*\"))/p"

you can have something like
".*#.*\.com;[A-Z,a-z]*;[A-Z,a-z]*;[A-Z,a-z, ,.,]*;[A-Z,a-z]*;[0-9][0-9];[0-9][0-9];[0-9][0-9];[0-9][0-9]"
Assuming the numbers are only two digit

Using awk you can do this easily:
awk -F ';' '$1 ~ /\.com$/{print NF}' file
9
9
9
cat file
a.test#test.com;Alex;;Alex A.Test;;12;34;56;78
a.test#test.com;Alex;;Alex A.Test;Alex;;34;;78
a.test#test.com;Alex;Test;;Alex;12;34;56; and so on

Related

awk Regular Expression (REGEX) get phone number from file

The following is what I have written that would allow me to display only the phone numbers
in the file. I have posted the sample data below as well.
As I understand (read from left to right):
Using awk command delimited by "," if the first char is an Int and then an int preceded by [-,:] and then an int preceded by [-,:]. Show the 3rd column.
I used "www.regexpal.com" to validate my expression. I want to learn more and an explanation would be great not just the answer.
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
awk -F "," '/^(\d)+([-,:*]\d+)+([-,:*]\d+)*$/ {print $3}' bashuser.csv
bashuser.csv
Jordon,New York,630-150,7234
Jaremy,New York,630-250-7768
Jordon,New York,630*150*7745
Jaremy,New York,630-150-7432
Jordon,New York,630-230,7790
Expected Output:
6301507234
6302507768
....
You could just remove all non int
awk '{gsub(/[^[:digit:]]/, "")}1' file.csv
gsub remove all match
[^[:digit:]] the ^ everything but what is next to it, which is an int [[:digit:]], if you remove the ^ the reverse will happen.
"" means remove or delete in awk inside the gsub statement.
1 means print all, a shortcut for print
In sed
sed 's/[^[:digit:]]*//g' file.csv
Since your desired output always appears to start on field #3, you can simplify your regrex considerably using the following:
awk -F '[*,-]' '{print $3$4$5}'
Proof of concept
$ awk -F '[*,-]' '{print $3$4$5}' < ./bashuser.csv
6301507234
6302507768
6301507745
6301507432
6302307790
Explanation
-F '[*,-]': Use a character class to set the field separators to * OR , OR -.
print $3$4$5: Concatenate the 3rd through 5th fields.
awk is not very suitable because the comma occurs not only as a separator of records, better results will give sed:
sed 's/[^,]\+,[^,]\+,//;s/[^0-9]//g;' bashuser.csv
first part s/[^,]\+,[^,]\+,// removes first two records
second part //;s/[^0-9]//g removes all remaining non-numeric characters

Select a single character in an alphanumeric string in bash

I have an issue with string manipulation in bash. I have a list of names, each name being composed of two parts, chars and numbers: for example
abcdef01234
I want to cut the last character before the numeric part starts, in this case
f
I think there is a regular expression to help me with this but just can't figure it out. AWK/sed solutions are accepted too. Hope someone can help.
Thank you.
In bash it can be done with parameter expansion with substring removal and string indexes, e.g.,
a=abcdef01234 # your string
tmp=${a%%[0-9]*} # remove all numbers from right
echo ${tmp:(-1)} # output last of remaining chars
Output: f
You can use a regexp like [a-zA-Z]+([a-zA-Z])[0-9]+. If you know how to use sed is pretty easy.
Check https://regex101.com/r/XCkKM5/1
The match will be the letter you want.
^\w+([a-zA-Z])\d+$
As a sed command (on OSX) this will be :
echo "abcdef12345" | sed -E "s#^[a-zA-Z]+([a-zA-Z])[0-9]+\$#\1#"
try following too once.
echo "abcdef01234" | awk '{match($0,/[a-zA-Z]+/);print substr($0,RLENGTH,1)}'
I have a list of names I assume is a file, file. Using grep's PCRE and (positive) lookahead:
$ grep -oP "[a-z](?=[^a-z])" file
f
It prints out the first (lowercase) letter followed by a non-(lowercase)-letter.

get number value between two strings using regex

I have a string with multiple value outputs that looks like this:
SD performance read=1450kB/s write=872kB/s no error (0 0), ManufactorerID 27 Date 2014/2 CardType 2 Blocksize 512 Erase 0 MaxtransferRate 25000000 RWfactor 2 ReadSpeed 22222222Hz WriteSpeed 22222222Hz MaxReadCurrentVDDmin 3 MaxReadCurrentVDDmax 5 MaxWriteCurrentVDDmin 3 MaxWriteCurrentVDDmax 1
I would like to output only the read value (1450kB/s) using bash and sed.
I tried
sed 's/read=\(.*\)kB/\1/'
but that outputs read=1450kB but I only want the number.
Thanks for any help.
Sample input shortened for demo:
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/read=\(.*\)kB/\1/'
SD performance 1450kB/s write=872/s no error
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/.*read=\(.*\)kB.*/\1/'
1450kB/s write=872
$ echo 'SD performance read=1450kB/s write=872kB/s no error' | sed 's/.*read=\([0-9]*\)kB.*/\1/'
1450
Since entire line has to be replaced, add .* before and after search pattern
* is greedy, will try to match as much as possible, so in 2nd example it can be seen that it matched even the values of write
Since only numbers after read= is needed, use [0-9] instead of .
Running
sed 's/read=\(.*\)kB/\1/'
will replace read=[digits]kB with [digit]. If you want to replace the whole string, use
sed 's/.*read=\([0-9]*\)kB.*/\1/'
instead.
As Sundeep noticed, sed doesn't support non-greedy pattern, updated for [0-9]* instead

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.
You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38
Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.
There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

bash - Add . certain number of characters from end of file

Given
variable=1234567890
or maybe
variable2=12345678901234567890
I need to modify the value so there is always a dot the 9th character from the end. The value could be any number of digits long but the dot must always be right before the 8 last characters.
I am trying to parse a response from https://blockchain.info/rawaddr/ for Bitcoin address balance that always gives a value like "final_balance":12345678901, instead of 123.45678901 (Bitcoin is divisible to 8 decimal places).
I'm guessing this might be possible with sed, but I don't really know. Any help would be appreciated.
For this you can use this sed:
sed -E "s/[0-9]{8}$/.&/"
It catches the last 8 digits and prints them back with a leading dot.
Samples
$ sed -E "s/[0-9]{8}$/.&/" <<< "$v"
123456789012.34567890
$ v="1234567890"
$ sed -E "s/[0-9]{8}$/.&/" <<< "$v"
12.34567890
How about just using parameter substring expansion?
Bash 4
$ echo "${variable2:0:-9}.${variable2: -8}"
12345678901.34567890
Bash 3
$ echo "${variable2:0:(${#variable2}-9)}.${variable2: -8}"
12345678901.34567890
$ echo "${variable:0:(${#variable}-9)}.${variable: -8}"
1.34567890
Parameter expansion saves you spawning another process, so it's very fast.
(Side note: The space between the colon and the first index is necessary to distinguish the substring expression with a negative start-index from another type of parameter expansion:)
${parameter:-word}
Use Default Values. If parameter is unset or null, the expansion of word is substituted. Otherwise, the value of parameter is substituted.
Here is the sed command without -E
echo 12345678901234567890 |sed 's/[0-9]\{8\}$/.&/'
123456789012.34567890