Extract a digit out after a quote in bash

Extract a digit out after a quote in bash - regex

I have an output looks like this
[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "
How can I extract 0 out with "grep" using regular expression in bash?
Thank you!

If ok with awk. could you please try following and let me know if this helps you.
awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
OR
your_command | awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}'

Try this grep option:
grep -Po '^([0-9]+) (?=found)'
Demo
The line above uses the pattern ^([0-9]+) (?=found), which says to match a number at the start of the line which is immediately followed by the text found.

There is one sed solution
$ sed 's/^.*"\(.*\) found .*/\1/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new
item is submitted "'
0
If perl counts, here is one more easy way
$ perl -ne 'print $1 if /(\d+) found/' <<< '[1] "0 found |2018-07-15 22:21:09 - no
new item is submitted "'
0

I also come up this solution:
$LOG_FILE is the file that I am going to write all my outputs
grep -oE '\[1\] " *[0-9]' $LOG_FILE | cut -d"\"" -f2 | head -1

Related

Validating specific column in grep

Ok this is driving me crazy. I have a text file with the following content:
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
etc.
I want to filter lines on which the second date is in december, I tried things like:
grep '.*(\d{4}-\d{2}-\d{2}).*(2020-12-).*' > output.txt
grep '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
grep -P '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
But nothing seems to work. Is there any way to accomplish this with either grep, egrep, sed or awk?

You need to use -P option of grep to enable perl compatible regular expressions, could you please try following. Written and tested with your shown samples.
grep -P '("\d+",){4}"[a-zA-Z]+","2020-12-\d{2}"' Input_file
Explanation: Adding explanation for above, following is only for explanation purposes.
grep ##Starting grep command from here.
-P ##Mentioning -P option for enabling PCRE regex with grep.
'("\d+",){4} ##Looking for " digits " comma this combination 4 times here.
"[a-zA-Z]+", ##Then looking for " alphabets ", with this one.
"2020-12-\d{2}" ##Then looking for " 2020-12-07 date " which OP needs.
' Input_file ##Mentioning Input_file name here.

I suggest an alternate solution awk due to input data structured in rows and columns using a common delimiter:
awk -F, '$7 ~ /-12-/' file
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"

Use either grep -P or egrep for short:
$ cat test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
$
$ grep -P '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
$
$ egrep '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
Explanation:
^" - expect a " to start
([^"]*","){6} - scan over all chars other than ", followed by ","; repeat that 6 times
2020-12- - expect 202012-

The problem is in:
egrep '.*\d{4}-\d{2}-\d{2}.2020-12-.' > output.txt
^ HERE
The . just matches a single character, but you want to skip ",", so change to:
egrep '.*\d{4}-\d{2}-\d{2}.+2020-12-.' > output.txt
^^ HERE
The . becomes a .+.

Grep value between strings with regex

$ acpi
Battery 0: Charging, 18%, 01:37:09 until charged
How to grep the battery level value without percentage character (18)?
This should do it but I'm getting an empty result:
acpi | grep -e '(?<=, )(.*)(?=%)'

Your regex is correct but will work with experimental -P or perl mode regex option in gnu grep. You will also need -o to show only matching text.
Correct command would be:
grep -oP '(?<=, )\d+(?=%)'
However, if you don't have gnu grep then you can also use sed like this:
sed -nE 's/.*, ([0-9]+)%.*/\1/p' file
18

Could you please try following, written and tested in link https://ideone.com/nzSGKs
your_command | awk 'match($0,/Charging, [0-9]+%/){print substr($0,RSTART+10,RLENGTH-11)}'
Explanation: Adding detailed explanation for above only for explanation purposes.
your_command | ##Running OP command and passing its output to awk as standrd input here.
awk ' ##Starting awk program from here.
match($0,/Charging, [0-9]+%/){ ##Using match function to match regex Charging, [0-9]+% in line here.
print substr($0,RSTART+10,RLENGTH-11) ##Printing sub string and printing from 11th character from starting and leaving last 11 chars here in matched regex of current line.
}'

Using awk:
awk -F"," '{print $2+0}'
Using GNU sed:
sed -rn 's/.*\, *([0-9]+)\%\,.*/\1/p'

You can use sed:
$ acpi | sed -nE 's/.*Charging, ([[:digit:]]*)%.*/\1/p'
18
Or, if Charging is not always in the string, you can look for the ,:
$ acpi | sed -nE 's/[^,]*, ([[:digit:]]*)%.*/\1/p'

Using bash:
s='Battery 0: Charging, 18%, 01:37:09 until charged'
res="${s#*, }"
res="${res%%%*}"
echo "$res"
Result: 18.
res="${s#*, }" removes text from the beginning to the first comma+space and "${res%%%*}" removes all text from end till (and including) the last occurrence of %.

Get Capture Group and Line before Match using Grep

Suppose I have a file called 'test.txt':
>reference1
fooHappybar
>reference2
fooBirthdaybar
I need a grep command that will capture the string between foo and bar, and the line directly above the match. The command should result in the following output:
>reference1
Happy
>reference2
Birthday
Here is what I have so far:
grep -oP 'foo\K\w+(?=bar)' test.txt
which gives:
Happy
Birthday
I know that grep -B 1 outputs the match and line before the match. I tried:
grep -oP -B 1 'foo\K\w+(?=bar)' test.txt
But that doesn't work.
Any guidance is appreciated.
EDIT:
How would the awk command change if I had this file instead?
>reference1
AGTCTGCAFOOHAPPYBARGTACAC
>reference2
GTACAFOOBIRTHDAYBARGACCAT
expected output:
>reference1
HAPPY
>reference2
BIRTHDAY

Grep solution
grep -zPo '(foo)\K(\w+(?=bar))|.*(?=\n(?1)(?2))' | tr '\0' '\n'
Perl solution
perl -nE '/^foo(.*)bar$/&&say$p.$1;$p=$_'

You may use this awk:
awk '/FOO.+BAR/{gsub(/.*FOO|BAR.*/, ""); print p ORS $0} {p=$0}' file
>reference1
HAPPY
>reference2
BIRTHDAY

I am afraid this is impossible only using grep.
The reason is, that -o disables -B.
Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After

Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

This might work for you (GNU sed):
sed 's/-[^-]*//2g' file

You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u

#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).

This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u

awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After

This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

Print all matches of a regular expression from the command line?

What's the simplest way to print all matches (either one line per match or one line per line of input) to a regular expression on a unix command line? Note that there may be 0 or more than 1 match per line of input.
I assume there must be some way to do this with sed, awk, grep, and/or perl, and I'm hoping for a simple command line solution so it will show up in my bash history when needed in the future.
EDIT: To clarify, I do not want to print all matching lines, only the matches to the regular expression. For example, a line might have 1000 characters, but there are only two 10-character matches to the regular expression. I'm only interested in those two 10-character matches.

Assuming you only use non-capturing parentheses,
perl -wnE'say /yourregex/g'
or
perl -wnE'say for /yourregex/g'
Sample use:
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say for /fo*d/g'
fod
food
fooooood
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say /fo*d/g'
fodfood
fooooood

Unless I misunderstand your question, the following will do the trick
grep -o 'fo.*d' input.txt
For more details see:
GNU grep (most platforms)
Solaris grep
AIX grep
HP-UX grep

Going off the comment, and assuming you're passed the input from a pipe or otherwise on STDIN:
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}'
Usage:
cat SOME_TEXT_FILE | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'YOUR_REGEX'
or I would just stuff that whole mess into a bash function...
bggrep ()
{
if [ "x$1" != "x" ]; then
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' $1;
else
echo "Usage: bggrep <regex>";
fi
}
Usage is the same, just cleaner-looking:
cat SOME_TEXT_FILE | bggrep 'YOUR_REGEX'
(or just type the command itself and enter the text to match line-by-line, but that didn't seem a likely use case :).
Example (from your comment):
bash$ cat garbage
fod,food,fad
bar
fooooooood
bash$ cat garbage | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'fo*d'
fod
food
fooooooood
or...
bash$ cat garbage | bggrep 'fo*d'
fod
food
fooooooood

perl -MSmart::Comments -ne '#a=m/(...)/g;print;' -e '### #a'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract a digit out after a quote in bash - regex

I have an output looks like this [1] "0 found |2018-07-15 22:21:09 - no new item is submitted " How can I extract 0 out with "grep" using regular expression in bash? Thank you!

If ok with awk. could you please try following and let me know if this helps you. awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file OR your_command | awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}'

Try this grep option: grep -Po '^([0-9]+) (?=found)' Demo The line above uses the pattern ^([0-9]+) (?=found), which says to match a number at the start of the line which is immediately followed by the text found.

There is one sed solution $ sed 's/^."\(.\) found .*/\1/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "' 0 If perl counts, here is one more easy way $ perl -ne 'print $1 if /(\d+) found/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "' 0

I also come up this solution: $LOG_FILE is the file that I am going to write all my outputs grep -oE '\[1\] " *[0-9]' $LOG_FILE | cut -d"\"" -f2 | head -1

Related

Validating specific column in grep

Grep value between strings with regex

Get Capture Group and Line before Match using Grep

Remove everything after 2nd occurrence in a string in unix

Print all matches of a regular expression from the command line?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract a digit out after a quote in bash - regex

I have an output looks like this [1] "0 found |2018-07-15 22:21:09 - no new item is submitted " How can I extract 0 out with "grep" using regular expression in bash? Thank you!

If ok with awk. could you please try following and let me know if this helps you. awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file OR your_command | awk 'match($0,/\"[0-9]+/){print substr($0,RSTART+1,RLENGTH-1)}'

Try this grep option: grep -Po '^([0-9]+) (?=found)' Demo The line above uses the pattern ^([0-9]+) (?=found), which says to match a number at the start of the line which is immediately followed by the text found.

There is one sed solution $ sed 's/^.*"\(.*\) found .*/\1/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "' 0 If perl counts, here is one more easy way $ perl -ne 'print $1 if /(\d+) found/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "' 0

I also come up this solution: $LOG_FILE is the file that I am going to write all my outputs grep -oE '\[1\] " *[0-9]' $LOG_FILE | cut -d"\"" -f2 | head -1

Related

Validating specific column in grep

Grep value between strings with regex

Get Capture Group and Line before Match using Grep

Remove everything after 2nd occurrence in a string in unix

Print all matches of a regular expression from the command line?

Categories

Resources

There is one sed solution $ sed 's/^."\(.\) found .*/\1/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "' 0 If perl counts, here is one more easy way $ perl -ne 'print $1 if /(\d+) found/' <<< '[1] "0 found |2018-07-15 22:21:09 - no new item is submitted "' 0