Output the first regex match against STDIN - regex

In bash, I have the following:
#!/bin/bash
curl $1 | tac | tac | perl -e '/(\d\d(?=:\d\d))/g; print $1' > $2
All I want is to the first match from the output of curl and print it to the output file. I run the script with ./scriptname url outputfile.txt but nothing is printed. My regex is valid on http://regexr.com, so I'm sure it's something I don't know about Perl. What am I doing wrong? Thanks.

You can use the following:
#!/bin/bash
curl "$1" | perl -nle'print for /\d\d(?=:\d\d)/g' > "$2"
If you change the match to /script/g, you can see it working with something like
./scriptname http://www.ucsd.edu outputfile.txt

I suppose this means perl -ne is reading the input line by line. Is there a simple way to have perl return only the first result?
Consider using sed:
... | sed '/^.*\([[:digit:]]\{2\}\):[[:digit:]]\{2\}.*/{s//\1/;q};d'
In Perl, that would be:
... | perl -nle 'if (s/^.*(\d\d):\d\d.*/$1/) { print; exit }'
And with GNU Grep compiled with --perl-regexp:
... | grep -m1 -Po '\d\d(?=:\d\d)'

There are a few problems:
You never read from STDIN.
You don't stop trying to match after the first match.
You print unconditionally.
If you want all matches (as per your original question):
perl -nle'print for /\d\d(?=:\d\d)/g'
If you want the first match (as per your comment):
perl -nle'if (/\d\d(?=:\d\d)/) { print $&; exit }'
perl -nle'if (/(\d\d):\d\d/) { print $1; exit }'
grep -Pom1 '\d\d(?=:\d\d)'
Notes:
-n wraps the code with a loop that reads from STDIN.

Related

Regex to Find empty functions in js File using grep

I would like to Find in several files the empty function like
fLocalEvent(){
}
I would like to find using grep
I tried:
grep -Rlz "function fLocalEventoBotoes.*[\n\s]*?}" sob/SOB910C.js
grep -Rlz "fLocalEventoBotoes\((?:(?!\)\s*\{).)*\)\s*\{\s*\}" sob/SOB910C.js - return -bash: !\: event not found
You are better off using something like Perl for this:
echo '
fLegit1(x){ something }
empty1(){}
legit2(){ something }
fLocalEvent(){
}' | perl -0777 -lnE 'say $1 while (/(\w+\(\s*\)\s*{\s*})/g)'
Prints:
empty1(){}
fLocalEvent(){
}
With GNU grep, you can use the -Pzo switches to use that same regex, ignore line endings, and print only the matched section:
echo '
fLegit1(x){ something }
empty1(){}
legit2(){ something }
fLocalEvent(){
}' | grep -Pzo '\w+\(\s*\)\s*{\s*}\s*'
# same output...
This grep should work :
grep -Poz '\w+\(\){\s*}\n?' data

Find regular expression in a file matching a given value

I have some basic knowledge on using regular expressions with grep (bash).
But I want to use regular expressions the other way around.
For example I have a file containing the following entries:
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
Now I want to use bash to figure out to which line a particular number matches.
For example:
grep 8 file
should return:
line_three=[7-9]
Note: I am aware that the example of "grep 8 file" doesn't make sense, but I hope it helps to understand what I am trying to achieve.
Thanks for you help,
Marcel
As others haven pointed out, awk is the right tool for this:
awk -F'=' '8~$2{print $0;}' file
... and if you want this tool to feel more like grep, a quick bash wrapper:
#!/bin/bash
awk -F'=' -v seek_value="$1" 'seek_value~$2{print $0;}' "$2"
Which would run like:
./not_exactly_grep.sh 8 file
line_three=[7-9]
My first impression is that this is not a task for grep, maybe for awk.
Trying to do things with grep I only see this:
for line in $(cat file); do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done
Using while for file reading (following comments):
while IFS= read -r line; do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done < file
This can be done in native bash using the syntax [[ $value =~ $regex ]] to test:
find_regex_matching() {
local value=$1
while IFS= read -r line; do # read from input line-by-line
[[ $line = *=* ]] || continue # skip lines not containing an =
regex=${line#*=} # prune everything before the = for the regex
if [[ $value =~ $regex ]]; then # test whether we match...
printf '%s\n' "$line" # ...and print if we do.
fi
done
}
...used as:
find_regex_matching 8 <file
...or, to test it with your sample input inline:
find_regex_matching 8 <<'EOF'
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
EOF
...which properly emits:
line_three=[7-9]
You could replace printf '%s\n' "$line" with printf '%s\n' "${line%%=*}" to print only the key (contents before the =), if so inclined. See the bash-hackers page on parameter expansion for a rundown on the syntax involved.
This is not built-in functionality of grep, but it's easy to do with awk, with a change in syntax:
/[0-3]/ { print "line one" }
/[4-6]/ { print "line two" }
/[7-9]/ { print "line three" }
If you really need to, you could programmatically change your input file to this syntax, if it doesn't contain any characters that need escaping (mainly / in the regex or " in the string):
sed -e 's#\(.*\)=\(.*\)#/\2/ { print "\1" }#'
As I understand it, you are looking for a range that includes some value.
You can do this in gawk:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
$ awk -v n=8 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[7-9]
Since the digits are being treated as numbers (vs a regex) it supports larger ranges:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[75-95]
line_four=[55-105]
$ awk -v n=92 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[75-95]
line_four=[55-105]
If you are just looking to interpret the right hand side of the = as a regex, you can do:
$ awk -F= -v tgt=8 'tgt~$2' /tmp/file
You would like to do something like
grep -Ef <(cut -d= -f2 file) <(echo 8)
This wil grep what you want but will not display where.
With grep you can show some message:
echo "8" | sed -n '/[7-9]/ s/.*/Found it in line_three/p'
Now you would like to transfer your regexp file into such commands:
sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file
Store these commands in a virtual command file and you will have
echo "8" | sed -nf <(sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file)

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?
I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3
Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file
With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file
pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}
Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

Print all matches of a regular expression from the command line?

What's the simplest way to print all matches (either one line per match or one line per line of input) to a regular expression on a unix command line? Note that there may be 0 or more than 1 match per line of input.
I assume there must be some way to do this with sed, awk, grep, and/or perl, and I'm hoping for a simple command line solution so it will show up in my bash history when needed in the future.
EDIT: To clarify, I do not want to print all matching lines, only the matches to the regular expression. For example, a line might have 1000 characters, but there are only two 10-character matches to the regular expression. I'm only interested in those two 10-character matches.
Assuming you only use non-capturing parentheses,
perl -wnE'say /yourregex/g'
or
perl -wnE'say for /yourregex/g'
Sample use:
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say for /fo*d/g'
fod
food
fooooood
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say /fo*d/g'
fodfood
fooooood
Unless I misunderstand your question, the following will do the trick
grep -o 'fo.*d' input.txt
For more details see:
GNU grep (most platforms)
Solaris grep
AIX grep
HP-UX grep
Going off the comment, and assuming you're passed the input from a pipe or otherwise on STDIN:
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}'
Usage:
cat SOME_TEXT_FILE | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'YOUR_REGEX'
or I would just stuff that whole mess into a bash function...
bggrep ()
{
if [ "x$1" != "x" ]; then
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' $1;
else
echo "Usage: bggrep <regex>";
fi
}
Usage is the same, just cleaner-looking:
cat SOME_TEXT_FILE | bggrep 'YOUR_REGEX'
(or just type the command itself and enter the text to match line-by-line, but that didn't seem a likely use case :).
Example (from your comment):
bash$ cat garbage
fod,food,fad
bar
fooooooood
bash$ cat garbage | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'fo*d'
fod
food
fooooooood
or...
bash$ cat garbage | bggrep 'fo*d'
fod
food
fooooooood
perl -MSmart::Comments -ne '#a=m/(...)/g;print;' -e '### #a'

Return a regex match in a Bash script, instead of replacing it

I just want to match some text in a Bash script. I've tried using sed but I can't seem to make it just output the match instead of replacing it with something.
echo -E "TestT100String" | sed 's/[0-9]+/dontReplace/g'
Which will output TestTdontReplaceString.
Which isn't what I want, I want it to output 100.
Ideally, it would put all the matches in an array.
edit:
Text input is coming in as a string:
newName()
{
#Get input from function
newNameTXT="$1"
if [[ $newNameTXT ]]; then
#Use code that im working on now, using the $newNameTXT string.
fi
}
You could do this purely in bash using the double square bracket [[ ]] test operator, which stores results in an array called BASH_REMATCH:
[[ "TestT100String" =~ ([0-9]+) ]] && echo "${BASH_REMATCH[1]}"
echo "TestT100String" | sed 's/[^0-9]*\([0-9]\+\).*/\1/'
echo "TestT100String" | grep -o '[0-9]\+'
The method you use to put the results in an array depends somewhat on how the actual data is being retrieved. There's not enough information in your question to be able to guide you well. However, here is one method:
index=0
while read -r line
do
array[index++]=$(echo "$line" | grep -o '[0-9]\+')
done < filename
Here's another way:
array=($(grep -o '[0-9]\+' filename))
Pure Bash. Use parameter substitution (no external processes and pipes):
string="TestT100String"
echo ${string//[^[:digit:]]/}
Removes all non-digits.
I Know this is an old topic but I came her along same searches and found another great possibility apply a regex on a String/Variable using grep:
# Simple
$(echo "TestT100String" | grep -Po "[0-9]{3}")
# More complex using lookaround
$(echo "TestT100String" | grep -Po "(?i)TestT\K[0-9]{3}(?=String)")
With using lookaround capabilities search expressions can be extended for better matching. Where (?i) indicates the Pattern before the searched Pattern (lookahead),
\K indicates the actual search pattern and (?=) contains the pattern after the search (lookbehind).
https://www.regular-expressions.info/lookaround.html
The given example matches the same as the PCRE regex TestT([0-9]{3})String
Use grep. Sed is an editor. If you only want to match a regexp, grep is more than sufficient.
using awk
linux$ echo -E "TestT100String" | awk '{gsub(/[^0-9]/,"")}1'
100
I don't know why nobody ever uses expr: it's portable and easy.
newName()
{
#Get input from function
newNameTXT="$1"
if num=`expr "$newNameTXT" : '[^0-9]*\([0-9]\+\)'`; then
echo "contains $num"
fi
}
Well , the Sed with the s/"pattern1"/"pattern2"/g just replaces globally all the pattern1s to pattern 2.
Besides that, sed while by default print the entire line by default .
I suggest piping the instruction to a cut command and trying to extract the numbers u want :
If u are lookin only to use sed then use TRE:
sed -n 's/.*\(0-9\)\(0-9\)\(0-9\).*/\1,\2,\3/g'.
I dint try and execute the above command so just make sure the syntax is right.
Hope this helped.
using just the bash shell
declare -a array
i=0
while read -r line
do
case "$line" in
*TestT*String* )
while true
do
line=${line#*TestT}
array[$i]=${line%%String*}
line=${line#*String*}
i=$((i+1))
case "$line" in
*TestT*String* ) continue;;
*) break;;
esac
done
esac
done <"file"
echo ${array[#]}