Regular expression in perl does not work as expected - regex

I have a simple bash script that uses a line of perl code + regex to extract the necessary piece of string. It looks like
ANSWER=$(host $IPW 2>/dev/null | perl -p -e 's#^.+\s\b([a-zA-Z]{4,8}\d{1,3})(?=-\d\.).+$#\1#;'
It works for the most part, but produces unexpected matches from time to time. Example:
$ echo "Host 31.201.188.199.in-addr.arpa. not found: 3(NXDOMAIN)" | perl -p -e 's#^.+?\s\b([a-zA-Z]{4,8}\d{1,3})(?=-\d\.).+?(?=\.$)#\1#;'
Host 31.201.188.199.in-addr.arpa. not found: 3(NXDOMAIN)
The string is supposed to match parts of string like "server100" (letters + digits) and return the corresponding part. Is there something I am missing or don't understand yet. (sorry for bothering)

Your regex doesn't match, so no substitution is made. The line is therefore printed as is.
If you don't want to print when there is no match, you can use -n instead of -p, plus and print to print the line on successful substitution:
echo "Host 31.201.188.199.in-addr.arpa. not found: 3(NXDOMAIN)" |
perl -n -e 's#^.+?\s\b([a-zA-Z]{4,8}\d{1,3})(?=-\d\.).+?(?=\.$)#\1# and print'

I assume the sample text that you show shouldn't be printed at all?
I suggest that you use a simple match instead of a substitution. I've also removed the superfluous parts of your regex pattern
perl -lne 'print $1 if /.*\s([a-z]{4,8}\d{1,3})(?=-\d\.)/i'

Related

Find multi-line text & replace it, using regex, in shell script

I am trying to find a pattern of two consecutive lines, where the first line is a fixed string and the second has a part substring I like to replace.
This is to be done in sh or bash on macOS.
If I had a regex tool at hand that would operate on the entire text, this would be easy for me. However, all I find is bash's simple text replacement - which doesn't work with regex, and sed, which is line oriented.
I suspect that I can use sed in a way where it first finds a matching first line, and only then looks to replace the following line if its pattern also matches, but I cannot figure this out.
Or are there other tools present on macOS that would let me do a regex-based search-and-replace over an entire file or a string? Maybe with Python (v2.7 and v3 is installed)?
Here's a sample text and how I like it modified:
keyA
value:474
keyB
value:474 <-- only this shall be replaced (follows "keyB")
keyC
value:474
keyB
value:474
Now, I want to find all occurances where the first line is "keyB" and the following one is "value:474", and then replace that second line with another value, e.g. "value:888".
As a regex that ignores line separators, I'd write this:
Search: (\bkeyB\n\s*value):474
Replace: $1:888
So, basically, I find the pattern before the 474, and then replace it with the same pattern plus the new number 888, thereby preserving the original indentation (which is variable).
You can use
sed -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
# Or, to replace the contents of the file inline in FreeBSD sed:
sed -i '' -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
Details:
/keyB$/ - finds all lines that end with keyB
n - empties the current pattern space and reads the next line into it
s/\(.*\):[0-9]*/\1:888/ - find any text up to the last : + zero or more digits capturing that text into Group 1, and replaces with the contents of the group and :888.
The {...} create a block that is executed only once the /keyB$/ condition is met.
See an online sed demo.
Use a perl one-liner with -0777 to scan over multiple lines:
$ # inline edit:
$ perl -0777 -i -pe 's/\bkeyB\s*value):\d*/$1:888/' file.txt
$ # to stdout:
$ cat file.txt | perl -0777 -pe 's/\bkeyB\s*value):\d*/$1:888/'
In plain bash:
#!/bin/bash
keypattern='^[[:blank:]]*keyB$'
valpattern='(.*):'
replacement=888
while read -r; do
printf '%s\n' "$REPLY"
if [[ $REPLY =~ $keypattern ]]; then
read -r
if [[ $REPLY =~ $valpattern ]]; then
printf '%s%s\n' "${BASH_REMATCH[0]}" "$replacement"
else
printf '%s\n' "$REPLY"
fi
fi
done < file

bash - print regex captured groups

I have a file.xml so composed:
...some xml text here...
<Version>1.0.13-alpha</Version>
...some xml text here...
I need to extract the following information:
mayor_and_minor_release_number --> 1.0
patch_number --> 13
suffix --> -alpha
I've thought the cleanest way to achieve that is by mean of a regex with grep command:
<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>
I've checked with regex101 the correctness of this regex and actually it seems to properly capture the 3 fields I'm looking for. But here comes the problem, since I have no idea how to print those fields.
cat file.xml | grep "<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>" -oP
This command prints the entire line so it's quite useless.
Several posts on this site have been written about this topic, so I've also tried to use the bash native
regex support, with poor results:
regex="<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>"
txt=$(cat file.xml)
[[ "$txt" =~ $regex ]] --> it fails!
echo "${BASH_REMATCH[*]}"
I'm sorry but I cannot figure out how to overtake this issue. The desired output should be:
1.0
13
-alpha
You may use this read + sed solution with similar regex as your's:
read -r major minor suffix < <(
sed -nE 's~.*<Version>([0-9]+\.[0-9]+)\.([0-9]+)(-[^<]*)</Version>.*~\1 \2 \3~p' file.xml
)
Check variable contents:
declare -p major minor suffix
declare -- major="1.0"
declare -- minor="13"
declare -- suffix="-alpha"
Few points:
You cannot use \d without using -P (perl) mode in grep
grep command doesn't return capture groups
Use this Perl one-liner:
perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};' file.xml
Example:
echo '<Version>1.0.13-alpha</Version>' | perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};'
Output:
1.0
13
-alpha
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Match multiple patterns in same line using sed [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

Match unknown substring with RegEx

How can I get an unknown substring with an regular expression? I know what's before and after the wanted string but I don't want the known part with in the result.
Example text:
jhgjgjgvocher_SOMETHINGHERE.dbhjjkghjkg
vocher_SOMETHINGELSE.db
I'm looking for 'SOMETHINGHERE' and 'SOMETHINGELSE' only.
vocher_ and .db are always before and after the relevant part but should not be in the result.
A working solution is:
cat test | egrep -o "vocher_.*\.db" | cut -d "_" -f2 | cut -d "." -f1
… but you know it's ugly.
Is it possible to search exactly for an unknown part with regex (in this case only the .* part), or do I need to use something like sed? Is there a better solution?
A simple solution using perl is the following:
perl -ne 'if (/vocher_(.*)\.db/){ print "$1\n";}' test_file.txt
This iterates line-by-line over the file and only prints the desired portion.
Use the following grep approach:
grep -Po '(?<=vocher_).+(?=\.db)' test
-P - allows Perl regular expressions
-o - prints only matched substrings
The output will be like below:
SOMETHINGHERE
SOMETHINGELSE

Print all matches of a regular expression from the command line?

What's the simplest way to print all matches (either one line per match or one line per line of input) to a regular expression on a unix command line? Note that there may be 0 or more than 1 match per line of input.
I assume there must be some way to do this with sed, awk, grep, and/or perl, and I'm hoping for a simple command line solution so it will show up in my bash history when needed in the future.
EDIT: To clarify, I do not want to print all matching lines, only the matches to the regular expression. For example, a line might have 1000 characters, but there are only two 10-character matches to the regular expression. I'm only interested in those two 10-character matches.
Assuming you only use non-capturing parentheses,
perl -wnE'say /yourregex/g'
or
perl -wnE'say for /yourregex/g'
Sample use:
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say for /fo*d/g'
fod
food
fooooood
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say /fo*d/g'
fodfood
fooooood
Unless I misunderstand your question, the following will do the trick
grep -o 'fo.*d' input.txt
For more details see:
GNU grep (most platforms)
Solaris grep
AIX grep
HP-UX grep
Going off the comment, and assuming you're passed the input from a pipe or otherwise on STDIN:
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}'
Usage:
cat SOME_TEXT_FILE | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'YOUR_REGEX'
or I would just stuff that whole mess into a bash function...
bggrep ()
{
if [ "x$1" != "x" ]; then
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' $1;
else
echo "Usage: bggrep <regex>";
fi
}
Usage is the same, just cleaner-looking:
cat SOME_TEXT_FILE | bggrep 'YOUR_REGEX'
(or just type the command itself and enter the text to match line-by-line, but that didn't seem a likely use case :).
Example (from your comment):
bash$ cat garbage
fod,food,fad
bar
fooooooood
bash$ cat garbage | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'fo*d'
fod
food
fooooooood
or...
bash$ cat garbage | bggrep 'fo*d'
fod
food
fooooooood
perl -MSmart::Comments -ne '#a=m/(...)/g;print;' -e '### #a'