Regex to find string without curly braces but "\{", "\}" is allowed - regex

I have a regex to find string without curly braces "([^\{\}]+)". So that it can extract "cde" from follwing string:
"ab{cde}f"
Now I need to escape "{" with "\{" and "}" with "\}".
So if my original string is "ab{cd\{e\}}f" then I need to extract "cd{e}" or "cd\{e\}" (I can remove "\" later).
Thanks in advance.

This should work:
([^{}\\]|\\{|\\})+

To allow escapes inside your braces you can use:
{((?:[^\\{}]+|\\.)*)}
Perl example:
my $str = "ab{cd\\{e\\}} also foo{ad\\}ok\\{a\\{d}";
print "$str\n";
print join ', ', $str =~ /{((?:[^\\{}]+|\\.)*)}/g;
Output:
ab{cd\{e\}} also foo{ad\}ok\{a\{d}
cd\{e\}, ad\}ok\{a\{d

Note that any regex special characters are effectively escaped by putting them inside a range (i.e. square brackets). So:
[.] matches a literal period.
[[] matches a left square bracket.
[a] matches the letter a.
[{] matches a left curly brace.
So:
$ echo "ab{cde}f" | sed -r 's/[^{]*[{](.+)}.*/\1/'
cde
$ echo "ab{c\{d\}e}f" | sed -r 's/[^{]*[{](.+)}.*/\1/'
c\{d\}e
Or:
$ echo "ab{cde}f" | sed 's/[^{]*{//;s/}[^}]*$//'
cde
$ echo "ab{c\{d\}e}f" | sed 's/[^{]*{//;s/}[^}]*$//'
c\{d\}e
Or even:
$ php -r '$s="ab{cde}f"; print preg_replace("/[^{]*[{](.+)}.*", "$1", $s) . "\n";'
cde
$ php -r '$s="ab{c\{d\}e}f"; print preg_replace("/[^{]*[{](.+)}.*/", "$1", $s) . "\n";'
c\{d\}e
Obviously, this does not handle escaped backslashes. :-)

\{(.+)\} would extract everything between the first and last curly bracket

Related

Capture word after pattern with slash

I want to extract word1 from:
something /CLIENT_LOGIN:word1 something else
I would like to extract the first word after matching pattern /CLIENT_LOGIN:.
Without the slash, something like this works:
A=something /CLIENT_LOGIN:word1 something else
B=$(echo $A | awk '$1 == "CLIENT_LOGIN" { print $2 }' FS=":")
With the slash though, I can't get it working (I tried putting / and \/ in front of CLIENT_LOGIN). I don't care getting it done with awk, grep, sed, ...
Using sed:
s='=something /CLIENT_LOGIN:word1 something else'
sed -E 's~.* /CLIENT_LOGIN:([^[:blank:]]+).*~\1~' <<< "$s"
word1
Details:
We use ~ as regex delimiter in sed
/CLIENT_LOGIN:([^[:blank:]]+) matches /CLIENT_LOGIN: followed by 1+ non-whitespace characters that is captured in group #1
.* on both sides matches text before and after our match
\1 is used in substitution to put 1st group's captured value back in output
1st solution: With your shown samples, please try following GNU grep solution.
grep -oP '^.*? /CLIENT_LOGIN:\K(\S+)' Input_file
Explanation: Simple explanation would be, using GNU grep's o and P options. Which are responsible for printing exact match and enabling PCRE regex. In main program, using regex ^.*? /CLIENT_LOGIN:\K(\S+): which means using lazy match from starting of value to till /CLIENT_LOGIN: to match very first occurrence of string. Then using \K option to forget till now matched values so tat we can print only required values, which is followed by \S+ which means match all NON-Spaces before any space comes.
2nd solution: Using awk's match function along with its split function to print the required value.
awk '
match($0,/\/CLIENT_LOGIN:[^[:space:]]+/){
split(substr($0,RSTART,RLENGTH),arr,":")
print arr[2]
}
' Input_file
3rd solution: Using GNU awk's FPAT option please try following solution. Simple explanation would be, setting FPAT to /CLIENT_LOGIN: followed by all non-spaces values. In main program of awk using sub to substitute everything till : with NULL for first field and then printing first field.
awk -v FPAT='/CLIENT_LOGIN:[^[:space:]]+' '{sub(/.*:/,"",$1);print $1}' Input_file
Performing a regex match and capturing the resulting string in BASH_REMATCH[]:
$ regex='.*/CLIENT_LOGIN:([^[:space:]]*).*'
$ A='something /CLIENT_LOGIN:word1 something else'
$ unset B
$ [[ "${A}" =~ $regex ]] && B="${BASH_REMATCH[1]}"
$ echo "${B}"
word1
Verifying B remains undefined if we don't find our match:
$ A='something without the desired string'
$ unset B
$ [[ "${A}" =~ $regex ]] && B="${BASH_REMATCH[1]}"
$ echo "${B}"
<<<=== nothing output
Fixing your awk command, you can use
A="/CLIENT_IPADDR:23.4.28.2 /CLIENT_LOGIN:xdfmb1d /MXJ_C"
B=$(echo "$A" | awk 'match($0,/\/CLIENT_LOGIN:[^[:space:]]+/){print substr($0,RSTART+14,RLENGTH-14)}')
See the online demo yielding xdfmb1d. Details:
\/CLIENT_LOGIN: - a /CLIENT_LOGIN: string
[^[:space:]]+ - one or more non-whitespace chars
The pattern above is what awk searches for, and once matched, the part of this match value after /CLIENT_LOGIN: is "extracted" using substr($0,RSTART+14,RLENGTH-14) (where 14 is the length of the /CLIENT_LOGIN: string).

How to process a regular expression after being evaluated (sed)

I need to replace each character of a regular expression, once evaluated, with each character plus the # symbol.
For example:
If the regular expression is: POS[AB]
and the input text is: POSA_____POSB
I want to get this result: P#O#S#A#_____P#O#S#B#
Please, using sed or awk.
I have tried this:
$ echo "POSA_____POSB" | sed "s/POS[AB]/&#/g"
POSA#_____POSB#
$ echo "POSA_____POSB" | sed "s/./&#/g"
P#O#S#A#_#_#_#_#_#P#O#S#B#
But what I need is:
P#O#S#A#_____P#O#S#B#
Thank you in advance.
Best regards,
Octavio
Perl to the resuce!
perl -pe 's/(POS[AB])/$1 =~ s:(.):$1#:gr/ge'
The /e interprets the replacement as code, and it contains another substitution which replaces each character with itself plus #.
In ancient Perls before 5.14 (i.e. without the /r modifier), you need to use a bit more complex
perl -pe 's/(POS[AB])/$x = $1; $x =~ s:(.):$1#:g; $x/ge'
echo "POSA_____POSB" | sed "s/[^_]/&#/g"
or
echo "POSA_____POSB" | sed "s/[POSAB]/&#/g"
Try this regex:
echo "POSA_____POSB" | sed "s/[A-Z]/&#/g"
Output:
P#O#S#A#_____P#O#S#B#
You may replace regex pattern using awk with sub (first matching substring, sed "s///") or gsub (substitute matching substrings globally, sed "s///g") commands. The regex themselves will not differ between sed and awk. In your case you want:
Solution 1
EDIT: edited to match the comments
The following awk will limit substitution to a given substring (e.g.'POSA_____POSB'):
echo "OOPS POSA_____POSB" | awk '{str="POSA_____POSB"}; {gsub(/[POSAB]/,"&#",str)}; {gsub(/'POSA_____POSB'/, str); print $0} '
If your input consist only of matched string, try this:
echo "POSA_____POSB" | awk '{gsub(/[POSAB]/,"&#");}1'
Explanation:
Separate '{}' for each action and explicit print are for clarity sake.
The gsub accepts 3 arguments gsub(pattern, substitution [, target]) where target must be variable (gsub will change it inplace and store result there).
We use var named 'str' and initialize it with value (your string) before doing any substitutions.
The second gsub is there to put modified str into $0 (matches the whole record/line).
The expressions are greedy by default --- they will match the longest string possible.
[] introduces set of characters to be matched: every occurence of any char will be matched. The expression above says awk to match each occurence of any of "POSAB".
Your first regexp does not work as expected for you told sed to match POS ending in any of [AB] (the whole string at once).
In the other expression you told it to match any single character (including "_") when you used: '.' (dot).
If you want to generalize this solution you may use: [\w] expression which will match any of [a-zA-Z0-9_] or [a-z], [A-Z], [0-9] to match lowercase, uppercase letters and numbers respectively.
Solution 2
Note that you might negate character sets with [^] so: [^_] would also work in this particular case.
Explanation:
Negation means: match anything but the character between '[]'. The '^' character must come as first char, right after opening '['.
Sidenotes:
Also it may be good idea to directly indicate you want to match one character at a time with [POSAB]? or [POSAB]{1}.
Also note that some implementations of sed might need -r switch to use extended (more complicated) regexps.
With the given example you can use
echo "POSA_____POSB" | sed -r 's/POS([AB])/P#O#S#\1#/g'
This will fail for more complicated expressions.
When your input is without \v and \r, you can use
echo "POSA_____POSB" |
sed -r 's/POS([AB])/\v&\r/g; :loop;s/\v([^\r])/\1#\v/;t loop; s/[\v\r]//g'

regex for strings optionally surrounded with quotes

I'm trying to build a regex that matches strings which are either surrounded with quotes or have no quotes at either side. Moreover, a string the regex has to match may have quotes in the middle. Here's a result of my efforts at the moment:
^("?+)(.*[^"])(\1)$
It works well with strings having quotes both at start and end, having no quotes at any side or having quotes at start only:
$ echo '"blah "blah" blah"' | perl -ne 'if(/^("?+)(.*[^"])(\1)$/){print "$1\n$2\n$3"}'
"
blah "blah" blah
"
$ echo 'blah "blah" blah' | perl -ne 'if(/^("?+)(.*[^"])(\1)$/){print "$1\n$2\n$3"}'
blah "blah" blah
$ echo '"blah "blah" blah' | perl -ne 'if(/^("?+)(.*[^"])(\1)$/){print "$1\n$2\n$3"}'
But it matches strings having quotes only at end:
$ echo 'blah "blah" blah"' | perl -ne 'if(/^("?+)(.*[^"])(\1)$/){print "$1\n$2\n$3"}'
blah "blah" blah"
Any ideas what's the problem with the regex and how to fix it?
In your last case, ("?+) matches the empty string. (\1) effectively becomes a no-op: It also matches an empty string.
That leaves us with ^(.*[^"])$. This matches because your input string has a non-" character at the end: a newline ("\n").
You can fix this by removing the newline before running the regex (perl -ne 'chomp; ...').
As a side note, you might want to make the middle part of your regex optional. Otherwise it won't match the empty string or a string consisting of two quotes ("").

Bash regex: replace string with any number of characters

I'm trying to remove colouring codes from a string; e.g. from: \033[36;1mDISK\033[0m to: DISK
my regex looks like this: \033.*?m so match '\033' followed by any number of chars, terminated by 'm'
when I search for the pattern, it finds a match; [[ "$var" =~ $regex ]] evaluates to true
however when I try to replace matches, nothing happens and the same string is returned.
Here's my complete script:
regex="\033.*?m"
var="\033[36;1mDISK\033[0m"
if [[ "$var" =~ $regex ]]
then
echo "matches"
echo ${var//$regex}
else
echo "doesn't match!"
fi
The problem appears to be with the match any number of any character part of the regex. I can successfully replace DISK but if I change that to D.*K or D.*?K it fails.
Note in all above cases the pattern claims to match the string but fails when replacing. Not too sure where to go with this now, any help appreciated.
Thanks
The following should do it:
$ var="\033[36;1mDISK\033[0m"
$ newvar=$(printf ${var} | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g")
$ echo ${newvar}
returns:
DISK
Now verify!
$ echo $var | od
0000000 030134 031463 031533 035466 066461 044504 045523 030134
0000020 031463 030133 005155
0000026
$ echo $newvar | od
0000000 044504 045523 000012
0000005
To use the parameter expansion substitution operator, you need to use an extended glob.
shopt -s extglob
newvar=${var//\\033\[*([0-9;])m}
To break it down:
\\033\[ - match the encoded escape character and [.
*([0-9;]) - match zero or more digits or semicolons. You could use +([0-9;]) to (more correctly?) match one or more digits or semicolons
m - the trailing m.

how to replace a string inside perl regex

Is there a way to replace characters from inside the regex?
like so:
find x | xargs perl -pi -e 's/(as dasd asd)/replace(" ","",$1)/'
From OP's comment
code find x | xargs perl -pi -e 's/work_search=1\/ttype=2\/tag=(.*?)">(.*?)<\/a>/work\/\L$1\E\" rel=\"follow\">$2<\/a>/g'
in this case i want $1's spaces be replaced with _
You can use a nested substitution:
$ echo 'foo bar baz' | perl -wpE's/(\w+ \w+ \w+)/ $1 =~ s# ##gr /e'
foobarbaz
Note that the /r modifier requires perl v5.14. For earlier versions, use:
$ echo 'foo bar baz' | perl -wpE's/(\w+ \w+ \w+)/my $x=$1; $x=~s# ##g; $x/e'
foobarbaz
Note also that you need to use a different delimiter for the inner substitution. I used #, as you can see.
As far as I understand, you want to remove the spaces. Is it correct?
You can do:
s/(as) (dasd) (asd)/$1$2$3/