Regex contain match that should not match - regex

Given this ; delimited string
hap;; z
z ;d;hh
z;d;hh ;gfg;fdf ;ppp
ap;jj
lo mo;z
d;23
;;io;
b yio;b;12
b
a;b;bb;;;34
I am looking to get columns $1 $2 $3 from any line that contains ap or b or o m in column 1
Using this regex
^(?:(.*?(?:ap|b|o m).*?)(?:;([^\r\n;]*))?(?:;([^\r\n;]*))?(?:;.*)?|.*)$
as shown in this demo one can see that line 11 should not be matching, but it does.
Can not use negated character class to match the before and after sections of column 1, as far as I understand.
Any help making line 11, not match?

You may consider this perl one-liner that works like awk:
perl -F';' -MEnglish -ne 'BEGIN {$OFS=";"} print $F[0],$F[1],$F[2] if $F[0] =~ /ap|b|o m/' file
An awk would be even more simpler:
awk 'BEGIN {FS=OFS=";"} $1 ~ /ap|b|o m/{print $1,$2,$3}' file
hap;; z
ap;jj;
lo mo;z;
b yio;b;12
b ;;

Here is a regex that match your data:
^([^;\n]*(?:ap|b|o m)[^;]*);((?(1)[^;]*));?((?(1)[^;]*))$
You can see it in action.

Related

Only get alphanumeric characters in capture group using sed

Input:
x.y={aaa b .c}
Note that the the content within {} are only an example, in reality it could be any value.
Problem: I would like to keep only the alphanumeric characters within the {}.
So it would be come:
x.y={aaabbc}
Trial 0
$ echo 'x.y={aaa b .c}' | sed 's/[^[:alnum:]]\+//g'
xyaaabc
This is great, but I'd like to only modify the part within {}. So I thought this may need capture groups, hence I went ahead and tried these:
Trial 1
$ echo 'x.y={aaa b .c}' | sed -E 's/x.y=\{(.*)\}/x.y={\1}/'
x.y={aaa b .c}
Here I have captured the content I want to modify (aaa b .c) correctly, but I need a way to somehow do s/[^[:alnum:]]\+//g only on \1.
Instead, I tried capturing all alphanumeric characters only (to \1) like this:
Trial 2
$ echo 'x.y={aaa b .c}' | sed -E 's/x.y=\{([[:alnum:]]+)\}/x.y={\1}/'
x.y={aaa b .c}
Of course, it doesn't work because I'm only expecting alnum's and then immediately a } literal. I didn't tell it to ignore the non-alnum's. I.e, this part:
s/x.y=\{([[:alnum:]]+)\}/x.y={\1}/
^^^^^^^^^^^^^^^^^^
It literally matches: an open brace, some alnum's, and a closing brace -- which is not what I want. I'd like it to match everything, but only capture the alnum's.
Example of input/output:
x.y={aaa b .c} blah
blah
x.y={1 2 3 def} blah
blah
to
x.y={aaabc} blah
blah
x.y={123def} blah
blah
I searched the web before finally giving up and posting the question but I didn't find anything helpful as I didn't see anyone with a similar problem as mine. Would appreciate some help this as I'd love to have a better understanding of variables in regex/sed, thanks!
With your shown samples, please try following in awk. Written and tested in GNU awk.
awk '
match($0,/\{[^}]*}/){
val=substr($0,RSTART,RLENGTH)
gsub(/[^{}a-zA-Z]/,"",val)
$0=substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/\{[^}]*}/){ ##using match function of awk to match from { to first occurrence of }
val=substr($0,RSTART,RLENGTH) ##Creating val which has sub string of matched regex in it.
gsub(/[^{}a-zA-Z]/,"",val) ##Globally substituting everything apart from { } and alphabets in val.
$0=substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH) ##saving everything before match val and everything after match here.
}
1 ##Printing line if it doesn't meet `match` condition mentioned above.
' Input_file ##Mentioning Input_file name here.
Generic solution: In case you have multiple occurrences of { and } then try following awk code.
awk '
{
line=""
while(match($0,/\{[^}]*}/)){
val=substr($0,RSTART,RLENGTH)
gsub(/[^{}a-zA-Z]/,"",val)
line=(line?line:"") (substr($0,1,RSTART-1) val)
$0=substr($0,RSTART+RLENGTH)
}
if(RSTART+RLENGTH!=length($0)){
$0=line $0
}
else{
$0=line
}
}
1
' Input_file
With sed (tested on GNU sed, syntax may vary for other implementations):
$ sed -E ':a s/(\{[[:alnum:]]*)[^[:alnum:]]+([^}]*})/\1\2/; ta' ip.txt
x.y={aaabc} blah
blah
x.y={123def} blah
blah
:a marks that location as label a (used to jump using ta as long as the substitution succeeds)
(\{[[:alnum:]]*) matches { followed by zero or more alnum characaters
[^[:alnum:]]+ matches one or more non-alnum characters
([^}]*}) matches till the next } character
If perl is okay:
$ perl -pe 's/\{\K[^}]+(?=\})/$&=~s|[^a-z\d]+||gir/e' ip.txt
x.y={aaabc} blah
blah
x.y={123def} blah
blah
\{\K[^}]+(?=\}) match sequence of { to } (assuming } cannot occur in between)
\{\K and (?=\}) are used to avoid the braces from being part of the matched portion
e flag allows you to use Perl code in replacement portion, in this case another substitute command
$&=~s|[^a-z\d]+||gir here, $& refers to entire matched portion, gi flags are used for global/case-insensitive and r flag is used to return the value of this substitution instead of modifying $&
[^a-z\d]+ matches non-alphanumeric characters (assuming ASCII, you can also use [^[:alnum:]]+)
use \W+ if you want to preserve underscores as well
For both solutions, you can add x\.y= prefix if needed to narrow the scope of matching.
Here is another gnu-awk solution using FPAT:
s='x.y={aaa b .c}'
awk -v OFS= -v FPAT='{[^}]+}|[^{}]+' '
{
for (i=1; i<=NF; ++i)
if ($i ~ /^{/) $i = "{" gensub(/[^[:alnum:]]+/, "", "g", $i) "}"
} 1' <<< "$s"
x.y={aaabc}

Regular expression with conditional replacement

I am trying to write a RegEx for replacing a character in a string, given that a condition is met. In particular, if the string ends in y, I would like to replace all instances of a to o and delete the final y. To illustrate what I am trying to do with examples:
Katy --> Kot
cat --> cat
Kakaty --> KoKot
avidly --> ovidl
I was using the RegEx s/\(\w*\)a\(\w*\)y$/\1o\2/g but it does not work. I was wondering how would one be able to capture the "conditional" nature of this task with a RegEx.
Your help is always most appreciated.
With GNU sed:
If a line ends with y (/y$/), replace every a with o and replace trailing y with nothing (s/y$//).
sed '/y$/{y/a/o/;s/y$//}' file
Output:
Kot
cat
Kokot
ovidl
You may use awk:
Input:
cat file
Katy
cat
KaKaty
avidly
Command:
awk '/y$/{gsub(/a/, "o"); sub(/.$/, "")} 1' file
Kot
cat
KoKot
ovidl
You could use some sed spaghetti code, but please don't
sed '
s/y$// ; # try to replace trailing y
ta ; # if successful, goto a
bb ; # otherwise, goto b
:a
y/a/o/ ; # replace a with o
:b
'

How can I replace a character on specific strings on a file?

I have to replace the character '.' for an '_' but only on specific regions of the file (the function names), I have a file like this:
\name{function.name.something}
\usage{function.name.something(parameter.something, parameter2.something)}
I was thinking of using notepad++ or sed, and only replace on the captured groups, for example the first line would be:
\\name\{(.+)\}
and replace the with \\name\{\1\}
but with the group 1 (\1) having the dots replaced by underscores
I appreciate any help and thank you
Using gnu-awk:
awk -v FPAT='\\\\name{[^}]+}|\\S+' '{gsub(/\./, "_", $1)} 1' file
\name{function_name_something}
\usage{function.name.something(parameter.something, parameter2.something)}
FPAT='\\\\name{[^}]+}|\\S+' will parse each field using given regex here which is \name{...} OR some non-space string (default awk field).
More testing:
cat file
\name{function.name.something} abc.foo.bar
\usage{function.name.something(parameter.something, parameter2.something)}
awk -v FPAT='\\\\name{[^}]+}|\\S+' '{gsub(/\./, "_", $1)} 1' f
\name{function_name_something} abc.foo.bar
\usage{function_name_something(parameter_something, parameter2.something)}
Perl solution:
< file.txt perl -pe '
($n, $f) = /(\\name|\\usage)\{(.*?[}(])/
and s/\Q$n\E\{\Q$f\E/"$n\{" . ($f=~s=\.=_=gr)/e'
Needs Perl 5.14+, otherwise you have to write
($n, $f) = /(\\name|\\usage)\{(.*?[}(])/
and s/\Q$n\E\{\Q$f\E/"$n\{" . do { ($ff = $f) =~ s=\.=_=g; $ff }/e

Change value on 11th column based on 9th column using sed

I have a text file that has a white space separated values. Column 9th has field that needs to be matched(ice), but column 11th needs substitution based on the match. Example :
a b c d e f g h ice j k l m
Intended output :
a b c d e f g h ice j keep l m
I'm trying use this :
sed -i -r 's/ice [^ ]*/ice keep/' test.log
But it give this :
a b c d e f g h ice keep k l m
Please help me. I'm not familiar with sed and regex.
This is more suitable for awk or any other tool that understands columns:
awk '{if ($9=="ice") {$11="keep"} print}' inputfile
Fields in awk are delimited by space by default. $9 would denote the 9th field. If the 9th field is ice, change the 11th to keep.
For your input, it'd produce:
a b c d e f g h ice j keep l m
You could do it using sed too, but it's not quite straight-forward:
sed -r 's/^(([^ ]+ ){8}ice \S+ )(\S+)/\1keep/' inputfile
Here ([^ ]+ ){8} would match the first 8 fields. (([^ ]+ ){8}ice \S+ ) matches 10 fields with the 9th being ice and captures it into a group that is substituted later making use of a backreference, \1.
This might work for you (GNU sed):
sed -r '/^((\S+\s+){8}ice\s+\S+\s)\S+/s//\1keep/' file
This matches the 9th non-spaced value to ice and then changes the 11th non-spaced value to keep.
in your sample it work but it does not take count of column number, do you realy need the column reference or just the content ?
sed '/ice/ s/k/keep/' YourFile

sed remove and replace certain match

I have the following in a file.
A 01/13/13 \\0101 \\0102 \\0103
C 04/19/13 \\0301 \\0302 \\0303 \\0304 \\0305
F 04/05/13 \\0602 \\0603 \\0604
And i want to replace the first \\ with the letter at the beginning of the line, and an underscore. Its always one letter. And remove everything afterwards. There is only one space between each section of the lines if that helps.
The desired outcome should be
A 01/13/13 A_0101
C 04/19/13 C_0301
F 04/05/13 F_0602
I tried using grep, how can i do this using sed?
One way to do this is to enable extended regular expressions by passing the -r flag.
sed -re 's/^(.) (\S+) \\\\(\S+).*$/\1 \2 \1_\3/' file
Output
A 01/13/13 A_0101
C 04/19/13 C_0301
F 04/05/13 F_0602
awk is probably better suited to handling this
awk '{sub(/../, $1"_", $3); print($1, $2, $3)}' file.txt
A 01/13/13 A_0101
C 04/19/13 C_0301
F 04/05/13 F_0602
You can also use awk like this
awk '{sub(/\\\\/,x);print $1,$2,$1"_"$3}' file
A 01/13/13 A_0101
C 04/19/13 C_0301
F 04/05/13 F_0602