How can I replace a character on specific strings on a file? - regex

I have to replace the character '.' for an '_' but only on specific regions of the file (the function names), I have a file like this:
\name{function.name.something}
\usage{function.name.something(parameter.something, parameter2.something)}
I was thinking of using notepad++ or sed, and only replace on the captured groups, for example the first line would be:
\\name\{(.+)\}
and replace the with \\name\{\1\}
but with the group 1 (\1) having the dots replaced by underscores
I appreciate any help and thank you

Using gnu-awk:
awk -v FPAT='\\\\name{[^}]+}|\\S+' '{gsub(/\./, "_", $1)} 1' file
\name{function_name_something}
\usage{function.name.something(parameter.something, parameter2.something)}
FPAT='\\\\name{[^}]+}|\\S+' will parse each field using given regex here which is \name{...} OR some non-space string (default awk field).
More testing:
cat file
\name{function.name.something} abc.foo.bar
\usage{function.name.something(parameter.something, parameter2.something)}
awk -v FPAT='\\\\name{[^}]+}|\\S+' '{gsub(/\./, "_", $1)} 1' f
\name{function_name_something} abc.foo.bar
\usage{function_name_something(parameter_something, parameter2.something)}

Perl solution:
< file.txt perl -pe '
($n, $f) = /(\\name|\\usage)\{(.*?[}(])/
and s/\Q$n\E\{\Q$f\E/"$n\{" . ($f=~s=\.=_=gr)/e'
Needs Perl 5.14+, otherwise you have to write
($n, $f) = /(\\name|\\usage)\{(.*?[}(])/
and s/\Q$n\E\{\Q$f\E/"$n\{" . do { ($ff = $f) =~ s=\.=_=g; $ff }/e

Related

How to check last 3 chars of a string are alphabets or not using awk?

I want to check if the last 3 letters in column 1 are alphabets and print those rows. What am I doing wrong?
My code :-
awk -F '|' ' {print str=substr( $1 , length($1) - 2) } END{if ($str ~ /^[A-Za-z]/ ) print}' file
cat file
12300USD|0392
abc56eur|97834
238aed|23911
aabccde|38731
73716yen|19287
.*/|982376
0NRT0|928731
expected output :
12300USD|0392
abc56eur|97834
238aed|23911
aabccxx|38731
73716yen|19287
$ awk -F'|' '$1 ~ /[[:alpha:]]{3}$/' file
12300USD|0392
abc56eur|97834
238aed|23911
aabccde|38731
73716yen|19287
Regarding what's wrong with your script:
You're doing the test for alphabetic characters in the END section for the final line read instead of once per input line.
You're trying to use shell variable syntax $str instead of awk str.
You're testing for literal character ranges in the bracket expression instead of using a character class so YMMV on which characters that includes depending on your locale.
You're testing for a string that starts with a letter instead of a string that ends with 3 letters.
Use grep:
grep -P '^[^|]*[A-Za-z]{3}[|]' in_file > out_file
Here, GNU grep uses the following option:
-P : Use Perl regexes.
The regex means this:
^ : Start of the string.
[^|]* : Any non-pipe character, repeated 0 or more times.
[A-Za-z]{3} : 3 letters.
[|] : Literal pipe.
sed -n '/^[^|]*[a-Z][a-Z][a-Z]|/p' file
grep '^[^|]*[a-Z][a-Z][a-Z]|' file
{m,g}awk '!+FS<NF' FS='^[^|]*[A-Za-z][A-Za-z][A-Za-z][|]'
{m,g}awk '$!_!~"[|]"' FS='[A-Za-z][A-Za-z][A-Za-z][|]'
{m,g}awk '($!_~"[|]")<NF' FS='[A-Za-z][A-Za-z][A-Za-z][|]' # to play it safe
12300USD|0392
abc56eur|97834
238aed|23911
aabccde|38731
73716yen|19287

How to match and cut the string with different conditions using sed?

I want to grep the string which comes after WORK= and ignore if there comes paranthesis after that string .
The text looks like this :
//INALL TYPE=GH,WORK=HU.ET.ET(IO)
//INA2 WORK=HU.TY.TY(OP),TYPE=KK
//OOPE2 TYPE=KO,WORK=TEXT.LO1.LO2,TEXT
//OOP2 TYPE=KO,WORK=TEST1.TEST2
//H1 WORK=OP.TEE.GHU,TYPE=IU
So, desirable output should print only :
TEXT.L01.L02
TEST1.TEST2
OP.TEE.GHU
So far , I could just match and cut before WORK= but could not remove WORK= itself:
sed -E 's/(.*)(WORK=.*)/\2/'
I am not sure how to continue . Can anyone help please ?
You can use
sed -n '/WORK=.*([^()]*)/!s/.*WORK=\([^,]*\).*/\1/p' file > newfile
Details:
-n - suppresses the default line output
/WORK=.*([^()]*)/! - if a line contains a WORK= followed with any text and then a (...) substring skips it
s/.*WORK=\([^,]*\).*/\1/p - else, takes the line and removes all up to and including WORK=, and then captures into Group 1 any zero or more chars other than a comma, and then remove the rest of the line; p prints the result.
See the sed demo:
s='//INALL TYPE=GH,WORK=HU.ET.ET(IO)
//INA2 WORK=HU.TY.TY(OP),TYPE=KK
//OOPE2 TYPE=KO,WORK=TEXT.LO1.LO2,TEXT
//OOP2 TYPE=KO,WORK=TEST1.TEST2
//H1 WORK=OP.TEE.GHU,TYPE=IU'
sed -n '/WORK=.*([^()]*)/!s/.*WORK=\([^,]*\).*/\1/p' <<< "$s"
Output:
TEXT.LO1.LO2
TEST1.TEST2
OP.TEE.GHU
Could you please try following awk, written and tested with shown samples in GNU awk.
awk '
match($0,/WORK=[^,]*/){
val=substr($0,RSTART+5,RLENGTH-5)
if(val!~/\([a-zA-Z]+\)/){ print val }
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/WORK=[^,]*/){ ##Using match function to match WORK= till comma comes.
val=substr($0,RSTART+5,RLENGTH-5) ##Creating val with sub string of match regex here.
if(val!~/\([a-zA-Z]+\)/){ print val } ##checking if val does not has ( alphabets ) then print val here.
}
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed):
sed -n '/.*WORK=\([^,]\+\).*/{s//\1/;/(.*)/!p}' file
Extract the string following WORK= and if that string does not contain (...) print it.
This will work if there is only zero or one occurrence of WORK= and that the exclusion depends only on the (...) occurring within that string and not other following fields.
For a global solution with the same stipulations for parens:
sed -n '/WORK=\([^,]\+\)/{s//\n\1\n/;s/[^\n]*\n//;/(.*).*\n/!P;D}' file
N.B. This prints each such string on a separate line an excludes empty strings.

Replacing spaces with underscores within quotes

I need to replace within a large text file all occurrences such as 'yw234DV w-23-sDf wef23s-d-f' with the same strings but with underscores instead of spaces for all spaces within quotes, without replacing any spaces outside quotes with underscores.
I'm trying to find a solution for substitution within vim, but a sed solution would also be much appreciated. The number of tokens in each quote-delimited string may vary.
I've been playing with some regexes in vim, but they're pretty elementary and seem to be missing what I need.
My current attempt:
%s/'{[:alnum:] }*/'\0\_/g
And I'm experimenting with variations on that.
This is most similar to my question, though it is Java:
Replacing spaces within quotes
Sample Input:
'wiUEF7-gvouw ow wo24-RTeih we', 'yt23IT iug-76'
Sample Output:
'wiUEF7-gvouw_ow_wo24-RTeih_we', 'yt23IT_iug-76'
You may try this with VIM, tried this on Macvim:
%s/\%('[^']*'\)*\('[^']*'\)/\=substitute(submatch(1), ' ', '_', 'g')/g
Much simpler solution , Thanks to #SergioAraujo:
#%s/\v%(('[^']*'))/\=substitute(submatch(1),' ', '_', 'g')/g
Not sure however, if below is the outcome you have expected
Output:
'wiUEF7-gvouw_ow_wo24-RTeih_we', 'yt23IT_iug-76'
In perl:
perl -i -pe's{(\x27.*?\x27)}{ (my $subst = $1) =~ tr/ /_/ }ge' yourfile
or with perl5.14 or above:
perl -i -pe's{(\x27.*?\x27)}{ $1 =~ tr/ /_/r }ge'
With this the input file:
$ cat file
'wiUEF7-gvouw ow wo24-RTeih we', 'yt23IT iug-76'
We can convert all spaces inside of single-quotes into underscores with:
$ sed -E ":a; s/^(([^']*'[^']*')*[^']*'[^']*)[[:space:]]/\1_/; ta" file
'wiUEF7-gvouw_ow_wo24-RTeih_we', 'yt23IT_iug-76'
How it works
:a
This creates a label a.
s/^(([^']*'[^']*')*[^']*'[^']*)[[:space:]]/\1_/
This inserts the underscores where we want them.
^(([^']*'[^']*')*[^']*'[^']*)[[:space:]]
This looks for any odd number of single quotes followed by any number of non-quote characters followed by a space. Everything before that space is saved in group 1.
\1_
This replaces the matched text with group 1 followed by an underscore.
ta
If the previous command put any new underscores in the string, then jump back to label a and try again.
Using FPAT variable in gnu awk you can do this:
awk -v OFS=', ' -v FPAT="'[^']*'" '{for (h=1; h<=NF; h++)
{gsub(/[[:blank:]]/, "_", $h); printf "%s%s", $h, (h < NF ? OFS : ORS)}}' file
'wiUEF7-gvouw_ow_wo24-RTeih_we', 'yt23IT_iug-76'

Linux Replace With Variable Containing Double Quotes

I have read the following:
How Do I Use Variables In A Sed Command
How can I use variables when doing a sed?
Sed replace variable in double quotes
I have learned that I can use sed "s/STRING/$var1/g" to replace a string with the contents of a variable. However, I'm having a hard time finding out how to replace with a variable that contains double quotes, brackets and exclamation marks.
Then, hoping to escape the quotes, I tried piping my result though sed 's/\"/\\\"/g' which gave me another error sed: -e expression #1, char 7: unknown command: E'. I was hoping to escape the problematic characters and then do the variable replacement: sed "s/STRING/$var1/g". But I couldn't get that far either.
I figured you guys might know a better way to replace a string with a variable that contains quotes.
File1.txt:
Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file
Variable:
var1=$(cat file1.txt)
Example:
echo "STRING" | sed "s/STRING/$var1/g"
Desired output:
Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file
using awk
$ echo "STRING" | awk -v var="$var1" '{ gsub(/STRING/,var,$0); print $0}'
Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file
-v var="$var1": To use shell variable in awk
gsub(/STRING/,var,$0) : To globally substitute all occurances of "STRING" in whole record $0 with var
Special case : "If your var has & in it " say at the beginning of the line then it will create problems with gsub as & has a special meaning and refers to the matched text instead.
To deal with this situation we've to escape & as follows :
$ echo "STRING" | awk -v var="$var1" '{ gsub(/&/,"\\\\&",var); gsub(/STRING/,var,$0);print $0}'
&Example test here
<tag>"Hello! [world]" This line sucks!</tag>
End example file
The problem isn't the quotes. You're missing the "s" command, leading sed to treat /STRING/ as a line address, and the value of $var1 as a command to execute on matching lines. Also, $var1 has unescaped newlines and a / character that'll cause trouble in the substitution. So add the "s", and escape the relevant characters in $var1:
var1escaped="$(echo "$var1" | sed 's#[\/&]#\\&#; $ !s/$/\\/')"
echo "STRING" | sed "s/STRING/$var1escaped/"
...but realistically, #batMan's answer (using awk) is probably a better solution.
Here is one awk command that gets text-to-be-replaces from a file that may consist of all kind of special characters such as & or \ etc:
awk -v pat="STRING" 'ARGV[1] == FILENAME {
# read replacement text from first file in arguments
a = (a == "" ? "" : a RS) $0
next
}
{
# now run a loop using index function and use substr to get the replacements
s = ""
while( p = index($0, pat) ) {
s = s substr($0, 1, p-1) a
$0 = substr($0, p+length(pat))
}
$0 = s $0
} 1' File1.txt <(echo "STRING")
To be able to handle all kind of special characters properly this command avoids any regex based functions. We use plain text based functions such as index, substr etc.

How to exclude patterns in regex conditionally in bash?

This is the content of input.txt:
hello=123
1234
stack=(23(4))
12341234
overflow=345
=
friends=(987)
Then I'm trying to match all the lines with equal removing the external parenteses (if the line has it).
To be clear, this is the result I'm looking for:
hello=123
stack=23(4)
overflow=345
friends=987
I toughth in something like this:
cat input.txt | grep -Poh '.+=(?=\()?.+(?=\))?'
But does not returns nothing. What am I doing wrong? Do you have any idea to do this? I'm so interested.
Using awk:
awk 'BEGIN{FS=OFS="="} NF==2 && $1!=""{gsub(/^\(|\)$/, "", $2); print}' file
hello=123
stack=23(4)
overflow=345
friends=987
Here is an alternate way with sed:
sed -nr ' # Use n to disable default printing and r for extended regex
/.+=.+/ { # Look for lines with key value pairs separated by =
/[(]/!ba; # If the line does not contain a paren branch out to label a
s/\(([^)]+)\)/\1/; # If the line contains a paren find a subset and print that
:a # Our label
p # print the line
}' file
$ sed -nr '/.+=.+/{/[(]/!ba;s/\(([^)]+)\)/\1/;:a;p}' file
hello=123
stack=23(4)
overflow=345
friends=987