Sed replace hyphen with underscore - regex

new to regex and have a problem. I want to replace hyphens with underscores in certain places in a file. To simplify things, let's say I want to replace the first hyphen. Here's an example "file":
dont-touch-these-hyphens
leaf replace-these-hyphens
I want to replace hyphens in all lines found by
grep -P "leaf \w+-" file
I tried
sed -i 's/leaf \(\w+\)-/leaf \1_/g' file
but nothing happens (wrong replacement would have been better than nothing). I've tried a few tweaks but still nothing. Again, I'm new to this so I figure the above "should basically work". What's wrong with it, and how do I get what I want? Thanks.

You can simplify things by using two distinct regex's ; one for matching the lines that need processing, and one for matching what must be modified.
You can try something like this:
$ sed '/^leaf/ s/-/_/' file
dont-touch-these-hyphens
leaf replace_these-hyphens

Just use awk:
$ awk '$1=="leaf"{ sub(/-/,"_",$2) } 1' file
dont-touch-these-hyphens
leaf replace_these-hyphens
It gives you much more precise control over what you're matching (e.g. the above is doing a string instead of regexp comparison on "leaf" and so would work even if that string contained regexp metacharacters like . or *) and what you're replacing (e.g. the above only does the replacement in the text AFTER leaf and so would continue to work even if leaf itself contained -s):
$ cat file
dont-touch-these-hyphens
leaf-foo.*bar replace-these-hyphens
leaf-foobar dont-replace-these-hyphens
Correct output:
$ awk '$1=="leaf-foo.*bar"{ sub(/-/,"_",$2) } 1' file
dont-touch-these-hyphens
leaf-foo.*bar replace_these-hyphens
leaf-foobar dont-replace-these-hyphens
Wrong output:
$ sed '/^leaf-foo.*bar/ s/-/_/' file
dont-touch-these-hyphens
leaf_foo.*bar replace-these-hyphens
leaf_foobar dont-replace-these-hyphens
(note the "-" in leaf-foo being replaced by "_" in each of the last 2 lines, including the one that does not start with the string "leaf-foo.*bar").
That awk script will work as-is using any awk on any UNIX box.

Related

regex in sed removing only the first occurrence from every line

I have the following file I would like to clean up
cat file.txt
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
My desired output is:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
I would like to remove everything between ":" and the first occurence of "or"
I tried sed 's/MNS:d*?or /MNS:/g' though it removes the second "or" as well.
I tried every option in https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/
to no avail. should I create alias sed='perl -pe'? It seems that sed does not properly support regex
perl should be more suitable here because we need Lazy match logic here.
perl -pe 's|(:.*?or +)(.*)|:\2|' Input_file
by using .*?or we are checking for the first nearest match for or string in the line.
This might work for you (GNU sed):
sed '/:.*\<or\>/{s/\<or\>/\n/;s/:.*\n//}' file
If a line contains : followed by the word or, then substitute the first occurrence of the word or with a unique delimiter (e.g.\n) and then remove everything between : and the unique delimiter.
Wrt I would like to remove everything between ":" and the first occurence of "or" - no you wouldn't. The first occurrence of or in the 2nd line of sample input is as the start of orweqqwe. That text immediately after : looks like it could be any set of characters so couldn't it contain a standalone or, e.g. MNS:2 or eqqwe or M+ GYPA*02 or GYPA*N
Given that and the fact it's apparently a fixed number of characters to be removed on every line, it seems like this is what you should really be using:
$ sed 's/:.\{14\}/:/' file
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
If it is sure the or always occurs twice a line as provided example, please try:
sed 's/\(MNS:\).\+ or \(.\+ or .*\)/\1\2/' file.txt
Result:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
Otherwise using perl is a better solution which supports the shortest match as RavinderSingh13 answers.
ex supports lazy matching with \{-}:
ex -s '+%s/:\zs.\{-}or //g|wq' input_file
The pattern :\zs.\{-}or matches any character after the first : up to the first or.

sed regex match and replace any last digit

I have lots of file containing following ipaddress, and i want to replace last digit of ip and look like i am having struggle to come up with correct regex
file1
IPADDR=10.30.2.26
NETMASK=255.255.0.0
GATEWAY=10.30.0.1
I want to replace 10.30.2.26 to 10.30.2.27 using sed but somehow i am missing something, i have tried following.
I have many file which i want to replace and last digit could be anything.
I have tried sed 's/[^IPADDR].$/7/g' file1
how do i match anything between ^IPADDR{anything}$ ?
In your regex, [^IPADDR] is a character class that search for any character except those listed between brackets. I'm not sure that's what you want.
You can use an address instead to find lines starting with IPADDR(/^IPADDR/) and apply the substitution command on it:
sed '/^IPADDR/s/[0-9]$/7/' file
You may use the following command:
sed -r 's/(^IPADDR=[0-9.]+)([0-9]$)/\17/g' file
Prints:
IPADDR=10.30.2.27
NETMASK=255.255.0.0
GATEWAY=10.30.0.1

bash grep regexp - excluding subpattern

I have a script written in bash, with one particular grep command I need to modify.
Generally I have two patterns: A & B. There is a textfile that can contain lines with all possible combinations of those patterns, that is:
"xxxAxxx", "xxxBxxx", "xxxAxxxBxxx", "xxxxxx", where "x" are any characters.
I need to match ALL lines APART FROM the ones containing ONLY "A".
At the moment, it is done with "grep -v (A)", but this is a false track, as this would exclude also lines with "xxxAxxxBxxx" - which are OK for me. This is why it needs modification. :)
The tricky part is that this one grep lies in the middle of a 'multiply-piped' command with many other greps, seds and awks inside. Thus forming a smarter pattern would be the best solution. Others would cause much additional work on changing other commands there, and even would impact another parts of the code.
Therefore, the question is: is there a possibility to match pattern and exclude a subpattern in one grep, but allow them to appear both in one line?
Example:
A file contains those lines:
fooTHISfoo
fooTHISfooTHATfoo
fooTHATfoo
foofoo
and I need to match
fooTHISfooTHATfoo
fooTHATfoo
foofoo
a line with "THIS" is not allowed.
You can use this awk command:
awk '!(/THIS/ && !/THAT/)' file
fooTHISfooTHATfoo
fooTHATfoo
foofoo
Or by reversing the boolean expression:
awk '!/THIS/ || /THAT/' file
fooTHISfooTHATfoo
fooTHATfoo
foofoo
You want to match lines that contain B, or don't contain A. Equivalently, to delete lines containing A and not B. You could do this in sed:
sed -e '/A/{;/B/!d}'
Or in this particular case:
sed '/THIS/{/THAT/!d}' file
Tricky for grep alone. However, replace that with an awk call: Filter out lines with "A" unless there is a "B"
echo "xxxAxxx
xxxBxxx
xxxAxxxBxxx
xxxBxxxAxxx
xxxxxx" | awk '!/A/ || /B/'
xxxBxxx
xxxAxxxBxxx
xxxBxxxAxxx
xxxxxx
grep solution. Uses perl regexp (-P) for Lookaheads (look if there is not, some explanation here).
grep -Pv '^((?!THAT).)*THIS((?!THAT).)*$' file

Grep invert on string matched, not line matched

I'll keep this explanation of why I need help to a mimimum. One of my file directories got hacked through XSS and placed a long string at the beginning of all php files. I've tried to use sed to replace the string with nothing but it won't work because the pattern to match includes many many characters that would need to be escaped.
I found out that I can use fgrep to match a fixed string saved in a pattern file, but I'd like to replace the matched string (NOT THE LINE) in each file, but grep's -v inverts the result on the line, rather than the end of the matched string.
This is the command I'm using on an example file that contains the hacked
fgrep -v -f ~/hacked-string.txt example.php
I need the output to contain the <?php that's at the end of the line (sometimes it's a <style> tag), but the -v option inverts at the end of that line, so the output doesn't contain the <?php at the beginning.
NOTE
I've tried to use the -o or --only-matching which outputs nothing instead:
fgrep -f ~/hacked-string.txt example.php --only-matching -v
Is there another option in grep that I can use to invert on the end of the matched pattern, rather than the line where the pattern was matched? Or alternatively, is there an easier option to replace the hacked string in all .php files?
Here is a small snippet of what's in hacked-string.txt (line breaks added for readability):
]55Ld]55#*<%x5c%x7825bG9}:}.}-}!#*<%x55c%x7825)
dfyfR%x5c%x7827tfs%x5c%x7c%x785c%x5c%x7825j:^<!
%x5c%x7825w%x5c%x7860%x5c%x785c^>Ew:25tww**WYsb
oepn)%x5c%x7825bss-%x5c%x7825r%x5c%x7878B%x5c%x
7825h>#]y3860msvd},;uqpuft%x5c%x7860msvd}+;!>!}
%x5c%x7827;!%x5c%x7825V%x5c%x7827{ftmfV%x5e56+9
9386c6f+9f5d816:+946:ce44#)zbssb!>!ssbnpe_GMFT%
x5c5c%x782f#00#W~!%x5c%x7825t2w)##Qtjw)#]82#-#!
#-%x5c%x7825tmw)%x5c%x78w6*%x5c%x787f_*#fubfsdX
k5%x5c%xf2!>!bssbz)%x5c%x7824]25%x5c%x7824-8257
-K)fujs%x5c%x7878X6<#o]o]Y%x5c%x78257;utpI#7>-1
-bubE{h%x5c%x7825)sutcvt)!gj!|!*bubEpqsut>j%x5c
%x7825!*72!%x5c%x7827!hmg%x5c%x78225>2q%x5c%x7
Thanks in advance!
I think what you are asking is this:
"Is it possible to use the grep utility to remove all instances of a fixed string (which might contain lots of regex metacharacters) from a file?"
In that case, the answer is "No".
What I think you wanted to ask was:
"What is the easiest way to remove all instances of a fixed string (which might contain lots of regex metacharacters) from a file?"
Here's one reasonably simple solution:
delete_string() {
awk -v s="$the_string" '{while(i=index($0,s))$0=substr($0,1,i-1)substr($0,i+length(s))}1'
}
delete_string 'some_hideous_string_with*!"_inside' < original_file > new_file
The shell syntax is slightly fragile; it will break if the string contains an apostrophe ('). However, you can read a raw string from stdin into a variable with:
$ IFS= read -r the_string
absolutely anything here
which will work with any string which doesn't contain a newline or a NUL character. Once you have the string in a variable, you can use the above function:
delete_string "$the_string" < original_file > new_file
Here's another possible one liner, using python:
delete_string() {
python -c 'import sys;[sys.stdout.write(l.replace(r"""'"$1"'""","")) for l in sys.stdin]'
}
This won't handle strings which have three consecutive quotes (""").
Is the hacked string the same in every file?
If the length of hacked string in chars was 1234 then you can use
tail -c +1235 file.php > fixed-file.php
for each infected file.
Note that tail c +1235 tells to start output at 1235th character of the input file.
With perl:
perl -i.hacked -pe "s/\Q$(<hacked-string.txt)\E//g" example.php
Notes:
The $(<file) bit is a bash shortcut to read the contents of a file.
The \Q and \E bits are from perl, they treat the stuff in between as plain characters, ignoring regex metachars.
The -i.hacked option will edit the file in-place, creating a backup "example.php.hacked"

using sed to copy lines and delete characters from the duplicates

I have a file that looks like this:
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
I want it to look like this
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
I thought I could use sed to do this but I can't figure out how to store something in a buffer and then modify it.
Am I even using the right tool?
Thanks
You don't have to get tricky with regular expressions and replacement strings: use sed's p command to print the line intact, then modify the line and let it print implicitly
sed 'p; s/\.png//'
Glenn jackman's response is OK, but it also doubles the rows which do not match the expression.
This one, instead, doubles only the rows which matched the expression:
sed -n 'p; s/\.png//p'
Here, -n stands for "print nothing unless explicitely printed", and the p in s/\.png//p forces the print if substitution was done, but does not force it otherwise
That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:
$ cat input
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
you should use this command:
sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
The result:
$ sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
This commands is just a replacement command (s///). It matches anything starting with #" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
#"\([^.]*\)\.png",
So follows the replacement part of the command. The & command just inserts everything that was matched by #"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the #" string followed by the content of the first group (\1) and then the string ",.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/#"\([^.]*\)\.png",/&\n#"\1",/' input
I prefer this over Carles Sala and Glenn Jackman's:
sed '/.png/p;s/.png//'
Could just say it's personal preference.
or one can combine both versions and apply the duplication only on lines matching the required pattern
sed -e '/^#".*\.png",/{p;s/\.png//;}' input