Match string plus any non-whitespace character and insert whitespace

Match string plus any non-whitespace character and insert whitespace - regex

I'm trying to match and replace a string in a lot of files.
String to search for:
</ANON>[any non-whitespace char], e.g. "</ANON>." or "</ANON>)"
I want to stick a whitespace in between the tag and the non-whitespace char.
I have tried to do it with sed using something like:
sed -i -e 's/<\/ANON>/S/<\/ANON> /S/g'
but alas, that doesn't work.
Any help much appreciated.

Try the following:
sed -i -e 's|\(</ANON>\)\([^[:space:]]\)|\1 \2|g' file
It's not Perl and you can't use \S for non-whitespace characters. Also you should capture groups and use them in replacement part. Also you can't use /S because 1) it's wrong 2) slash used by sed for separating parts with pattern, replacement and flags.
P.S. Or you can use Perl if you like:
perl -p -i -e 's|(</ANON>)(\S)|$1 $2|g' file

Related

Replace only single instance of a character using sed

I need to replace only single instance of backslash.
Input: \\apple\\\orange\banana\\\\grape\\\\\
Output: \\apple\\\orangebanana\\\\grape\\\\\
Tried using sed 's/\\//g' which is replacing all backslashes
Note: The previous character to single backslash can be anything including alphanumeric or special characters. And it's a multiline text file.
Appreciate your help on this.

If you want to consider perl then lookahead and lookahead is exactly what you need here:
perl -pe 's~(?<!\\)\\(?!\\)~~g' file
\\apple\\\orangebanana\\\\grape\\\\\
Details
(?<!\\): Negative lookbehind to make sure that previous char is not \
\\: Match a \
(?!\\): Negative lookahead to make sure that next char is not \
If you want to use sed only then I suggest:
sed -E -e ':a' -e 's~(^|[^\\])\\([^\\]|$)~\1\2~g; ta' g

When you want to replace at most one single backslash, you can use
sed -r 's/(.*[^\]|^)\\([^\].*|$)/\1\2/g'
The command is ugly due to the possibility of a line starting or ending with a backslash (need to include the possibility ^ and $).
When you want to get rid off '\al\l \\sin\gle\slas\hes \\\on \\\\a \\\\\l\i\n\e\' , you can remove a backslash from any sequence of backslashes and afterwards put one back at any place where at least one backslash is left:
sed -r 's/\\([\]*)/\1/g;s/([\]+)/\\\1/g'
or, as suggested by #potong,
sed -E 's/\\(\\*)/\1/g;s/(\\+)/\\&/g'
I like the solution, as it mimics someone who removes one of any sequence of backslashes and tries to undo his last operation. The "bug" in his attempt is that the resulting output is missing the single slashes.

With your shown samples, please try following sed code. Written and tested with GNU sed.
sed -E 's/^(\\\\[^\]*\\\\\\)([^\]*)\\(.*)/\1\2\3/' Input_file
Explanation: Using -E option to enable ERE(extended regular expression) for this program. Then using sed's back reference capability(to save matched part into temp buffer which could be used later in substitution part) here. Creating 1st capturing group which has \\apple\\\ in it. In 2nd capturing group it has orange in it then in 3rd capturing group it has rest of line in it. Now if you see carefully we have left \ between orange and banana, which is needed as per OP's required output.

This might work for you (GNU sed):
sed 's/\>\\\<//g' file
Delete a single \ between word boundaries.

sed: What's wrong with this replace command?

I'm trying to replace the package line in a java file using this sed command:
sed -i "s/package org\.objectweb\.asm[.\w]*;/package $package;/" FILE
but it doesn't work as expected. What's wrong with this replace command?

I assume you want to match what \w would match, plus a dot. \w doesn't work like that inside a [] character set. Try [.a-zA-Z0-9_] instead:
sed -i "s/package org\.objectweb\.asm[.a-zA-Z0-9_]*;/package $package;/" FILE

Actually, sed does not support a \w Perl-like shorthand character class that matches digits, letters or underscores.
In the POSIX pattern, instead of \w, you would use [_[:alnum:]] where [:alnum:] is a POSIX character class matching any alphanumeric chars. And [._[:alnum:]] seems to be what you meant to use here:
sed -i "s/package org\.objectweb\.asm[._[:alnum:]]*;/package $package;/" FILE
^^^^^^^^^^^^^^
However, judging by your string structure, you may just match any chars but ; with a negated bracket expression [^;]:
sed -i "s/package org\.objectweb\.asm[^;]*;/package $package;/" FILE
^^^^^

Using sed to match regex

I don't know much about sed, nor regex. I want to replace every line that contains only tabs by the string '0'. There are also lines in my file that contain only '\n'.
Basically I want to use the regular expression ^\h+$ and replace the matches with 0.
I tried:
sed -i 's/^\h+$/0/' file.txt
But it doesn't work

You can use:
sed -i.bak -E 's/^[[:blank:]]+$/0/' file
POSIX character class [[:blank:]] matches a space or tab which is same as \h in PCRE.
-i.bak is to keep original file in file.bak, in case you want to restore.

In sed the tabulator is called \t. One-or-more need a backslash \+:
sed -i -e 's/^\t\+$/0/' file.txt

How to add a line break before and after a regex in a text file?

This is an excerpt from the file I want to edit:
>chr1|-|9|S|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG >chr1|+|9|Y|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
I would a new text file in which I add a line break before ">" and after "somatic" or after "germline", how can I do in R or Unix?
Expected output:
>chr1|-|9|S|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
>chr1|+|9|Y|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG

By the looks of your input, you could simply replace spaces with newlines:
tr -s ' ' '\n' <infile >outfile
(Some tr dialects don't like \n. Try '\012' or a literal newline: opening quote, newline, closing quote.)
If that won't work, you can easily do this in sed. If somatic is static, just hard-code it:
sed -e 's/somatic */&\n/g' -e 's/ >/\n>/g' file >newfile
The usual caveats about different sed dialects apply. Some versions don't like \n for newline, some want a newline or a semicolon instead of multiple -e arguments.
On Linux, you can modify the file in-place:
sed -i 's/somatic */&\
/g
s/ >/\
/g' file
(For variation, I'm showing how to do this if your sed doesn't recognize \n but allows literal newlines, and how to put the script in a single multi-line string.)
On *BSD (including MacOS) you need to add an argument to -i always; sed -i '' ...
If somatic is variable, but you always want to replace the first space after a wedge, try something like
sed 's/\(>[^ ]*\) /\1\n/g'
>[^ ] matches a wedge followed by zero or more non-space characters. The parentheses capture the matched string into \1. Again, some sed variants don't want backslashes in front of the parentheses, or are otherwise just ... different.
If you have very long lines, you might bump into a sed which has problems with that. Maybe try Perl instead. (Luckily, no dialects to worry about!)
perl -i -pe 's/(>[^ ]*) /$1\n/g;s/ >/\n>/g' file
(Skip the -i option if you don't want to modify the input file. Then output will be to standard output.)

(\bsomatic\b|\bgermline\b)|(?=>)
Try this.See demo.Replace by $1\n
http://regex101.com/r/tF5fT5/53
If there's no support for lookahead then try
(\bsomatic\b|\bgermline\b)
Try this.Replace by $1\n.See demo.
http://regex101.com/r/tF5fT5/50
and
(>)
Replace by \n$1.See demo.
http://regex101.com/r/tF5fT5/51

Thank you everyone!
I used:
tr -s ' ' '\n' <infile >outfile
as suggested by tripleee and it worked perfectly!

Insert space after period using sed

I've got a bunch of files that have sentences ending like this: \#.Next sentence. I'd like to insert a space after the period.
Not all occurrences of \#. do not have a space, however, so my regex checks if the next character after the period is a capital letter.
Because I'm checking one character after the period, I can't just do a replace on \#. to \#., and because I don't know what character is following the period, I'm stuck.
My command currently:
sed -i .bak -E 's/\\#\.[A-Z]/<SOMETHING IN HERE>/g' *.tex
How can I grab the last letter of the matching string to use in the replacement regex?
EDIT: For the record, I'm using a BSD version of sed (I'm using OS X) - from my previous question regarding sed, apparently BSD sed (or at least, the Apple version) doesn't always play nice with GNU sed regular expressions.

The right command should be this:
sed -i.bak -E "s/\\\#.(\S)/\\\#. \1/g" *.tex
Whith it, you match any \# followed by non whitespace (\S) and insert a whitespace (what is made by replacing the whole match with '\# ' plus the the non whitespace just found).

Use this sed command:
sed -i.bak -E 's/(\\#\.)([A-Z])/\1 \2/g' *.tex
OR better:
sed -i.bak -E 's/(\\#\.)([^ \t])/\1 \2/g' *.tex
which will insert space if \#. is not followed by any white-space character (not just capital letter).

This might work for you:
sed -i .bak -E 's/\\#\. \?/\\#. /g' *.tex
Explanation:
If there's a space there replace it with a space, otherwise insert a space.

I think the following would be correct:
s/\\#\.[^\s]/\\#. /g
Only replace the expression if it is not followed by a space.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Match string plus any non-whitespace character and insert whitespace - regex

Related

Replace only single instance of a character using sed

sed: What's wrong with this replace command?

Using sed to match regex

How to add a line break before and after a regex in a text file?

Insert space after period using sed

Categories

Resources