Backreferences in sed returning wrong value

Backreferences in sed returning wrong value - regex

I am trying to replace an expression using sed. The regex works in vim but not in sed. I'm replacing the last dash before the number with a slash so
/www/file-name-1
should return
/www/file-name/1
I am using the following command but it keeps outputting /www/file-name/0 instead
sed 's/-[0-9]/\/\0/g' input.txt
What am I doing wrong?

You must surround between parentheses the data to reference it later, and sed begins to count in 1. To recover all the characters matched without the need of parentheses, it is used the & symbol.
sed 's/-\([0-9]\)/\/\1/g' input.txt
That yields:
/www/file-name/1

You need to capture using parenthesis before you can back reference (which start a \1). Try sed -r 's|(.*)-|\1/|':
$ sed -r 's|(.*)-|\1/|' <<< "/www/file-name-1"
/www/file-name/1
You can use any delimiter with sed so / isn't the best choice when the substitution contains /. The -r option is for extended regexp so the parenthesis don't need to be escaped.

It seems sed under OS X starts counting backreferences at 1. Try \1 instead of \0

Related

Replace only single instance of a character using sed

I need to replace only single instance of backslash.
Input: \\apple\\\orange\banana\\\\grape\\\\\
Output: \\apple\\\orangebanana\\\\grape\\\\\
Tried using sed 's/\\//g' which is replacing all backslashes
Note: The previous character to single backslash can be anything including alphanumeric or special characters. And it's a multiline text file.
Appreciate your help on this.

If you want to consider perl then lookahead and lookahead is exactly what you need here:
perl -pe 's~(?<!\\)\\(?!\\)~~g' file
\\apple\\\orangebanana\\\\grape\\\\\
Details
(?<!\\): Negative lookbehind to make sure that previous char is not \
\\: Match a \
(?!\\): Negative lookahead to make sure that next char is not \
If you want to use sed only then I suggest:
sed -E -e ':a' -e 's~(^|[^\\])\\([^\\]|$)~\1\2~g; ta' g

When you want to replace at most one single backslash, you can use
sed -r 's/(.*[^\]|^)\\([^\].*|$)/\1\2/g'
The command is ugly due to the possibility of a line starting or ending with a backslash (need to include the possibility ^ and $).
When you want to get rid off '\al\l \\sin\gle\slas\hes \\\on \\\\a \\\\\l\i\n\e\' , you can remove a backslash from any sequence of backslashes and afterwards put one back at any place where at least one backslash is left:
sed -r 's/\\([\]*)/\1/g;s/([\]+)/\\\1/g'
or, as suggested by #potong,
sed -E 's/\\(\\*)/\1/g;s/(\\+)/\\&/g'
I like the solution, as it mimics someone who removes one of any sequence of backslashes and tries to undo his last operation. The "bug" in his attempt is that the resulting output is missing the single slashes.

With your shown samples, please try following sed code. Written and tested with GNU sed.
sed -E 's/^(\\\\[^\]*\\\\\\)([^\]*)\\(.*)/\1\2\3/' Input_file
Explanation: Using -E option to enable ERE(extended regular expression) for this program. Then using sed's back reference capability(to save matched part into temp buffer which could be used later in substitution part) here. Creating 1st capturing group which has \\apple\\\ in it. In 2nd capturing group it has orange in it then in 3rd capturing group it has rest of line in it. Now if you see carefully we have left \ between orange and banana, which is needed as per OP's required output.

This might work for you (GNU sed):
sed 's/\>\\\<//g' file
Delete a single \ between word boundaries.

How to use grep/sed/awk, to remove a pattern from beginning of a text file

I have a text file with the following pattern written to it:
TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"
I would like to discard the first part of each line containing
TIME[32.468ms] -(3)-.............
To test the regular expression I've tried the following:
cat myfile.txt | egrep "^TIME\[.*\]\s\s\-\(3\)\-\.+"
This identifies correctly the lines I want. Now, to delete the pattern I've tried:
cat myfile.txt | sed s/"^TIME\[.*\]\s\s\-\(3\)\-\.+"//
but it just seems to be doing the cat, since it shows the content of the complete file and no substitution happens.
What am I doing wrong?
OS: CentOS 7

With your shown samples, please try following grep command. Written and tested with GNU grep.
grep -oP '^TIME\[\d+\.\d+ms\]\s+-\(\d+\)-\.+\K.*' Input_file
Explanation: Adding detailed explanation for above code.
^TIME\[ ##Matching string TIME from starting of value here.
\d+\.\d+ms\] ##Matching digits(1 or more occurrences) followed by dot digits(1 or more occurrences) followed by ms ] here.
\s+-\(\d+\)-\.+ ##Matching spaces91 or more occurrences) followed by - digits(1 or more occurrences) - and 1 or more dots.
\K ##Using \K option of GNU grep to make sure previous match is found in line but don't consider it in printing, print next matched regex part only.
.* ##to match till end of the value.
2nd solution: Adding awk program here.
awk 'match($0,/^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+/){print substr($0,RSTART+RLENGTH)}' Input_file
Explanation: using match function of awk, to match regex ^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+ which will catch text which we actually want to remove from lines. Then printing rest of the text apart from matched one which is actually required by OP.

This awk using its sub() function:
awk 'sub(/^TIME[[][^]]*].*\.+/,"")' file
"TEXT I WANT TO KEEP"
If there is replacement, sub() returns true.

$ cut -d'"' -f2 file
TEXT I WANT TO KEEP

You may use:
s='TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"'
sed -E 's/^TIME\[[^]]*].*\.+//'
"TEXT I WANT TO KEEP"

The \s regex extension may not be supported by your sed.
In BRE syntax (which is what sed speaks out of the box) you do not backslash round parentheses - doing that turns them into regex metacharacters which do not match themselves, somewhat unintuitively. Also, + is just a regular character in BRE, not a repetition operator (though you can turn it into one by similarly backslashing it: \+).
You can try adding an -E option to switch from BRE syntax to the perhaps more familiar ERE syntax, but that still won't enable Perl regex extensions, which are not part of ERE syntax, either.
sed 's/^TIME\[[^][]*\][[:space:]][[:space:]]-(3)-\.*//' myfile.txt
should work on any reasonably POSIX sed. (Notice also how the minus character does not need to be backslash-escaped, though doing so is harmless per se. Furthermore, I tightened up the regex for the square brackets, to prevent the "match anything" regex you had .* from "escaping" past the closing square bracket. In some more detail, [^][] is a negated character class which matches any character which isn't (a newline or) ] or [; they have to be specified exactly in this order to avoid ambiguity in the character class definition. Finally, notice also how the entire sed script should normally be quoted in single quotes, unless you have specific reasons to use different quoting.)
If you have sed -E or sed -r you can use + instead of * but then this complicates the overall regex, so I won't suggest that here.

A simpler one for sed:
sed 's/^[^"]*//' myfile.txt

If the "text you want to keep" always surrounded by the quote like this and only them having the quote in the line starting with "TIME...", then:
sed -n '/^TIME/p' file | awk -F'"' '{print $2}'
should get the line starting with "TIME..." and print the text within the quotes.

Thanks all, for your help.
By the end, I've found a way to make it work:
echo 'TIME[32.468ms] -(3)-.............TEXT I WANT TO KEEP' | grep TIME | sed -r 's/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//'
More generally,
grep TIME myfile.txt | sed -r ‘s/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//’
Cheers,
Pedro

Bash script to enclose words in single quotes

I'm trying to write a bash script to enclose words contained in a file with single quotes.
Word - Hello089
Result - 'Hello089',
I tried the following regex but it doesn't work. This works in Notepad++ with find and replace. I'm not sure how to tweak it to make it work in bash scripting.
sed "s/(.+)/'$1',/g" file.txt > result.txt

Replacement backreferences (also called placeholders) are defined with \n syntax, not $n (this is perl-like backreference syntax).
Note you do not need groups here, though, since you want to wrap the whole lines with single quotation marks. This is also why you do not need the g flags, they are only necessary when you plan to find multiple matches on the same line, input string.
You can use the following POSIX BRE and ERE (the one with -E) solutions:
sed "s/..*/'&',/" file.txt > result.txt
sed -E "s/.+/'&',/" file.txt > result.txt
In the POSIX BRE (first) solution, ..* matches any char and then any 0 or more chars (thus emulating .+ common PCRE pattern). The POSIX ERE (second) solution uses .+ pattern to do the same. The & in the right-hand side is used to insert the whole match (aka \0). Certainly, you may enclose the whole match with capturing parentheses and then use \1, but that is redundant:
sed "s/\(..*\)/'\1',/" file.txt > result.txt
sed -E "s/(.+)/'\1',/" file.txt > result.txt
See the escaping, capturing parentheses in POSIX BRE must be escaped.
See the online sed demo.
s="Hello089";
sed "s/..*/'&',/" <<< "$s"
# => 'Hello089',
sed -E "s/.+/'&',/" <<< "$s"
# => 'Hello089',

$1 is expanded by the shell before sed sees it, but it's the wrong back reference anyway. You need \1. You also need to escape the parentheses that define the capture group. Because the sed script is enclosed in double quotes, you'll need to escape all the backslashes.
$ echo "Hello089" | sed "s/\\(.*\\)/'\1',/g"
'Hello089',
(I don't recall if there is a way to specify single quotes using an ASCII code instead of a literal ', which would allow you to use single quotes around the script.)

sed to add quotes around timestamps

I have a large file that contains timestamps in the following format:
2018-08-22T13:06:04.442774Z
I would like to add double quotes around all the occurrences that match this specific expression. I am trying to use sed, but I don't seem to be able to find the right command. I am trying something around these lines:
sed -e s/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}.[0-9]\{6\}Z/"$0"/g my_file.json
and I am pretty sure that the problem is around my "replace" expression.
How should I correct the command?
Thank you in advance.

You should wrap the sed replacement command with single quotes and use & instead of $0 in the RHS to replace with the whole match:
sed 's/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\.[0-9]\{6\}Z/"&"/g' file > outfile
See the online demo
Also, do not forget to escape the . char if you want to match a dot, and not any character.
You may also remove excessive escapes if you use ERE syntax:
sed -E 's/[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{6}Z/"&"/g'
If you want to change the file inline, use the -i option,
sed -i 's/[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\.[0-9]\{6\}Z/"&"/g' file

The following works:
sed 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\.[0-9]\{6\}Z\)/"\1"/g' my_file.json
multiple modifications:
wrap command in single quotes
use \( and \) to create a group (referenced by '\1` in the replacement section)
escape the '.' and '{' and '}' characters

How to use sed and regex?

I need to use sed to look for all lines in a file with pattern "[whatever]|[whatever]" so I'm using the following regex:
sed '/\"[a-zA-Z0-9]+\|[a-zA-Z0-9]+\"/p' test2.txt
But it's not working because in this file is returning something when it shouldn't
RTV0031605951US|3160595|20/03/2013|0|"Laurie Graham"|"401"
Does anybody know with regex should I use? Thanks in advance

I see three problems with your regular expression:
+ is not a metacharacter, so you need to escape it to get its special meaning.
Similar issue happens with the pipe. Neither it is a metacharacter, so don't escape it to match it literally.
Sed by default prints each line that matches, so add -n that avoids that, if you already use /p that prints it. Otherwise you will have those lines twice in the output.

sed will output anything that is a partial match.
To match only whole lines that match your regex, add ^ and $ to the start/end:
sed '/^\"[a-zA-Z0-9]+\|[a-zA-Z0-9]+\"$/p' test2.txt

sed '/\B\"[ [:alnum:]]\+\"|\"[ [:alnum:]]\+\"\B/!d' file
If you use this in a sed script, do not escape double quotes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Backreferences in sed returning wrong value - regex

You must surround between parentheses the data to reference it later, and sed begins to count in 1. To recover all the characters matched without the need of parentheses, it is used the & symbol. sed 's/-\([0-9]\)/\/\1/g' input.txt That yields: /www/file-name/1

It seems sed under OS X starts counting backreferences at 1. Try \1 instead of \0

Related

Replace only single instance of a character using sed

How to use grep/sed/awk, to remove a pattern from beginning of a text file

Bash script to enclose words in single quotes

sed to add quotes around timestamps

How to use sed and regex?

Categories

Resources