SED - Non greedy regex cant seem to work in sed [duplicate]

SED - Non greedy regex cant seem to work in sed [duplicate] - regex

This question already has answers here:
Non greedy (reluctant) regex matching in sed?
(27 answers)
Closed 6 years ago.
When I run a regex pattern from a online RegEx testing tool on the text below works fine. However, it is not working when using in sed on unix
Text:
<Field1><Field2><Field3>001</Field3></Field2><Field4><FieldDesc>Transaction Successful</FieldDesc></Field4><DtTm><LocalDtTm>2016-07-01-12:05:40.383</LocalDtTm></DtTm><Field5><Field6>N</Field6><Field7></Field7><DtTm><LocalDtTm>2016-07-01-12:05:44.171</LocalDtTm></DtTm></Field5></Field1>
RegEx:
<DtTm>(.*?)<\/DtTm>
Usage in Sed: Looking to remove anything between <DtTm> and </DtTm>
sed 's/<DtTm>(.*?)<\/DtTm>//g'
Expected Output:
<Field1><Field2><Field3>001</Field3></Field2><Field4><FieldDesc>Transaction Successful</FieldDesc></Field4><Field5><Field6>N</Field6><Field7></Field7></Field5></Field1>

GNU sed has two modes, basic and extended. Neither of these, nor the single basic mode of less advanced sed implementations, permit non-greedy specifications. As per the info sed output:
Note that the regular expression matcher is greedy, i.e., matches are attempted from left to right and, if two or more matches are possible starting at the same character, it selects the longest.
So, if you need non-greedy, you will have to choose another tool, such as Perl (or something else supporting PCRE), which is probably what the online testing tool you mentioned is using.
The good thing is, the Perl substitute command is so stunningly similar to the sed one that you can often just change the program name (and possibly use a different delimiter character in complex REs so you don't end up with sawtooths like \/\/\/\/\/):
perl -pe 's|<DtTm>.*?</DtTm>||g'

Related

bash script sed to remove www or www3 or any other www prefix from string [duplicate]

This question already has answers here:
How to extract text from a string using sed?
(5 answers)
Closed 5 years ago.
I am trying to use \d in regex in sed but it doesn't work:
sed -re 's/\d+//g'
But this is working:
sed -re 's/[0-9]+//g'

\d is a switch not a regular expression macro. If you want to use some predefined "constant" instead of [0-9] expression just try run this code:
s/[[:digit:]]+//g

There is no such special character group in sed. You will have to use [0-9].
In GNU sed, \d introduces a decimal character code of one to three digits in the range 0-255.
As indicated in this comment.

You'd better use the Extended pattern in sed by adding -E.
In basic RegExp, \d and some others won't be detected
-E Interpret regular expressions as extended (modern) regular expressions rather than basic regular expressions (BRE's). The re_format(7) manual page fully describes both formats.

How can I translate a regex within vim to work with sed?

I have a string that exists within a text file that I am trying to modify with regex.
"configuration_file_for_wks_33-40"
and I want to modify it so that it looks like this
"configuration_file_for_wks_33-40_6ks"
Within vim I can accomplish this with the following regex command
%s/33-\(\d\d\)/33-\1_6ks/
But if I try to pass that regex command to sed such as
sed 's/33-\(\d\d\)/33-\1_6ks/' input_file.json
The string is not changed, even if I include the -e parameter.
I have also tried to do this using ex as
echo '%s/33-\(\d\d\)/33-\1_6ks/' | ex input_file.json
If I use
sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
then I get
configuration_file_for_wks_33-_6ks40
For that, I've tried various different escaping patterns without any luck.
Can someone help me understand why this changes are not working?

vim has a different syntax for regular expressions (which is even configurable). Unfortunately, sed doesn't understand \d (see https://unix.stackexchange.com/a/414230/304256). With -E, you can match digits with [0-9] or [[:digit:]]:
$ sed -E 's/33-[0-9][0-9]/&_6ks/'
configuration_file_for_wks_33-40_6ks
Note that you can use & in the replacement for adding the entire matched string.
So why is this:
$ sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
configuration_file_for_wks_33-_6ks40
Here, (\d\d)* is simply matched 0 times, so you replace wks_33- by wks_33-_6ks (\1 is a zero-length string) and 40 remains where it was before.

Translation from one language to another is best done with some reference material on hand:
sed BRE syntax
sed ERE syntax
sed classes
sed RE extensions
The superficial reading of which shows that sed doesn't support \d.
Possible alternatives to \d\d:
[[:digit:]]\{2\}
[0-9]\{2\}

How can I translate a regex within vim to work with sed?
Since you write "a regex", I think you refer to any regex.
Translating a Vim regex to a Sed regex is not always possible, because a Vim regex can have lookarounds, whereas a Sed regex has no such things.

GREP Regex not working properly, but my regex is correct [duplicate]

This question already has answers here:
Extract value from a list of key-value pairs using grep
(3 answers)
Closed 3 years ago.
Hopefully this is a simple mistake I am making, I am fairly new to regex in general. Basically I am trying to extract the name of a website from a text file.
myfile.txt example:
Hello please enjoy your stay at%sbananas.com%sfor the rest of the day. Bye now!
I am trying to extract only the word bananas from this. My regex is as follows:
/(?<=m%s)(.*?)(?=\.com)/
Using regexr online it works just fine but in GREP code I just can't figure out how to get this to work properly. It doesn't return any results. I have tried several variants of the following:
grep "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep -E "/(?<=m%s)(.*?)(?=\.com)/" myfile.txt
grep '/(?<=m%s)(.*?)(?=\.com)/' myfile.txt
grep "(?<=m%s)(.*?)(?=\.com)" myfile.txt
grep '(?<=m%s)(.*?)(?=\.com)' myfile.txt
Nothing seems to work. I would love if someone could point me in the right direction.

The problem with regular expressions in grep and other Unix tools is that they usually support one, two or three different kinds of regular expressions. These are:
Basic regular expressions (BRE)
Extended regular expressions (ERE or EREG)
Perl compatible regular expressions (PCRE or PREG)
Your pattern is in PCRE syntax, therefore you need to identify your pattern as one (using -P). Note that I also removed the m between = and % (I don't know what that was supposed to do).
grep -Po "(?<=%s)(.*?)(?=\.com)" myfile.txt
With -o, you say you only want to print the matching part. My grep man page declares PCRE in grep as experimental so there probably might be cases where you'd get a segmentation fault or where the evaluation takes unusually much time.

Why is this white space character following a colon in the grep statement not working in Bash? [duplicate]

This question already has an answer here:
grep regex whitespace behavior
(1 answer)
Closed 4 years ago.
Why does the first grep statement below fail to return results, but the modified grep statement below that works? I have tried egrep as well with same results.
cat test
ALL: 192.168.0.0/255.255.0.0, 10.0.0.0/255.0.0.0
grep '^[\s]*ALL[\s]*:[\s]*192.168.0.0/255.255.0.0[\s]*' test
No results
grep '^[\s]*ALL[\s]*: 192.168.0.0/255.255.0.0[\s]*' test
ALL: 192.168.0.0/255.255.0.0, 10.0.0.0/255.0.0.0
Also , when I put a $ at the end, both fail.
grep '^[\s]*ALL[\s]*:[\s]*192.168.0.0/255.255.0.0[\s]*$' test
No results
grep '^[\s]*ALL[\s]*: 192.168.0.0/255.255.0.0[\s]*$' test
No results

grep is guaranteed to implement BRE -- POSIX basic regular expressions. \s is not meaningful in BRE. (Some OS vendors extend the standard, some don't).
Use [[:space:]] instead to have something that works everywhere.
Adding $ to the end of your expression makes it fail because it matches the end of the line. Your line has an extra , 10.0.0.0/255.0.0.0 after the matching portion, so of course that doesn't match $. You could say .*$, but that would be redundant unless you had the -o/--only-matching flag enabled.

Replacing digits in PowerShell doesn't work [duplicate]

This question already has answers here:
What's the difference between .replace and -replace in powershell?
(2 answers)
Closed 4 years ago.
Edit: though the question above is related, this isn't the same question as asking the difference between .replace and -replace, nor does it have the same answer.
Per the Powershell docs
\d matches any digit character.
I have a command (gg, an alias for git grep) that gives the output:
packages/somemodule/index.js:69: log(`woo`)
I'm familiar with regexs, and would like to change the output to :
packages/somemodule/index.js:69 log(`woo`)
I.e. adding a space after the first digits and the colon (if you're interested, this is to make the file openable by an editor). However a digit, one or more times, followed by a colon \d+: doesn't work:
gg 'No previous' | % {$_.replace("\d+:",'xxxx')}
Trying different versions, the \d doesn't work. What am I doing wrong?

Command output is treated as string data. In your code you are calling the [String].Replace() method which does not support regular expressions. For this to work as expected, you need to use PowerShell's -replace operator.
gg 'No previous' | % { $_ -replace '\d+:','xxxx' }
This approach will allow PowerShell to utilize regular expressions for string replacement!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SED - Non greedy regex cant seem to work in sed [duplicate] - regex

Related

bash script sed to remove www or www3 or any other www prefix from string [duplicate]

How can I translate a regex within vim to work with sed?

GREP Regex not working properly, but my regex is correct [duplicate]

Why is this white space character following a colon in the grep statement not working in Bash? [duplicate]

Replacing digits in PowerShell doesn't work [duplicate]

Categories

Resources