Regex not working in Bash

Regex not working in Bash - regex

I have this regex for now
It should catch something like this
org.package;version="[1.0.41, 1.0.51)" and "," optionally if it is not last element.
Also if after package i added .* because the package could be "org.package.util.something" until ";version"
I tried it online in Regex tool and it is working like this
org.package(.*.*)?;version="[[0-9].[0-9].[0-9][0-9],\s[0-9].[0-9].[0-9][0-9])",?
but i dont know what should i change so it can work in bash
package="org.package"
sed -i "s/"$$package.*;version="\[[0-9].[0-9].[0-9][0-9],[[:space:]][0-9].[0-9].[0-9][0-9]\)",?"//g" "$file"

Change the double quotes arround sed command by single quotes, because variable expansion of $package single quotes are closed and double quotes are use arround variable
package="org.package"
sed -i 's/'"$package"'.*;version="\[[0-9].[0-9].[0-9][0-9],[[:space:]][0-9].[0-9].[0-9][0-9]\)",?//g' "$file"
before using command with -i option check the output is correct

There is more than one problem
$$ will be replaced by bash with its PID, that's probably not what you want
online regex evaluators usually use extended regex or perl regex syntax
sed -r will enable extended regex mode. (for grep there's -E and -P)
You use . when you want to match literal dots. However you should be using \., because . actually means "any character" in regular expressions.

Related

How can I translate a regex within vim to work with sed?

I have a string that exists within a text file that I am trying to modify with regex.
"configuration_file_for_wks_33-40"
and I want to modify it so that it looks like this
"configuration_file_for_wks_33-40_6ks"
Within vim I can accomplish this with the following regex command
%s/33-\(\d\d\)/33-\1_6ks/
But if I try to pass that regex command to sed such as
sed 's/33-\(\d\d\)/33-\1_6ks/' input_file.json
The string is not changed, even if I include the -e parameter.
I have also tried to do this using ex as
echo '%s/33-\(\d\d\)/33-\1_6ks/' | ex input_file.json
If I use
sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
then I get
configuration_file_for_wks_33-_6ks40
For that, I've tried various different escaping patterns without any luck.
Can someone help me understand why this changes are not working?

vim has a different syntax for regular expressions (which is even configurable). Unfortunately, sed doesn't understand \d (see https://unix.stackexchange.com/a/414230/304256). With -E, you can match digits with [0-9] or [[:digit:]]:
$ sed -E 's/33-[0-9][0-9]/&_6ks/'
configuration_file_for_wks_33-40_6ks
Note that you can use & in the replacement for adding the entire matched string.
So why is this:
$ sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
configuration_file_for_wks_33-_6ks40
Here, (\d\d)* is simply matched 0 times, so you replace wks_33- by wks_33-_6ks (\1 is a zero-length string) and 40 remains where it was before.

Translation from one language to another is best done with some reference material on hand:
sed BRE syntax
sed ERE syntax
sed classes
sed RE extensions
The superficial reading of which shows that sed doesn't support \d.
Possible alternatives to \d\d:
[[:digit:]]\{2\}
[0-9]\{2\}

How can I translate a regex within vim to work with sed?
Since you write "a regex", I think you refer to any regex.
Translating a Vim regex to a Sed regex is not always possible, because a Vim regex can have lookarounds, whereas a Sed regex has no such things.

Trying to do a simple sed substitution but I'm confused about what needs to be escaped

I have this string: '$'nwnwnwnnn
And want to change it to: { bitset<9>(0bnwnwnwnnn), '$'},
I've looked at many similar questions for different shells using their methods but nothing has worked. I'm generally in zsh but I can use bash or another shell.
The general form I've been trying is this:
sed -E -i new s/(\'.\')([nw]+)/{ bitset<9>(0b\2), \1},/g thing.txt
It should work for any character other than $ and any sequence of n or w.
I'm generally confused as to what I need to escape here. Some answers on this site said to escape the parenthesis in the first part of the substitution.
Am I using -i incorrectly?

You need to escape the parentheses to create a capture group if you're using basic regexp, you don't escape them if you're using extended regexp. The -E option to GNU sed, and the -r option to standard sed, enable extended regexp, so you don't need to escape them.
If you only want to match $ rather than allow any character in the quotes, you need an escaped $.
You need to put the entire s/// command inside quotes, as it must be a single argument to the sed command.
When using -i, it's conventional to put a . before the suffix. Also, the suffix is put on the saved copy of the original file, not the new file that you're creating with the changes, so new is a poor suffix.
sed -E -i .bak "s/('\$')([nw]+)/{ bitset<9>(0b\2), \1},/g" thing.txt

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.

That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson

If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.

A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

Why does sed provide an "invalid content" error on linux but not on mac

I have the following sed extended regular expressions replacement inside a bash script:
sed -i.bak -E 's~^[[:blank:]]*\\iftoggle{[[:alnum:]_]+}{\\input{([[:alnum:]_\/]+)}}{}~\\input{\1}~' file.txt
which should replace strings like
\iftoggle{xx_yy}{\input{xx_yy/zz}}{}
with
\input{xx_yy/zz}
inside file.txt.
This works just fine locally, on OS X, but the script needs to be POSIX. Specifically, it fails on my remote Travis CI build (which uses Linux). While sed -E is not documented for GNU sed, it behaves just like sed -r and seems to work fine, allowing for a POSIX version of sed with extended regular expressions.
The error given is:
sed: -e expression #1, char 81: Invalid content of \{\}
I'm also not sure where the error starts counting characters from, whether it's the beginning of the line, or only that part which is encased in quotes (the expression)?

You don't need ERE here. Using BRE:
sed i.bak 's~^[[:blank:]]*\\iftoggle{[[:alnum:]_][[:alnum:]_]*}{\\input{\([[:alnum:]_\/][[:alnum:]_\/]*\)}}{}~\\input{\1}~' file.txt
{ don't need to be escaped here but ( do.
As + is not part of the BRE, you can replace [[:alnum:]_]+ with [[:alnum:]_][[:alnum:]_]* or with [[:alnum:]_]\{1,\}.
And as a side note, \+ can be used with GNU sed in BRE but keep in mind that it's not portable, it's a GNU extension.

This does not directly answer the question with sed, but provides an alternate simpler way to do this in perl command-line regex search and replacement.
perl -p -e 's|\iftoggle\{(\w+)\}\{\\input\{(\w+)/(\w+)\}\}\{\}|\input\{\2/\3\}|g' file
\input{xx_yy/zz}
Using the word-separator as | and \w+ to match the [[:alnum:]] characters.
For in-place replacement, use the -i flag similar to sed
perl -p -i.bak -e 's|\iftoggle\{(\w+)\}\{\\input\{(\w+)/(\w+)\}\}\{\}|\input\{\2/\3\}|g' file
Regarding Word-characters(\w) in perl POSIX character classes page,
Word characters
A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit); or a connecting punctuation character, such as an underscore ("_"); or a "mark" character (like some sort of accent) that attaches to one of those. It does not match a whole word. To match a whole word, use \w+ . This isn't the same thing as matching an English word, but in the ASCII range it is the same as a string of Perl-identifier characters.
For an input-with multiple folders inside input, e.g.
cat file
\iftoggle{xx_yy}{\input{xx_yy/zz_yy_zz_kk/dude_hjgk}}{}
perl -p -e 's|\iftoggle\{(\w+)\}\{\\input\{(\w+)/(\w+)/(\w+)\}\}\{\}|\input\{\2/\3/\4\}|g' file
\input{xx_yy/zz_yy_zz_kk/dude_hjgk}
Just plug and play as many as capturing groups you want.

Use grep to find strings at the beginning of a line or after a delimiter in Git Bash for Windows

I have such file:
blue|1|red|2
green|3|blue|4
darkblue|0|yellow|3
I want to use grep to find anything containg blue| at the beginning of line or |blue| anywhere, but not any darkblue| or |darkblue| or |blueberry|
I tried to use grep [^|\|]blue\| but Git Bash gives me error:
$ grep [^|\|]blue\| *.*
grep: Unmatched [ or [^
sh.exe": |]blue|: command not found
What did I do wrong? What's the proper way to do it?

Here's a quick & dirty one:
grep -E '(^|\|)blue\|' *
Matches start of line or |, followed by blue|. The important note is that you need extended regular expressions (via egrep or the -E flag) to use the | (or) construct.

Also, note the single quotes around the regular expression.
So, in answer to the OP's "What did I do wrong?",
You forgot to put the regexp in single quotes;
You chose the wrong type of brackets to enclose the alternate expressions; and finally
You forgot to use egrep or the -E flag
It's always easier to see other people's errors; I wish I was a quick to spot my own :-|

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex not working in Bash - regex

Related

How can I translate a regex within vim to work with sed?

Trying to do a simple sed substitution but I'm confused about what needs to be escaped

sed regexp, number reformatting: how to escape for bash

Why does sed provide an "invalid content" error on linux but not on mac

Use grep to find strings at the beginning of a line or after a delimiter in Git Bash for Windows

Categories

Resources