Pass variable as expression for regex lookaround [duplicate] - regex

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 2 years ago.
I'm trying to write a shell script that extracts a string that occurs between two other strings using a regex lookaround (though please let me know if there's a better way).
The string I'm searching through is the path /gdrive/My Drive/Github/gbks/NC_004113.1.gbk (in reality I have several of these strings) and the part that I want to extract is the NC_004113.1 (or whatever is in its place in another similar string). In other words, the part that I want to extract will always be flanked by /gdrive/My Drive/Github/gbks/ and .gbk.
I'm playing around with how to do this, and I thought that a regex lookaround might work. To complicate things slightly, the string itself is stored in a variable. I started to try the following, just to see if it would run, which it did:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP "$input_directory"/.*
However, when I tried to do the same thing with a lookaround, the command failed:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<="$input_directory")'
As a sanity check, I tried to pass the string directly as the expression, but it only worked when I omitted the quotation marks like so:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?=/gdrive/My Drive/Github/gbks/)'
This line actually gave me the output that I wanted (though I need to modify it so I'm passing the string in as a variable):
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<=/gdrive/My Drive/Github/gbks/).*(?=.gbk)'
Ultimately, I think the code should look something like:
input_directory="/gdrive/My Drive/Github/gbks/"
echo "/gdrive/My Drive/Github/gbks/NC_004113.1.gbk" | grep -oP '(?<="$input_directory").*(?=.gbk)'
Thanks in advance!
-Rob

In grep -oP '(?<="$input_directory")', the variable input_directory won't be expanded becaues of the outer single quotes. You can do something like `
grep -oP '(?<='"$input_directory"')'
instead.

Related

How to escape forward slashes inside of a bash variable in perl regex? [duplicate]

This question already has an answer here:
Perl regex with pipes
(1 answer)
Closed 2 years ago.
Here's a snippet of my code:
code=$(</dev/stdin)
prefix="$2"
code=$(perl -p -e "s/\"/\"$prefix/" <<< "$code");
It takes something like #include "myLib.h" and turns it into #include "(some_prefix)myLib.h", however, if I input something with a forward slash, like ./script.h -d lib/ < code.txt, I get an error:
syntax error at -e line 1, at EOF
Execution of -e aborted due to compilation errors.
Is there a way of escaping the backslash in the variable $prefix?
you could use pipes in your regex like in stackoverflow.com/questions/31830613/perl-regex-with-pipes
code=$(</dev/stdin)
prefix="$2"
code=$(perl -p -e "s|$prefix|" <<< "$code");
Since you make a regular expression out of user input, you need to escape it for regular expression. Merely changing the delimiter may be not enough, since some characters with special value in regular expression can be legitimately used in a filepath.
Therefore:
code=$(</dev/stdin)
prefix=$(printf %s "$2" | sed 's/[][()\.^$?*+/]/\\&/g')
code=$(perl -p -e "s/\"/\"$prefix/" <<< "$code");
The second line prepends \ before ][()\.^$?*+/.
UPD per #Wiktor Stribiżew; however you form the replacement string with user input, not the regex (as I imagined), so you need to replace there only the delimiter; and could just as well change the delimiter.

Regex group match using shell [duplicate]

This question already has answers here:
How do I use grep to extract a specific field value from lines
(2 answers)
Closed 3 years ago.
I am trying to match a pattern and set that as a variable.
I have a file with many "value=key". I want to find the value for key "fizz".
In the file I have this string
fizz="something_cool"
I try to parse it as:
cat file | grep fizz="(.*)"
I was thinking it would give me the group output, and then I would be able to use $1 to select it.
I also play with escaping characters and sed and awk. But I could not manage to get it working.
You need to enable extended regex for using unescaped ( and ) and quote pattern properly to make it:
grep -E 'fizz="(.*)"' file
However awk might be better choice here since it will do both search and filter in same command.
You may just use:
awk -F= '$1 == "fizz" {gsub(/"/, "", $2); print $2}' file
something_cool

Use grep to get next word after match [duplicate]

This question already has answers here:
Using grep to get the next WORD after a match in each line
(6 answers)
Closed 9 months ago.
I want to use grep to get a number from a JSON file
For example, I want to get the 1.0872 from this:
{"base":"EUR","date":"2016-03-01","rates":{"USD":1.0872}}
Using
grep "USD" rates
gives out the whole line
{"base":"EUR","date":"2016-03-01","rates":{"USD":1.0872}}
I just want to display 1.0872.
I tried using a regex but it doesn't work (probably an error on my part since I've never done this before):
grep -oP '(?<="USD"\:)\w+' file
For "normal" integers and float values, you may use
grep -oP '(?<="USD":)\d+(?:\.\d+)?' file
If your numbers can have no integer part and can start with a ., use
grep -oP '(?<="USD":)\d*\.?\d+' file
An optional -:
grep -oP '(?<="USD":)-?\d*\.?\d+' file
See IDEONE demo

Regular expression: replace one character set with another

I have a string ( e.g. 3122323123123) and want to replace any 1->ax, 2->by and 3->cz.
How do I do that in bash?
I started with the character set [123] and tried with "sed", but didn't know how to write the replacement expression ?
Regex is not the tool for you here. There's nothing in your question that requires any regex.
You didn't specify your language, but if you're working in PHP, you could use the function strtr() which does exactly what you are looking for.
And good old str_replace() can probably also do what you want too, as it can accept arrays for the search/replacement arguments.
Most other languages should have similar capabilities that mean you shouldn't need regex for this.
Look at standard tr utility.
% echo "3122323123123" | tr "123" "abc"
cabbcbcabcabc
If you want to replace a character with multiple characters, you can use sed for every replacement:
% echo "3122323123123" | sed -e "s/1/ax/g" -e "s/2/by/g" -e "s/3/cz/g"
czaxbybyczbyczaxbyczaxbycz
In c#
string input = "3122323123123";
string output = intput.Replace('1','a').Replace('2','b').Replace('3','c');
Using Perl tr/// for example:
$ echo "3122323123123" | perl -pe "tr/123/abc/"
cabbcbcabcabc

Remove everything between pairs of braces with sed

I've got a string that looks like this:
[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]
I want to remove the substrings matching %{...}, which may or may not contain further substrings of the same order.
I should get: [master *] as the final output. My progress so far:
gsed -E 's/%\{[^\}]*\}//g'
which gives:
echo '[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]' | gsed -E 's/%\{[^\}]*\}//g'
[%}master %}*%B%F{green}%}]
So, this works fine for %{...} sections which do not contain %{...}. It fails for strings like %{%B%F{blue}%} (it returns %}).
What I want to do is parse the string until I find the matching }, then remove everything up to that point, rather than removing everything between %{ and the first } I encounter. I'm not sure how to do this.
I'm fully aware that there are probably multiple ways to do this; I'd prefer an answer regarding the way specified in the question if it is possible, but any ideas are more than welcome.
This might work for you:
echo '[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]' |
sed 's/%{/{/g;:a;s/{[^{}]*}//g;ta'
[master *]
Use recursion to eat it out from the inside out.
s/%{.*?%}//g
Then wrap in
while(there's at least one more brace)
(probably while $? -ne 0 ... whatever rcode sed uses to say "no matches!")
Try this:
sed -E 's/%{([^{}]*({[^}]*})*[^{}]*)*}//g'