Regular Expressions Match Specific Location In File

Regular Expressions Match Specific Location In File - regex

The file i am working with (oraInst.loc) looks like this:
inventory_loc=/u01/app/ORAENV/oracle/oraInventory
inst_group=dba
I need to use a regular expression to grab the value between app/ and /oracle. In this case it will be ORAENV but it could be any alphanumeric string of any case and length but with no spaces.
From what I have read so far using grouping seems to be the way to do this but I just can't get my head round it.
I am using egrep on Solaris 10 as the regex engine.

Try this:
\/app\/([\d\w]+)\/oracle\/

Try this, assuming your egrep has -o:
$ echo '/u01/app/ORAENV/oracle/oraInventory' | egrep -o '/app/[0-9A-Za-z]+/oracle/' | cut -d'/' -f3
Output:
ORAENV
Update, solution using sed:
$ echo '/u01/app/ORAENV/oracle/oraInventory' | sed 's:.*/app/\(.*\)/oracle/.*:\1:'

Something like:
app/(.*)/oracle
Would do the trick.

$ echo "inventory_loc=/u01/app/ORAENV/oracle/oraInventory"| nawk -F"/" '{for(i=1;i<=NF;i++)if($i=="app") {print $(i+1);exit} } '
ORAENV

Related

Transform a dynamic alphanumeric string

I have a Build called 700-I20190808-0201. I need to convert it to 7.0.0-I20190808-0201. I can do that with regular expression:
sed 's/\([0-9]\)\([0-9]\)\([0-9]\)\(.\)/\1.\2.\3\4/' abc.txt
But the solution does not work when the build ID is 7001-I20190809-0201. Can we make the regular expression dynamic so that it works for both (700 and 7001)?

Could you please try following.
awk 'BEGIN{FS=OFS="-"}{gsub(/[0-9]/,"&.",$1);sub(/\.$/,"",$1)} 1' Input_file

If you have Perl available, lookahead regular expressions make this straightforward:
$ cat foo.txt
700-I20190808-0201
7001-I20190809-0201
$ perl -ple 's/(\d)(?=\d+\-I)/\1./g' foo.txt
7.0.0-I20190808-0201
7.0.0.1-I20190809-0201

You can implement a simple loop using labels and branching using sed:
$ echo '7001-I20190809-0201' | sed ':1; s/^\([0-9]\{1,\}\)\([0-9][-.]\)/\1.\2/; t1'
7.0.0.1-I20190809-0201
$ echo '700-I20190809-0201' | sed ':1; s/^\([0-9]\{1,\}\)\([0-9][-.]\)/\1.\2/; t1'
7.0.0-I20190809-0201
If your sed support -E flag:
sed -E ':1; s/^([0-9]+)([0-9][-.])/\1.\2/; t1'

sed -e 's/\([0-9]\)\([0-9]\)\([0-9]\)\(.\)/\1.\2.\3.\4/' -e 's/\.\-/\-/' abc.txt
This worked for me, very simple one. Just needed to extract it in my ant script using replaceregex pattern

Extract number of variable length from string

I want to extract a number of variable length from a string.
The string looks like this:
used_memory:1775220696
I would like to have the 1775220696 part in a variable. There are a lot of questions about this, but I could not find a solution that suits my needs.

You can use cut:
my_val=$(echo "used_memory:1775220696" | cut -d':' -f2)
Or also awk:
my_val=$(echo "used_memory:1775220696" | awk -F':' '{print $2}')

Use parameter expansion:
string=used_memory:1775220696
num=${string#*:} # Delete everything up to the first colon.

I used to use egrep
echo used_memory:1775220696 | egrep -o [0-9]+
Output:
1775220696

use the regex:
s/^[^:]*://g
you use it with sed or perl and get the part you needed.
> echo "used_memory:1775220696" | perl -pe 's/^[^:]*://g'
1775220696

bash supports regular-expression matching, but for a simple case like this it is overkill; use parameter expansion (see choroba's answer).
For the sake of completeness, here's an example using regular expression matching:
[[ $string =~ (.*):([[:digit:]]+) ]] && num=${BASH_REMATCH[2]}

Can be done using awk, like this:
var=`echo "used_memory:1775220696" | awk -F':' '{print $2;}'`
echo $var
output:
1775220696

If your number could be anywhere in the string, but you know that the digits are contiguous, you can use shell parameter expansion to remove everything that is not a digit:
$ str=used_memory:1775220696
$ num=${str//[!0-9]}
$ echo "$num"
1775220696
This works also for used_memory:1775220696andmoretext and 123numberfirst. However, something like abc123def456 would become 123456.

Using sed and regex to capture last part of url

I'm trying to make sed match the last part of a url and output just that. For example:
echo "http://randomurl/suburl/file.mp3" | sed (expression)
should give the output:
file.mp3
So far I've tried sed 's|\([^/]+mp3\)$|\1|g' but it just outputs the whole url. Maybe there's something I'm not seeing here but anyways, help would be much appreciated!

this works:
echo "http://randomurl/suburl/file.mp3" | sed 's#.*/##'

basename is your good friend.
> basename "http://randomurl/suburl/file.mp3"
=> file.mp3

This should do the job:
$ echo "http://randomurl/suburl/file.mp3" | sed -r 's|.*/(.*)$|\1|'
file.mp3
where:
| has been used instead of / to separate the arguments of the s command.
Everything is matched and replaced with whatever if found after the last /.
Edit: You could also use bash parameter substitution capabilities:
$ url="http://randomurl/suburl/file.mp3"
$ echo ${url##*/}
file.mp3

echo 'http://randomurl/suburl/file.mp3' | grep -oP '[^/\n]+$'
Here's another solution using grep.

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.

You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'

This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2

I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234

As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.

To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?

You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'

Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234

Remove substring till first Token using regexp

I have the Path:
GarbageContainingSlashesAndDots/TOKEN/xyz/TOKEN/abc
How coukt I remove GarbageContainingSlashesAndDots?
I know, it is before TOKEN, but Unfortunately, there are two substrings TOKEN in string.
using sed s/.*TOKEN// makes my string to /abc,
but I need /TOKEN/xyz/TOKEN/abc
Thank You!!!

Divide and conquer:
$ echo 'Garbage.Containing/Slashes/And.Dots/TOKEN/xyz/TOKEN/abc' |
sed -n 's|/TOKEN/|\n&|;s/.*\n//;p'
/TOKEN/xyz/TOKEN/abc

Is perl instead of sed allowed?
perl -pe 's!.*?(?=/TOKEN)!!'
echo 'GarbageContainingSlashesAndDots/TOKEN/xyz/TOKEN/abc' | perl -pe 's!.*?(?=/TOKEN)!!'
# returns:
/TOKEN/xyz/TOKEN/abc
Sed does not support non-greedy matching. Perl does.

I think you have bash, so it can be a simple as
$ s="GarbageContainingSlashesAndDots/TOKEN/xyz/TOKEN/abc"
$ echo ${s#*/}
TOKEN/xyz/TOKEN/abc
or if you have Ruby(1.9+)
echo $s | ruby -e 'print gets.split("/",2)[-1]'

Thank you for all suggestions, I've learnt something new.
Finally I was able to reach my goal using grep -o
echo "GarbageContainingSlashesAndDots/TOKEN/xyz/TOKEN/abc" | grep -o "/TOKEN/.*/TOKEN/.*"

Using grep:
word='GarbageContainingSlashesAndDots/TOKEN/xyz/TOKEN/abc'
echo $word | grep -o '/.*'

echo "./a//...b/TOKEN/abc/TOKEN/xyz"|sed 's#.*\(/TOKEN/.*/TOKEN/.*\)#\1#'

UPDATE 2: have you tried this?
s!.*\(/TOKEN.+TOKEN.*\)!\1!
UPDATE: sorry, non-greedy matches are not supported by sed
Try this:
s/.*?TOKEN//
.*? matches only for the first occurance of TOKEN.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expressions Match Specific Location In File - regex

Try this: \/app\/([\d\w]+)\/oracle\/

Try this, assuming your egrep has -o: $ echo '/u01/app/ORAENV/oracle/oraInventory' | egrep -o '/app/[0-9A-Za-z]+/oracle/' | cut -d'/' -f3 Output: ORAENV Update, solution using sed: $ echo '/u01/app/ORAENV/oracle/oraInventory' | sed 's:./app/\(.\)/oracle/.*:\1:'

Something like: app/(.*)/oracle Would do the trick.

$ echo "inventory_loc=/u01/app/ORAENV/oracle/oraInventory"| nawk -F"/" '{for(i=1;i<=NF;i++)if($i=="app") {print $(i+1);exit} } ' ORAENV

Related

Transform a dynamic alphanumeric string

Extract number of variable length from string

Using sed and regex to capture last part of url

grep: group capturing

Remove substring till first Token using regexp

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expressions Match Specific Location In File - regex

Try this: \/app\/([\d\w]+)\/oracle\/

Try this, assuming your egrep has -o: $ echo '/u01/app/ORAENV/oracle/oraInventory' | egrep -o '/app/[0-9A-Za-z]+/oracle/' | cut -d'/' -f3 Output: ORAENV Update, solution using sed: $ echo '/u01/app/ORAENV/oracle/oraInventory' | sed 's:.*/app/\(.*\)/oracle/.*:\1:'

Something like: app/(.*)/oracle Would do the trick.

$ echo "inventory_loc=/u01/app/ORAENV/oracle/oraInventory"| nawk -F"/" '{for(i=1;i<=NF;i++)if($i=="app") {print $(i+1);exit} } ' ORAENV

Related

Transform a dynamic alphanumeric string

Extract number of variable length from string

Using sed and regex to capture last part of url

grep: group capturing

Remove substring till first Token using regexp

Categories

Resources

Try this, assuming your egrep has -o: $ echo '/u01/app/ORAENV/oracle/oraInventory' | egrep -o '/app/[0-9A-Za-z]+/oracle/' | cut -d'/' -f3 Output: ORAENV Update, solution using sed: $ echo '/u01/app/ORAENV/oracle/oraInventory' | sed 's:./app/\(.\)/oracle/.*:\1:'