More elegant way to extract substring in shell [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I wrote regex to get chartname(auth-token-service)). But this seems very crude, can someone write a more precise way.
chartname=`echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | cut -d= -f1 | sed -e "s/^.*-//"`

Gets text between '=' and '/'
sed "s/.*=\(.*\)\/.*/\1/" = xxx.azurecr.io
Gets text between '/' and ':'
sed "s/.*\/\(.*\):.*/\1/" = auth-token-service
Gets text after ':'
sed "s/.*:\(.*\)/\1/" = latest

Not familiar with the format of token, but if I understood correctly you just want the part after the slash and before the colon.
echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | sed -e 's/^.\+\/\([^\/]\+\):[^:]\+$/\1/'

Since you asked for a regex solution:
string=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
[[ $string =~ /([^:]*) ]] && chartname=${BASH_REMATCH[1]}
This assumes that the chartname is always between the / and the :. Note that chartname would be unassigned with this, if the reges does not match.

The Unix shell has parameter expansion built in. You can't nest these, so it takes multiple steps, but you avoid the overhead of starting multiple external processes.
var='my-auth-token-service=xxx.azurecr.io/auth-token-service:latest'
chartname=${var%%=*}
chartname=${chartname#*-}
The suffix operator ${var%pattern} returns the value of $var with any suffix matching pattern removed; the ${var#pattern} operator does the same for a prefix match. Doubling the operator changes it to trim the longest possible pattern match instead of the shortest. (These are shell glob patterns, not regular expressions, though.)
If you require a one-liner, you can refactor the cut into the sed script.
chartname=$(sed 's/[^-]*\([^=]*\)=.*/\1/' <<< 'my-auth-token-service=xxx.azurecr.io/auth-token-service:latest')
Notice the modernized syntax $(cmd ...) over the obsolescent `cmd ...` and the Bash "here string" with <<< (not POSIX-compatible though).

With awk only tested on the GNU variant.
var=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
echo "$var" | awk -F'[=:/]' -vOFS='\n' '{print $1, $2, $3, $NF}'
Output
my-auth-token-service
xxx.azurecr.io
auth-token-service
latest

Related

Regex group match using shell [duplicate]

This question already has answers here:
How do I use grep to extract a specific field value from lines
(2 answers)
Closed 3 years ago.
I am trying to match a pattern and set that as a variable.
I have a file with many "value=key". I want to find the value for key "fizz".
In the file I have this string
fizz="something_cool"
I try to parse it as:
cat file | grep fizz="(.*)"
I was thinking it would give me the group output, and then I would be able to use $1 to select it.
I also play with escaping characters and sed and awk. But I could not manage to get it working.
You need to enable extended regex for using unescaped ( and ) and quote pattern properly to make it:
grep -E 'fizz="(.*)"' file
However awk might be better choice here since it will do both search and filter in same command.
You may just use:
awk -F= '$1 == "fizz" {gsub(/"/, "", $2); print $2}' file
something_cool

Regular Expression to extract multiple values from a delimited string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to extract both i-name & ipaddress from the below string (where ; is delimiter)
INPUT:
i-03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
I was able to retrieve the ipaddress only from this using ([0-9]{1,3}[\.]){3}[0-9]{1,3} but I need both strings in one line
OUTPUT:
i-03ghijklmn345;192.186.40.255
No need for AWK. Use grep:
# Partial Bash script
I_NAME=$(cat your_file | grep -Po 'i-\w+')
IP_ADDR=$(cat your_file | grep -Po '\d{1,3}(?:\.\d{1,3}){3}')
The RegEx is between the single quotes in the commands above.
If you want a awk solution and for a bit of diversity you can use the following commands:
iName=$(awk 'BEGIN{RS=";"}/^i-\w+/{print $1; exit}' inputFile)
ipAddress=$(awk 'BEGIN{RS=";"}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit}' inputFile)
echo $iName
echo $ipAddress
output:
i-03ghijklmn345
192.186.40.255
explanations:
BEGIN{RS=";"} you defined ; as record separator
/^i-\w+/{print $1; exit} when you reach the i-name it will be printed and the process will stop at that point and will not continue analyzing the input string
/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit} works the same way to extract the IP address.
finally you assign the result to the 2 variables and display their content or do whatever you want with them.
change the inputFile with what fit your needs.
If you want to put it in one variable use the following awk command:
$ awk 'BEGIN{RS=";"}/^i-\w+/{printf $1;}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print ";"$1;exit}' inputFile;
i-03ghijklmn345;192.186.40.255
TESTED:
Considering your pattern, the first field is some sort of an id and so it is inappropriate for an id to contain an asterisk(*). Also the ip address is always enclosed between asterisks(*). In that case below awk would also help.
$ cat 48437686
i03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
$ awk -v RS=";" 'BEGIN{oldORS=ORS}NR==1 || /^\*\*.*\*\*$/{gsub(/*/,"");ORS=NR==1?";":oldORS;print}' 48437686
i03ghijklmn345;192.186.40.255
With awk. Set input and output field separator to ; and print columns 1 and 17:
awk 'BEGIN{FS=OFS=";"} {print $1,$17}' file
Output:
i-03ghijklmn345;192.186.40.255

BASH URL path extraction [duplicate]

This question already has answers here:
Parse URL in shell script
(16 answers)
Closed 6 years ago.
I am trying to extract a path from the url with the following expression:
url
url+="http://www.google.co.uk/setprefdomain?prefdom=US&sig=__REM5I87ZmVOTkq-ipnJx6oisXz0%3D"
url_path=`echo "${url[0]}"| cut -d# -f2`
echo "$url_path"
I would like to get: /setprefdomain?prefdom=US&sig=__REM5I87ZmVOTkq-ipnJx6oisXz0%3D
Any ideas please?
Additional challenge comes when the the URLs vary in format for example:
url=()
url+="http://www.google.co.uk/setprefdomain?prefdom=US&sig=__REM5I87ZmVOTkq-ipnJx6oisXz0%3D"
url+="www.google.co.uk/shopping?hl=en&tab=wf"
url+="https://photos.google.com/?tab=wq"
url+="accounts.google.com/ServiceLogin?hl=en&passive=true&continue=http://www.google.co.uk"
Then result should be:
/setprefdomain?prefdom=US&sig=__REM5I87ZmVOTkq-ipnJx6oisXz0%3D
/shopping?hl=en&tab=wf
/?tab=wq
/ServiceLogin?hl=en&passive=true&continue=http://www.google.co.uk
echo $url | awk -F / '{print "/"$NF}'
/setprefdomain?prefdom=US&sig=__REM5I87ZmVOTkq-ipnJx6oisXz0%3D
In straight bash, if there are no slashes after google.co.uk/, you can use
url_path=${url[0]/#*\//\/}
The ${<var>/#<pat>/<repl>} construct replaces <pat> at the beginning (#) of the expansion of <var> with <repl>. Here, that is
var => url[0]
pat => *\/ , i.e., anything followed by a slash
repl => \/ , i.e., a single slash
The issue with your code specifically is that you are supplying the wrong delimiter to cut as well as the wrong field. Instead of -d# you should be using -d'/', and in the example you provided you want the 4th field, not the second. So what you should have used was this:
url_path=`echo "${url[0]}"| cut -d'/' -f4`
echo $url_path
setprefdomain?prefdom=US&sig=__REM5I87ZmVOTkq-ipnJx6oisXz0%3D
However, that omits the beginning slash. If you need that, you can manually prepend it like this:
url_path='/'`echo "${url[0]}"| cut -d'/' -f4`
echo $url_path
/setprefdomain?prefdom=US&sig=__REM5I87ZmVOTkq-ipnJx6oisXz0%3D
If there is a chance the urls will have differing numbers of slashes though, you may need a more robust solution. If you're certain you always want the 4th field, this cut example will work fine. If you always want the last field, you will be better off with awk or with bash's parameter expansion.
Here's what you'd need for either of those:
With awk, the delimiter is set with -F-, and $NF accesses the final field:
url_path=`echo ${url[0]} | awk -F / '{print "/"$NF}'`
With bash parameter expansion, ${var/pattern/} removes pattern from var. The pattern http:\/\/*\/ matches everything from "http://" to the final slash. Once again, the final slash is not in the output and is manually prepended:
url_path=`echo "/${url/http:\/\/*\//}"`

cURL and Bash, capture variable and value from string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Using bash, grep, split, awk, or sed, I would like to capture
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ
from
Set-Cookie: ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ; secure; path=/
'ASPSESSIONID' remains always the same + 8 random characters (SUSTQBQS).
Also this variable may not always be located in the second columns or right after 'Set-Cookie: '
Can anyone please help ?
With awk:
awk -F"[ ;]" '{print $2}' FileName
Set the field seperator as space and ;. Then print the 2nd field.
The basic regex structure is the same in various programs.
It may be explained in words as: The text between the colon/space(: ) and the semicolon (;). Which, in regex parlance is:
: ([^;]*);
And could be assigned to a var:
RE=': ([^;]*);'
Then, we could use it in
bash
while read l; do
[[ $l =~ $RE ]] && echo "${BASH_REMATCH[1]}";
done <file
gawk
gawk -v RE="$RE" '$0 ~ RE { print gensub(".*"RE".*","\\1",1); }' file
sed
sed -rn 's/^.*'"$RE"'.*$/\1/p' file # using -r avoids the several `\`
Try this sed command
sed 's/[^:]\+..\([^;]\+\).*/\1/' FileName
Explanation:
[^:]\+ -- Remove the charecters until :
.. -- Remove two characters
\([^;]\+\) -- Capture the group until ; found
.* -- Remove the all character after capture the group
\1 -- Finally print the captured group
Output :
ASPSESSIONIDSUSTQBQS=AAHNFMBAGABAILMKCDGIIMFJ

Extract string from string using RegEx in the Terminal [duplicate]

This question already has answers here:
How to extract a value from a string using regex and a shell?
(7 answers)
Closed 4 years ago.
I have a string like first url, second url, third url and would like to extract only the url after the word second in the OS X Terminal (only the first occurrence). How can I do it?
In my favorite editor I used the regex /second (url)/ and used $1 to extract it, I just don't know how to do it in the Terminal.
Keep in mind that url is an actual url, I'll be using one of these expressions to match it: Regex to match URL
echo 'first url, second url, third url' | sed 's/.*second//'
Edit: I misunderstood. Better:
echo 'first url, second url, third url' | sed 's/.*second \([^ ]*\).*/\1/'
or:
echo 'first url, second url, third url' | perl -nle 'm/second ([^ ]*)/; print $1'
Piping to another process (like 'sed' and 'perl' suggested above) might be very expensive, especially when you need to run this operation multiple times. Bash does support regexp:
[[ "string" =~ regex ]]
Similarly to the way you extract matches in your favourite editor by using $1, $2, etc., Bash fills in the $BASH_REMATCH array with all the matches.
In your particular example:
str="first url1, second url2, third url3"
if [[ $str =~ (second )([^,]*) ]]; then
echo "match: '${BASH_REMATCH[2]}'"
else
echo "no match found"
fi
Output:
match: 'url2'
Specifically, =~ supports extended regular expressions as defined by POSIX, but with platform-specific extensions (which vary in extent and can be incompatible).
On Linux platforms (GNU userland), see man grep; on macOS/BSD platforms, see man re_format.
In the other answer provided you still remain with everything after the desired URL. So I propose you the following solution.
echo 'first url, second url, third url' | sed 's/.*second \(url\)*.*/\1/'
Under sed you group an expression by escaping the parenthesis around it (POSIX standard).
While trying this, what you probably forgot was the -E argument for sed.
From sed --help:
-E, -r, --regexp-extended
use extended regular expressions in the script
(for portability use POSIX -E).
You don't have to change your regex significantly, but you do need to add .* to match greedily around it to remove the other part of string.
This works fine for me:
echo "first url, second url, third url" | sed -E 's/.*second (url).*/\1/'
Output:
url
In which the output "url" is actually the second instance in the string. But if you already know that it is formatted in between comma and space, and you don't allow these characters in URLs, then the regex [^,]* should be fine.
Optionally:
echo "first http://test.url/1, second ://test.url/with spaces/2, third ftp://test.url/3" \
| sed -E 's/.*second ([a-zA-Z]*:\/\/[^,]*).*/\1/'
Which correctly outputs:
://example.com/with spaces/2