Get first set of 8 numbers only with Sed - regex

I have some code I'm using with Windows and SED to give me the first set of eight characters in a file name that keeps giving me the second set only that I cannot figure out what I'm doing wrong.
My Code:
echo JiggySauce_20161208_21325005_Meat.txt | sed -r "s/.*_([0-9]*)_.*/\1/g"
Addition Example (so regex per underbar delimiters won't always work):
echo JiggySauce_Mustard_Mayo_20161208_21325005_Meat.txt | sed -r "s/.*_([0-9]*)_.*/\1/g"
I keep getting this wrong result (at least not what I need):
21325005
My expected result:
20161208
I could even live with (preferrably not but could work with that I suppose):
20161208_21325005
Please help me with this if you have an answer as I'm at a standstill looking dumb and stumped over here like UHHH....

With GNU sed:
echo JiggySauce_20161208_21325005_Meat.txt | sed -r 's/^[^_]*_([^_]*).*/\1/'
Output:
20161208
Post Initial Answer Update:
I suggest: sed -r 's/[^0-9]*([0-9]{8}).*/\1/'
Cyrus
Output:
20161208
See: The Stack Overflow Regular Expressions FAQ

Using grep:
echo JiggySauce_20161208_21325005_Meat.txt | grep -Eo '[0-9]+' | head -1
or
echo JiggySauce_20161208_21325005_Meat.txt | tr '_' '\n' | grep -m1 -Eo '[0-9]+'

Related

how to sed for pattern before and after match

I currently am trying to get specific parameters from a url.
My url looks like: https://private.io/report-artifact/dsop-pipeline-artifacts/container-scan-reports/redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar
I want just redhat/ubi/ubi7/7.8
I can get redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar by doing,
echo https://private.io/report-artifact/dsop-pipeline-artifacts/container-scan-reports/redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar | sed 's|.*/container-scan-reports/||'
Thus I want to remove /2020-02-14T222203.548_2868/ubi7-7.8.tar
I also would like to change the / to a - so that I have redhat-ubi-ubi7-7.8
With GNU sed:
Get the 4 following path elements after .*/container-scan-reports/ and replace all / with -:
url='https://private.io/report-artifact/dsop-pipeline-artifacts/container-scan-reports/redhat/ubi/ubi7/7.8/2020-02-14T222203.548_2868/ubi7-7.8.tar'
echo "$url" | sed -E 's|.*/container-scan-reports/(([^/]*/){3}[^/]*).*|\1|;s|/|-|g'
Or you could get everything after .*/container-scan-reports/, but not the last two path elements:
echo "$url" | sed -E 's|.*/container-scan-reports/(.*)/[^/]*/[^/]*|\1|;s|/|-|g'
When you know the position in the string you can use cut
echo "${string}" | cut -d/ -f 7-10 | tr '/' '-'
Another way with sed is
echo "${string}" | sed -E 's#([^/]*/){6}([^/]*)/([^/]*)/([^/]*)/([^/]*).*#\2-\3-\4-\5#'

Replace string with another string based on backreference with sed

I'm trying to convert a predefined string %c# where # can be some number with another string. The catch is that the length of the other string must be truncated to # number of characters.
Ideally these set of commands would work:
FORMAT="%c10"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
echo $FORMAT | sed "s/%c\([0-9]\+\)/${LAST_COMMIT:0:\1}/g"
but clearly there is a syntax error on the \1. You can replace it with a number to see what I'm trying to get as output.
I'm open to using some other program other than sed to achieve this but ideally it should be programs that are pretty much native to most linux installations.
Thanks!
This is my idea.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c//')
Get number with sed and get first some character with head.
EDIT1
This might be better.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c\([0-9]\+\)/\1/')
EDIT2
I make the script because it is too tough to understand. Please try this.
$ cat sample.sh
#!/bin/bash
FORMAT="%b-%t-%c10-%c5"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
## List numbers
lengths=$(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g")
## Substitute %cXX to first XX characters of LAST_COMMIT
for n in ${lengths}
do
to_str=$(echo ${LAST_COMMIT:0:${n}})
FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/")
done
## Print result
echo ${FORMAT}
This is the result.
$ ./sample.sh
%b-%t-5189e42b1410-5189e5
Also this is one line commands (Same contents but too long and too tough)
for n in $(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g"); do to_str=$(echo ${LAST_COMMIT:0:${n}}); FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/"); done; echo ${FORMAT}
The value of $LAST_COMMIT gets interpolated before sed runs, so there is no backreference to refer back to yet. There is an /e extension in GNU sed which would support something like this, but I would simply use a slightly more capable tool.
perl -e '$fmt = shift; $fmt=~ s/%c(\d+)/%.$1s/g; printf("$fmt\n", #ARGV)' '%c10' "$LAST_COMMIT"
Of course, if you can let go of your own ad-hoc format string specifier, and switch to a printf-compatible format string altogether, just use the printf shell command straight off.
length=$(echo $FORMAT | sed "s/%c\([0-9]\+\)/\1/g")
echo "${LAST_COMMIT:0:$length}"

(GNU)Sed: how to replace any character from nth character to nth+10?

I need to replace characters from 10th to 20th in the string which looks like that:
123456789012345678901234567890
So far I've tried:
a)
Works for the 10th character ONLY:
echo "123456789012345678901234567890" | sed 's/./X/10'
b)
Doesn't work on the range:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
echo "123456789012345678901234567890" | sed 's/./X/10\,20'
echo "123456789012345678901234567890" | sed 's/./X/\{10,20\}'
echo "123456789012345678901234567890" | sed 's/./X/\{10\,20\}'
Does not work and I get error
unknown option to `s'
So - the question is - how do I make this to work:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
Try:
$ sed -r "s/^(.{9})(.{11})/\1XXXXXXXXXX/" <<< 123456789012345678901234567890
123456789XXXXXXXXXX1234567890
It is a complex sed problem, I could just find this solution:
$ sed 's/^\(.\{10\}\)\(.\{10\}\)/\1XXXXXXXXXX/' <<< 123456789012345678901234567890
1234567890XXXXXXXXXX1234567890
With awk it looks nicer:
$ awk 'BEGIN{FS=OFS=""} {for (i=10;i<=20;i++) $i="X"} {print}' <<< 123456789012345678901234567890
123456789XXXXXXXXXXX1234567890
You can do it with bash parameter substitution like this:
#!/bin/bash
s="123456789012345678901234567890"
l=${s:0:9} # Extract left part
m=${s:10:11} # Extract middle part
r=${s:20} # Extract right part
# Diddle with middle part to your heart's content and re-assemble "$l$m$r" when done
m=$(sed 's/./X/g' <<<$m)
See here for more explanation and examples.
Or, you can do this:
transform the row of letters into a column so each is on its own line
apply your edits to LINES 10 through 20 (as opposed to characters 10 through 20)
transform column of letters back into a row (by deleting linefeeds)
as shown in the one-liner below:
$ echo "123456789012345678901234567890" | sed "s/\(.\)/\1\n/g" | sed "10,20s/./X/" | tr -d "\n"
I know, that it looks ugly, but:
echo "123456789012345678901234567890" | \
sed 's/^\(.\{10\}\).\{10\}\(.*\)/\1XXXXXXXXXX\2/'
Without placing multiple X in sed command:
sed -r 's/^(.{9})(.{10,20})(.*)$/\1\n\2\n\3/' | sed -e '2s/./X/g' -e 'N;N;s/\n//g'
To replace the 10th to 20th characters, inclusive, try:
echo 123456789012345678901234567890 | sed 's/\(.\{9\}\).\{11\}/\1XXXXXXXXXX/'
123456789XXXXXXXXXX1234567890
With the GNU sed, you can use the -r switch to remove most of the backslashes:
echo 123456789012345678901234567890 | sed -r 's/(.{9}).{11}/\1XXXXXXXXXX/'
Or the naive approach also works here:
echo 123456789012345678901234567890 | sed 's/\(.........\).........../\1XXXXXXXXXX/'
This might work for you (GNU sed):
sed ':a;/.\{9\}X\{11\}/!s/\(.\{9\}X*\)./\1X/;ta' file
or with a bit of syntactic sugar:
sed -r ':a;/.{9}X{11}/!s/(.{9}X*)./\1X/;ta' file

Using sed and regex to capture last part of url

I'm trying to make sed match the last part of a url and output just that. For example:
echo "http://randomurl/suburl/file.mp3" | sed (expression)
should give the output:
file.mp3
So far I've tried sed 's|\([^/]+mp3\)$|\1|g' but it just outputs the whole url. Maybe there's something I'm not seeing here but anyways, help would be much appreciated!
this works:
echo "http://randomurl/suburl/file.mp3" | sed 's#.*/##'
basename is your good friend.
> basename "http://randomurl/suburl/file.mp3"
=> file.mp3
This should do the job:
$ echo "http://randomurl/suburl/file.mp3" | sed -r 's|.*/(.*)$|\1|'
file.mp3
where:
| has been used instead of / to separate the arguments of the s command.
Everything is matched and replaced with whatever if found after the last /.
Edit: You could also use bash parameter substitution capabilities:
$ url="http://randomurl/suburl/file.mp3"
$ echo ${url##*/}
file.mp3
echo 'http://randomurl/suburl/file.mp3' | grep -oP '[^/\n]+$'
Here's another solution using grep.

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.
You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'
This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.
To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?
You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234