Sed invalid range end - regex

I have strings like these:
volume 5
vol. 5
V. 5
v. 5
I'm trying to turn them into this format:
\textbf{5}
with this sed command
s/\(v[a-Z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/
but I keep getting invalid range end. Am I doing something wrong with the 0-9 range?

If you check the ASCII table, you will see that a value is higher than the value of Z. This creates an invalid range. Moreover, you need a case-insensitive pattern, add /I modifier (for GNU sed only):
echo 'volume 5' | sed 's/\(v[a-z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/gI'
echo 'vol. 5' | sed 's/\(v[a-z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/gI'
echo 'V. 5' | sed 's/\(v[a-z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/gI'
echo 'v. 5' | sed 's/\(v[a-z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/gI'
produces
volume \textbf{5}
vol. \textbf{5}
V. \textbf{5}
v. \textbf{5}
Since the BSD implementation of sed does not support case-insensitive matching, on macOS, you need to install GNU sed with the following brew command:
brew install gnu-sed
and then use
gsed -e 's/\(v[a-z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/gI'
etc.
Or, add the uppercase letters to the bracket expression:
sed 's/\(v[a-zA-Z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/g'
And if you want to make sure only ASCII letters are matched add
LC_ALL=C sed 's/\(v[a-zA-Z]*[.]*\) \([0-9]*\)/\1 \\textbf{\2}/g'

This worked for me:
sed -r "s/([vV][a-zA-Z]*[.]*) ([0-9]*)/\1 \\\textbf{\2}/"

Related

Why isn't Mac sed isn't matching what I expect?

echo 'iPhone 12 Pro Max (5EF5105C-7EED-4017-979C-A6185E927B84) (Booted)' | sed -En 's,(\w+-\w+-\w+-\w+-\w+),\1,p'
Because I'm using extended regex -E (-r in GNU sed) and -n for print only matched/replaced. Assuming my regex101 is correct,
expecting 5EF5105C-7EED-4017-979C-A6185E927B84 in the output, but getting empty.
If you're just trying to get the serial number out from inside the parens, and you're not actually modifying anything, then use grep
$ echo 'iPhone 12 Pro Max (5EF5105C-7EED-4017-979C-A6185E927B84) (Booted)' \
| grep -E '\w+-\w+-\w+-\w+-\w+' -o
5EF5105C-7EED-4017-979C-A6185E927B84
-o tells grep "Just output what matched, not the entire line".

sed - exchange words with delimiter

I'm trying swap words around with sed, not replace because that's what I keep finding on Google search.
I don't know if it's the regex that I'm getting wrong. I did a search for everything before a char and everything after a char, so that's how I got the regex.
echo xxx,aaa | sed -r 's/[^,]*/[^,]*$/'
or
echo xxx/aaa | sed -r 's/[^\/]*/[^\/]*$/'
I am getting this output:
[^,]*$,aaa
or this:
[^,/]*$/aaa
What am I doing wrong?
For the first sample, you should use:
echo xxx,aaa | sed 's/\([^,]*\),\([^,]*\)/\2,\1/'
For the second sample, simply use a character other than slash as the delimiter:
echo xxx/aaa | sed 's%\([^/]*\)/\([^/]*\)%\2/\1%'
You can also use \{1,\} to formally require one or more:
echo xxx,aaa | sed 's/\([^,]\{1,\}\),\([^,]\{1,\}\)/\2,\1/'
echo xxx/aaa | sed 's%\([^/]\{1,\}\)/\([^/]\{1,\}\)%\2/\1%'
This uses the most portable sed notation; it should work anywhere. With modern versions that support extended regular expressions (-r with GNU sed, -E with Mac OS X or BSD sed), you can lose some of the backslashes and use + in place of * which is more precisely what you're after (and parallels \{1,\} much more succinctly):
echo xxx,aaa | sed -E 's/([^,]+),([^,]+)/\2,\1/'
echo xxx/aaa | sed -E 's%([^/]+)/([^/]+)%\2/\1%'
With sed it would be:
sed 's#\([[:alpha:]]\+\)/\([[:alpha:]]\+\)#\2,\1#' <<< 'xxx/aaa'
which is simpler to read if you use extended posix regexes with -r:
sed -r 's#([[:alpha:]]+)/([[:alpha:]]+)#\2/\1#' <<< 'xxx/aaa'
I'm using two sub patterns ([[:alpha:]]+) which can contain one or more letters and are separated by a /. In the replacement part I reassemble them in reverse order \2/\1. Please also note that I'm using # instead of / as the delimiter for the s command since / is already the field delimiter in the input data. This saves us to escape the / in the regex.
Btw, you can also use awk for that, which is pretty easy to read:
awk -F'/' '{print $2,$1}' OFS='/' <<< 'xxx/aaa'

Linux sed - Delete words do not start with a specific character

How to remove words that do not start with a specific character by sed?
Sample:
echo "--foo imhere -abc anotherone" | sed ...
Result must be;
"--foo -abc"
echo "--foo imhere -abc anotherone" |\
sed -e 's/^/ /g' -e 's/ [^-][^ ]*//g' -e 's/^ *//g'
The first and last -e commands are needed if only when the first word can be wrong either.
gnu sed with -r:
kent$ echo "--foo imhere -abc anotherone" | sed -r 's/^|\s[^-]\S*//g'
--foo -abc
However I prefer awk to solve it, more straightforward:
awk '{for(i=1;i<=NF;i++)$i=($i~/^-/?$i:"")}7'
output:
--foo -abc
You can use ssed to enable PCRE regex and then you can use this one:
(?<!-)\b\w+
Working demo
echo "--foo imhere -abc anotherone" | ssed 's/(?<!-)\b\w+//'

(GNU)Sed: how to replace any character from nth character to nth+10?

I need to replace characters from 10th to 20th in the string which looks like that:
123456789012345678901234567890
So far I've tried:
a)
Works for the 10th character ONLY:
echo "123456789012345678901234567890" | sed 's/./X/10'
b)
Doesn't work on the range:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
echo "123456789012345678901234567890" | sed 's/./X/10\,20'
echo "123456789012345678901234567890" | sed 's/./X/\{10,20\}'
echo "123456789012345678901234567890" | sed 's/./X/\{10\,20\}'
Does not work and I get error
unknown option to `s'
So - the question is - how do I make this to work:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
Try:
$ sed -r "s/^(.{9})(.{11})/\1XXXXXXXXXX/" <<< 123456789012345678901234567890
123456789XXXXXXXXXX1234567890
It is a complex sed problem, I could just find this solution:
$ sed 's/^\(.\{10\}\)\(.\{10\}\)/\1XXXXXXXXXX/' <<< 123456789012345678901234567890
1234567890XXXXXXXXXX1234567890
With awk it looks nicer:
$ awk 'BEGIN{FS=OFS=""} {for (i=10;i<=20;i++) $i="X"} {print}' <<< 123456789012345678901234567890
123456789XXXXXXXXXXX1234567890
You can do it with bash parameter substitution like this:
#!/bin/bash
s="123456789012345678901234567890"
l=${s:0:9} # Extract left part
m=${s:10:11} # Extract middle part
r=${s:20} # Extract right part
# Diddle with middle part to your heart's content and re-assemble "$l$m$r" when done
m=$(sed 's/./X/g' <<<$m)
See here for more explanation and examples.
Or, you can do this:
transform the row of letters into a column so each is on its own line
apply your edits to LINES 10 through 20 (as opposed to characters 10 through 20)
transform column of letters back into a row (by deleting linefeeds)
as shown in the one-liner below:
$ echo "123456789012345678901234567890" | sed "s/\(.\)/\1\n/g" | sed "10,20s/./X/" | tr -d "\n"
I know, that it looks ugly, but:
echo "123456789012345678901234567890" | \
sed 's/^\(.\{10\}\).\{10\}\(.*\)/\1XXXXXXXXXX\2/'
Without placing multiple X in sed command:
sed -r 's/^(.{9})(.{10,20})(.*)$/\1\n\2\n\3/' | sed -e '2s/./X/g' -e 'N;N;s/\n//g'
To replace the 10th to 20th characters, inclusive, try:
echo 123456789012345678901234567890 | sed 's/\(.\{9\}\).\{11\}/\1XXXXXXXXXX/'
123456789XXXXXXXXXX1234567890
With the GNU sed, you can use the -r switch to remove most of the backslashes:
echo 123456789012345678901234567890 | sed -r 's/(.{9}).{11}/\1XXXXXXXXXX/'
Or the naive approach also works here:
echo 123456789012345678901234567890 | sed 's/\(.........\).........../\1XXXXXXXXXX/'
This might work for you (GNU sed):
sed ':a;/.\{9\}X\{11\}/!s/\(.\{9\}X*\)./\1X/;ta' file
or with a bit of syntactic sugar:
sed -r ':a;/.{9}X{11}/!s/(.{9}X*)./\1X/;ta' file

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.
You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'
This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.
To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?
You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234