sed replaces word boundary with two periods/fullstops - regex

why would sed replace the word boundary with three periods/fullstops instead of one?
echo " 0/1:53,0,56:5:3,2 0/0:0,18,155:6:6,0 0/0:0,35,255:23:22,1" | sed 's/:[0-9,]\+\b/\./g'
returns 0/1... 0/0... 0/0...
This happens even when I use \> instead of \b for word boundary.
I'm running on
Operating System: Ubuntu 16.04.7 LTS
Kernel: Linux 4.15.0-128-generic

You get multiple dots because you have : in the number sequence as well. Do this:
$ echo " 0/1:53,0,56:5:3,2 0/0:0,18,155:6:6,0 0/0:0,35,255:23:22,1" | sed 's/:[0-9,:]\+/./g'
0/1. 0/0. 0/0.
In other words scan over [0-9,:]\+ instead of [0-9,]\+. Also there is no need to escape the dot in the replacement part.

You want to remove all non-space chars after each :, so use [^ ]* POSIX BRE pattern instead of [0-9,]\+:
echo " 0/1:53,0,56:5:3,2 0/0:0,18,155:6:6,0 0/0:0,35,255:23:22,1" | \
sed 's/:[^ ]*/./g'
# => 0/1. 0/0. 0/0.
See the online sed demo.
If there can be any whitespace, use sed 's/:[^[:space:]]*/./g'.
Note you do not need to escape the dot in the replacement pattern, it is a literal . char there.

Related

Regex to match exact version phrase

I have versions like:
v1.0.3-preview2
v1.0.3-sometext
v1.0.3
v1.0.2
v1.0.1
I am trying to get the latest version that is not preview (doesn't have text after version number) , so result should be:
v1.0.3
I used this grep: grep -m1 "[v\d+\.\d+.\d+$]"
but it still outputs: v1.0.3-preview2
what I could be missing here?
To return first match for pattern v<num>.<num>.<num>, use:
grep -m1 -E '^v[0-9]+(\.[0-9]+){2}$' file
v1.0.3
If you input file is unsorted then use grep | sort -V | head as:
grep -E '^v[0-9]+(\.[0-9]+){2}$' f | sort -rV | head -1
When you use ^ or $ inside [...] they are treated a literal character not the anchors.
RegEx Details:
^: Start
v: Match v
[0-9]+: Match 1+ digits
(\.[0-9]+){2}: Match a dot followed by 1+ dots. Repeat this group 2 times
$: End
To match the digits with grep, you can use
grep -m1 "v[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" file
Note that you don't need the [ and ] in your pattern, and to escape the dot to match it literally.
With awk you could try following awk code.
awk 'match($0,/^v[0-9]+(\.[0-9]+){2}$/){print;exit}' Input_file
Explanation of awk code: Simple explanation of awk program would be, using match function of awk to match regex to match version, once match is found print the matched value and exit from program.
Regular expressions match substrings, not whole strings. You need to explicitly match the start (^) and end ($) of the pattern.
Keep in mind that $ has special meaning in double quoted strings in shell scripts and needs to be escaped.
The boundary characters need to be outside of any group ([]).

Regexp or Grep in Bash

Can you please tell me how to get the token value correctly? At the moment I am getting: "1jdq_dnkjKJNdo829n4-xnkwe",258],["FbtResult
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | sed -n 's/.*"token":\([^}]*\)\}/\1/p'
You need to match the full string, and to get rid of double quotes, you need to match a " before the token and use a negated bracket expression [^"] instead of [^}]:
sed -n 's/.*"token":"\([^"]*\).*/\1/p'
Details:
.* - any zero or more chars
"token":" - a literal "token":" string
\([^"]*\) - Group 1 (\1 refers to this value): any zero or more chars other than "
.* - any zero or more chars.
This replacement works:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult'
| sed -n 's/.*"token":"\([a-z]*\)"\}.*/\1/p'
Key capture after "token" found between quotes via \([a-z]*\), followed by a closing brace \} and remaining characters after that as .* (you were missing this part before, which caused the replacement to include the text after keyword as well).
Output:
aaaaaaa
A grep solution:
echo '{"facebookdotcom":true,"messengerdotcom":false,"workplacedotcom":false},827],["DTSGInitialData",[],{"token":"aaaaaaa"},258],["FbtResult' | grep -Po '(?<="token":")[^"]+'
yields
aaaaaaa
The -P option to grep enables the Perl-compatible regex (PCRE).
The -o option tells grep to print only the matched substring, not the entire line.
The regex (?<="token":") is a PCRE-specific feature called a zero-width positive lookbehind assertion. The expression (?<=pattern) matches a pattern without including it in the matched result.

Regex matches but sed fails replace

I am having a tricky regex issue
I have the string like below
some_Name _ _Bday Date Comm.txt
And here is my regex to match the spaces and underscore
\_?\s\_?
Now when i try to replace the string using sed and the above regex
echo "some_Name _ _Bday Date Comm.txt" | sed 's/\_?\s\_?/\_/g'
The output i want is
some_Name_Bday_Date_Comm.txt
Any ideas on how do i go about this ?
You are using a POSIX BRE regex engine with the \_?\s\_? pattern that matches a _?, a whitespace (if your sed supports \s shorthand) an a _? substring, i.e. the ? are treated as literal question mark symbols.
You may use
sed -E 's/[[:space:]_]+/_/g'
sed 's/[[:space:]_]\{1,\}/_/g'
See online sed demo
The [[:space:]_]+ POSIX ERE pattern (enabled with -E option) will match one or more whitespace or underscore characters.
The POSIX ERE + quantifier can be written as \{1,\} in POSIX BRE. Also, if you use a GNU sed, you may use \+ in the second sed command.
This might work for you (GNU sed):
sed -E 's/\s(\s*_)*/_/g' file
This will replace a space followed by zero or more of the following: zero or more spaces followed by an underscore.

Replace all non-alphanumeric characters in a string with an underscore

I want to replace special characters (regex \W) with _ (underscore)
But I don't want to replace whitespace with underscore
Also replace multiple consecutive special characters with single underscore
Example
String: The/Sun is red#
Output: The_Sun is red_
String: .//hack Moon
Output: _hack Moon
I have tried echo 'string' | sed 's/\W/_/g'
But it's not accurate
Use tr for that:
echo -n "The/Sun is red#" | tr -c -s '[:alnum:][:blank:]' '_'
[:alnum:][:blank:] represents alphanumeric characters and whitespace
-c (or --complement) means "use the opposite of that"
Use -s (or --squeeze-repeats) to squeeze duplicate underscores into one
sed approach:
s="The/Sun is red# .//hack Moon"
sed -E 's/[^[:alnum:][:space:]]+/_/g' <<<"$s"
The_Sun is red_ _hack Moon
[^[:alnum:][:space:]]+ - match any character sequence except alphanumeric and whitespace
Just with bash parameter expansion, similar pattern to other answers:
shopt -s extglob
for str in "The/Sun is red#" ".//hack Moon"; do
echo "${str//+([^[:alnum:][:blank:]])/_}"
# .........^^........................^ replace all
# ...........^^.....................^ one or more
# .............^^^^^^^^^^^^^^^^^^^^^ non-alnum, non-space character
done
The_Sun is red_
_hack Moon

How to extract with sed a pattern starting with '('

The output of xdpyinfo | grep dimensions is
dimensions: 2560x1600 pixels (676x423 millimeters)
Piping it through sed -r 's/^[^0-9]*([0-9]+x[0-9]+).*$/\1/') does extract the dimensions in pixels (2560x1600), but won't work with the opening parenthesis.
How to get the dimensions in milimiters (i.e. 646x423) with sed?
You may use
sed -r 's/.*\(([0-9]+x[0-9]+).*/\1/'
See an online demo.
Details
.* - any 0+ chars as many as possible
\( - a literal ( (in POSIX ERE flavor you are using with -r)
([0-9]+x[0-9]+) - Group 1 (later referred to with \1 backreference): 1+ digits, x, 1+digits
.* - any 0+ chars as many as possible
Note you actually can omit both ^ and $ here since there is a single whole line match with sed.
And here is an equivalent solution using a POSIX BRE regex:
sed 's/.*(\([0-9][0-9]*x[0-9][0-9]*\).*/\1/'
Note that a ( denotes a literal ( char in POSIX BRE, and \(...\) defines a capturing group here. Since + quantifier is not supported by POSIX BRE, you may just use [0-9][0-9]* instead (1 digit and 0+ digits).
If you want to extract it in single command then you can just replace your existing grep command with this gnu grep using match reset regex i.e. \K:
xdpyinfo | grep -oP 'dimensions:.*\(\K\d+x\d+'
676x423
Above will require gnu grep. If that is not available to you then you can use this grep piped with another grep:
xdpyinfo | grep -oE 'dimensions:.*\([0-9]+x[0-9]+' | grep -oE '[0-9]+x[0-9]+$'
If you have to use sed, then use a single sed command as this and cut down grep:
xdpyinfo | sed -nE '/dimensions/{s/.*\(([0-9]+x[0-9]+).*/\1/p;q;}'