sed & protobuf: need to delete dots

sed & protobuf: need to delete dots - regex

I need to delete dots using sed, but not all dots.
- repeated .CBroadcast_GetBroadcastChatUserNames_Response.PersonaName persona_names = 1
+ repeated CBroadcast_GetBroadcastChatUserNames_Response.PersonaName persona_names = 1
Here the dot after repeated, (repeated also can beoptional | required | extend) should be deleted
- rpc NotifyBroadcastViewerState (.CBroadcast_BroadcastViewerState_Notification) returns (.NoResponse)
+ rpc NotifyBroadcastViewerState (CBroadcast_BroadcastViewerState_Notification) returns (NoResponse)
And here delete dot after (
It should work on multiple files with different content.
Full code can be found here

A perhaps simpler solution (works with both GNU sed and BSD/macOS sed):
sed -E 's/([[:space:][:punct:]])\./\1/g' file
In case a . can also appear as the first character on a line, use the following varation:
sed -E 's/(^|[[:space:][:punct:]])\./\1/g' file
The assumption is that any . preceded by:
a whitespace character (character class [:space:])
as in:  .
or a punctuation character (character class [:punct:])
as in: (.
should be removed, by replacing the matched sequence with just the character preceding the ., captured via subexpression (...) in the regular expression, and referenced in the replacement string with \1 (the first capture group).
If you invert the logic, you can try the simpler:
sed -E 's/([^[:alnum:]])\./\1/g' file
In case a . can also appear as the first character on a line:
sed -E 's/(^|[^[:alnum:]])\./\1/g' file
This replaces all periods that are not (^) preceded by an alphanumeric character (a letter or digit).

Assuming only the leading . needs removal, here's some GNU sed code:
echo '.a_b.c c.d (.e_f.g) ' |
sed 's/^/& /;s/\([[:space:]{([]\+\)\.\([[:alpha:]][[:alpha:]_.]*\)/\1\2/g;s/^ //'
Output:
a_b.c c.d (e_f.g)
Besides the ., it checks for two fields, which are left intact:
Leading whitespace, or any opening (, [, or {.
Trailing alphabetical chars or also _ or ..
Unfortunately, while the \+ regexp matches one or more spaces et al, it fails if the . is at the beginning of the line. (Replacing the \* with a '*' would match the beginning, but would incorrectly change c.d to cd.) So there's a kludge... s/^/& / inserts a dummy space at the beginning of the line, that way the \+ works as desired, then a s/^ // removes the dummy space.

Related

Using sed to replace space delimited strings

echo 'bar=start "bar=second CONFIG="$CONFIG bar=s buz=zar bar=g bar=ggg bar=f bar=foo bar=zoo really?=yes bar=z bar=yes bar=y bar=one bar=o que=idn"' | sed -e 's/^\|\([ "]\)bar=[^ ]*[ ]*/\1/g'
Actual output:
CONFIG="$CONFIG buz=zar bar=ggg bar=foo really?=yes bar=yes bar=one que=idn"
Expected output:
CONFIG="$CONFIG buz=zar really?=yes que=idn"
What I'm missing in my regex?
Edit:
This works as expected (with GNU sed):
's/\(^\|\(['\''" ]\)\)bar=[^ ]*/\2/g; s/[ ][ ]\+/ /g; s/[ ]*\(['\''"]\+\)[ ]*/\1/g'

sed regular expressions are pretty limited. They don't include \w as a synonym for [a-zA-Z0-9_], for example. They also don't include \b which means the zero-length string at the beginning or end of a word (which you really want in this situation...).
s/ bar=[^ ]* *//
is close, but the problem is the trailing * removes the space that might precede the next bar=. So, in ... bar=aaa bar=bbb ... the first match is bar=aaa leaving bar=bbb ... to try for the second match but it won't match because you already consumed the space before bar.
s/ bar=[^ ]*//
is better -- don't consume the trailing spaces, leave them for the next match attempt. If you want to match bar=something even if it's at the beginning of the string, insert a space at the beginning first:
sed 's/^bar=/ bar=/; s/ bar=[^ ]*//'

If you want to remove all instances of bar=something then you can simplify your regex as such:
\sbar=\w+
This matches all bar= plus all whole words. The bar= must be preceded by a whitespace character.
Demonstration:
https://regex101.com/r/xbBhJZ/3
As sed:
s/\sbar=\w\+//g
This correctly accounts for foobar=bar.
Like Waxrat's answer, you have to insert a space at the beginning for it to properly match as it's now matching against a preceding whitespace character before the bar=. This can be easily done since you're quoting your string explicitly.

ask for explanation of sed regex

I struggle to understand the following two sed regex in a makefile:
sw_version:=software/module/1.11.0
sw:= $(shell echo $(sw_version) | sed -e 's:/[^/]*$$::;s:_[^/]*$$::g')
// sw is "software/module"
version:= $(shell echo $(sw_version) | sed -e 's:.*/::g;s/-.*$$//')
// version is "1.11.0"
I really appreciate a detailed explanation. Thanks!

$$ will be substituted to $ in make files, so the sed expression looks like this:
sed -e 's:/[^/]*$::;s:_[^/]*$::g'
s/// is the substitution command. And the delimiter doesn't need to be / in your case it's a colon (:):
s:/[^/]*$::;
s:_[^/]*$::g
It works with matching pattern and replacing with replacement:
s/pattern/replacement/
; is a delimiter to use multiply commands in the same call to sed.
So basically this is two substitutions one which replaces /[^/]*$ another which replaces _[^/]*$ with nothing.
[...] is a character class which will match what ever you stick in there one time. eg: [abc] will match either a or b or c. If ^ is in the beginning of the class it will match everything but what is in the class, eg: [^abc] will match everything but a and b and c.
* will repeat the last pattern zero or more times.
$ is end of line
Lets apply what we know to the examples above (read bottom up):
s:/[^/]*$::;
#^^^^^ ^^ ^^
#||||| || |Delimiter
#||||| || Replace with nothing
#||||| |End of line
#||||| Zero or more times
#||||Literal slash
#|||Match everything but ...
#||Character class
#|Literal slash
#Delimiter used for the substitution command
/[^/]*$ will match literal slash (/) followed by everything but a slash zero or more times at end of line.
_[^/]*$ will match literal underscore (_) followed by everything but a slash zero or more times at end of line.
That was the first, the second is left as an exercise.

regex for capturing path from a string with optional character ~ (perl|awk|sed|..)

I want to match everything between first and last slash / including optional ~ before first slash.
I used this for the first part:
echo ~~a~/dir1/di r2/b.c \
| perl -pe 's/[^\/]*(\/.*\/).*/\1/'
which produces /dir1/di r2/.
This match includes the tilde:
perl -pe 's/.*(~\/.*\/).*/\1/'
but adding ? for optional character doesn't seem to work like in these cases:
perl -pe 's/.*(~?\/.*\/).*/\1/' -> /di r2/
perl -pe 's/.*((?:~)\/.*\/).*/\1/' -> ~~a/dir1/di r2/b.c
What am I doing wrong?

If I understood the desired output right, this works for me with or without tilde
echo "path /d1/d2/43a/" | perl -nE 'm{ ( ~? (?: /.*/ | /) ) }x; say "$1"'
Prints
/d1/d2/43a/
Same Perl code, with a tilde before the first slash in the input
echo "path ~/d1/d2/43a/" | perl -nE 'm{ ( ~? (?: /.*/ | /) ) }x; say "$1"'
prints
~/d1/d2/43a/
Notes Use of /1 in the substitution is deprecated. Use $1 instead. With {} for the delimiters we don't have to escape /, making it more readable (while with delimiters other than // we can't leave out m in front). Otherwise the same works when using / for delimiter and then escaping it inside.
Update
To also catch a lone ~/ (or /), the simplest change was to add that explicitly, /.*/ | /. In order to capture the (optinal) ~ in both cases there is a (non-capturing) grouping around this. Removed -w flag so no warnings are issued when the input string has no slashes at all, but only an empty line is printed.

Original requirements
File data
~~a~/dir1/di r2/b.c
/dir1/di r2/z.y
~/dir1/di r3/p.q
gobbledegook~/name/more/still/more/notwanted.c
xxx~//yyy
Script
perl -ple 's%(?:^.*?)((?:^|~)/.*/).*%$1%' data
Example output
~/dir1/di r2/
/dir1/di r2/
~/dir1/di r3/
~/name/more/still/more/
~//
Is that what you needed?
Dissecting the regex
s%(?:^.*?)((?:^|~)/.*/).*%$1%
The first part, (?:^.*?) is a non-capturing non-greedy match for an arbitrary sequence of characters at the start of the line.
The second part, ((?:^|~)/.*/), is a capturing expression that contains a non-capturing term that matches at the start of a line, or a tilde, followed by a slash and a greedy anything up to the last slash on the line.
The trailing .* matches everything after the second part.
The replacement is simply what was captured; the rest is Perl being Perl.
Revised requirements
The original problem statement was incomplete, it seems. Apparently:
for single slash it should output just / (with accompanying tilde if present). For no slashes preferably empty string as there is no match. … And for this case ~a b/c/d.f it returns full string; instead it should return /c/.
So, here is a revised script to deal with the special extra cases (what happened to 'learning how to fish'?). The ~a b/c/d.f case was a missing ? qualifier on a 'start of string or tilde' grouping.
Revised data file
~~a~/dir1/di r2/b.c
/dir1/di r2/z.y
~/dir1/di r3/p.q
gobbledegook~/name/more/still/more/notwanted.c
xxx~//yyy
not-a-slash-in-sight
just-the-one/with-extra-info
just-the~/with-more-info
~/one-slash-at-start-with-tilde
/one-slash-at-start-without-tilde
~a b/c/d.f
Revised script
perl -ple 's%^[^/]*$%%; s%(?:^[^/]*?)((?:^|~)?/)[^/]*$%$1%; s%(?:^[^/]*?)((?:^|~)?/.*/).*%$1%' data
A mildly modified of the original expression comes last.
The first s/// looks for lines without any / and replaces them with nothing.
The second s/// looks for lines with a slash, possibly preceded by tilde or start of line, followed by non-slashes to end of line with the optional tilde and the slash.
The output of the first two in event of a match does not match the third s///.
Revised output
~/dir1/di r2/
/dir1/di r2/
~/dir1/di r3/
~/name/more/still/more/
~//
/
~/
~/
/
/c/

What does the following SED pattern exactly do?

I am working on a CGI script and the developer who worked on this before me has used a SED Pattern.
COMMAND=`echo "$QUERY_STRING" | sed -n 's/^.*com_tex=\([^&]*\).*$/\1/p' | sed "s/%20/ /g"`
Here com_tex is the name of the text box in HTML.
What this line does is it takes a value form the HTML text box and assigns it to a SHELL variable. The SED pattern is apparently (not sure) necessary to extract the value from HTML without the other unnecessary accompanying stuff.
I will also mention the issue what I am asking this. The same pattern is used for a text area where I am entering a command and I need it retrieved exactly as it is. However it's getting jumbled up. Eg. IF I enter the following command in text box:
/usr/bin/free -m >> /home/admin/memlog.txt
The value that gets stored in the variable is:
%2Fusr%2Fbin%2Ffree+-m+%3E%3E+%2Fhome%2Fadmin%2Fmemlog.txt
All of us can get that / is being substituted by %2F, a space by + and the > sign by %3E.
But I just can not figure how this is specified in the above pattern! Will someone please tell me how that pattern works or what pattern should I substitute there so that I would get my entered command instead of the output I am getting?

sed -n
-n switch means "Dont print"
's/
s is for substitutions, / is a delimiter so the command looks like
s/Thing to sub/subsitution/optional extra command
^.*com_tex=
^ means the start of the line
.* means match 0 or more of any character
So it will match the longest string from the start of the line up to com_tex=
\(\)
This is a capture group, whatever is matched inside these brackets is saved and can be used later
[^&]*
[^] When the hat is used inside square brackets it means do not match any characters inside the brackets
* The same as before means 0 or more matches
The capture group combined with this means capture any character except &.
.*$
The same as the first bit except $ means the end of the line, so this matches everything until the end
/\1/p'
After the second / is the substitution. \1 is the capture group from before, so this will substitute everything we matched in the first part(the whole line) with the capture group.
p means print, this must be explicitly stated as the -n switch was used and will prevent other lines from being printed.
|
PIPE
s/%20/ /g
Sub %20 for a space, g means global so do it for every match on the line
HTH :)

This is not performed by any of the patterns. My best guess is that this escaping is performed by the shell or whatever fetches the HTML.
I will try to explain the patterns a little at a time
sed -n
-n specifies that sed should not print out the text to be matched, ie the html, after applying the commands.
The command following is of the form 's/regexp/replacement/flags'
^.*com_tex=\([^&]*\).*$
^ matches the beginning of the line
.* matches zero to many of any character
com_tex= matches the characters literally
\([^&]*\) '\(' specifies the beginning of a group that can later be backreferenced via its index. '[^&]*' matches zero to many characters which are not '&'. '\)' specifies the end of the group.
.* See above
$ matches the end of the line
\1
The above replacement is a backreference to the first (and only) group in the regexp i.e. '[^&]*'. So the replacement replaces the entire line with all characters immediately following 'com_tex=' till the first '&'.
The p flag specifies that if a substitution took place, the current line post substitution should be printed.
sed "s/%20/ /g"
The above is much simpler, it replaces all (not just the first) occurences of '%20' with a space ' '.

How to regexp match surrounding whitespace or beginning/end of line

I am trying to find lines in a file that contain a / (slash) character which is not part of a word, like this:
grep "\</\>" file
But no luck, even if the file contains the "/" alone, grep does not find it.
I want to be able to match lines such as
some text / pictures
/ text
text /
but not e.g.
/home

Why your approach does not work
\<, \> only match against the beginning (or end, respectively) of a word. That means that they can never match if put adjacent to / (which is not treated as a word-character) – because e.g. \</ basically says "match the beginning of a word directly followed by something other than a word (a 'slash', in this case)", which is impossible.
What will work
This will match / surrounded by whitespace (\s) or beginning/end of line:
egrep '(^|\s)/($|\s)' file
(egrep implies the -E option, which turns on processing of extended regular expressions.)
What might also work
The following slightly simpler expression will work if a / is never adjacent to non-word characters (such as *, #, -, and characters outside the ASCII range); it might be of limited usefulness in OP's case:
grep '\B/\B' file

for str in 'some text / pictures' ' /home ' '/ text' ' text /'; do
echo "$str" | egrep '(^|\s)/($|\s)'
done
This will match /:
if the entire input string is /
if the input string starts with / and is followed by at least 1 whitespace
if the input string ends with / and is preceded by at least 1 whitespace
if / is inside the input string surrounded by at least 1 whitespace on either side.
As for why grep "\</\>" file did not work:
\< and /> match the left/right boundaries between words and non-words. However, / does not qualify as a word, because words are defined as a sequence of one or more instances of characters from the set [[:alnum:]_], i.e.: sequences of at least length 1 composed entirely of letters, digits, or _.

This seems to work for me.
grep -rni " / \| /\|/ " .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sed & protobuf: need to delete dots - regex

Related

Using sed to replace space delimited strings

ask for explanation of sed regex

regex for capturing path from a string with optional character ~ (perl|awk|sed|..)

What does the following SED pattern exactly do?

How to regexp match surrounding whitespace or beginning/end of line

Categories

Resources