Replace all Occurences of '.' with '_' before '=' using sed

Replace all Occurences of '.' with '_' before '=' using sed - regex

I have a properties file as below:
build.number=153013
db.create.tablespace=0
db.create.user=0
db.create.schema=0
upgrade.install=1
new.install=0
configure.jboss=0
configure.jbosseap=false
configure.weblogic=1
configure.websphere=0
I need to import these variables into a shell script. As you know, '.' is not a valid character to use as a variable in linux. How would I use sed to replace all occurrences of '.' before the '=' with '_' . I have replaced all occurrences of '.' but there are some properties that have values which contain a '.' that I would like to not modify.
Any help is appreciated!
Thanks!

You can use
sed -e ':b; s/^\([^=]*\)*\./\1_/; tb;'
It replaces stringWithoutEquals. with stringWithoutEquals_ for as long as the match succeeds. In effect, this replaces all the .'s before the = with _.

You can try this:
sed 's/\.\|\(=.*\)/_\1/g;s/_=/=/' file
The approach consists to capture all the content after the equal to consume all trailing characters (including possible dots). So all dots before the equal are replaced with an underscore and an empty capture group, but there is an underscore before the equal. The second replacement removes this last underscore.

This might work for you (GNU sed):
sed 's/=/&\n/;h;y/./_/;G;s/\n.*\n//' file
Separate the line with a marker at the =. Copy the line. Replace all .'s with _'s. Append the original line and subtract the text between the two markers.

Here is an awk
awk -F= '{gsub(/\./,"_",$1)}1' OFS== file
build_number=153013
db_create_tablespace=0
db_create_user=0
db_create_schema=0
upgrade_install=1
new_install=0
configure_jboss=0
configure_jbosseap=false
configure_weblogic=1
configure_websphere=0
It divides the text by =, then replace in first field . with _.

Related

Substring using Regex in Shell or bash

I've a huge text file having row items like following
"https://www.wayfair.ca/appliances/pdp/agua-canada-30-500-cfm-ducted-wall-mount-range-hood-agcd1041.html?piid=47570655"
"https://www.wayfair.ca/appliances/pdp/agua-canada-30-500-cfm-ducted-wall-mount-range-hood-agcd1041.html?piid=47570656"
"https://www.wayfair.ca/appliances/pdp/agua-canada-30-500-cfm-ducted-wall-mount-range-hood-agcd1042.html"
"https://www.wayfair.ca/appliances/pdp/agua-canada-30-500-cfm-ducted-wall-mount-range-hood-agcd1043.html?piid=47570657"
"https://www.wayfair.ca/appliances/pdp/agua-canada-30-500-cfm-ducted-wall-mount-range-hood-agcd1043.html?piid=47570658"
I want to extract alpha-numeric character after last occurrence of '-' and before '.html' ('agcd1043' only) and save those values to another file.
Kindly help me do this using regex ( .-(.+).html. - is the regex I used to npp for smaller files) or any other method. TIA

You could extract the string with sed:
sed 's/.*-\([^-]*\)\.html.*/\1/' <<< "https://www.wayfair.ca/appliances/pdp/agua-canada-30-500-cfm-ducted-wall-mount-range-hood-agcd1041.html?piid=47570655"
If you have all your strings in a file you can iterate on it:
while read line
do
variable=$(sed 's/.*-\([^-]*\)\.html.*/\1/' <<< $line)
# ... use the value from $variable
done < /path/to/file
The sed script is a substitution, where:
.*-\([^-]*\)\.html.* is the pattern
\1 is the replacement
The pattern is written so that it captures any sequence of non-hyphen character, i.e. [^-]* trapped between a hypen character - and the .html string. The dot character is escaped for regex purposes, hence the \.html pattern. The leading ad trailing .* make sure that anything before the hyphen and after html are captured too, otherwise they would appear in the output.

sed & protobuf: need to delete dots

I need to delete dots using sed, but not all dots.
- repeated .CBroadcast_GetBroadcastChatUserNames_Response.PersonaName persona_names = 1
+ repeated CBroadcast_GetBroadcastChatUserNames_Response.PersonaName persona_names = 1
Here the dot after repeated, (repeated also can beoptional | required | extend) should be deleted
- rpc NotifyBroadcastViewerState (.CBroadcast_BroadcastViewerState_Notification) returns (.NoResponse)
+ rpc NotifyBroadcastViewerState (CBroadcast_BroadcastViewerState_Notification) returns (NoResponse)
And here delete dot after (
It should work on multiple files with different content.
Full code can be found here

A perhaps simpler solution (works with both GNU sed and BSD/macOS sed):
sed -E 's/([[:space:][:punct:]])\./\1/g' file
In case a . can also appear as the first character on a line, use the following varation:
sed -E 's/(^|[[:space:][:punct:]])\./\1/g' file
The assumption is that any . preceded by:
a whitespace character (character class [:space:])
as in:  .
or a punctuation character (character class [:punct:])
as in: (.
should be removed, by replacing the matched sequence with just the character preceding the ., captured via subexpression (...) in the regular expression, and referenced in the replacement string with \1 (the first capture group).
If you invert the logic, you can try the simpler:
sed -E 's/([^[:alnum:]])\./\1/g' file
In case a . can also appear as the first character on a line:
sed -E 's/(^|[^[:alnum:]])\./\1/g' file
This replaces all periods that are not (^) preceded by an alphanumeric character (a letter or digit).

Assuming only the leading . needs removal, here's some GNU sed code:
echo '.a_b.c c.d (.e_f.g) ' |
sed 's/^/& /;s/\([[:space:]{([]\+\)\.\([[:alpha:]][[:alpha:]_.]*\)/\1\2/g;s/^ //'
Output:
a_b.c c.d (e_f.g)
Besides the ., it checks for two fields, which are left intact:
Leading whitespace, or any opening (, [, or {.
Trailing alphabetical chars or also _ or ..
Unfortunately, while the \+ regexp matches one or more spaces et al, it fails if the . is at the beginning of the line. (Replacing the \* with a '*' would match the beginning, but would incorrectly change c.d to cd.) So there's a kludge... s/^/& / inserts a dummy space at the beginning of the line, that way the \+ works as desired, then a s/^ // removes the dummy space.

How to terminate a regular expression and start another

I have a file which have the data something like this
34sdf, 434ssdf, 43fef,
34sdf, 434ssdf, 43fef, sdfsfs,
I have to identify the sdfsfs, and replace it and/or print the line.
The exact condition is the tokens are comma separated. target expression starts with a non numeric character, and till a comma is met.
Now i start with [^0-9] for starting with a non numeric character, but the next character is really unknown to me, it can be a number, a special char, an alphabet or even a space. So I wanted a (anything)*. But the previous [] comes into play and spoils it. [^0-9]* or [^0-9].*, or [^0-9]\+.*, or [^0-9]{1}*, or [^0-9][^,]* or [^0-9]{1}[^\,]*, nothing worked till now. So my question is how to write a regex for this (starting character a non numeric, then any character except a comma or any number of character till comma) I am using grep and sed (gnu). Another question is for posix or non-posix, any difference comes there?

Something like that maybe?
(?:(?:^(\D.*?))|(?:,\s(\D.*?))),
This captures the string that starts with a non-numeric character. Tested here.
I'm not sure if sed supports \D, but you can easily replace it with [^0-9] if not, which you already know.
EDIT: Can be trimmed to:
(?:\s|^)(\D.*?),

With sed, and slight modifications to your last regex:
sed -n 's/.*,[ ]*\([^ 0-9][^\,]*\),/\1/p' input

I think pattern (\s|^)(\D[^,]+), will catch it.
It matches white-space or start of string and group of a non-digit followed by anything but comma, which is followed by comma.
You can use [^0-9] if \D is not supported.

This might work for you (GNU sed):
sed '/\b[^0-9,][^,]*/!d' file # only print lines that match
or:
sed -n 's/\b[^0-9,][^,]*/XXX/gp' file # substitute `XXX` for match

Replace only the first occurence matching a regex with sed

I have a string
test:growTest:ret
And with sed i would to delete only test: to get :
growTest:ret
I tried with
sed '0,/RE/s/^.*://'
But it only gives me
ret
Any ideas ?
Thanks

Modify your regexp ^.*: to ^[^:]*:
All you need is that the .* construction won't consume your delimiter — the colon. To do this, replace matching-any-char . with negated brackets: [^abc], that match any char except specified.
Also, don't confuse the two circumflexes ^, as they have different meanings: first one matches beginning of string, second one means negated brackets.

If I understand your question, you want strings like test:growTest:ret to become growTest:ret.
You can use:
sed -i 's/test:(.*$)/\1/'
i means edit in place.
s/one/two/ replaces occurences of one with two.
So this replaces "test:(.*$)" with "\1". Where \1 is the contents of the first group, which is what the regex matched inside the braces.
"test:(.*$)" matches the first occurence of "test:" and then puts everything else until the end of the line unto the braces. The contents of the braces remain after the sed command.

Sed use hungry match. So ^.*: will match test:growTest: other than test:.
Default, sed only replace the first matched pattern. So you need not do anything specially.

Substitution till the end of the line in bash

I have a huge text file with lots of lines like:
asdasdasdaasdasd_DATA_3424223423423423
gsgsdgsgs_DATA_6846343636
.....
I would like to do, for each line, to substitute from DATA_ .. to the end, with just empty space so I would get:
asdasdasdaasdasd_DATA_
gsgsdgsgs_DATA_
.....
I know that you can do something similar with:
sed -e "s/^DATA_*$/DATA_/g" filename.txt
but it does not work.
Do you know how?
Thanks

You have two problems: you're unnecessarily matching beginning and end of line with ^ and $, and you're looking for _* (zero or more underscores) instead of .* (zero or more of any character. Here's what you want:
sed -e 's/_DATA_.*/_DATA_/'
The g on the end (global) won't do anything, because you're already going to remove everything from the first instance of "DATA" onward - there can't be another match.
P.S. The -e isn't strictly necessary if you only have one expression, but if you think you might tack more on, it's a convenient habit.

With regular expressions, * means the previous character, any number of times. To match any character, use .
So what you really want is .* which means any character, any number of times, like this:
sed 's/DATA_.*/DATA_/' filename.txt
Also, I removed the ^ which means start of line, since you want to match "DATA_" even if it's not in the beginning of a line.

using awk. Set field delimiter as "DATA", then get field 1 ($1). No need regular expression
$ awk -F"_DATA_" '{print $1"_DATA_"}' file
asdasdasdaasdasd_DATA_
gsgsdgsgs_DATA_

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace all Occurences of '.' with '_' before '=' using sed - regex

You can use sed -e ':b; s/^\([^=]\)\./\1_/; tb;' It replaces stringWithoutEquals. with stringWithoutEquals_ for as long as the match succeeds. In effect, this replaces all the .'s before the = with _.

This might work for you (GNU sed): sed 's/=/&\n/;h;y/./_/;G;s/\n.*\n//' file Separate the line with a marker at the =. Copy the line. Replace all .'s with _'s. Append the original line and subtract the text between the two markers.

Related

Substring using Regex in Shell or bash

sed & protobuf: need to delete dots

How to terminate a regular expression and start another

Replace only the first occurence matching a regex with sed

Substitution till the end of the line in bash

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace all Occurences of '.' with '_' before '=' using sed - regex

You can use sed -e ':b; s/^\([^=]*\)*\./\1_/; tb;' It replaces stringWithoutEquals. with stringWithoutEquals_ for as long as the match succeeds. In effect, this replaces all the .'s before the = with _.

This might work for you (GNU sed): sed 's/=/&\n/;h;y/./_/;G;s/\n.*\n//' file Separate the line with a marker at the =. Copy the line. Replace all .'s with _'s. Append the original line and subtract the text between the two markers.

Related

Substring using Regex in Shell or bash

sed & protobuf: need to delete dots

How to terminate a regular expression and start another

Replace only the first occurence matching a regex with sed

Substitution till the end of the line in bash

Categories

Resources

You can use sed -e ':b; s/^\([^=]\)\./\1_/; tb;' It replaces stringWithoutEquals. with stringWithoutEquals_ for as long as the match succeeds. In effect, this replaces all the .'s before the = with _.