Replace unknown sub-string in an URL - regex

I have an URL in the format like https://foo.bar.whoo.dum.io, for which I like to replace the foo string with something else. Of course, the foo part is unknown and can be anything.
I tried with a simple regex like (.+?)\.(.+), but it seems that regex in Bash is always greedy (or?).
My best attempt is to split the string by . and then join it back with the first part left out, but I was wondering, whether there is a more intuitive, different solution.
Thank you

There are a lot of ways of getting the desired output.
If you're sure the url will always start with https://, we can use parameter expansion to remove everything before the first . and then add the replacement you need:
input="https://foo.bar.whoo.dum.io"
echo "https://new.${input#*.}"
Will output
https://new.bar.whoo.dum.io
Try it online!

You can use sed:
url='https://foo.bar.whoo.dum.io'
url=$(sed 's,\(.*://\)[^/.]*,\1new_value,' <<< "$url")
Here, the sed command means:
\(.*://\) - Capturing group 1: any text and then ://
[^/.]* - zero or more chars other than / and .
\1new_value - replaces the match with the Group 1 and new_value is appended to this group value.
See the online demo:
url='https://foo.bar.whoo.dum.io'
sed 's,\(.*://\)[^/.]*,\1new_value,' <<< "$url"
# => https://new_value.bar.whoo.dum.io

1st solution: Using Parameter expansion capability of bash here, adding this solution. Where newValue is variable with new value which you want to have in your url.
url='https://foo.bar.whoo.dum.io'
newValue="newValue"
echo "${url%//*}//$newValue.${url#*.}"
2nd solution: With your shown samples, please try following sed code here. Where variable url has your shown sample url value in it.
echo "$url" | sed 's/:\/\/[^.]*/:\/\/new_value/'
Explanation: Simple explanation would be, printing shell variable named url value by echo command and sending it as a standard input to sed command. Then in sed command using its capability of substitution here. Where substituting :// just before1st occurrence of . with ://new_value as per requirement.

Related

Use "sed" to Remove Capture Group 1 From All Lines In a File

I currently have a file with lines like the below:
ABCD123RTY,steve_tyler#gmail.com,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy#hotmail.com,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2#netnet,10.20.30.l6,2021-08-20T15:30:34.480Z
My goal is to remove everything from the "#" to the next comma, such that it instead looks like the below:
ABCD123RTY,steve_tyler,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2,10.20.30.l6,2021-08-20T15:30:34.480Z
I'm not that experienced with utilizing sed and RegEx expressions. In playing around on a testing website, I came up with the below RegEx string, in which capture group 1 is perfectly matching to what I want to remove:
regex101.com Test
How would I go about putting this in a "sed" command against a given input file, and writing the results to a new output file. I had tried the below most recently:
sed 's/(#.+?),//' input.csv > input_Corrected.csv
Just as another note, I'm doing this in a bash script in which I have an API call generating the "input.csv" file, and then want to run this sed command to clean up the data format to match my needs.
You can use
sed 's/#[^,]*,/,/' input.csv > input_Corrected.csv
sed 's/#[^,]*//' input.csv > input_Corrected.csv
The #[^,]*, POSIX BRE pattern matches a # and then any zero or more chars other than , and then a , (in the first example, use it if there MUST be a comma after the match) and replaces with a comma (in the first example, keep the replacement empty if you use the second approach).
See the online demo:
s='ABCD123RTY,steve_tyler#gmail.com,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy#hotmail.com,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2#netnet,10.20.30.l6,2021-08-20T15:30:34.480Z'
sed 's/#[^,]*,/,/' <<< "$s"
Output:
ABCD123RTY,steve_tyler,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2,10.20.30.l6,2021-08-20T15:30:34.480Z
You can used the below regular expression in order to remove the content of the valid email address only.
sed "s/#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})//g" input.csv > input_Corrected.csv
And as per your requirement you can use the below code. As it is going to replace all the email address on the file as you have on your file "calvin_hobbes2#netnet" which is not valid email address.
sed "s/#[^,]*//g" input.csv > input_Corrected.csv

bash regexp to extract part of URL

From the following URL:
https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]
I need to extract the following part:
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
I'm pretty bad at regex. I came up with the following but it doesn't work:
sed -n "s/^.*browser\(test-lab.*/.*/\).*$/\1/p"
Can anyone help with what I'm doing wrong?
Could you please try with awk solution also and let me know if this helps you.
echo "https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/" | awk '{sub(/.*browser\//,"");sub(/\/$/,"");print}'
Explanation: Simply, substituting everything till browser/ then substituting last / with NULL.
EDIT1: Adding a sed solution here too.
sed 's/\(.[^//]*\)\/\/\(.[^/]*\)\(.[^/]*\)\(.[^/]*\)\/\(.*\)/\5/' Input_file
Output will be as follows.
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
Explanation of sed command: Dividing the whole line into parts and using sed's ability to keep the matched regex into memory so here are the dividers I used.
(.[^//]):* Which will have the value till https: in it and if anyone wants to print it you could use \1 for it because this is very first buffer for sed.
//: Now as per URL // comes to mentioning them now.
(.[^/]):* Now comes the 2nd part for sed's buffer which will have value console.developers.google.com in it, because REGEX looks for very first occurrence of / and stops matching there itself.
(.[^/]) && (.[^/]) && /(.):* These next 3 occurrences works on same method of storing buffers like they will look for first occurrence of / and keep the value from last matched letter's next occurrence to till 1st / comes.
/\5/: Now I am substituting everything with \5 means 5th buffer which contains values as per OP's instructions.
Use a different sed delimiter and don't forget to escape the braces.
avinash:~/Desktop$ echo 'https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]' | sed 's~.*/browser/\([^/]*/[^/]*/\).*~\1~'
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
OR
Use grep with oP parameters.
avinash:~/Desktop$ echo 'https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]' | grep -oP '/browser/\K[^/]*/[^/]*/'
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/

Property File with Sed regex - Ignore first character for match

I have a test property file with this in it:
-config.test=false
config.test=false
I'm trying to, using sed, update the values of these properties whether they have the - in front of them or not. Originally I was using this, which worked:
sed -i -e "s/#*\(config.test\)\s*=\s*\(.*\)/\1=$(echo "true" | sed -e 's/[\/&]/\\&/g')/" $FILE_NAME
However, since I was basically ignoring all characters before the match, I found that when I had properties with keys that ended in the same value, it'd give me problems. Such as:
# The regex matches both of these
config.test=true
not.config.test=true
Is there a way to either ignore the first character for a match or ignore the initial - specifically?
EDIT:
Adding a little clarification in terms of what I'd want the regex to match:
config.test=false # Should match
-config.test=false # Should match
not.config.test=false # Should NOT match
sed -E 's/^(-?config\.test=).*/\1true/' file
? means zero or 1 repetitions of so it means the - can be present or not when matching the regexp.
I found some solution for a regex of a specific length instead of ignoring the first character with sed and awk. Sometimes the opposite does the same by an easier way.
If you only have the alternative to use sed I have two workaround depending on your file.
If your file looks like this
$ cat file
config.test=false
-config.test=false
not.config.test=false
you can use this one-liner
sed 's/^\(.\{11,12\}=\)\(.*$\)/\1true/' file
sed is looking at the beginning ^ of each line and is grouping \( ... \) for later back referencing every character . that occurs 11 or 12 times \{11,12\} followed by a =.
This first group will be replaced with the back reference \1.
The second group that match every character after the = to the end of line \(.*$\) will be dropped. Instead of the second group sed replaces with your desired string true.
This also means, that every character after the new string true will be chopped.
If you want to avoid this and your file looks like
$ cat file
config.test=true # Should match
-config.test=true # Should match
not.config.test=false # Should NOT match
you can use this one-liner
sed 's/^\(.\{11,12\}=\)\(false\)\(.*$\)/\1true\3/' file
This is like the example before but works with three groups for back referencing.
The content of the former group 2 is now in group 3. So no content after a change from false to true will be chopped.
The new second group \(false\) will be dropped and replaced by the string true.
If your file looks like in the example before and you are allowed to use awk, you can try this
awk -F'=' 'length($1)<=12 {sub(/false/,"true")};{print}'
For me this looks much more self-explanatory, but is up to your decision.
In both sed examples you invoke only one time the sed command which is always good.
The first sed command needs 39 and the second 50 character to type.
The awk command needs 52 character to type.
Please tell me if this works for you or if you need another solution.

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.
You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38
Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.
There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

Using sed to remove all console.log from javascript file

I'm trying to remove all my console.log, console.dir etc. from my JS file before minifying it with YUI (on osx).
The regex I got for the console statements looks like this:
console.(log|debug|info|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*)\);?
and it works if I test it with the RegExr.
But it won't work with sed.
What do I have to change to get this working?
sed 's/___???___//g' <$RESULT >$RESULT_STRIPPED
update
After getting the first answer I tried
sed 's/console.log(.*)\;//g' <test.js >result.js
and this works, but when I add an OR
sed 's/console.\(log\|dir\)(.*)\;//g' <test.js >result.js
it doesn't replace the "logs":
Your original expression looks fine. You just need to pass the -E flag to sed, for extended regular expressions:
sed -E 's/console.(log|debug|info|...|count)\((.*)\);?//g'
The difference between these types of regular expressions is explained in man re_format.
To be honest I have never read that page, but instead simply tack on an -E when things don't work as expected. =)
You must escape ( (for grouping) and | (for oring) in sed's regex syntax. E.g.:
sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
UPDATE example:
$ sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
console.log # <- input line, not matches, no replacement printed on next line
console.log
console.log() # <- input line, matches, no printing
console.log(blabla); # <- input line, matches, no printing
console.log(blabla) # <- input line, matches, no printing
console.debug(); # <- input line, matches, no printing
console.debug(BAZINGA) # <- input line, matches, no printing
DATA console.info(ditto); DATA2 # <- input line, matches, printing of expected data
DATA DATA2
HTH
I also find the way to remove all the console.log ,
and i am trying to use python to do this,
but i find the Regex is not work for.
my writing like this:
var re=/^console.log(.*);?$/;
but it will match the following string:
'console.log(23);alert(234dsf);'
does it work? with the
"s/console.(log|debug|info|...|count)((.*));?//g"
I try this:
sed -E 's/console.(log|debug|info)( ?| +)\([^;]*\);//g'
See the test:
Regex Tester
Here's my implementation
for i in $(find ./dir -name "*.js")
do
sed -E 's/console\.(log|warn|error|assert..timeEnd)\((.*)\);?//g' $i > ${i}.copy && mv ${i}.copy $i
done
took the sed thing from github
I was feeling lazy and hoping to find a script to copy & paste. Alas there wasn't one, so for the lazy like me, here is mine. It goes in a file named something like 'minify.sh' in the same directory as the files to minify. It will overwrite the original file and it needs to be executable.
#!/bin/bash
for f in *.js
do
sed -Ei 's/console.(log|debug|info)\((.*)\);?//g' $f
yui-compressor $f -o $f
done
I'd just like to add here that I was running into issues with namespaced console.logs such as window.console.log. Also Tweenmax.js has some interesting uses of console.log in some parts such as
window.console&&console.log(t)
So I used this
sed -i.bak s/[^\&a-zA-Z0-9\.]console.log\(/\\/\\//g js/combined.js
The regex effectively says replace all console.logs that don't start with &, alphanumerics, and . with a '//' comment, which uglify later takes out.
Rodrigocorsi's works with nested parentheses. I added a ? after the ; because yuicompressor was omitting some semicolons.
It is probable that the reason this is not working is that you are not 'limiting'
the regex to not include a closing parenthesises ()) in the method parameters.
Try this regular expression:
console\.(log|trace|error)\(([^)]+)\);
Remember to include the rest of your method names in the capture group.