How to remove only the matching string part - regex

I need to remove [PR:] from the [PR:Parker] which only print "ParkerS"
Note:[PR:xxxxxxx] "xxxxxxx" Part is changed time to time.
Upto now I have create a following sed command:
sed 's/[PR:]//g' | sed 's/[][]//g'
But it prints "arkerS" which missing the "P" in name too.

1st solution: With awk, with your shown samples, please try following code once. Using gsub function to globally substituting starting [ followed by PR: and ] ending with NULL and printing rest of the values of line.
awk '{gsub(/^\[PR:|\]$/,"")} 1' Input_file
2nd solution: Using different field separator(s) in awk code to grab 2nd last value as per shown samples, try following.
awk -F':|\\]' '{print $(NF-1)}' Input_file
3rd solution: Using match function of awk try following. Matching regex /:[^]]*/ from 1st occurrence of : to before ] occurs and printing the matched part only as per requirement.
awk 'match($0,/:[^]]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
4th solution: Using bash capability of parameter expansion here. In case you have this value in a shell variable then this will be BEST solution to go for.
##If your shown sample is in a shell variable, use parameter expansion then.
var="[PR:Parker]"
##Create interim variable var1 to remove everything from starting till : here.
var1="${var##*:}"
echo "$var1"
Parker]
##Then on var1 remove ] and get needed value here.
echo "${var1%*]}"
Parker
5th solution: Using perl one liner try following, performing global substitution to remove starting [PR: and ending ] with null.
perl -pe 's/^\[PR:|\]$//g' Input_file

You can use
sed 's/\[PR:\([^][]*\)]/\1/' <<< "[PR:Parker]"
Here, the \[PR:\([^][]*\)] matches [PR:, then any zero or more chars other than [ and ] are captured into Group 1 and a ] is matched, and the match is replaced with the Group 1 value (with \1 placeholder).
Or,
sed -E 's/\[PR:|]//g' <<< "[PR:Parker]"
See the online demo. Here, \[PR:|] matches either [PR: or ] and the s command removes them.

Related

Convert regex positive look ahead to sed operation

I would like to sed to find and replace every occurrence of - with _ but only before the first occurrence of = on every line.
Here is a dataset to work with:
ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"
In the end the dataset should look like this:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
I found this regex will match properly.
\-(?=.*=)
However the regex uses positive lookaheads and it appears that sed (even with -E, -e or -r) dose not know how to work with positive lookaheads.
I tried the following but keep getting Invalid preceding regular expression
cat dataset.txt | sed -r "s/-(?=.*=)/_/g"
Is it possible to convert this in a usable way with sed?
Note, I do not want to use perl. However I am open to awk.
You can use
sed ':a;s/^\([^=]*\)-/\1_/;ta' file
See the online demo:
#!/bin/bash
s='ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"'
sed ':a; s/^\([^=]*\)-/\1_/;ta' <<< "$s"
Output:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
Details:
:a - setting a label named a
s/^\([^=]*\)-/\1_/ - find any zero or more chars other than a = char from the start of string (while capturing into Group 1 (\1)) and then matches a - char, and replaces with Group 1 value (\1) and a _ (that replaces the found - char)
ta - jump to lable a location upon successful replacement. Else, stop.
You might also use awk setting the field separator to = and replace all - with _ for the first field.
To print only the replaced lines:
awk 'BEGIN{FS=OFS="="}gsub("-", "_", $1)' file
Output
ke_y_0_1="foo"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
If you want to print all lines:
awk 'BEGIN{FS=OFS="="}{gsub("-", "_", $1);print}' file

How to find and replace a pattern string using sed/perl/awk?

I have a file foo.properties with contents like
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.03,delta:1.0,gamma:.5
In my script, I need to replace whatever value is against ph (The current value is unknown to the bash script) and change it to 0.5. So the the file should look like
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5
I know it can be easily done if the current value is known by using
sed "s/\,ph\:0.03\,/\,ph\:0.5\,/" foo.properties
But in my case, I have to actually read the contents against allNames and search for the value and then replace within a for loop. Rest all is taken care of but I can't figure out the sed/perl command for this.
I tried using sed "s/\,ph\:.*\,/\,ph\:0.5\,/" foo.properties and some variations but it didn't work.
A simpler sed solution:
sed -E 's/([=,]ph:)[0-9.]+/\10.5/g' file
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5
Here we match ([=,]ph:) (i.e. , or = followed by ph:) and capture in group #1. This should be followed by 1+ of [0-9.] character to natch any number. In replacement we put \1 back with 0.5
With your shown samples, please try following awk code.
awk -v new_val="0.5" '
match($0,/,ph:[0-9]+(\.[0-9]+)?/){
val=substr($0,RSTART+1,RLENGTH-1)
sub(/:.*/,":",val)
print substr($0,1,RSTART) val new_val substr($0,RSTART+RLENGTH)
next
}
1
' Input_file
Detailed Explanation: Creating awk's variable named new_val which contains new value which needs to put in. In main program of awk using match function of awk to match ,ph:[0-9]+(\.[0-9]+)? regex in each line, if a match of regex is found then storing that matched value into variable val. Then substituting everything from : to till end of value in val variable with : here. Then printing values as pre requirement of OP(values before matched regex value with val(edited matched value in regex) with new value and rest of line), using next will avoid going further and by mentioning 1 printing rest other lines which are NOT having a matched value in it.
2nd solution: Using sub function of awk.
awk -v newVal="0.5" '/^allNames=/{sub(/,ph:[^,]*/,",ph:"newVal)} 1' Input_file
Would you please try a perl solution:
perl -pe '
s/(?<=\bph:)[\d.]+(?=,|$)/0.5/;
' foo.properties
The -pe option makes perl to read the input line by line, perform
the operation, then print it as sed does.
The regex (?<=\bph:) is a zero-length lookbehind which matches
the string ph: preceded by a word boundary.
The regex [\d.]+ will match a decimal number.
The regex (?=,|$) is a zero-length lookahead which matches
a comma or the end of the string.
As the lookbehind and the lookahead has zero length, they are not
substituted by the s/../../ operator.
[Edit]
As Dave Cross comments, the lookahead (?=,|$) is unnecessary as long as the input file is correctly formatted.
Works with decimal place or not, or no value, anywhere in the line.
sed -E 's/(^|[^-_[:alnum:]])ph:[0-9]*(.[0-9]+)?/ph:0.5/g'
Or possibly:
sed -E 's/(^|[=,[:space:]])ph:[0-9]+(.[0-9]+)?/ph:0.5/g'
The top one uses "not other naming characters" to describe the character immediately before a name, the bottom one uses delimiter characters (you could add more characters to either). The purpose is to avoid clashing with other_ph or autograph.
Here you go
#!/usr/bin/perl
use strict;
use warnings;
print "\nPerl Starting ... \n\n";
while (my $recordLine =<DATA>)
{
chomp($recordLine);
if (index($recordLine, "ph:") != -1)
{
$recordLine =~ s/ph:.*?,/ph:0.5,/g;
print "recordLine: $recordLine ...\n";
}
}
print "\nPerl End ... \n\n";
__DATA__
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.03,delta:1.0,gamma:.5
output:
Perl Starting ...
recordLine: allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5 ...
Perl End ...
Using any sed in any shell on every Unix box (the other sed solutions posted that use sed -E require GNU or BSD seds):
a) if ph: is never the first tag in the allNames list (as shown in your sample input):
$ sed 's/\(,ph:\)[^,]*/\10.5/' foo.properties
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5
b) or if it can be first:
$ sed 's/\([,=]ph:\)[^,]*/\10.5/' foo.properties
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5

Replace unknown sub-string in an URL

I have an URL in the format like https://foo.bar.whoo.dum.io, for which I like to replace the foo string with something else. Of course, the foo part is unknown and can be anything.
I tried with a simple regex like (.+?)\.(.+), but it seems that regex in Bash is always greedy (or?).
My best attempt is to split the string by . and then join it back with the first part left out, but I was wondering, whether there is a more intuitive, different solution.
Thank you
There are a lot of ways of getting the desired output.
If you're sure the url will always start with https://, we can use parameter expansion to remove everything before the first . and then add the replacement you need:
input="https://foo.bar.whoo.dum.io"
echo "https://new.${input#*.}"
Will output
https://new.bar.whoo.dum.io
Try it online!
You can use sed:
url='https://foo.bar.whoo.dum.io'
url=$(sed 's,\(.*://\)[^/.]*,\1new_value,' <<< "$url")
Here, the sed command means:
\(.*://\) - Capturing group 1: any text and then ://
[^/.]* - zero or more chars other than / and .
\1new_value - replaces the match with the Group 1 and new_value is appended to this group value.
See the online demo:
url='https://foo.bar.whoo.dum.io'
sed 's,\(.*://\)[^/.]*,\1new_value,' <<< "$url"
# => https://new_value.bar.whoo.dum.io
1st solution: Using Parameter expansion capability of bash here, adding this solution. Where newValue is variable with new value which you want to have in your url.
url='https://foo.bar.whoo.dum.io'
newValue="newValue"
echo "${url%//*}//$newValue.${url#*.}"
2nd solution: With your shown samples, please try following sed code here. Where variable url has your shown sample url value in it.
echo "$url" | sed 's/:\/\/[^.]*/:\/\/new_value/'
Explanation: Simple explanation would be, printing shell variable named url value by echo command and sending it as a standard input to sed command. Then in sed command using its capability of substitution here. Where substituting :// just before1st occurrence of . with ://new_value as per requirement.

bash regexp to extract part of URL

From the following URL:
https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]
I need to extract the following part:
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
I'm pretty bad at regex. I came up with the following but it doesn't work:
sed -n "s/^.*browser\(test-lab.*/.*/\).*$/\1/p"
Can anyone help with what I'm doing wrong?
Could you please try with awk solution also and let me know if this helps you.
echo "https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/" | awk '{sub(/.*browser\//,"");sub(/\/$/,"");print}'
Explanation: Simply, substituting everything till browser/ then substituting last / with NULL.
EDIT1: Adding a sed solution here too.
sed 's/\(.[^//]*\)\/\/\(.[^/]*\)\(.[^/]*\)\(.[^/]*\)\/\(.*\)/\5/' Input_file
Output will be as follows.
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
Explanation of sed command: Dividing the whole line into parts and using sed's ability to keep the matched regex into memory so here are the dividers I used.
(.[^//]):* Which will have the value till https: in it and if anyone wants to print it you could use \1 for it because this is very first buffer for sed.
//: Now as per URL // comes to mentioning them now.
(.[^/]):* Now comes the 2nd part for sed's buffer which will have value console.developers.google.com in it, because REGEX looks for very first occurrence of / and stops matching there itself.
(.[^/]) && (.[^/]) && /(.):* These next 3 occurrences works on same method of storing buffers like they will look for first occurrence of / and keep the value from last matched letter's next occurrence to till 1st / comes.
/\5/: Now I am substituting everything with \5 means 5th buffer which contains values as per OP's instructions.
Use a different sed delimiter and don't forget to escape the braces.
avinash:~/Desktop$ echo 'https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]' | sed 's~.*/browser/\([^/]*/[^/]*/\).*~\1~'
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
OR
Use grep with oP parameters.
avinash:~/Desktop$ echo 'https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]' | grep -oP '/browser/\K[^/]*/[^/]*/'
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/

Unix regex get only the first match

I have the following text:
NodeMetaData MapNodeId="105141" PageFormat="OsXml" UniqueIdentifier="fd0f9ade-88e1-4b04-b338-0a8884f66423" RelativePath="Test_03/AddressMap_MyAddressMap.os.xml" LastPulledRevision="-9223372036854775808" LastPulledMd5="" LastSyncedMd5="7D0C294B9A7C09F17FD5AC0414179DD414649455297B8F73125D7FB5E39D647D" HasMergeConflicts="false"
NodeMetaData MapNodeId="105142" Pag
eFormat="OsXml" UniqueIdentifier="85f55c40-f95c-47f2-9c97-d35881e8f762" RelativePath="Test_03/Struct_MyStruct.os.xml" LastPulledRevision="-922337203685477580
8" LastPulledMd5="" LastSyncedMd5="32364BCCBCD8AA9C47D8E09A3EB06667DD9476EB155F9411FA359EFA5C1A4F4F" HasMergeConflicts="false"
There are two MapNodeId (see bold) and I need to get only the first one and insert it to a file.
I used the following:
set WorkingCopyRI=`( sed -n 's/.*MapNodeId=\"// ; s/\" .*//p' Result.log)`
but the var contains the the id of both MapNodeId, what do I need to add in order to get only the first one?
You can append ;T;q to your script to make it quit after the second s instruction prints for the first time.
Here's a cleaner and more robust way to do the whole thing:
sed -n '/MapNodeId=/ { s/^.*\sMapNodeId="\([^"]*\)"\s .*$/\1/p; q }'
I'm assuming your ID-s won't contain double quotes -- if they can, you will have to modify the expression in group #1.
(Also, your formatting gives no clue as to whether your text occurs in multiple lines or not, but I'm assuming that the MapNodeId="..." parts appear on separate lines, otherwise you wouldn't have this problem.)
perl approach:
perl -ne 'print "$1\n" if /MapNodeId="([^"]+)"/' Result.log
The output:
105141
print "$1\n" - print the first captured group value
Or if you have grep PCRE support:
grep -Po '.*MapNodeId="\K([^"]+)' Result.log | head -n 1