How to use regex to find and replace a specific text?

How to use regex to find and replace a specific text? - regex

I have the following code to replace the value between "-sav" and "test"
test="-val https://www.randomurl.com -sav 1aFAd381cCCb86FD300e7a3A399a6014.test -speed 2 -save -delay 14"
test=$(sed 's/-sav *.*\./-sav 12345./g' <<< $test)
echo $test
# -val https://www.randomurl.com -sav 12345.test -speed 2 -save -delay 14
How do I also include what is before -speed 2. Expected new variable value could be.
test="-val https://www.randomurl.com -sav 12345.itWorks -speed 2 -save -delay 14"

With your shown samples, please try following sed code. Have applied -E flag to enable ERE(extended regular expression) with sed program here and where test is OP's mentioned shell variable.
echo "$test" | sed -E 's/(^-.*sav )[^.]*\.[^ ]*(.*)/-sav (\1 12345.itworks\2)/g'
Explanation: Simple explanation would be, passing echo command's output as standard input to sed command. In sed program using regex (^-.*sav )[^.]*\.[^ ]*(.*), which creates 2 capturing groups in matching regex portion and then while substituting it using these 2 capturing groups(1st and 2nd one) along with newly required value 12355 as per OP's requirement.
NOTE: I am using g flag here with sed to perform substitution Globally in case you have only 1 match in your variable then remove it.

Related

How to remove only the matching string part

I need to remove [PR:] from the [PR:Parker] which only print "ParkerS"
Note:[PR:xxxxxxx] "xxxxxxx" Part is changed time to time.
Upto now I have create a following sed command:
sed 's/[PR:]//g' | sed 's/[][]//g'
But it prints "arkerS" which missing the "P" in name too.

1st solution: With awk, with your shown samples, please try following code once. Using gsub function to globally substituting starting [ followed by PR: and ] ending with NULL and printing rest of the values of line.
awk '{gsub(/^\[PR:|\]$/,"")} 1' Input_file
2nd solution: Using different field separator(s) in awk code to grab 2nd last value as per shown samples, try following.
awk -F':|\\]' '{print $(NF-1)}' Input_file
3rd solution: Using match function of awk try following. Matching regex /:[^]]*/ from 1st occurrence of : to before ] occurs and printing the matched part only as per requirement.
awk 'match($0,/:[^]]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
4th solution: Using bash capability of parameter expansion here. In case you have this value in a shell variable then this will be BEST solution to go for.
##If your shown sample is in a shell variable, use parameter expansion then.
var="[PR:Parker]"
##Create interim variable var1 to remove everything from starting till : here.
var1="${var##*:}"
echo "$var1"
Parker]
##Then on var1 remove ] and get needed value here.
echo "${var1%*]}"
Parker
5th solution: Using perl one liner try following, performing global substitution to remove starting [PR: and ending ] with null.
perl -pe 's/^\[PR:|\]$//g' Input_file

You can use
sed 's/\[PR:\([^][]*\)]/\1/' <<< "[PR:Parker]"
Here, the \[PR:\([^][]*\)] matches [PR:, then any zero or more chars other than [ and ] are captured into Group 1 and a ] is matched, and the match is replaced with the Group 1 value (with \1 placeholder).
Or,
sed -E 's/\[PR:|]//g' <<< "[PR:Parker]"
See the online demo. Here, \[PR:|] matches either [PR: or ] and the s command removes them.

Unix regex get only the first match

I have the following text:
NodeMetaData MapNodeId="105141" PageFormat="OsXml" UniqueIdentifier="fd0f9ade-88e1-4b04-b338-0a8884f66423" RelativePath="Test_03/AddressMap_MyAddressMap.os.xml" LastPulledRevision="-9223372036854775808" LastPulledMd5="" LastSyncedMd5="7D0C294B9A7C09F17FD5AC0414179DD414649455297B8F73125D7FB5E39D647D" HasMergeConflicts="false"
NodeMetaData MapNodeId="105142" Pag
eFormat="OsXml" UniqueIdentifier="85f55c40-f95c-47f2-9c97-d35881e8f762" RelativePath="Test_03/Struct_MyStruct.os.xml" LastPulledRevision="-922337203685477580
8" LastPulledMd5="" LastSyncedMd5="32364BCCBCD8AA9C47D8E09A3EB06667DD9476EB155F9411FA359EFA5C1A4F4F" HasMergeConflicts="false"
There are two MapNodeId (see bold) and I need to get only the first one and insert it to a file.
I used the following:
set WorkingCopyRI=`( sed -n 's/.*MapNodeId=\"// ; s/\" .*//p' Result.log)`
but the var contains the the id of both MapNodeId, what do I need to add in order to get only the first one?

You can append ;T;q to your script to make it quit after the second s instruction prints for the first time.
Here's a cleaner and more robust way to do the whole thing:
sed -n '/MapNodeId=/ { s/^.*\sMapNodeId="\([^"]*\)"\s .*$/\1/p; q }'
I'm assuming your ID-s won't contain double quotes -- if they can, you will have to modify the expression in group #1.
(Also, your formatting gives no clue as to whether your text occurs in multiple lines or not, but I'm assuming that the MapNodeId="..." parts appear on separate lines, otherwise you wouldn't have this problem.)

perl approach:
perl -ne 'print "$1\n" if /MapNodeId="([^"]+)"/' Result.log
The output:
105141
print "$1\n" - print the first captured group value
Or if you have grep PCRE support:
grep -Po '.*MapNodeId="\K([^"]+)' Result.log | head -n 1

Using sed, can I make changes only to the regex match portion of a line?

Can I limit string substitution to just part of a line that matches a regex?
For example:
A this matches Z this does not
And, I want to replace this with that but only within the substring matched by:
A[^Z]*Z
That is, the only portion of the line that may be operated on is as is shown below in bold:
A this matches Z this does not
So, that I am looking for the result:
A that matches Z this does not
However, see the actual results for something that I've attempted operate on the entire line:
% sed '/A[^Z]*Z/ {
s/this/that/g
}' <<<"A this matches Z this does not"
A that matches Z that does not
%
The above example is for illustration purposes.
Recap: Is there any general solution using sed to make changes only to a regex match portion of a line? If the answer is "no," then is there a solution that uses only software that is installed in a CentOS 7 minimal configuration (such as awk)? Also, I don't want to rely on third-party packages.
My environment:
CentOS 7.3 [kernel-3.10.0-514.6.1.el7.x86_64]
sed (GNU sed) 4.2.2 [sed-4.2.2-5.el7.x86_64]
Bash 4.2.46(1) [bash-4.2.46-21.el7_3.x86_64]

If perl is available:
$ echo 'A this matches Z this does not' | perl -pe 's/A[^Z]*Z/$&=~s|this|that|gr/ge'
A that matches Z this does not
g modifier to replace all occurrences of matched text
e evaluation modifier allows to use Perl code in replacement section of substitute
$&=~s|this|that|gr expression to perform substitution only in matched text, r modifier gives back result without changing value of $&
Further reading:
http://perldoc.perl.org/perlrequick.html
http://perldoc.perl.org/perlrun.html#Command-Switches

You can use a regular expression with capturing groups to match the part of the line you want, and replace only part of it.
sed 's/\(A[^Z]*\)this\([^Z]*Z\)/\1that\2/'

Use the following approach:
echo "A this matches Z this does not" | sed -r 's/(A[^Z]*)\bthis\b([^Z]*Z)/\1that\2/g'
The output:
A that matches Z this does not

If all you can use is sed, it may be done in bash like this:
#!/bin/bash
str="This does not A this matches Z this also does not"
regex='^\(.*\)\(A[^Z]*Z\)\(.*\)$'
a=$(sed -e 's/'"$regex"'/\1/' <<<"$str")
b=$(sed -e 's/'"$regex"'/\2/' -e 's/this/that/g' <<<"$str")
c=$(sed -e 's/'"$regex"'/\3/' <<<"$str")
echo "$a$b$c"
Or, you may use awk (faster):
#!/bin/bash
str="This does not A this matches Z this also does not"
awk -vreg='A[^Z]*Z' '{
split($0,a,reg,s);
printf("%s%s%s\n",a[1],gensub(/this/,"that","g", s[1]),a[2])
}' <<<"$str"

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.

You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38

Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.

There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

In GNU Grep or another standard bash command, is it possible to get a resultset from regex?

Consider the following:
var="text more text and yet more text"
echo $var | egrep "yet more (text)"
It should be possible to get the result of the regex as the string: text
However, I don't see any way to do this in bash with grep or its siblings at the moment.
In perl, php or similar regex engines:
$output = preg_match('/yet more (text)/', 'text more text yet more text');
$output[1] == "text";
Edit: To elaborate why I can't just multiple-regex, in the end I will have a regex with multiple of these (Pictured below) so I need to be able to get all of them. This also eliminates the option of using lookahead/lookbehind (As they are all variable length)
egrep -i "([0-9]+) +$USER +([0-9]+).+?(/tmp/Flash[0-9a-z]+) "
Example input as requested, straight from lsof (Replace $USER with "j" for this input data):
npviewer. 17875 j 11u REG 8,8 59737848 524264 /tmp/FlashXXu8pvMg (deleted)
npviewer. 17875 j 17u REG 8,8 16037387 524273 /tmp/FlashXXIBH29F (deleted)
The end goal is to cp /proc/$var1/fd/$var2 ~/$var3 for every line, which ends up "Downloading" flash files (Flash used to store in /tmp but they drm'd it up)
So far I've got:
#!/bin/bash
regex="([0-9]+) +j +([0-9]+).+?/tmp/(Flash[0-9a-zA-Z]+)"
echo "npviewer. 17875 j 11u REG 8,8 59737848 524264 /tmp/FlashXXYOvS8S (deleted)" |
sed -r -n -e " s%^.*?$regex.*?\$%\1 \2 \3%p " |
while read -a array
do
echo /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
It cuts off the first digits of the first value to return, and I'm not familiar enough with sed to see what's wrong.
End result for downloading flash 10.2+ videos (Including, perhaps, encrypted ones):
#!/bin/bash
lsof | grep "/tmp/Flash" | sed -r -n -e " s%^.+? ([0-9]+) +$USER +([0-9]+).+?/tmp/(Flash[0-9a-zA-Z]+).*?\$%\1 \2 \3%p " |
while read -a array
do
cp /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done

Edit: look at my other answer for a simpler bash-only solution.
So, here the solution using sed to fetch the right groups and split them up. You later still have to use bash to read them. (And in this way it only works if the groups themselves do not contain any spaces - otherwise we had to use another divider character and patch read by setting $IFS to this value.)
#!/bin/bash
USER=j
regex=" ([0-9]+) +$USER +([0-9]+).+(/tmp/Flash[0-9a-zA-Z]+) "
sed -r -n -e " s%^.*$regex.*\$%\1 \2 \3%p " |
while read -a array
do
cp /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
Note that I had to adapt your last regex group to allow uppercase letters, and added a space at the beginning to be sure to capture the whole block of numbers. Alternatively here a \b (word limit) would have worked, too.
Ah, I forget mentioning that you should pipe the text to this script, like this:
./grep-result.sh < grep-result-test.txt
(provided your files are named like this). Instead you can add a < grep-result-test after the sed call (before the |), or prepend the line with cat grep-result-test.txt |.
How does it work?
sed -r -n calls sed in extended-regexp-mode, and without printing anything automatically.
-e " s%^.*$regex.*\$%\1 \2 \3%p " gives the sed program, which consists of a single s command.
I'm using % instead of the normal / as parameter separator, since / appears inside the regex and I don't want to escape it.
The regex to search is prefixed by ^.* and suffixed by .*$ to grab the whole line (and avoid printing parts of the rest of the line).
Note that this .* grabs greedy, so we have to insert a space into our regexp to avoid it grabbing the start of the first digit group too.
The replacement text contains of the three parenthesed groups, separated by spaces.
the p flag at the end of the command says to print out the pattern space after replacement. Since we grabbed the whole line, the pattern space consists of only the replacement text.
So, the output of sed for your example input is this:
5 11 /tmp/FlashXXu8pvMg
5 17 /tmp/FlashXXIBH29F
This is much more friendly for reuse, obviously.
Now we pipe this output as input to the while loop.
read -a array reads a line from standard input (which is the output from sed, due to our pipe), splits it into words (at spaces, tabs and newlines), and puts the words into an array variable.
We could also have written read var1 var2 var3 instead (preferably using better variable names), then the first two words would be put to $var1 and $var2, with $var3 getting the rest.
If read succeeded reading a line (i.e. not end-of-file), the body of the loop is executed:
${array[0]} is expanded to the first element of the array and similarly.
When the input ends, the loop ends, too.

This isn't possible using grep or another tool called from a shell prompt/script because a child process can't modify the environment of its parent process. If you're using bash 3.0 or better, then you can use in-process regular expressions. The syntax is perl-ish (=~) and the match groups are available via $BASH_REMATCH[x], where x is the match group.

After creating my sed-solution, I also wanted to try the pure-bash approach suggested by Mark. It works quite fine, for me.
#!/bin/bash
USER=j
regex=" ([0-9]+) +$USER +([0-9]+).+(/tmp/Flash[0-9a-zA-Z]+) "
while read
do
if [[ $REPLY =~ $regex ]]
then
echo cp /proc/${BASH_REMATCH[1]}/fd/${BASH_REMATCH[2]} ~/${BASH_REMATCH[3]}
fi
done
(If you upvote this, you should think about also upvoting Marks answer, since it is essentially his idea.)
The same as before: pipe the text to be filtered to this script.
How does it work?
As said by Mark, the [[ ... ]] special conditional construct supports the binary operator =~, which interprets his right operand (after parameter expansion) as a extended regular expression (just as we want), and matches the left operand against this. (We have again added a space at front to avoid matching only the last digit.)
When the regex matches, the [[ ... ]] returns 0 (= true), and also puts the parts matched by the individual groups (and the whole expression) into the array variable BASH_REMATCH.
Thus, when the regex matches, we enter the then block, and execute the commands there.
Here again ${BASH_REMATCH[1]} is an array-access to an element of the array, which corresponds to the first matched group. ([0] would be the whole string.)
Another note: Both my scripts accept multi-line input and work on every line which matches. Non-matching lines are simply ignored. If you are inputting only one line, you don't need the loop, a simple if read ; then ... or even read && [[ $REPLY =~ $regex ]] && ... would be enough.

echo "$var" | pcregrep -o "(?<=yet more )text"

Well, for your simple example, you can do this:
var="text more text and yet more text"
echo $var | grep -e "yet more text" | grep -o "text"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to use regex to find and replace a specific text? - regex

Related

How to remove only the matching string part

Unix regex get only the first match

Using sed, can I make changes only to the regex match portion of a line?

How can I use sed to regex string and number in bash script

In GNU Grep or another standard bash command, is it possible to get a resultset from regex?

Categories

Resources