Using sed to replace string matching regex with wildcards - regex

I have a string I'm trying manipulate with sed
js/plex.js?hash=f1c2b98&version=2.4.23"
Desired output is
js/plex.js"
This is what I'm currently trying
sed -i s'/js\/plex.js[\?.\+\"]/js\/plex.js"/'
But it is only matching the first ? and returns this output
js/plex.js"hash=f1c2b98&version=2.4.23"
I can't see why this isn't working after a few hours

This works
echo 'js/plex.js?hash=f1c2b98&version=2.4.23"' | sed s:.js?.*:.js:g
With the original Regex:
Firstly I would suggest use a different delimiter (like : in sed when using / in the regex. Secondly, the use of [] means that you are matching the characters inside the brackets (and as such it will not expand the .+ to the end of the line - you could potentially try put the + after the [])

perhaps
sed 's#\(js/plex.js?\)[^"]\+".*#\1#g'
..
\# is used as a delimiter
\(js/plex.js?\)[^"]\+".* #find this pattern and replace everything with your marked pattern \1 found
The marked pattern
In sed you can mark part of a pattern or the whole pattern buy using \( \). .
When part of a pattern is enclosed by brackets () escaped by backslashes..the pattern is marked/stored...
in my example this is my pattern without marking
js/plex.js?[^"]\+".*
but I only want sed to remember js/plex.js? and replace the whole line with only this piece of pattern js/plex.js? ..with sed the first marked pattern is known as \1, the second \2 and so forth
\(js/plex.js?\) ---> is marked as \1
Hence I replace the whole line with \1

Related

How can I use SED to replace a specific character in a substring

So, i have a csv file with multiple lines like
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3"","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"
I would like to replace the "", with ", only for the column 8 or where its after GAUZE PACKING STRIPS 1/4, or ACE WRAP 3 without touching other "", in the line.
Have tried sed 's/[[:alnum:]]""//g' file.csv but it removes <num>"" as well.
Any ideas? Much appreciated!
You can use capture groups to match and replace anything that is between double quotes and followed immediately by double quotes.
The regex to match would look something like this: ("[^",]*")". Note two things: the first one is that " are matched literally and the expression in the middle [^",]* just means that the regex will match anything except a " or a ,. This means it will prevent the matched string from having a quote inside.
Lastly, the parenthesis are a capture group and we can reference anything that matched the sub-regex between the () with a backslash and a number. For example, \1 will be replaced by the match of the first capture group, \3 with the third and so on.
The sed script for what you need may look something like this:
sed -re 's/("[^",]*")"/\1/g'
See how the last double quote is outside the capture group, and it will not be replaced with the \1.
Capture groups are a feature of Extended Regular Expressions (ERE), so the flag -r is needed to enable them in sed, otherwise it will use Basic Regular Expressions (BRE).
Notice also the /g at the end. This is needed for sed to be able to match and replace more than one occurrence in the same line.
Example:
$ cat test
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00",""","XXX","XXX","2019-02-12T23:57:48-06:00"","XXX-XXX-176568981"
$ cat test | sed -re 's/("[^",]*")"/\1/g'
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
Using awk:
$ awk '
BEGIN { FS=OFS="," } # set delimiters
{
if($8!="\"\"") # if $8 is not empty ie. ""
sub(/\"\"$/,"\"",$8) # replace trailing double quotes with a single double quote
}1' file # output
Output:
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"

capturing each word containing pattern regex

I'm trying to write a sed script that finds every word that contains a certain pattern and then prepends all words that contain that pattern. For example:
foobarbaz barfoobaz barbazfoo barbaz
might turn into:
quxfoobarbaz quxbarfoobaz quxbarbazfoo barbaz
I understand the basics of capture groups and backrefrences, but I'm still having trouble. Specifically I can't get it so that it captures each whole word separately.
s/\(.*\)men\(.*\)/ not just the \1men\2, but the \1women\2 and \1children\2 too /
I tried using \s, for whitespace as many sites recommend, but sed treats \s as the separate characters \ and s
You could use the non-space character \S as follows:
sed 's/\S*foo\S*/qux&/g' <<< "foobarbaz barfoobaz barbazfoo barbaz"
this will match words containing foo. The replacement string qux& will prepend every matched pattern with qux. Output:
quxfoobarbaz quxbarfoobaz quxbarbazfoo barbaz
It works fine if no spaces in each word.
echo "foobarbaz barfoobaz barbazfoo barbaz" | sed 's/\([^ ]*foo[^ ]*\)/qux\1/g'

Why doesn't sed interpret this regex properly?

echo "This is a test string" | sed 's/This/\0/'
First I match substring This using the regex This. Then I replace the entire string with the first match using \0. So the result should be just the matched string.
But it prints out the entire line. Why is this so?
You don't replace the whole string with \0, just the pattern match, which is This. In other words, you replace This with This.
To replace the whole line with This, you can do:
echo "This is a test string" | sed '/This/s/.*/This/'
It looks for a line matching This, and replaces the whole line with This. In this case (since there is only one line) you can also do:
echo "This is a test string" | sed 's/.*/This/'
If you want to reuse the match, then you can do
echo "This is a test string" | sed 's/.*\(This\).*/\1/'
\( and \) are used to remember the match inside them. It can be referenced as \1 (if you have more than one pair of \( and \), then you can also use \2, \3, ...).
In the example above this is not very helpful, since we know that inside \( and \) is the word This, but if we have a regex inside the parentheses that can match different words, this can be very helpful.
sed 's/.*\(PatThis\).*/PatThat/'
or
se '/PatThis/ s/.*/PatThat/'
In your request "PatThis" and "PatThat" are the same contain ("This"). In the comment (
I need to select a number using \d\d\d\d and then use it as
replacement
) you have 2 different value for the pattern PatThis and PatThat
the \1 is not really needed because you know the exact contain (unless 'PatThis' is a regex with special char like \ & ? .)

How can I find and replace between ( ) characters using regex?

I want to change a string just like below. But I couldn't find out the exact regex pattern.
Strings like:
Stack Overflow (1234)
Stack exchange (12)
That I want is to proceed like
Stack Overflow
Stack Exchange
I'm using Notepad++, UltraEdit etc. Also It would be very useful to try sed command too .
Thanks everybody
Try using this find:
\s+\([^)]+\)
And replace by nothing.
\s+ matches spaces.
\( matches an opening parenthesis.
[^)]+ matches any character except a closing parenthesis.
) matches a closing parenthesis.
[(*)] will match any one of (, * or ) because they are in a character class.
You can otherwise use \s+\(.*?\) as well, but it's not as safe as the regex above. In regex, the dot is the wildcard and brackets are used for capture; that's why I had to escape them with backslashes. You don't need to escape them in a character class, like for instance, you can use this: \s+[(].*?[)] though it's a bit longer!
Don't know about Notepad++ but using sed it is a simple command:
sed -i.bak 's/ *(.*$//' file
-i is for inline editing (it will save the converted file with original saved as file.bak)
Replace (Ctrl+H):
^(.+?)\s*\(\d+\)$
By:
$1
This is how it works:
Everything in Group 1 is kept, the rest is dropped.
Using awk you can do this:
awk '{sub(/ \(.+\)/,x)}1' file

Regular expression to match beginning and end of a line?

Could anyone tell me a regex that matches the beginning or end of a line? e.g. if I used sed 's/[regex]/"/g' filehere the output would be each line in quotes? I tried [\^$] and [\^\n] but neither of them seemed to work. I'm probably missing something obvious, I'm new to these
Try:
sed -e 's/^/"/' -e 's/$/"/' file
To add quotes to the start and end of every line is simply:
sed 's/.*/"&"/g'
The RE you were trying to come up with to match the start or end of each line, though, is:
sed -r 's/^|$/"/g'
Its an ERE (enable by "-r") so it will work with GNU sed but not older seds.
matthias's response is perfectly adequate, but you could also use a backreference to do this. if you're learning regular expressions, they are a handy thing to know.
here's how that would be done using a backreference:
sed 's/\(^.*$\)/"\1"/g' file
at the heart of that regex is ^.*$, which means match anything (.*) surrounded by the start of the line (^) and the end of the line ($), which effectively means that it will match the whole line every time.
putting that term inside parenthesis creates a backreference that we can refer to later on (in the replace pattern). but for sed to realize that you mean to create a backreference instead of matching literal parentheses, you have to escape them with backslashes. thus, we end up with \(^.*$\) as our search pattern.
the replace pattern is simply a double quote followed by \1, which is our backreference (refers back to the first pattern match enclosed in parentheses, hence the 1). then add your last double quote to end up with "\1".