Regex for this dashed pattern - regex

Would anyone have a suggestion for a regex that manipulates line that ends in:
,04-721-0G-00033-AU
and transform that string into:
,04,721,0G,00033,AU
(replaces all dashes after last comma in a string into commas)
Keep in mind that there could be preceding parts of the string that have dashes and commas, so what I know for sure is that the part of the line I want manipulated is a string that starts with a last comma in the line, ends with EOL and has this structure of ,XX-XXX-XX-XXXXX-XX
Any suggestions?
Thanks.

Match: ,(?=[^,]*$)(\w{2})-(\w{3})-(\w{2})-(\w{5})-(\w{2})$
Replace by: ,$1,$2,$3,$4,$5
How it works:
,(?=[^,]*$) selects the last , of the line (literally: the , that is only followed by anything but an other , until the end of the line).
after that, we try to match your XX-XXX-XX-XXXXX-XX with
(\w{2})-(\w{3})-(\w{2})-(\w{5})-(\w{2})
make sure that the end of the line has been reached by matching $
Then you just rewrite:
the ,
each XX group separated by a -.

Would this pattern (test replace) do what you like?
-(?=[^,]{1,15}$)
Replace with ,
Checks at hyphen, if there are 1-15 charcters left to end that are no commas using a look ahead, if so replaces with comma.
As no language is specified, for a multiline replace, you might want to add the m-modifier for multiline, for JS additional the g-modifier for global (test with modifiers).

Related

Using regex for repeating text in Notepad++

I have links like this:
https://d2ynliea65eb6o.cloudfront.net/6100052500-STXMLOPEN/sub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052499-STXMLOPEN/sub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052498-STXMLOPEN/sub_1.m3u8
How can I use a regex in Notepad++ to make them like this:
https://d2ynliea65eb6o.cloudfront.net/6100052500-STXMLOPEN/6100052500-STXMLOPENsub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052499-STXMLOPEN/6100052499-STXMLOPENsub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052498-STXMLOPEN/6100052498-STXMLOPENsub_1.m3u8
I want to repeat what is between net/ and /sub for each link.
I am assuming you want to repeat the characters before the last /.
You may try this regex:
Regex
([^/\n]+)/(?=[^/\n]+$)
Substitution
$1/$1
([^/\n]+) // any consecutive non-slash and non-linebreak characters, and capture them in group 1
/ // a slash
(?=[^/\n]+$) // lookahead, there must be non-slash and non-linebreak characters followed by the end of a line ahead
Check the proof
If you want to actually search for and repeat what's in between "net/" and "/sub" and repeat that then you can use:
(net/(.*?))/sub
replace with:
$1/$2sub
the second () ie (.*?) will create group $2 which will contain the variable text that occurs between net/ and /sub
the first (), which DOES NOT contain the /sub will contain the text up to, but not including the "/sub" text and put it into $1. If you want to include the "/sub" you would put the ")" on the right side of "/sub".
then $1/$2sub will be the concatenation of $1 with a "/" then $2 then "sub" then the remainder of the text

\1 not defined in the RE

In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")
Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.

Regex replace one value between comma separated values

I'm having a bunch of comma separated CSV files.
I would like to replace exact one value which is between the third and fourth comma. I would love to do this with Notepad++ 'Find in Files' and Replace functionality which could use RegEx.
Each line in the files look like this:
03/11/2016,07:44:09,327575757,1,5434543,...
The value I would like to replace in each line is always the number 1 to another one.
It can't be a simple regex for e.g. ,1, as this could be somewhere else in the line, so it must be the one after the third and before the fourth comma...
Could anyone help me with the RegEx?
Thanks in advance!
Two more rows as example:
01/25/2016,15:22:55,276575950,1,103116561,10.111.0.111,ngd.itemversions,0.401,0.058,W10,0.052,143783065,,...
01/25/2016,15:23:07,276581704,1,126731239,10.111.0.111,ll.browse,7.133,1.589,W272,3.191,113273232,,...
You can use
^(?:[^,\n]*,){2}[^,\n]*\K,1,
Replace with any value you need.
The pattern explanation:
^ - start of a line
(?:[^,\n]*,){2} - 2 sequences of
[^,\n]* - zero or more characters other than , and \n (matched with the negated character class [^,\n]) followed with
, - a literal comma
[^,\n]* - zero or more characters other than , and \n
\K - an operator that forces the regex engine to discard the whole text matched so far with the regex pattern
,1, - what we get in the match.
Note that \n inside the negated character classes will prevent overflowing to the next lines in the document.
You can replace value between third and fourth comma using following regex.
Regex: ([^,]+,[^,]+,[^,]+),([^,]+)
Replacement to do: Replace with \1,value. I used XX for demo.
Regex101 Demo
Notepad++ Demo

remove all commas between quotes with a vim regex

I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g
My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas

Regex to change all past a certain pattern to Uppercase

I have an xml file that has a value like
JOBNAME="JBDSR14353_Some_other_Descriptor"
I am looking for an expression that will go through the file and change all of the characters in the quotes to Uppercase letters. Is there a Regex expression that will search for JOBNAME="Anything within the quotes" and change them to uppercase? Or a command that will find JOBNAME= and change all on that line to uppercase letters? I know that can just do a search for JOBNAME= and then use a VU command in vim to throw the line to uppercase store that to a macro and run that, but I was wondering if there was a way to get this done with a regex??
Here's an alternative with :substitute, as you had originally intended. This works better than #Zach's solution with gU_ when there's other text in the line:
:%s/JOBNAME="[^"]\+"/\U&/g
"[^"]\+" matches the quoted text (non-greedily by matching only non-quotes inside, to handle multiple quotes in a line)
\U turns the remainder of the replacement uppercase
for simplicity, the entire match (&) is uppercased here, but one could have also used capture groups (\(...\)), or match limiting with \zs
You can use the :g command which executes a command on lines that match a pattern:
:g/JOBNAME/norm! gU_
This will execute the gU_, which capitalizes all letters on a line, on all the lines that match JOBNAME
If there are other things on the same line that you don't want to capitalize, here is a solution for only the words in quotes:
:g/JOBNAME/norm! f"gU;
f" goes to the next quote. gU capitalizes with a motion. The motion used is ; which searches for the next " (repeats the last f command).
To do this with substitution you can use the \U atom which makes everything after it uppercase.
:%s/JOBNAME="\zs.*\ze"/\U&
\zs and \ze mark the start and end of the match and & is the whole match. This means that only the part between quotes is replaced.