Regex match string part on specific lines - regex

I need to run advanced find and replace using regex. I have a CSV similar to the following:
"Item 1a,,,,
,,Item 1b,,,,
,,Item 1c"
"Item 2a,,,,
,,Item 2b,,,,"
I need to remove the trailing commas for lines that start with a " quote.
I can match the correct lines like so:
(".*?),,,,$
The problem is, that selects the entire row, rather than just the trailing commas.
Anybody know how to match this correctly? so that only the commas are matched, on lines that start with " quote.

You are already capturing the content before the commas, just put it all back using a back reference in the replace:
Search: ^(".*?),+$
Replace: $1
Note: You need to anchor you regex to start of line ^ to match a quote there (otherwise it will match a quote anywhere in the line).

Related

Using regex for repeating text in Notepad++

I have links like this:
https://d2ynliea65eb6o.cloudfront.net/6100052500-STXMLOPEN/sub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052499-STXMLOPEN/sub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052498-STXMLOPEN/sub_1.m3u8
How can I use a regex in Notepad++ to make them like this:
https://d2ynliea65eb6o.cloudfront.net/6100052500-STXMLOPEN/6100052500-STXMLOPENsub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052499-STXMLOPEN/6100052499-STXMLOPENsub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052498-STXMLOPEN/6100052498-STXMLOPENsub_1.m3u8
I want to repeat what is between net/ and /sub for each link.
I am assuming you want to repeat the characters before the last /.
You may try this regex:
Regex
([^/\n]+)/(?=[^/\n]+$)
Substitution
$1/$1
([^/\n]+) // any consecutive non-slash and non-linebreak characters, and capture them in group 1
/ // a slash
(?=[^/\n]+$) // lookahead, there must be non-slash and non-linebreak characters followed by the end of a line ahead
Check the proof
If you want to actually search for and repeat what's in between "net/" and "/sub" and repeat that then you can use:
(net/(.*?))/sub
replace with:
$1/$2sub
the second () ie (.*?) will create group $2 which will contain the variable text that occurs between net/ and /sub
the first (), which DOES NOT contain the /sub will contain the text up to, but not including the "/sub" text and put it into $1. If you want to include the "/sub" you would put the ")" on the right side of "/sub".
then $1/$2sub will be the concatenation of $1 with a "/" then $2 then "sub" then the remainder of the text

replace single-quote with double-quote, if and only if quote is after specific string

I'm working in notepad++, and using its find-replace dialog box.
NP++ documentation states: Notepad++ regular expressions use the Boost regular expression library v1.70, which is based on PCRE (Perl Compatible Regular Expression) syntax. ref: https://npp-user-manual.org/docs/searching
What I'm trying to do should be simple, but I'm a regex novice, and after 2-3 hrs of web searches and playing with online regex testers, I give up.
I want to replace all single quotes ' with double quote " , but if and only if the ' is to the RIGHT of one or more #, ie inside a python comment.
For example,
list1 = ['apple','banana','pear'] # All 'single quotes' to LEFT of # remained unchanged.
list2 = ['tomato','carrot'] # All 'single quotes' to RIGHT of one or more # are replaced
# # with "double quotes", like this.
The np++ file is over 800 lines, manual replacement would be tedious & error prone. Advice appreciated.
This regex should do what you want:
(^[^#]*#|(?<!^)\G)[^'\n]*\K'
It looks for a ' which is preceded by either
^[^#]*# : start of line and some number of non-# characters followed by a #; or
(?<!^)\G : the start of line or the end of the previous match (\G), with a negative lookbehind for start of line (?<!^), meaning that it only matches at the end of the previous match
and then some number of non ' or newline (to prevent the match wrapping around the end of the previous line) characters [^'\n]*.
We then use \K to reset the match, so that everything before that is discarded from the match, and the regex only matches the '.
That can then be replaced with ".
Demo on regex101
Update
You can avoid matching apostrophes within words by only matching ones that are either preceded or followed by a non-word character:
(^[^#]*#|(?<!^)\G)[^'\n]*\K('(?=\W)|(?<=\W)')
Demo on regex101
Update 2
You can also deal with the case where there are # characters in strings by qualifying the first part of the regex with the requirement for there to be matched pairs of quotes beforehand:
(?:^[^'#]*(?:'[^']*'[^#']*)*[^'#]*#|(?<!^)\G)[^'\n]*\K(?:'(?=\W)|(?<=\W)')
Demo on regex101

Regex - replace blank spaces in line (Notepad++)

I have a document with multiple information. What I want is to build a Notepad++ Regex replace function, that finds the following lines in the document and replaces the blank spaces between the "" with an underline (_).
Example:
The line is:
&LOG Part: "NAME TEST.zip"
The result should be:
&LOG Part: "NAME_TEST.zip"
The perfect solution would be that the regex finds the &LOG Part: "NAME TEST.zip" lines and replaces the blank space with an underline.
What I have tried for now is this expression to find the text between the " ":
\"[^"]*\"
It should do it, but I don't know which expression to use to replace the blank spaces with an underline.
Anyone could help with a solution?
Thanks!
The \"[^"]*\" will only match whole substrings from " up to another closest " without matching individual spaces you want to replace.
Since Notepad++ does not support infinite width lookbehind, the only possible solution is using the \G - based regex to set the boundaries and use multiple matching (this one will replace consecutive spaces with 1 _):
(?:"|(?!^)\G)\K([^ "]*) +(?=[^"]*")
Or (if each space should be replaced with an underscore):
(?:"|(?!^)\G)\K([^ "]*) (?=[^"]*")
And replace with $1_. If you need to restrict to replacing inside &LOG Part only, just add it to the beginning:
(?:&LOG Part:\s*"|(?!^)\G)\K([^ "]*) (?=[^"]*")
A human-readable explanation of the regex:
(?:"|(?!^)\G)\K - Find a ", or, with each subsequent successful match, the end of the previous successful match position, and omit all the text in the buffer (thanks to \K)
([^ "]*) - (Group 1, accessed with$1from the replacement pattern) 0+ characters other than a space and"`
+ - one or more literal spaces (replace with \h to match all horizontal whitespace, or \s to match any whitespace)
(?=[^"]*") - check if there is a double quote ahead of the current position

remove all commas between quotes with a vim regex

I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g
My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas

Notepad++ RegEx Search/Replace: How to append and prepend a character at start and end of each file line?

How to append and prepend a character at start and end of each file line?
I have this file structure:
140","Bosnia
160","Croatia
170","Serbia
180","Montenegro
200","Slovenia
What I need is to add a double quote " at the start and at the end of each file line, using regular expressions in Notepad++ editor.
Thanks!
Just search for
(.*)
and replace with
"\1"
with regular expression option activated. Regular expressions are working only on a row bases, so (.*) matches the complete row and because of the brackets around you can access the match using \1.
Try searching ^(.*)$ and replacing by "$1".
bye ;)
You can match the whole, even an empty line, with
^.*$
You can match a non-empty line with
^.+$
You may match a non-blank line with
^\h*\S.*$
Now, all you need to do to wrap these lines with any text of your choice, you need to use the backreference to the whole match (see Replace with whole match value using Notepad++ regex search and replace):
"$0"
"$&"
"$MATCH"
"${^MATCH}"
If you need to wrap the whole line with parentheses, you will need to escape them since ( and ) are "special" in the Notepad++ replacement pattern, \($&\).
Whenever you need to insert a backslash, make sure you double it, \\$&\\.