Notepad++ Regular Expression Replace Quotes Between Quoted Columns in CSV - regex

I have a comma separated CSV file with 3 quoted columns like this:
"this","is good","data"
Some rows have extra quotes in the second column:
"this","is "bad","data"
"this","is "really" bad","data"
This site (http://editplus.info/wiki/Search_and_Replace_Tricks#Delete_everything_inside_a_tag_pair_.28keeping_tags.29) has a RegEx string to select the text between tags ("," in this case), but I only want to replace any quote characters between the tags, not the whole string. Ideally I would only select those lines which have the offending quotes.
RegEx that selects whole second column:
(",").+(",")
RegEx that only selects the bad quotes or rows with them:
???
Any help is much appreciated. Thanks!

Ideally I would only select those lines which have the offending quotes.
^.*(?:,|^)"[^,"\n]*"[^,\n"]*".*$
DEMO
To replace the mismatched quotes with empty string, use the below regex.
([^,\n])"([^,\n])
Then replace the matched characters with \1\2
DEMO

Try this: It is tested in notepad++ for your all cases:
search by ([^,\n\r\t])"+([^,\n\r\t]) and replace with $1$2 (idea from #Avinash Raj)
Demo
Update for next requirement which is include comment
search by (^"|","|"$)|" and replace with $1
Update demo

Running a match using: /([\w ])"(?![,\n])/g
And substituting with: $1'
Replaces all offending double quotes with single quotes, producing:
"this","is 'bad","data"
"this","is 'really' bad","data"
Demo here: https://regex101.com/r/dL7jZ6/12 (Credit to Avinash Raj for finding the demo website)

Assuming the format is exactly how always how posted, I'd do something like:
[ ]".*?"

Related

Regexp finding string enclosed by quotes

I have a little problem with VS2010.
So I want to find a certain strings between quotes with a Regexpression but only if the line doesn't contain a single "tr" or "QObject::tr" example:
I want to display all this lines:
Hallotr("asa");
("hhajkshjkas");
( _"hhajkshjkas" );
But don't want to display this lines:
tr("hhajkshjkas");
QObject::tr("hhajkshjkas");
My Regexp looks like this:
[^t-r]"[a-zA-Z0-9<>=\\"" ]* and ^[^tr]*"[a-zA-Z0-9<>=\\"" ]*"
but it shows all lines even if there is a single tr on the beginning of the string. Or it shows only lines which don't contain tr.
Thanks for help guys.
So I got the solution this is my regexp. :
^~((.*QObject.+tr)|(:b*tr:b*\()).*:q

Replace commas between at's on notepad++

I have a CSV with data to import, the separator character is the comma here; but when the row or line has two e-mails, a comma separates them so the import fails at that point.
So I thought removing the commas between two at's when they're on the same line, but I don't know how.
If you have an alternative solucion, it'll be welcome too!!
Thanks.
Example:
ENTERPRISE1 S.L.,,ENTERPRISE1,999461678,,,,,,ent1#mail.com, ent1alternate#mail2.com,Spain,,,
ENTERPRISE2 S.A.,,ENTERPRISE2.,999859177,,,,,,ent2#mail.com,Italy,,,
Given your data doesn't use any escaping and the #-char will only be present in the mail column, you could use ((?:#|\G(?!^))[^,]+),([^,#]+#) as a search pattern and $1$2 for replace. This will also handle more than two mails in the column correctly. of course you can place a separator of choice between $1 and $2, like $1;$2
You can see it in action here.
You can do it with notepad:
search field:
([^#]+#[^,]+)\s*,\s*([^#]+#[^,]+)
Replace field:
\1|\2
Check regular expression checkbox
So
ent1#mail.com, ent1alternate#mail2.com
will be:
ent1#mail.com|ent1alternate#mail2.com
This will let you keep your column organization, and allow to process data and avoid any lost
Another option is to use to use the correct CSV formatting: double quotes around any field that contains the delimiter.
([^,]+#[^,]+,[^,]+#[^,]+)
Replace:
"\1"
(Regex adapted from destrif's answer).
Looks like in your example you always have a empty space after the comma separating multiple email addresses.If that's a generic rule, you should replace the ", " (comma + empty space) string by another separator like semicolon, using the ctrl+h to call the replace function.

NotePad++ Currency RegEx with Optional Replace

My Search pattern: \"(\$)(\d{0,3}?)\,?(\d{1,3}?)\,?(\d{0,3})\s?\"
Matches all of these:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"
I know I don't really need to search for under the thousands place, but am including those for possible future application.
My problem: I need to replace all of the commas with HTML escape char ,, but only if there is a comma present in the search result.
This replace pattern $1$2,$3,$4 gives the incorrect result, and I'm just not seeing the right pattern to use for my replacement.
$,1,
$,1,0
$,1,00
$,1,000
$,10,000
$,100,000
$1,000,000
$10,000,000
$100,000,000
This is the result I am attempting to get:
$1
$10
$100
$1,000
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
No Quotes and no extra space after the last digit.
I'm not married to having to find the 1's through 100's, but it is preferable.
Any ideas on how to do optional replace in NotePad++?
Use a regex Search and Replace: Replace (\d),(\d) with \1,\2. Check regular expression, click Replace or Replace all.
For some unknown reason, the RE of Sebastian from the comments above did not work with notepad++ 6.8.6 (find worked fine, but not replace). So instead of using look around, we capture the surrounding digits into \1 and \2 for reuse in the replacement.
Try following regex:
(?<=\d),(?=\d)
After running test on your dataset, I got result as:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"

Excel regex: Delete all content after last group

I have CSV file with breadcrumb (prestastop products)
I want to delete the content after last separator (product name), my structure is:
col1|col2|col3|product
I can delete with simple regex, problem is that number of separators is not always the same so for example ([^|]+/[^|]+/[^|]+/[^|]+|).* wont work.
Is there any way to do it with one regex?
I want:
col1|col2|col3|product
col1|col2|col3|col4|product
col1|col2|col3|col5|product
to become
col1|col2|col3
col1|col2|col3|col4
col1|col2|col3|col5
I think the simple way would be to read from right to left and not left to right...
Use the following regex to match the last part including the | symbol. Just replacing the matched characters with an empty string will give you the desired output.
Regex:
\|[^|]*$
REplacement string:
Empty string
DEMO
^(.*)\|.*
Try this.Replace by
$1
See demo.
http://regex101.com/r/jT3pG3/12

Regex find comma not inside quotes

I'm checking line by line in C#
Example data:
bob jones,123,55.6,,,"Hello , World",,0
jim neighbor,432,66.5,,,Andy "Blank,,1
john smith,555,77.4,,,Some value,,2
Regex to pick commas outside of quotes doesn't resolve second line, it's the closest.
Try the following regex:
(?!\B"[^"]*),(?![^"]*"\B)
Here is a demonstration:
regex101 demo
It does not match the second line because the " you inserted does not have a closing quotation mark.
It will not match values like so: ,r"a string",10 because the letter on the edge of the " will create a word boundary, rather than a non-word boundary.
Alternative version
(".*?,.*?"|.*?(?:,|$))
This will match the content and the commas and is compatible with values that are full of punctuation marks
regex101 demo
The below regex is for parsing each fields in a line, not an entire line
Apply the methodical and desperate regex technique: Divide and conquer
Case: field does not contain a quote
abc,
abc(end of line)
[^,"]*(,|$)
Case: field contains exactly two quotes
abc"abc,"abc,
abc"abc,"abc(end of line)
[^,"]*"[^"]*"[^,"]*(,|$)
Case: field contains exactly one quote
abc"abc(end of line)
abc"abc, (and that there's no quote before the end of this line)
[^,"]*"[^,"]$
[^,"]*"[^"],(?!.*")
Now that we have all the cases, we then '|' everything together and enjoy the resultant monstrosity.
The best answer written by Vasili Syrakis does not work with negative numbers inside quotation marks such as:
bob jones,123,"-55.6",,,"Hello , World",,0
jim neighbor,432,66.5
Following regex works for this purpose:
,(?!(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$))
But I was not successful with this part of input:
,Andy "Blank,
try this pattern ".*?"(*SKIP)(*FAIL)|, Demo
import re
print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)