Replace a comma in text values in CSV using regex in Notepad++ - regex

I searched a lot but couldn't find any exact soluion.
I have a CSV which contains some values that contains a comma in between the values.
Following is a sample row
"BEIAAGJIPAMBPJIF",2757,08042010,"13:53.59",09042010,"01:55.39","SIHAM","BEIAIGHEIPLGPJIF",20,"A",20,"S",0.00,0.00,0.00,"OLY
SPECIAL ORDER","IN STOCK , DESIGNER",0.00000,0,"","N","N",
Now it you look at the value "IN STOCK , DESIGNER", it containts a comma in between. due to which while reading the csv in my .net application and in MS Dynamics CRM import file wizard, it breaks it into two seprate values instead of one single value.
I need a regex that can match such strings and replace the comma with a hyphen "-" that I can use in Notepad ++.
Kindly help.
Thanks.

This solution worked for me, although it is a bit indirect:
by searching, detect character which is unused in the file, e.g. #
use the following regex replace to replace all delimiters: find: (".*?"|.*?), replace: \1# (note the character from step 1)
now, all leftover commas are only those which are inside the quotes. Mass replace them for -
replace back all #'s for commas

Related

Regex Find Spaces between single qoutes and replace with underscore

I have a database table that I have exported. I need to replace the image file name with a space and would like to use notepad++ and regex to do so. I have:
'data/green tea powder.jpg'
'data/prod_img/lumina herbal shampoo.JPG'
'data/ALL GREEN HERBS.jpeg'
'data/prod_img/PSORIASIS KIT (640x530) (2).jpg'
and need to make them look like this:
'data/green_tea_powder.jpg'
'data/prod_img/lumina_herbal_shampoo.JPG'
'data/ALL_GREEN_HERBS.jpeg'
'data/prod_img/PSORIASIS_KIT_(640x530)_(2).jpg'
I just want to change the spaces between the quotes (I don't want to change the capitalization). To be more specific I would like to replace any and all spaces between 'data/ and ' because there are other spaces between quotes in the DB, for example:
'data/ REPLACE ANY SPACE HERE '
I found this:
\s(?!(?:[^']*'[^']*')*[^']*$)
but there are other places where there are spaces between quotes so I'd like to search for data/ in the beging and not just a single quote but I can't figure out how. I tried \s(?!(?:[^'data\/]*'[^']*')*[^']*$) but it didn't work and I am not familiar enough with regex to make it do so.
An example of a full line from the database is:
(712, 'GRTE-P', '', 'data/green tea powder.jpg', '2014-03-12 22:52:03'),
I don't want to replace the spaces in the time and data stamp at the end of the line, just the image file names.
Thanks in advance for your help!
You have to use a \G based pattern to ensure that matches are contiguous.
search: (?:\G(?!^)|'data/)[^' ]*\K[ ]replace: _
The first match uses the second branch of the alternation, then the next matches are contiguous and use the first branch.

Regex to find specific character between two other characters

I've been trying to find a way to find a single comma between inverted commas without much luck. Example: "text , text " - how do I isolate the "," between the inverted commas line by line in a flat file?
My attempt .["].[,].["].
Thanks in advance
this regex will work
(?<=truck).*(?=car)
finds e.g. "plane" in the string
truckplanecar
so for test,test the regex would be
(?<=test).*(?=test)
PS. can you please provide an more detailed example what you would like to do
Try using 2 group at the start and end of the string, the following regex should work:
(".*),(.*")
it does match the example you've shared:
"text , text "
Furthermore, using groups, you can isolate the string before the comma and afterwards, in case you'll be needed it.

Regex to remove commas between quotes with comma right before end quote Notepad++

In Notepad++, I am using Regex to replace commas between quotes in CSV file.
Using similar example from here.This is what I am trying to read.
1070,17,2,GN3-670,"COLLAR B, M STAY,","2,606.45"
except in my text there is an extra comma right before the closing quotes.
The regex ("[^",]+),([^"]+") does not seem to pick up the last comma and result is
1070,17,2,GN3-670,"COLLAR B M STAY,","2606.45"
I would like
1070,17,2,GN3-670,"COLLAR B M STAY","2606.45"
Is there a simple Regex or will I have to use csv reader C#?
Edit: Some of the Regex is giving false matches so I would like to add another scenario. If I have
1070,17,2,GN3-670,"COLLAR B, M STAY,",55, FREE,"2,606.45"
I would like
1070,17,2,GN3-670,"COLLAR B M STAY",55, FREE,"2606.45"
I think this is what you're looking for:
,(?=[^"]*"(?:[^"]*"[^"]*")*[^"]*$)
This matches any comma that's followed by an odd number of quotes. It consumes only the comma, so you replace it with nothing.
The thing about your original solution is that it would only match one comma per quoted field. It never even tried to match the second comma in "COLLAR B, M STAY,", so its position didn't really matter. This solution removes any number of commas, regardless of their position within the field.
UPDATE: This regex assumes you're processing one line at a time. If you're using it on a whole document containing many lines, the regex is probably timing out. You can work around that by excluding line terminators (carriage returns and linefeeds), like this:
,(?=[^"\r\n]*"(?:[^"\r\n]*"[^"\r\n]*")*[^"\r\n]*$)
Note that the CSV spec (such as it is) says you can have line terminators in quoted fields, so this regex is technically incorrect. If you do need to support multiline fields, you might as well switch to the CSV library. Regexes are not quite capable of handling CSV fully, but in most cases they're good enough.
You can use the following to match:
((["])(?:(?=(\\?))\3.)*?),\2
And replace with the following:
\1"
See DEMO
This should work
Find What ("[^"]*),"
Replace With \1"

PCRE regex replace a text pattern within double quotes

In Notepad++ 6.5.1 I need to replace certain patterns within quote pairs. I want to save the replace as part of a macro, so all replacements need to happen in one step.
For example, in the following string, replace all 'a' characters within quote pairs with a dash, while leaving characters outside the quote pairs untouched:
Input: aa"bbabaavv"kdjhas"bbabaavv"x
Desired result: aa"bb-b--vv"kdjhas"bb-b--vv"x
Note that the quotes are matched up pairwise, such that the 'a' in kdjhas is untouched.
So far I have tried searching for (?:"[^"a]*|\G)\Ka([^"a]*) and replacing with -$1, but that simply replaces all the a's, with the result --"bb-b--vv"kdjh-s"bb-b--vv"x. I'm attempting PCRE regex that will let me recursively replace the quote-delimited text.
Edit: Quote marks within a quoted string are escaped with an extra quote, e.g. "". However, assume I will have already replaced these in a previous pass with a special character. Therefore a regex solution to this problem will not have to deal with escaped quotes.
It is hard to tell if this is possible as you've only provided one line of input text.
But assuming that input follows this pattern:
BOL|any text|string with two groups of a's|any text|string with two groups of a's|any text|EOL
aa "bbabaavv" kdjhas "bbabaavv" x
I was able to create this regexp search string:
^(.+?\".+?)([a]+)(.+?)([a]+)(.*?\")(.+?\".+?)([a]+)(.+?)([a]+)(.*?\".*)$
With this replace string:
\1-\3-\5\6-\8-\A
and it turn your input string from this:
aa"bbabaavv"kdjhas"bbabaavv"x
into this:
aa"bb-b-vv"kdjhas"bb-b-vv"x
Now naturally the search an replace will fail if the input varies from that pattern described as the search is looking for those four groups of a's inside the two groups of quoted strings.
Also I tested that regexp using Zeus which can create a regexp with more than 9 groups.
As you can see the regexp requires 10 groups.
I'm not familar with Notpad++ so I don't know if it supports that many groups.
If your data have variable number of occurrences of quoted strings, then it is not possible to perform replacements only via regex at least in its form offered by Notepad++.
To replace using regex, you would need to perform regex find in existing regex match. As far as I know such a functionality is not available in Notepad++ regexes.
Self-answer
I may have been reaching for the stars in trying to get Notepad++ to do this regex replace, but I think I found a workaround.
The actual task I was attempting involved creating a SQL Server VALUES list from an Excel spreadsheet, where I was copying and pasting selected cells into Notepad++. The delimiters are \t and \r\n. But, cells can have linefeeds too, which are delimited by ". So, I was going to replace these linefeeds with <br> (or something like it), so that
"line1
line2"
would become "line1<br>line2", before processing the actual end-of-row line feeds.
Having such parsing work reliably, especially when more than two lines were in a single cell, may have been too much to ask of Notepad++'s regex capability.
So I came up with a workaround that seems to be working:) Basically it starts with selecting a blank "dummy" column to the right of my column selection (which I can insert if I'm partially selecting from the middle). This will leave a trailing \t at the end of each row, which effectively sets these EOL's apart from ones that might exist with a text cell, freeing me from having to parse line feeds from a "..." field.
So I compiled a macro from the following steps, which seems to be working well:
replace ' with ''
replace \t\r\n with '\)\r\n, \('
replace \t with ', '
replace "" with ''
replace " with <blank>
replace ^ with \(' (cleanup - first row only)
replace ^, \('$ with <blank> (cleanup - last row only)
Example transformation:
from
line1 line 2
"line3
line3b
line3c" line 4
to
('line1', 'line 2')
, ('line3
line3b
line3c', 'line 4')
which can now be easily modified into a SELECT statement:
SELECT *
FROM (VALUES('line1', 'line 2')
, ('line3
line3b
line3c', 'line 4')
) t(a,b)

Notepad++ regex replace - replace all commas with \, within quotations

I am trying to import a csv file into mysql, and I need to convert it into a proper format before importing.
If there's a comma in a column, the csv encloses it within double quotations, here's an example of a row without a comma, and a row with a comma:
1,Superman
2,"Batman,Flash"
What I need to do is to convert all columns which have commas to escape the comma and remove the quotations... such as "Batman,Flash" to Batman\,Flash
Here's what I have so far
Find: "(.*),(.*)"
Replace: \1\\,\2
However, there are two cases in which this does not work:
It will only replace one comma if there's more than one comma withing a quoted column. So something like "Batman,Flash,Robin" will be converted to Batman,Flash\,Robin
This doesn't work if the first column has a comma as well. For example, on a row such as "1,2,3","Batman,Robin"
How can I change the regexes to accommodate the two cases that don't yet work?
I'm sorry, but regex is not the tool for this. You must parse it.
Why?
Do you want to convert this?
"test\, w00t!"
Or what about this?
"test\\\\\, w00t!"
Heck, even this?
"tes\\","\"ing\,\\,"