Replace commas between at's on notepad++ - regex

I have a CSV with data to import, the separator character is the comma here; but when the row or line has two e-mails, a comma separates them so the import fails at that point.
So I thought removing the commas between two at's when they're on the same line, but I don't know how.
If you have an alternative solucion, it'll be welcome too!!
Thanks.
Example:
ENTERPRISE1 S.L.,,ENTERPRISE1,999461678,,,,,,ent1#mail.com, ent1alternate#mail2.com,Spain,,,
ENTERPRISE2 S.A.,,ENTERPRISE2.,999859177,,,,,,ent2#mail.com,Italy,,,

Given your data doesn't use any escaping and the #-char will only be present in the mail column, you could use ((?:#|\G(?!^))[^,]+),([^,#]+#) as a search pattern and $1$2 for replace. This will also handle more than two mails in the column correctly. of course you can place a separator of choice between $1 and $2, like $1;$2
You can see it in action here.

You can do it with notepad:
search field:
([^#]+#[^,]+)\s*,\s*([^#]+#[^,]+)
Replace field:
\1|\2
Check regular expression checkbox
So
ent1#mail.com, ent1alternate#mail2.com
will be:
ent1#mail.com|ent1alternate#mail2.com
This will let you keep your column organization, and allow to process data and avoid any lost

Another option is to use to use the correct CSV formatting: double quotes around any field that contains the delimiter.
([^,]+#[^,]+,[^,]+#[^,]+)
Replace:
"\1"
(Regex adapted from destrif's answer).

Looks like in your example you always have a empty space after the comma separating multiple email addresses.If that's a generic rule, you should replace the ", " (comma + empty space) string by another separator like semicolon, using the ctrl+h to call the replace function.

Related

Regex Find Spaces between single qoutes and replace with underscore

I have a database table that I have exported. I need to replace the image file name with a space and would like to use notepad++ and regex to do so. I have:
'data/green tea powder.jpg'
'data/prod_img/lumina herbal shampoo.JPG'
'data/ALL GREEN HERBS.jpeg'
'data/prod_img/PSORIASIS KIT (640x530) (2).jpg'
and need to make them look like this:
'data/green_tea_powder.jpg'
'data/prod_img/lumina_herbal_shampoo.JPG'
'data/ALL_GREEN_HERBS.jpeg'
'data/prod_img/PSORIASIS_KIT_(640x530)_(2).jpg'
I just want to change the spaces between the quotes (I don't want to change the capitalization). To be more specific I would like to replace any and all spaces between 'data/ and ' because there are other spaces between quotes in the DB, for example:
'data/ REPLACE ANY SPACE HERE '
I found this:
\s(?!(?:[^']*'[^']*')*[^']*$)
but there are other places where there are spaces between quotes so I'd like to search for data/ in the beging and not just a single quote but I can't figure out how. I tried \s(?!(?:[^'data\/]*'[^']*')*[^']*$) but it didn't work and I am not familiar enough with regex to make it do so.
An example of a full line from the database is:
(712, 'GRTE-P', '', 'data/green tea powder.jpg', '2014-03-12 22:52:03'),
I don't want to replace the spaces in the time and data stamp at the end of the line, just the image file names.
Thanks in advance for your help!
You have to use a \G based pattern to ensure that matches are contiguous.
search: (?:\G(?!^)|'data/)[^' ]*\K[ ]replace: _
The first match uses the second branch of the alternation, then the next matches are contiguous and use the first branch.

Notepad++: Find Using Regular Expression and Replacing with Extra Comma

I have a comma delimited file that I want to add another comma after the ID: number, but before the street address, such as:
Adam,ID:1,200,N,Sway,Rd.,Hometown,IN,46111,Website:,
Allison,ID:2,201,N,Sway,Rd.,Hometown,IN,46111,Website:,
Bob,ID:31,202,N,Sway,Rd.,Hometown,IN,46111,Website:,
Carl,ID:49,203,N,Sway,Rd.,Hometown,IN,46111,Website:,
I am using the below, to find the comma delimiter before the address, in the Replace window "Find what:" field.
,ID:[0-9]{1,2},
I am failing to understand what regular expression to use in the Replace window "Replace with:" field, so that I can achieve the below output for the comma delimited file.
Adam,ID:1,,200,N,Sway,Rd.,,Hometown,IN,46111,Website:,
Allison,ID:2,,201,N,Sway,Rd.,,Hometown,IN,46111,Website:,
Bob,ID:31,,202,N,Sway,Rd.,,Hometown,IN,46111,Website:,
Carl,ID:49,,203,N,Sway,Rd.,,Hometown,IN,46111,Website:,
The end output is to eventually remove all of the delimiters from the street address by using the double comma delimiters as the search context begin and end markers.
There is no need adding anything to your regex.
To access the whole match in the replacement string, you may use one of the following value:
$&
$MATCH
${^MATCH}
$0
${0}
Add a , after one of these, and use in the Replace With field.
See Notepad++: Substitutions.

PCRE regex replace a text pattern within double quotes

In Notepad++ 6.5.1 I need to replace certain patterns within quote pairs. I want to save the replace as part of a macro, so all replacements need to happen in one step.
For example, in the following string, replace all 'a' characters within quote pairs with a dash, while leaving characters outside the quote pairs untouched:
Input: aa"bbabaavv"kdjhas"bbabaavv"x
Desired result: aa"bb-b--vv"kdjhas"bb-b--vv"x
Note that the quotes are matched up pairwise, such that the 'a' in kdjhas is untouched.
So far I have tried searching for (?:"[^"a]*|\G)\Ka([^"a]*) and replacing with -$1, but that simply replaces all the a's, with the result --"bb-b--vv"kdjh-s"bb-b--vv"x. I'm attempting PCRE regex that will let me recursively replace the quote-delimited text.
Edit: Quote marks within a quoted string are escaped with an extra quote, e.g. "". However, assume I will have already replaced these in a previous pass with a special character. Therefore a regex solution to this problem will not have to deal with escaped quotes.
It is hard to tell if this is possible as you've only provided one line of input text.
But assuming that input follows this pattern:
BOL|any text|string with two groups of a's|any text|string with two groups of a's|any text|EOL
aa "bbabaavv" kdjhas "bbabaavv" x
I was able to create this regexp search string:
^(.+?\".+?)([a]+)(.+?)([a]+)(.*?\")(.+?\".+?)([a]+)(.+?)([a]+)(.*?\".*)$
With this replace string:
\1-\3-\5\6-\8-\A
and it turn your input string from this:
aa"bbabaavv"kdjhas"bbabaavv"x
into this:
aa"bb-b-vv"kdjhas"bb-b-vv"x
Now naturally the search an replace will fail if the input varies from that pattern described as the search is looking for those four groups of a's inside the two groups of quoted strings.
Also I tested that regexp using Zeus which can create a regexp with more than 9 groups.
As you can see the regexp requires 10 groups.
I'm not familar with Notpad++ so I don't know if it supports that many groups.
If your data have variable number of occurrences of quoted strings, then it is not possible to perform replacements only via regex at least in its form offered by Notepad++.
To replace using regex, you would need to perform regex find in existing regex match. As far as I know such a functionality is not available in Notepad++ regexes.
Self-answer
I may have been reaching for the stars in trying to get Notepad++ to do this regex replace, but I think I found a workaround.
The actual task I was attempting involved creating a SQL Server VALUES list from an Excel spreadsheet, where I was copying and pasting selected cells into Notepad++. The delimiters are \t and \r\n. But, cells can have linefeeds too, which are delimited by ". So, I was going to replace these linefeeds with <br> (or something like it), so that
"line1
line2"
would become "line1<br>line2", before processing the actual end-of-row line feeds.
Having such parsing work reliably, especially when more than two lines were in a single cell, may have been too much to ask of Notepad++'s regex capability.
So I came up with a workaround that seems to be working:) Basically it starts with selecting a blank "dummy" column to the right of my column selection (which I can insert if I'm partially selecting from the middle). This will leave a trailing \t at the end of each row, which effectively sets these EOL's apart from ones that might exist with a text cell, freeing me from having to parse line feeds from a "..." field.
So I compiled a macro from the following steps, which seems to be working well:
replace ' with ''
replace \t\r\n with '\)\r\n, \('
replace \t with ', '
replace "" with ''
replace " with <blank>
replace ^ with \(' (cleanup - first row only)
replace ^, \('$ with <blank> (cleanup - last row only)
Example transformation:
from
line1 line 2
"line3
line3b
line3c" line 4
to
('line1', 'line 2')
, ('line3
line3b
line3c', 'line 4')
which can now be easily modified into a SELECT statement:
SELECT *
FROM (VALUES('line1', 'line 2')
, ('line3
line3b
line3c', 'line 4')
) t(a,b)

Replace a comma in text values in CSV using regex in Notepad++

I searched a lot but couldn't find any exact soluion.
I have a CSV which contains some values that contains a comma in between the values.
Following is a sample row
"BEIAAGJIPAMBPJIF",2757,08042010,"13:53.59",09042010,"01:55.39","SIHAM","BEIAIGHEIPLGPJIF",20,"A",20,"S",0.00,0.00,0.00,"OLY
SPECIAL ORDER","IN STOCK , DESIGNER",0.00000,0,"","N","N",
Now it you look at the value "IN STOCK , DESIGNER", it containts a comma in between. due to which while reading the csv in my .net application and in MS Dynamics CRM import file wizard, it breaks it into two seprate values instead of one single value.
I need a regex that can match such strings and replace the comma with a hyphen "-" that I can use in Notepad ++.
Kindly help.
Thanks.
This solution worked for me, although it is a bit indirect:
by searching, detect character which is unused in the file, e.g. #
use the following regex replace to replace all delimiters: find: (".*?"|.*?), replace: \1# (note the character from step 1)
now, all leftover commas are only those which are inside the quotes. Mass replace them for -
replace back all #'s for commas

Notepad++ regex replace - replace all commas with \, within quotations

I am trying to import a csv file into mysql, and I need to convert it into a proper format before importing.
If there's a comma in a column, the csv encloses it within double quotations, here's an example of a row without a comma, and a row with a comma:
1,Superman
2,"Batman,Flash"
What I need to do is to convert all columns which have commas to escape the comma and remove the quotations... such as "Batman,Flash" to Batman\,Flash
Here's what I have so far
Find: "(.*),(.*)"
Replace: \1\\,\2
However, there are two cases in which this does not work:
It will only replace one comma if there's more than one comma withing a quoted column. So something like "Batman,Flash,Robin" will be converted to Batman,Flash\,Robin
This doesn't work if the first column has a comma as well. For example, on a row such as "1,2,3","Batman,Robin"
How can I change the regexes to accommodate the two cases that don't yet work?
I'm sorry, but regex is not the tool for this. You must parse it.
Why?
Do you want to convert this?
"test\, w00t!"
Or what about this?
"test\\\\\, w00t!"
Heck, even this?
"tes\\","\"ing\,\\,"