Find commas in pattern - regex

I have file with rows like this:
"B4P(6-3,5)-VH(LF)(SN)",JST,2018+,34000,SMD
893D226X0016C8W,VISHAY,2018+,"30,000",SMD
BL-BUF1V4V-AT-L,FOXLINK,2018+,1890,CONN
"TLP721F(D4-GR,M,F)",NSC,2001+,114,AUCDIP-16
How can i find all commas inside quotes? For example, i need to find this:
"B4P(6-3 >>,<< 5)-VH(LF)(SN)",JST,2018+,34000,SMD
893D226X0016C8W,VISHAY,2018+,"30 >>,<< 000",SMD
BL-BUF1V4V-AT-L,FOXLINK,2018+,1890,CONN
"TLP721F(D4-GR >>,<< M >>,<< F)",NSC,2001+,114,AUCDIP-16
Now I can only find text in quotes, tell me how to select only commas from it, using one regular expression?
("(?:\[??[^\[]*?"))
Regex101 - online regex editor and debugger

Here is a simplistic solution that works with your example:
It match only quoted strings having one or more , inside.
grep '"[^,]*,[^"]*"'
Hope it works for you.
Explanation
"[^,]* match " and following non , chars
, match the first , char
[^"]*" match following non " till find the next"

Related

Remove all text except a certain string in Notepad++

I'm extracting case numbers from a wall of text. How do I filter out all the useless text using the replace function in Notepad++ with the help of RegEx? The parts I want to keep are made up of letters, digits, and a hyphen (SPP-1803-2045227).
I would like to turn this...
(SPP-1803-2045227)Useless text goes here. 2019-05-18 *
(SPP-1915-1802667)More useless text. 2019-01-14 *
(SPP-1904-1012523)And some more. 2019-02-03 *
...into this:
SPP-1803-2045227
SPP-1915-1802667
SPP-1904-1012523
I've been playing around with RegEx and also found something in another thread on here before, which wasn't the solution but came very close. Unfortunately I can't find it anymore. It looked something like this:
^(?!S\w+).*\r?\n?
Any help is appreciated.
you could try something like this.
find: .*\((\w{3}-\d{4}-\d{7})\).*
replace with: \1
The above Regular Expression matches the whole line with your letters and digits between an extra pair of parentheses.
When you replace with \1 you keep only the match between parentheses.
Remember to select Regular Expression Search mode.

Regex to find specific character between two other characters

I've been trying to find a way to find a single comma between inverted commas without much luck. Example: "text , text " - how do I isolate the "," between the inverted commas line by line in a flat file?
My attempt .["].[,].["].
Thanks in advance
this regex will work
(?<=truck).*(?=car)
finds e.g. "plane" in the string
truckplanecar
so for test,test the regex would be
(?<=test).*(?=test)
PS. can you please provide an more detailed example what you would like to do
Try using 2 group at the start and end of the string, the following regex should work:
(".*),(.*")
it does match the example you've shared:
"text , text "
Furthermore, using groups, you can isolate the string before the comma and afterwards, in case you'll be needed it.

Regex find & replace data cleaning

I have a csv I would like to clean up in Notepad++ with regex and the find and replace tool.
I want to do something like: find ^"(\d+).* and replace with $1 so that
"25110716
"
and
"27155790
AirBnB-16261519-PBH2ED"
end up 25110716 and 27155790. These are the first entry in every row.
Right now using find ^"(\d+).* and replace with $1 in NPP finds the first entry in every row and returns it the same but missing the first quotation mark. I would like everything that isn't the first numbers removed, i.e. all quotation marks, and the linebreak & everything on the following line.
You could accomplish this in the following way ...
Find what : (?m)^"(\d+)\n.*?(?=,|$)
Replace with : $1
see regex demo / explanation

NotePad++ Currency RegEx with Optional Replace

My Search pattern: \"(\$)(\d{0,3}?)\,?(\d{1,3}?)\,?(\d{0,3})\s?\"
Matches all of these:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"
I know I don't really need to search for under the thousands place, but am including those for possible future application.
My problem: I need to replace all of the commas with HTML escape char ,, but only if there is a comma present in the search result.
This replace pattern $1$2,$3,$4 gives the incorrect result, and I'm just not seeing the right pattern to use for my replacement.
$,1,
$,1,0
$,1,00
$,1,000
$,10,000
$,100,000
$1,000,000
$10,000,000
$100,000,000
This is the result I am attempting to get:
$1
$10
$100
$1,000
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
No Quotes and no extra space after the last digit.
I'm not married to having to find the 1's through 100's, but it is preferable.
Any ideas on how to do optional replace in NotePad++?
Use a regex Search and Replace: Replace (\d),(\d) with \1,\2. Check regular expression, click Replace or Replace all.
For some unknown reason, the RE of Sebastian from the comments above did not work with notepad++ 6.8.6 (find worked fine, but not replace). So instead of using look around, we capture the surrounding digits into \1 and \2 for reuse in the replacement.
Try following regex:
(?<=\d),(?=\d)
After running test on your dataset, I got result as:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"

Matching all occurrences of a html element attribute in notepad++ regex

I have a file which has hundreds of links like this:
<h3>aspnet</h3>
Ex 1
Ex 2
Ex 3
So I want to remove all the elements
icon="data:image/png;base64,ivborw0kggoaaaansuheugaaabaaaaaqcayaaaaf8..."
from all the lines. I went through the official Notepad++ regex wiki and have come up with this after several trials:
icon=\"[^\.]+\"
The problem with this is, it is selecting past the second double quote and stopping at the next occurring double quote. To illustrate, this will select the following content:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">EX 1</a> <a href="
If I modify the above regex to,
icon=\"[^\.]+\">
Then it is almost perfect, but it is also selecting the >:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">
The regex I am looking for would select like this:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt..."
I also tried the following, but it doesn't match anything at all
icon=\"[^\.]+\"$
Just match anything but a quote, followed by a quote:
icon="[^"]+"
Just tested with notepad++ 6.2.2 and confirmed that this matches correctly as written.
Broken down:
icon="
This is fairly obvious, match the literal text icon=".
[^"]+
This means to match any character that is not a ". Adding the + after it means "one or more times."
Finally we match another literal ".
I am not a notepad++ user. so don't know how notepad++ plays with regex, but can you try to replace
icon=\"[^>]* to (empty string) ?
Try this solution:
This is I just check was working as you wanted it.
The way achieving your goal:
Find what: (icon.*")|.*?
Replace with: $1