NotePad++ Currency RegEx with Optional Replace - regex

My Search pattern: \"(\$)(\d{0,3}?)\,?(\d{1,3}?)\,?(\d{0,3})\s?\"
Matches all of these:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"
I know I don't really need to search for under the thousands place, but am including those for possible future application.
My problem: I need to replace all of the commas with HTML escape char ,, but only if there is a comma present in the search result.
This replace pattern $1$2,$3,$4 gives the incorrect result, and I'm just not seeing the right pattern to use for my replacement.
$,1,
$,1,0
$,1,00
$,1,000
$,10,000
$,100,000
$1,000,000
$10,000,000
$100,000,000
This is the result I am attempting to get:
$1
$10
$100
$1,000
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
No Quotes and no extra space after the last digit.
I'm not married to having to find the 1's through 100's, but it is preferable.
Any ideas on how to do optional replace in NotePad++?

Use a regex Search and Replace: Replace (\d),(\d) with \1,\2. Check regular expression, click Replace or Replace all.
For some unknown reason, the RE of Sebastian from the comments above did not work with notepad++ 6.8.6 (find worked fine, but not replace). So instead of using look around, we capture the surrounding digits into \1 and \2 for reuse in the replacement.

Try following regex:
(?<=\d),(?=\d)
After running test on your dataset, I got result as:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"

Related

Find commas in pattern

I have file with rows like this:
"B4P(6-3,5)-VH(LF)(SN)",JST,2018+,34000,SMD
893D226X0016C8W,VISHAY,2018+,"30,000",SMD
BL-BUF1V4V-AT-L,FOXLINK,2018+,1890,CONN
"TLP721F(D4-GR,M,F)",NSC,2001+,114,AUCDIP-16
How can i find all commas inside quotes? For example, i need to find this:
"B4P(6-3 >>,<< 5)-VH(LF)(SN)",JST,2018+,34000,SMD
893D226X0016C8W,VISHAY,2018+,"30 >>,<< 000",SMD
BL-BUF1V4V-AT-L,FOXLINK,2018+,1890,CONN
"TLP721F(D4-GR >>,<< M >>,<< F)",NSC,2001+,114,AUCDIP-16
Now I can only find text in quotes, tell me how to select only commas from it, using one regular expression?
("(?:\[??[^\[]*?"))
Regex101 - online regex editor and debugger
Here is a simplistic solution that works with your example:
It match only quoted strings having one or more , inside.
grep '"[^,]*,[^"]*"'
Hope it works for you.
Explanation
"[^,]* match " and following non , chars
, match the first , char
[^"]*" match following non " till find the next"

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

Select last character of a substring in regexp

I'm trying to clean a huge geoJson datafile. I need to change the format of "text" field from
"text": "(2:Placename,Placename)"
to
"text": "Placename".
In Sublime text I managed to write a regular expression which enabled me to select and remove the first part leaving something like this:
"text": "Placename)"
With following regexp I can select the text above, but I need to narrow it down to the last character:
text\": \".*?\)
No matter what I can't figure out how to select the ")" character in the end of Placename string in the whole file and remove it. Note that the "Placename" here can be any place name, like New York, London etc.
I tried to build an expression where first part finds the text field, then ignores n-amount of characters until it finds the ")" character.
After experimenting and Googling I couldn't find a solution here.
You can capture the value of the second placemark field with the following regexp:
/"text": "+\(\d+:[^,]+,(.*?)\)/
Which will capture "Placename" in $1
More info on capturing parenthesis: http://www.regular-expressions.info/brackets.html
The trick is to use the inverted character classes and to escape any parentheses you want to match.
HTH
I do not know if you are using a Unix system, but probably sed can do much of the work for you. It can interpret regular expressions, capture groups, and substitute by other groups of characters. I have tried an example with sed and the following sed command worked for me:
echo "\"text\": \"(2:Placename,Placename)\"" | sed -r 's/(\"text\": )\"\([[:digit:]]:[^0-9]+,([^0-9]+)\)\"/\1\"\2\"/g'
-r allows sed to interpret regular expressions. I am using parentheses to capture groups that I will use later in the substitution (e.g., a group for "text", and a group for the second placename). In the substitution part of sed, you can use groups by using \n where n is the group number that you want to used. This expression should help you to achieve your desired result.

Matching all occurrences of a html element attribute in notepad++ regex

I have a file which has hundreds of links like this:
<h3>aspnet</h3>
Ex 1
Ex 2
Ex 3
So I want to remove all the elements
icon="data:image/png;base64,ivborw0kggoaaaansuheugaaabaaaaaqcayaaaaf8..."
from all the lines. I went through the official Notepad++ regex wiki and have come up with this after several trials:
icon=\"[^\.]+\"
The problem with this is, it is selecting past the second double quote and stopping at the next occurring double quote. To illustrate, this will select the following content:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">EX 1</a> <a href="
If I modify the above regex to,
icon=\"[^\.]+\">
Then it is almost perfect, but it is also selecting the >:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">
The regex I am looking for would select like this:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt..."
I also tried the following, but it doesn't match anything at all
icon=\"[^\.]+\"$
Just match anything but a quote, followed by a quote:
icon="[^"]+"
Just tested with notepad++ 6.2.2 and confirmed that this matches correctly as written.
Broken down:
icon="
This is fairly obvious, match the literal text icon=".
[^"]+
This means to match any character that is not a ". Adding the + after it means "one or more times."
Finally we match another literal ".
I am not a notepad++ user. so don't know how notepad++ plays with regex, but can you try to replace
icon=\"[^>]* to (empty string) ?
Try this solution:
This is I just check was working as you wanted it.
The way achieving your goal:
Find what: (icon.*")|.*?
Replace with: $1

How do I access matched objects for replacement when using regular expression mode in PL/SQL Developer Find & Replace?

I have a query where I want to replace
avg(j2)
with
avg(case when j2 <> 0 then j2 else 0 end)
The above is a specific example but the pattern is the same with all the replacements. It's always a word followed by a number that needs to be replaced with the case statement that checks if the number is not 0.
I tried the following for find:
avg(\(\w\d\))
and the find works. Now, I want to do a replace so I try:
avg(case when \1 <> 0 then \1 else 0 end)
but it puts literal \1 and not the captured text from the match. I tried \\1 & $1 as well and it takes all of them literally. Can anyone tell me what the right syntax is for using the captured text for replacement? Is this supported?
Thanks,
Ashish
I am not sure if the PL/SQL Developer IDE supports group capture. The recent versions do seem to support regex based find and replace though. Cant find a source to confirm if group capture works.
Why dont you try pasting the code in a something like Notepad++ and try the same regex. It should work. You could paste the result back to your IDE and continue from there...
You can replace it using $ and number like,
$0 or $1 etc. see an example below
find: TABLE (.*\..*) IS
replace: COLUMN $1 IS
http://regexr.com/3gm6c