How to replace only part of found text? - regex

I have a file with a some comma separated names and some comma separated account numbers.
Names will always be something like Dow, John and numbers like 012394,19862.
Using Notepad++'s "Regex Find" feature, I'd like to replace commas between numbers with pipes |.
Basically :
turn: Dow,John into: Dow,John
12345,09876 12345|09876
13568,08642 13568|08642
I've been using [0-9], to find the commas, but I can't get it to properly leave the number's last digit and replace just the comma.
Any ideas?

Search for ([0-9]), and replace it with \1|. Does that work?

use this regex
(\d),(\d)
and replace it with
$1|$2
OR
\1|\2

(?<=\d), should work. Oddly enough, this only works if I use replace all, but not if I use replace single. As an alternative, you can use (\d), and replace with $1|

General thoughts about replacing only part of a match
In order to replace a part of a match, you need to either 1) use capturing groups in the regex pattern and backreferences to the kept group values in the replacement pattern, or 2) lookarounds, or 3) a \K operator to discard left-hand context.
So, if you have a string like a = 10, and you want to replace the number after a = with, say, 500, you can
find (a =)\d+ and replace with \1500 / ${1}500 (if you use $n backreference syntax and it is followed with a digit, you should wrap it with braces)
find (?<=a =)\d+ and replace with 500 (since (?<=...) is a non-consuming positive lookbehind pattern and the text it matches is not added to the match value, and hence is not replaced)
find a =\K\d+ and replace with 500 (where \K makes the regex engine "forget" the text is has matched up to the \K position, making it similar to the lookbehind solution, but allowing any quantifiers, e.g. a\h*=\K\d+ will match a = even if there are any zero or more horizontal whitespaces between a and =).
Current problem solution
In order to replace any comma in between two digits, you should use lookarounds:
Find What: (?<=\d),(?=\d)
Replace With: |
Details:
(?<=\d) - a positive lookbehind that requires a digit immediately to the left of the current location
, - a comma
(?=\d) - a positive lookahead that requires a digit immediately to the right of the current location.
See the demo screenshot with settings:
See the regex demo.
Variations:
Find What: (\d),(?=\d)
Replace With: \1|
Find What: \d\K,(?=\d)
Replace With: |
Note: if there are comma-separated single digits, e.g. 1,2,3,4 you can't use (\d),(\d) since this will only match odd occurrences (see what I mean).

Related

RegEx for adding a zero between a dash and number [duplicate]

This question already has answers here:
Replacing digits immediately after a saved pattern
(2 answers)
Closed 3 years ago.
I want to find a way to add a leading zero "0" in front of numbers but BBEdit thinks it's substitute #10 Example:
Original string: Video 2-1: Title Goes Here
Desired result: Video 2-01: Title Goes Here
My find regex is: (-)(\d:)
My replace regex is: \10\2. The first substitute is NOT 10. I simply intend to replace first postion, then add a "0", then replace second position.
Kindly tell me how to tell BBEdit that I want to add a zero and that I don't mean 10th position.
If you simply need a number preceded by a dash, then I recommend using the regex lookbehind for this one.
Try this out:
(?<=-)(\d+:)
As seen here: regex101.com
It tells the regex that the match should be preceded by a dash -, and the - itself won't be matched!
You really don't need to capture hyphen in group1 (as it is a fixed string so no benefit capturing in group1 and replacing with \1) for replacement, instead just capture hyphen with digit using -(\d+:) and while replacing just use -0\1
Regex Demo
Also, there are other better ways to make the replacement where you don't need to deal with back references at all.
Another alternate solution is to use this look around based regex,
(?<=-)(?=\d+:)
and replace it with just 0 which will just insert a zero before the digit.
Regex Demo with lookaround
Another alternate solution when lookbehind is not supported (like in Javascript prior to EcmaScript2018), you can use a positive look ahead based solution. Basically match a hyphen - which is followed by digits and colon using this regex,
-(?=\d+:)
and replace it with -0
Regex Demo with only positive look ahead
Try \1\x30\2 as the replacement. \x30 is the hex escape for the 0 character, so the replacement is \1, then 0, then \2, and cannot be interpreted as \10 then 2. I don't know if BBEdit supports hex escapes in the replacement string though.
This expression might help you to do so, if Video 2- is a fixed input:
(Video 2-)(.+)
If you have other instances, you can add left boundary to this expression, maybe something similar to this:
([A-Za-z]+\s[0-9]+-)(.+)
Then, you can simply replace it with a leading zero after capturing group $1:
Graph
This graph shows how the expression would work:
If you wish, you can add additional boundaries to the expression.
Replacement
For replacing, you can simply use \U0030 or \x30 instead of zero, whichever your program might support, in between $1 and $2.

Notepad++ Replace regex match for same text plus appending character

I have a file with text and numbers with a length of five (i.e. 12000, 11153, etc.). I want to append all of these numbers with a 0. So 11153 becomes 111530. Is this possible in Notepad++?
I know I can find all numbers with the following regex: [0-9]{5}, but how can I replace these with the same number, plus an appending 0?
In the replacement box I tried the following things:
[0-9]{5}0 - Which it took literally, so 11153 was replaced with [0-9]{5}0
\10 - I read somewhere that \1 would take the match, but it doesn't seem to work. This will replace 11153 with 0
EDIT: \00 - Based on this SO answer I see I need to use \0 instead of \1. It still doesn't work though. This will replace 11153 with
So, I've got the feeling I'm close with the \1 or \0, but not close enough.
You are very near to the answer! What you missed is a capturing group.
Use this regex in "Find what" section:
([0-9]{5})
In "Replace with", use this:
\10
The ( and ) represent a capturing group. This essentially means that you capture your number, and then replace it with the same followed by a zero.
You are very close. You need to add a capturing group to your regex by surrounding it with brackets. ([0-9]{5})
Then use \10 as the replacement. This is replacing the match with the text from group 1 followed by a zero.
You can use \K to reset.
\b\d{5}\b\K
And replace with 0
\b matches a word boundary
\d is a short for digit [0-9]
See demo at regex101

Removing comma between numbers in CSV using regex in Sublime

I'm very new to regex. Pardon me for silly questions.
I was wondering if it was possible to use regex pattern matcher to replace commas in between numbers such as, $3,542 with $3542 in Sublime Editor.
I tried to use [0-9],[0-9][0-9][0-9] to detect all such occurrences but don't know why I can't retain just numbers :/
Puzzled me!
You may use capturing groups to retain digits:
(\$\d+),(\d+)
and replace with $1$2. You may remove \$ if you do not care if it is a currency or not.
The (\$\d+),(\d+) regex matches:
(\$\d+) - Group 1 matching $ as a literal symbol followed with 1 or more digits
, - a literal comma
(\d+) - Group 2 matching 1 or more digits
The $1 and $2 are backreferences that retrieve the texts stored in the memoru buffers for both groups.
/
Note that there are other ways to do the same, you can use lookarounds or a regex with \K, or using both, but capturing seems to me the most efficient solution for this case.
Ctrl + H, select "regular expression" (Alt + R) and replace:
\$\d+\K,(?=\d)
with nothing.
Explanation:
\$\d+\K will match dollar sign followed by one or more digit (we use the \K - the short form of the positive lookbehind to do a zero-width assertion). The next token "," matches a comma and finally we use a positive lookahead to match digits.

Regex: Find multiple matching strings in all lines

I'm trying to match multiple strings in a single line using regex in Sublime Text 3.
I want to match all values and replace them with null.
Part of the string that I'm matching against:
"userName":"MyName","hiScore":50,"stuntPoints":192,"coins":200,"specialUser":false
List of strings that it should match:
"MyName"
50
192
200
false
Result after replacing:
"userName":null,"hiScore":null,"stuntPoints":null,"coins":null,"specialUser":null
Is there a way to do this without using sed or any other substitution method, but just by matching the wanted pattern in regex?
You can use this find pattern:
:(.*?)(,|$)
And this replace pattern:
:null\2
The first group will match any symbol (dot) zero or more times (asterisk) with this last quantifier lazy (question mark), this last part means that it will match as little as possible. The second group will match either a comma or the end of the string. In the replace pattern, I substitute the first group with null (as desired) and I leave the symbol matched by the second group unchanged.
Here is an alternative on amaurs answer where it doesn't put the comma in after the last substitution:
:\K(.*?)(?=,|$)
And this replacement pattern:
null
This works like amaurs but starts matching after the colon is found (using the \K to reset the match starting point) and matches until a comma of new line (using a positive look ahead).
I have tested and this works in Sublime Text 2 (so should work in Sublime Text 3)
Another slightly better alternative to this is:
(?<=:).+?(?=,|$)
which uses a positive lookbehind instead of resetting the regex starting point
Another good alternative (so far the most efficient here):
:\K[^,]*
This may help.
Find: (?<=:)[^,]*
Replace: null

Non-brute force regex to remove commas numbers in CSV list

The main thing I am trying to do here is learn regex so that I have a better understanding of it. What I am trying to do is a find and replace using regex to remove only the commas that are within the numbers.
I can do this using multiple find/replace patterns, and I can also do this using a brute force method of matching a large number and ignoring commas, however I am wondering if there is some way to place the numbers and comma into a capture group but ignore the commas from output.
Here is an example of a list of numbers:
"7,033.00","0.00","7,033.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00",1,1,1,!!$,,"123,123,123.00","123,444,38.01"
So my 'brute-force' method is the following:
\"([0-9]+)[,]?([0-9]*)[,]?([0-9]*)[,]?([0-9]*[.]+[0-9]+)\"
This would account for any number up to 999,999,999,999.00. It contains the four capture groups $1$2$3$4 and will output any number I would expect in the format that I want.
Example of wanted output using a replace of $1$2$3$4:
7033.00,0.00,7033.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,1,1,1,!!$,,123123123.00,12344438.01
What I would like to do is something like this (pseudo code):
[\"]([0-9]+)([(?:,)[0-9]*][.]+[0-9]+)[\"]
The idea behind this is:
Match the first quotation mark but ignore it
Match a group of numbers and place in capture group $1
Match either a number or comma followed by a period and one or more numbers and store in a capture group, but leave the commas out of the capture group.
Match the last quotation mark but ignore it
I've been reading and reading but can't seem to find a way to ignore part of a capture group the way I want to do it. Any suggestions or can it not be done?
A two step method would be to match the commas first then remove the quotes, which might work too:
(,)(?=([0-9]{2,3}[.,]))
Well, regexr uses ECMAScript regex, so you might use something like
"|([0-9]),(?=[0-9])(?=(?:[^"]*"[^"]*")*[^"]*"[^"]*$)
And replace with $1.
regexr demo
Otherwise, with PCRE, you might use something like:
"|(?<=[0-9]),(?=[0-9])(?=(?:[^"]*"[^"]*")*[^"]*"[^"]*$)
And replace with nothing, where it makes use of lookarounds to make sure that the comma in question is surrounded by [0-9] (ECMAScript doesn't support lookbehinds currently).
regex101 demo
" matches a literal quote character.
| means OR, so the regex matches a " or a ([0-9]),(?=[0-9]) (or (?<=[0-9]),(?=[0-9]))
([0-9]) is a capture group to get one digit.
, matches a literal comma.
(?=[0-9]) is a positive lookahead and ensures that the comma is followed by a digit, without matching the digit itself.
(?<=[0-9]) is a positive lookbehind and ensures that the comma is preceded by a digit, again without matching the digit itself.
(?=(?:[^"]*"[^"]*")*[^"]*"[^"]*$) ensures that there are an odd number of quotes ahead, and this in turn means that this will match a comma only within quotes, assuming that there are no unbalanced or escaped quotes.
In two steps:
First remove all commas within quotes (i.e. commas that are followed by an odd number of quotes. This even works with escaped quotes since in CSV files, quotes are escaped by doubling):
>>> import re
>>> s = '"7,033.00","0.00","7,033.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00",1,1,1,!!$,,"123,123,123.00","123,444,38.01"'
>>> s = re.sub(r',(?!(?:[^"]*"[^"]*")*[^"]*$)', '', s)
>>> s
'"7033.00","0.00","7033.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00",1,1,1,!!$,,"123123123.00","12344438.01"'
Then remove all the quotes:
>>> s.replace('"', '')
'7033.00,0.00,7033.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,1,1,1,!!$,,123123123.00,12344438.01'