Regular expression challenge to match same numbers separately - regex

I am struggling with a nice challenge to match two same numbers separately, with a regex.
See here the list I am trying to match separately.
1,680,000,0001,680,000,000
3,350,0003,350,000
110,000110,000
11,100,00011,100,000
550,000550,000
1,0001,000
250250
49,50049,500
165,000165,000
49,50049,500
3,350,0003,350,000
165,000165,000
550,000550,000
550,000550,000
33,10033,100
18,10018,100
450,000450,000
Take for example 550,000550,000, that's twice 550,000 or 250250 that's twice 250. I want to match for example 550,000 and 250.
I have tested many regular expressions in RegexBuddy, but no one does what I want. Maybe you have a suggestion?

If I understand your requirements correctly, then
^(.+)\1$
should work. You can restrict the possible matches to only allow digits and commas like this:
^([\d,]+)\1$
This matches a "double number" and keeps the first repetition in capturing group number 1. If you want your match only to contain the first repetition, then use
^([\d,]+)(?=\1$)

Related

Regex capture into group everything from string except part of string

I'm trying to create a regex, which will capture everything from a string, except for specific parts of the string. The he best place to start seems to be using groups.
For example, I want to capture everything except for "production" and "public" from a string.
Sample input:
california-public-local-card-production
production-nevada-public
Would give output
california-local-card
nevada
On https://regex101.com/ I can extract the strings I don't want with
(production|public)\g
But how to capture the things I want instead?
The following will kind of get me the word from between production and public, but not anything before or after https://regex101.com/r/f5xLLr/2 :
(production|public)-?(\w*)\g
Flipping it and going for \s\S actually gives me what I need in two separate subgroups (group2 in both matches) https://regex101.com/r/ItlXk5/1 :
(([\s\S]*?)(production|public))\g
But how to combine the results? Ideally I would like to extract them as a separate named group , this is where I've gotten to https://regex101.com/r/scWxh5/1 :
(([\s\S]*?)(production|public))(?P<app>\2)\g
But this breaks the group2 matchings and gets me empty strings. What else should I try?
Edit: This question boils down to this: How to merge regex group matches?
Which seems to be impossible to solve in regex.
A regexp match is always a continuous range of the sample string. Thus, the anwswer is "No, you cannot write a regexp which matches a series of concatenated substrings as described in the question".
But, this popular kind of task is being solved very easily by replacing unnecessary words by empty strings. Like
s/-production|production-|-public|public-//g
(Or an equivalent in a language you're using)
Note. Provided that \b is supported, it would be more correct to spell it as
s/-production\b|\bproduction-|-public\b|\bpublic-//g
(to avoid matching words like 'subproduction' or 'publication')
Your regex is nearly there:
([\s\S]*?)(?>production|public)
But this results in multiple matches
Match 1
Full match 0-17 `california-public`
Group 1. 0-11 `california-`
Match 2
Full match 17-39 `-local-card-production`
Group 1. 17-29 `-local-card-`
So You have to match multiple times to retrieve the result.

Regular Expression for check of correct amount of fields

I have a certain file where the fields are seperated by a comma.
bla,20171206123901,bla,
I want to check if the correct amount of fields is in the line by regular expression where the last comma is optional. At this example it should be exactly 3.
What is the correct regular expression for that?
I thought that maybe this one could work, but it is not working
(.*,){3}(,)[0,1]
because this one also matches lines which have too much fields.
Any help would be really appreciated
Thank you
Here is pattern which you can try applying to each line:
^[^,]*,[^,]*,[^,]*,?$
This assumes that if the optional third comma does appear, that it is the last thing on the line. Also, the pattern allows for empty columns. If this is not the case, then replace [^,]* with [^,]+ everywhere in the pattern.
Another way to write the above takes advantage of repeated columns:
^(?:[^,]*,){2}[^,]*,?$
Here, you may replace 2 with the number of desired columns minus one. So for 3 columns, you would use {2} in the pattern.
Demo
This one should work:
^(?:[^,]+,[^,]*){2},?$
and you can change the number (number of columns minus one.) if you need to add more columns.
Test it here.

Regular expression to place number pair in square brackets

I have a large data file with sequences of numbers bearing the form
6.06038475036627,50.0646896362306\r\n
6.0563435554505,50.0635681152345\r\n
6.05446767807018,50.0632934570313\r\n
which I am trying to modify in Notepad++ so it reads
[6.06038475036627,50.0646896362306]\r\n
[6.0563435554505,50.0635681152345]\r\n
[6.05446767807018,50.0632934570313]\r\n
I can count the number of instances of these occurrences with a relatively simple regex \d{1,2}\.\d+\,\d{1,2}\.\d+. However, there my own regex skills hit the buffers. I am dimly aware that it is possible to go a step further and perform the actual modifications but I have no idea how that should be done.
You would simply need to do as follows:
Find what: (\d+\.\d+,\d+\.\d+)
Replace with: [\1]
Make sure that Regular Expression is checked.
Given this, it will transform this:
6.06038475036627,50.0646896362306\r\n
6.0563435554505,50.0635681152345\r\n
6.05446767807018,50.0632934570313\r\n
Into this:
[6.06038475036627,50.0646896362306]\r\n
[6.0563435554505,50.0635681152345]\r\n
[6.05446767807018,50.0632934570313]\r\n
The expression above will match the comma seperated numbers and throw them in a group. The replace will inject a [, followed by the matched group (denoted by \1) and it will inject another ].
Try the following regexp(with substitution):
\b(\d{1,2}\.\d+,\d{1,2}\.\d+)\b
https://regex101.com/r/VkHppp/1

How to handle float numbers using regular expression in VB Script

I am trying to get number with submatches in below string and i am not sure how to handle if my string contains either integer(without decimal) or float number
please correct me where i am making mistake in below code.
str="Added Quantity:12.23 Pass"
Set oReg=New RegExp
oReg.pattern="(.*Quantity.*)+((\d{1,})|(\d{1,}\.\d{1,}))(.*)"
set r=oReg.execute(str)
for i=0 to r.count-1
print r.item(1).submatches(i)
next
Your expression will match numbers alright, but it won’t match in the wrong place. To see why, let’s just consider what (Quantity.*)(\d{1,}) matches in the following string:
Quantity:12.23
Here’s the result of that match:
Whole match: Quantity:12
Group 1: Quantity:1
Group 2: 2
— The problem is that .* is greedy and matches as much as possible, including digits. It then backtracks so that it can match at least one digit (\d{1,}) in its second group. But you want to get all digits in there.
Several ways exist to solve this, but the easiest is to make your expression more specific: instead of everything (.), just match non-digits:
(.*Quantity\D*)+(\d{1,})
Furthermore, you don’t need the + quantifier here, and \d{1,} can be shortened to \d+. And in the rest of the expression you can join matching integers and decimals together, and just make the decimal part optional:
.*Quantity\D*(\d+(?:\.\d+)?).*
((?:…) just means that this group will not be captured; the parentheses are merely to enforce operator precedence.)
Finally, note this will match 1 and 0.23, but not 1., nor .23. While this is completely fine, it’s somewhat common (especially in American spelling) to omit a leading zero in front of the decimal point.

can a regex match cn.cn. or ti.ti. but not vv.pp. or aa.bb.?

is it possible with regex to match a particular sequence repeating it self rather than number of letters? I would like to be able to match cn.cn. or ti.ti. or xft.xft. but not vv.pp. or aa.bb. and I do not seam to be able to do that with (\w\w.)+ opposed to \w+.\w+. in the first case I want in fact to use only one occurrence, like cn. or ti. in the second I want to keep v.p. or a.b.
thanks for any help.
Depending on your flavor of regex, you can use backreferences in your regex to match an earlier group. Your question title and question body disagree, however, on what exactly is supposed to be matched. I'll answer in Python as that's the flavor I'm most familiar with.
# match vv.pp., no match cn.cn.
re.match(r"(\w)\1\.(\w)\2\.", some_text)
# match cn.cn., no match vv.pp.
re.match(r"(\w{2})\.\1\.", some_text)