Regular expression to place number pair in square brackets - regex

I have a large data file with sequences of numbers bearing the form
6.06038475036627,50.0646896362306\r\n
6.0563435554505,50.0635681152345\r\n
6.05446767807018,50.0632934570313\r\n
which I am trying to modify in Notepad++ so it reads
[6.06038475036627,50.0646896362306]\r\n
[6.0563435554505,50.0635681152345]\r\n
[6.05446767807018,50.0632934570313]\r\n
I can count the number of instances of these occurrences with a relatively simple regex \d{1,2}\.\d+\,\d{1,2}\.\d+. However, there my own regex skills hit the buffers. I am dimly aware that it is possible to go a step further and perform the actual modifications but I have no idea how that should be done.

You would simply need to do as follows:
Find what: (\d+\.\d+,\d+\.\d+)
Replace with: [\1]
Make sure that Regular Expression is checked.
Given this, it will transform this:
6.06038475036627,50.0646896362306\r\n
6.0563435554505,50.0635681152345\r\n
6.05446767807018,50.0632934570313\r\n
Into this:
[6.06038475036627,50.0646896362306]\r\n
[6.0563435554505,50.0635681152345]\r\n
[6.05446767807018,50.0632934570313]\r\n
The expression above will match the comma seperated numbers and throw them in a group. The replace will inject a [, followed by the matched group (denoted by \1) and it will inject another ].

Try the following regexp(with substitution):
\b(\d{1,2}\.\d+,\d{1,2}\.\d+)\b
https://regex101.com/r/VkHppp/1

Related

Regular Expression Match (get multiple stuff in a group)

I have trouble working on this regular expression.
Here is the string in one line, and I want to be able to extract the thing in the swatchColorList, specifically I want the word Natural Burlap, Navy, Red
What I have tried is '[(.*?)]' to get everything inside bracket, but what I really want is to do it in one line? is it possible, or do I need to do this in two steps?
Thanks
{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}
You can try this regex
(?<=[[,]\{\")[^"]+
If negative lookbehind is not supported, you can use
[[,]\{"([^"]+)
This will save needed word in group 1.
import json
str = '{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}'
obj = json.loads(str)
words = []
for thing in obj["swatchColorList"]:
for word in thing:
words.append(word)
print word
Output will be
Natural Burlap
Navy
Red
And words will be stored to words list. I realize this is not a regex but I want to discourage the use of regex on serialized object notations as regular expressions are not intended for the purpose of parsing strings with nested expressions.

Interesting easy looking Regex

I am re-phrasing my question to clear confusions!
I want to match if a string has certain letters for this I use the character class:
[ACD]
and it works perfectly!
but I want to match if the string has those letter(s) 2 or more times either repeated or 2 separate letters
For example:
[AKL] should match:
ABCVL
AAGHF
KKUI
AKL
But the above should not match the following:
ABCD
KHID
LOVE
because those are there but only once!
that's why I was trying to use:
[ACD]{2,}
But it's not working, probably it's not the right Regex.. can somebody a Regex guru can help me solve this puzzle?
Thanks
PS: I will use it on MYSQL - a differnt approach can also welcome! but I like to use regex for smarter and shorter query!
To ensure that a string contains at least two occurencies in a set of letters (lets say A K L as in your example), you can write something like this:
[AKL].*[AKL]
Since the MySQL regex engine is a DFA, there is no need to use a negated character class like [^AKL] in place of the dot to avoid backtracking, or a lazy quantifier that is not supported at all.
example:
SELECT 'KKUI' REGEXP '[AKL].*[AKL]';
will return 1
You can follow this link that speaks on the particular subject of the LIKE and the REGEXP features in MySQL.
If I understood you correctly, this is quite simple:
[A-Z].*?[A-Z]
This looks for your something in your set, [A-Z], and then lazily matches characters until it (potentially) comes across the set, [A-Z], again.
As #Enigmadan pointed out, a lazy match is not necessary here: [A-Z].*[A-Z]
The expression you are using searches for characters between 2 and unlimited times with these characters ACDFGHIJKMNOPQRSTUVWXZ.
However, your RegEx expression is excluding Y (UVWXZ])) therefore Z cannot be found since it is not surrounded by another character in your expression and the same principle applies to B ([ACD) also excluded in you RegEx expression. For example Z and A would match in an expression like ZABCDEFGHIJKLMNOPQRSTUVWXYZA
If those were not excluded on purpose probably better can be to use ranges like [A-Z]
If you want 2 or more of a match on [AKL], then you may use just [AKL] and may have match >= 2.
I am not good at SQL regex, but may be something like this?
check (dbo.RegexMatch( ['ABCVL'], '[AKL]' ) >= 2)
To put it in simple English, use [AKL] as your regex, and check the match on the string to be greater than 2. Here's how I would do in Java:
private boolean search2orMore(String string) {
Matcher matcher = Pattern.compile("[ACD]").matcher(string);
int counter = 0;
while (matcher.find())
{
counter++;
}
return (counter >= 2);
}
You can't use [ACD]{2,} because it always wants to match 2 or more of each characters and will fail if you have 2 or more matching single characters.
your question is not very clear, but here is my trial pattern
\b(\S*[AKL]\S*[AKL]\S*)\b
Demo
pretty sure this should work in any case
(?<l>[^AKL\n]*[AKL]+[^AKL\n]*[AKL]+[^AKL\n]*)[\n\r]
replace AKL for letters you need can be done very easily dynamicly tell me if you need it
Is this what you are looking for?
".*(.*[AKL].*){2,}.*" (without quotes)
It matches if there are at least two occurences of your charactes sorrounded by anything.
It is .NET regex, but should be same for anything else
Edit
Overall, MySQL regular expression support is pretty weak.
If you only need to match your capture group a minimum of two times, then you can simply use:
select * from ... where ... regexp('([ACD].*){2,}') #could be `2,` or just `2`
If you need to match your capture group more than two times, then just change the number:
select * from ... where ... regexp('([ACD].*){3}')
#This number should match the number of matches you need
If you needed a minimum of 7 matches and you were using your previous capture group [ACDF-KM-XZ]
e.g.
select * from ... where ... regexp('([ACDF-KM-XZ].*){7,}')
Response before edit:
Your regex is trying to find at least two characters from the set[ACDFGHIJKMNOPQRSTUVWXZ].
([ACDFGHIJKMNOPQRSTUVWXZ]){2,}
The reason A and Z are not being matched in your example string (ABCDEFGHIJKLMNOPQRSTUVWXYZ) is because you are looking for two or more characters that are together that match your set. A is a single character followed by a character that does not match your set. Thus, A is not matched.
Similarly, Z is a single character preceded by a character that does not match your set. Thus, Z is not matched.
The bolded characters below do not match your set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you were to do a global search in the string, only the italicized characters would be matched:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

What regular expression can I use to find the Nᵗʰ entry in a comma-separated list?

I need a regular expression that can be used to find the Nth entry in a comma-separated list.
For example, say this list looks like this:
abc,def,4322,mail#mailinator.com,3321,alpha-beta,43
...and I wanted to find the value of the 7th entry (alpha-beta).
My first thought would not be to use a regular expression, but to use something that splits the string into an array on the comma, but since you asked for a regex.
most regexes allow you to specify a minimum or maximum match, so something like this would probably work.
/(?:[^\,]*,){5}([^,]*)/
This is intended to match any number of character that are not a comma followed by a comma six times exactly (?:[^,]*,){5} - the ?: says to not capture - and then to match and capture any number of characters that are not a comma ([^,]+). You want to use the first capture group.
Let me know if you need more info.
EDIT: I edited the above to not capture the first part of the string. This regex works in C# and Ruby.
You could use something like:
([^,]*,){$m}([^,]*),
As a starting point. (Replace $m with the value of (n-1).) The content would be in capture group 2. This doesn't handle things like lists of size n, but that's just a matter of making the appropriate modifications for your situation.
#list = split /,/ => $string;
$it = $list[6];
or just
$it = (split /,/ => $string)[6];
Beats writing a pattern with a {6} in it every time.

Regular expression challenge to match same numbers separately

I am struggling with a nice challenge to match two same numbers separately, with a regex.
See here the list I am trying to match separately.
1,680,000,0001,680,000,000
3,350,0003,350,000
110,000110,000
11,100,00011,100,000
550,000550,000
1,0001,000
250250
49,50049,500
165,000165,000
49,50049,500
3,350,0003,350,000
165,000165,000
550,000550,000
550,000550,000
33,10033,100
18,10018,100
450,000450,000
Take for example 550,000550,000, that's twice 550,000 or 250250 that's twice 250. I want to match for example 550,000 and 250.
I have tested many regular expressions in RegexBuddy, but no one does what I want. Maybe you have a suggestion?
If I understand your requirements correctly, then
^(.+)\1$
should work. You can restrict the possible matches to only allow digits and commas like this:
^([\d,]+)\1$
This matches a "double number" and keeps the first repetition in capturing group number 1. If you want your match only to contain the first repetition, then use
^([\d,]+)(?=\1$)

Can I shorten this regular expression?

I have the need to check whether strings adhere to a particular ID format.
The format of the ID is as follows:
aBcDe-fghIj-KLmno-pQRsT-uVWxy
A sequence of five blocks of five letters upper case or lower case, separated by one dash.
I have the following regular expression that works:
string idFormat = "[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}[-]{1}[a-zA-Z]{5}";
Note that there is no trailing dash, but the all of the blocks within the ID follow the same format. Therefore, I would like to be able to represent this sequence of four blocks with a trailing dash inside the regular expression and avoid the duplication.
I tried the following, but it doesn't work:
string idFormat = "[[a-zA-Z]{5}[-]{1}]{4}[a-zA-Z]{5}";
How do I shorten this regular expression and get rid of the duplicated parts?
What is the best way to ensure that each block does also not contain any numbers?
Edit:
Thanks for the replies, I now understand the grouping in regular expressions.
I'm running a few tests against the regular expression, the following are relevant:
Test 1: aBcDe-fghIj-KLmno-pQRsT-uVWxy
Test 2: abcde-fghij-klmno-pqrst-uvwxy
With the following regular expression, both tests pass:
^([a-zA-Z]{5}-){4}[a-zA-Z]{5}$
With the next regular expression, test 1 fails:
^([a-z]{5}-){4}[a-z]{5}$
Several answers have said that it is OK to omit the A-Z when using a-z, but in this case it doesn't seem to be working.
You can try:
([a-z]{5}-){4}[a-z]{5}
and make it case insensitive.
If you can set regex options to be case insensitive, you could replace all [a-zA-Z] with just plain [a-z]. Furthermore, [-]{1} can be written as -.
Your grouping should be done with (, ), not with [, ] (although you're correctly using the latter in specifying character sets.
Depending on context, you probably want to throw in ^...$ which matches start and end of string, respectively, to verify that the entire string is a match (i.e. that there are no extra characters).
In javascript, something like this:
/^([a-z]{5}-){4}[a-z]{5}$/i
This works for me, though you might want to check it:
[a-zA-Z]{5}(-[a-zA-Z]{5}){4}
(One group of five letters, followed by [dash+group of five letters] four times)
([a-zA-Z]{5}[-]{1}){4}[a-zA-Z]{5}
Try
string idFormat = "([a-zA-Z]{5}[-]{1}){4}[a-zA-Z]{5}";
I.e. you basically replace your brackets by parentheses. Brackets are not meant for grouping but for defining a class of accepted characters.
However, be aware that with shortened versions, you can use the expression for validating the string, but not for analyzing it. If you want to process the 5 groups of characters, you will want to put them in 5 groups:
string idFormat =
"([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})-([a-zA-Z]{5})";
so you can address each group and process it.