Not able to find all groups using a pattern regex in matlab - regex

I am trying to parse a regular expression in matlab. I am trying to extract all the number between '[]' for all the groups. Here are the details:
pat = '(\[\d,\d,\d,\d\])';
s1 = 'frame_1:[1,2,3,5],[11,22,33,44],[23,12,12,33],'
[matched_string] = regexp(s1,pat,'match');
>> matched_string{:}
ans =
'[1,2,3,5]'
I want to get all the boxes, i.e [1,2,3,5],[11,22,33,44] and [23,12,12,33].
Can someone help me figure out what I am doing wrong?

Your pattern only matches single digits inside square brackets. To match one or more, add + after each:
'(\[\d+,\d+,\d+,\d+\])'
If you do not care of the format inside the square brackets, and just need to extract square brackets with digits and commas inside, you may use a simpler
'\[[\d,]+]'
Note that ] at the end of the regular expression is not a special char here, since there is no corresponding [ that opens a character class, thus, no need escaping it.

Related

extracting a word between brackets using regex pattern

Am using this tutorial to create regular expression for one of my task with input string as:
[Begin] { (GetLatestCode)
Trying to extract string between brackets i.e. trying to extract GetLatestCode for which I made the following:
(?<=\[Begin\]\s{\s\()\w+(?=\)) //returns GetLatestCode
But this solution does not seem to work when I have multiple spaces around the curly brace.
[Begin] { (GetLatestCode) //does not work
If you need to account for 0 or more spaces, add a * after each space:
(?<=\[Begin\]\s*{\s*\()\w+(?=\))
If you need to account for 1 or more, use a +:
(?<=\[Begin\]\s+{\s+\()\w+(?=\))

Regular expression to place number pair in square brackets

I have a large data file with sequences of numbers bearing the form
6.06038475036627,50.0646896362306\r\n
6.0563435554505,50.0635681152345\r\n
6.05446767807018,50.0632934570313\r\n
which I am trying to modify in Notepad++ so it reads
[6.06038475036627,50.0646896362306]\r\n
[6.0563435554505,50.0635681152345]\r\n
[6.05446767807018,50.0632934570313]\r\n
I can count the number of instances of these occurrences with a relatively simple regex \d{1,2}\.\d+\,\d{1,2}\.\d+. However, there my own regex skills hit the buffers. I am dimly aware that it is possible to go a step further and perform the actual modifications but I have no idea how that should be done.
You would simply need to do as follows:
Find what: (\d+\.\d+,\d+\.\d+)
Replace with: [\1]
Make sure that Regular Expression is checked.
Given this, it will transform this:
6.06038475036627,50.0646896362306\r\n
6.0563435554505,50.0635681152345\r\n
6.05446767807018,50.0632934570313\r\n
Into this:
[6.06038475036627,50.0646896362306]\r\n
[6.0563435554505,50.0635681152345]\r\n
[6.05446767807018,50.0632934570313]\r\n
The expression above will match the comma seperated numbers and throw them in a group. The replace will inject a [, followed by the matched group (denoted by \1) and it will inject another ].
Try the following regexp(with substitution):
\b(\d{1,2}\.\d+,\d{1,2}\.\d+)\b
https://regex101.com/r/VkHppp/1

Regular Expression Match (get multiple stuff in a group)

I have trouble working on this regular expression.
Here is the string in one line, and I want to be able to extract the thing in the swatchColorList, specifically I want the word Natural Burlap, Navy, Red
What I have tried is '[(.*?)]' to get everything inside bracket, but what I really want is to do it in one line? is it possible, or do I need to do this in two steps?
Thanks
{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}
You can try this regex
(?<=[[,]\{\")[^"]+
If negative lookbehind is not supported, you can use
[[,]\{"([^"]+)
This will save needed word in group 1.
import json
str = '{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}'
obj = json.loads(str)
words = []
for thing in obj["swatchColorList"]:
for word in thing:
words.append(word)
print word
Output will be
Natural Burlap
Navy
Red
And words will be stored to words list. I realize this is not a regex but I want to discourage the use of regex on serialized object notations as regular expressions are not intended for the purpose of parsing strings with nested expressions.

Interesting easy looking Regex

I am re-phrasing my question to clear confusions!
I want to match if a string has certain letters for this I use the character class:
[ACD]
and it works perfectly!
but I want to match if the string has those letter(s) 2 or more times either repeated or 2 separate letters
For example:
[AKL] should match:
ABCVL
AAGHF
KKUI
AKL
But the above should not match the following:
ABCD
KHID
LOVE
because those are there but only once!
that's why I was trying to use:
[ACD]{2,}
But it's not working, probably it's not the right Regex.. can somebody a Regex guru can help me solve this puzzle?
Thanks
PS: I will use it on MYSQL - a differnt approach can also welcome! but I like to use regex for smarter and shorter query!
To ensure that a string contains at least two occurencies in a set of letters (lets say A K L as in your example), you can write something like this:
[AKL].*[AKL]
Since the MySQL regex engine is a DFA, there is no need to use a negated character class like [^AKL] in place of the dot to avoid backtracking, or a lazy quantifier that is not supported at all.
example:
SELECT 'KKUI' REGEXP '[AKL].*[AKL]';
will return 1
You can follow this link that speaks on the particular subject of the LIKE and the REGEXP features in MySQL.
If I understood you correctly, this is quite simple:
[A-Z].*?[A-Z]
This looks for your something in your set, [A-Z], and then lazily matches characters until it (potentially) comes across the set, [A-Z], again.
As #Enigmadan pointed out, a lazy match is not necessary here: [A-Z].*[A-Z]
The expression you are using searches for characters between 2 and unlimited times with these characters ACDFGHIJKMNOPQRSTUVWXZ.
However, your RegEx expression is excluding Y (UVWXZ])) therefore Z cannot be found since it is not surrounded by another character in your expression and the same principle applies to B ([ACD) also excluded in you RegEx expression. For example Z and A would match in an expression like ZABCDEFGHIJKLMNOPQRSTUVWXYZA
If those were not excluded on purpose probably better can be to use ranges like [A-Z]
If you want 2 or more of a match on [AKL], then you may use just [AKL] and may have match >= 2.
I am not good at SQL regex, but may be something like this?
check (dbo.RegexMatch( ['ABCVL'], '[AKL]' ) >= 2)
To put it in simple English, use [AKL] as your regex, and check the match on the string to be greater than 2. Here's how I would do in Java:
private boolean search2orMore(String string) {
Matcher matcher = Pattern.compile("[ACD]").matcher(string);
int counter = 0;
while (matcher.find())
{
counter++;
}
return (counter >= 2);
}
You can't use [ACD]{2,} because it always wants to match 2 or more of each characters and will fail if you have 2 or more matching single characters.
your question is not very clear, but here is my trial pattern
\b(\S*[AKL]\S*[AKL]\S*)\b
Demo
pretty sure this should work in any case
(?<l>[^AKL\n]*[AKL]+[^AKL\n]*[AKL]+[^AKL\n]*)[\n\r]
replace AKL for letters you need can be done very easily dynamicly tell me if you need it
Is this what you are looking for?
".*(.*[AKL].*){2,}.*" (without quotes)
It matches if there are at least two occurences of your charactes sorrounded by anything.
It is .NET regex, but should be same for anything else
Edit
Overall, MySQL regular expression support is pretty weak.
If you only need to match your capture group a minimum of two times, then you can simply use:
select * from ... where ... regexp('([ACD].*){2,}') #could be `2,` or just `2`
If you need to match your capture group more than two times, then just change the number:
select * from ... where ... regexp('([ACD].*){3}')
#This number should match the number of matches you need
If you needed a minimum of 7 matches and you were using your previous capture group [ACDF-KM-XZ]
e.g.
select * from ... where ... regexp('([ACDF-KM-XZ].*){7,}')
Response before edit:
Your regex is trying to find at least two characters from the set[ACDFGHIJKMNOPQRSTUVWXZ].
([ACDFGHIJKMNOPQRSTUVWXZ]){2,}
The reason A and Z are not being matched in your example string (ABCDEFGHIJKLMNOPQRSTUVWXYZ) is because you are looking for two or more characters that are together that match your set. A is a single character followed by a character that does not match your set. Thus, A is not matched.
Similarly, Z is a single character preceded by a character that does not match your set. Thus, Z is not matched.
The bolded characters below do not match your set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you were to do a global search in the string, only the italicized characters would be matched:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Regular expression to check comma separted number values in Flex

Can anyone please help me to find the suitable regular expression to validate a string that has comma separated numbers, for e.g. '1,2,3' or '111,234234,-09', etc. Anything else should be considered invalid. for e.g. '121as23' or '123-123' is invalid.
I suppose this must be possible in Flex using regular expression but I can not find the correct regular expression.
#Justin, I tried your suggestion /(?=^)(?:[,^]([-+]?(?:\d*\.)?\d+))*$/ but I am facing two issues:
It will invalidate '123,12' which should be true.
It won't invalidate '123,123,aasd' which is invalid.
I tried another regex - [0-9]+(,[0-9]+)* - which works quite well except for one issue: it validates '12,12asd'. I need something that will only allow numbers separated by commas.
Your example data consists of three decimal integers, each having an optional leading plus or minus sign, separated by commas with no whitespace. Assuming this describes your requirements, the Javascript/ActionScript/Flex regex is simple:
var re_valid = /^[-+]?\d+(?:,[-+]?\d+){2}$/;
if (re_valid.test(data_string)) {
// data_string is valid
} else {
// data_string is NOT valid
}
However, if your data can contain any number of integers and may have whitespace the regex becomes a bit longer:
var re_valid = /^[ \t]*[-+]?\d+[ \t]*(,[ \t]*[-+]?\d+[ \t]*)*$/;
If your data can be even more complex (i.e. the numbers may be floating point, the values may be enclosed in quotes, etc.), then you may be better off parsing the string as a CSV record and then check each value individually.
Looks like what you want is this:
/(?!,)(?:(?:,|^)([-+]?(?:\d*\.)?\d+))*$/
I don't know Flex, so replace the / at the beginning and end with whatever's appropriate in Flex regex syntax. Your numbers will be in match set 1. Get rid of the (?:\d*\.)? if you only want to allow integers.
Explanation:
(?!,) #Don't allow a comma at the beginning of the string.
(?:,|^) #Your groups are going to be preceded by ',' unless they're the very first group in the string. The '(?:blah)' means we don't want to include the ',' in our match groups.
[-+]? #Allow an optional plus or minus sign.
(?:\d*\.)?\d+ #The meat of the pattern, this matches '123', '123.456', or '.456'.
* #Means we're matching zero or more groups. Change this to '+' if you don't want to match empty strings.
$ #Don't stop matching until you reach the end of the string.