Regular Expression Repitition groups - regex

I have a Regex :
\*777\*[0-9]{10,}\*\d+\*(5|10|20|25|50|100)\*\d+#
That is what i have these far.
It could handle input : *777*9283928839*89*5*9090#.
The format goes like this : *777*phone*Qty*Item Code*pin#
The problem is sometime the input will go like this :
*777*phone*Qty*Item Code*Qty*Item Code*Qty*Item Code*pin#
It will repeat at Qty*Item Code. But the Item code should be one of these 5,10,20,25,50,100
I confuse in making the regex check for Qty*Item Code.
Can someone give a hint?
Thanks.

You can use the following:
\*777\*[0-9]{10,}\*(\d+\*(5|10|20|25|50|100)\*)+\d+#
Explanation
The part that's repeating seems to be this:
\d+\*(5|10|20|25|50|100)\*
If you enclose that in parentheses and add + after it, it will tell regex to match what's inside the parentheses one or more times:
(\d+\*(5|10|20|25|50|100)\*)+

Related

Regex to match everything between multiple set of brackets

I am trying to match everything between multiple set of brackets
Example of data
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
I need to match everything within the inner brackets as below
42.30722,-83.181125,
42.30722,-83.18112667,
42.30722167,-83.18112667,
42.30721667,-83.181125,
+42.30721667,-83.181125
How do I do that. I tried \[([^\[\]]|)*\] but it gives me values with brackets. Can anybody please help me with this. Thanks in advance
Seems like one of them is missing a bracket maybe, or if not, maybe some expression similar to:
\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?
might be OK to start with.
Test
import re
expression = r"\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?"
string = """
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
"""
print([list(i) for i in re.findall(expression, string)])
print(re.findall(expression, string))
Output
[['42.30722', '-83.181125'], ['42.30722', '-83.18112667'], ['42.30722167', '-83.18112667'], ['42.30721667', '-83.181125'], ['+42.30721667', '-83.181125']]
[('42.30722', '-83.181125'), ('42.30722', '-83.18112667'), ('42.30722167', '-83.18112667'), ('42.30721667', '-83.181125'), ('+42.30721667', '-83.181125')]
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
A little late, but figured I would include it anyhow.
Your 3rd set is missing a ']'.
If that is in there, then in Alteryx, you can just use Text to Columns splitting to Rows and ignore delimiter in brackets

Regex that matches a pattern only if string does not begin with 'N'

I need to put together a regex that matches a patter only if string does not begin with 'N'.
Here is my pattern so far [A-E]+[-+]?.
Now I want to make sure that it does not match something like:
N\A
NA
NB+
NB-
NCAB
This is for REGEXP_SUBSTR command in Oracle SQL DB
UPDATE
It looks like I should have been more specific, sorry
I want to extract from a string [A-E]+[-+]? but if the string also matches ^(N|n) then I want my regex to return nothing.
See examples below:
String Returns
N/A
F1/AAA AAA
NABC
FABC ABC
To match a character between A and E not preceded by N, you can use:
([^N]|^)[A-E]+
If you want to avoid fields that contains N[A-E] use a negation in your query using the pattern N[A-E] (in other words, use two predicates, this one to exclude NA and the first to find A)
To be more clear:
WHERE NOT REGEXP_LIKE(coln, 'N[A-E]') AND REGEXP_LIKE(coln, '[A-E]')
Ok I figured it out, I broadened the scope of the problem a little, I realized that I can also play with other parameters of REGEXP_SUBSTR in this case that I can have returned only second substring.
REGEXP_SUBSTR(field1, '^([^NA-D][^A-D]*)?([A-D]+[-+]?)',1,1,'i',2)
I still have to give you guys the credit, lot of good ideas that led me to here.
Just throw a [^N]? in front. That should do it.
OOPS...
That actually needs to include an " OR ^ "...
It should look like this:
([^N]|^)[A-E]+[-+]?
Sorry about that...It looks like the right answer already got posted anyway.

Regular Expression to find all SupressMessage in solution

I am looking for a regular expression to match all references to SuppressMessage in a solution that took over.
example:
[SuppressMessage("Microsoft.Globalization", "CA1305:SpecifyIFormatProvider", MessageId = "System.Int32.ToString")]
I tried this to find the SuppressMessage with the beginning and ending square brackets but it does not observe line feeds and when multiple matches are with the same file, it will return the bulk of the file.
\[(SuppressMessage)\((.*)\)\]
[(SuppressMessage)((.*?))]
try it
Thanks vks - That got me closer but that finds two groups.
SupressMessage
"Microsoft.Design", "CA1062:Validate arguments of public methods", MessageId = "0"
What I found that works (without multiple SuppressMessages in the same square brace) is:
\[(SuppressMessage.*?)\]
\[(SuppressMessage\((?:.*?)\))\]
make your expression non greedy.In fact try
\[(SuppressMessage\((?:[^)]*)\))\]
or
\[(SuppressMessage[^)]*\)))
to make it fail proof.

Regular expression quantifier questions

Im trying to find a regular expression that matches this kind of URL:
http://sub.domain.com/selector/F/13/K/100546/sampletext/654654/K/sampletext_sampletext.html
and dont match this:
http://sub.domain.com/selector/F/13/K/10546/sampletext/5987/K/sample/K/101/sample_text.html
only if the number of /K/ is minimum 1 and maximum 2 (something with a quantifier like {1,2})
Until this moment i have the following regexp:
http://sub\.domain\.com/selector/F/[0-9]{1,2}/[a-z0-9_-]+/
Now i would need a hand to add any kind of condition like:
Match this if in the text appears the /K/ from 1 to 2 times at most.
Thanks in advance.
Best Regards.
Josema
Do you need to this all in one line?
The approach I would take is to do a regex for /K/ and then count the number of matches I got.
I think Boost is a C++ library right? In C# I would do it like this:
string url = "http://sub.domain.com/selector/F/13/K/100546/sampletext/654654/K/sampletext_sampletext.html";
if (Regex.Matches(url, "/K/").Count <= 2)
{
// good url found
}
UPDATE
This regex would match everything up to the first two K's and then only allow the url filename.html after that:
^http://sub.domain.com/selector/F/[\d]+/[a-zA-Z]+/[\d]+/[a-zA-Z]+/[\d]+/K/[a-zA-Z_]+\.html$
This RE will match anything after the/F/[0-9]{1,2} that has 1 or 2 /K/, it could also match http://sub.domain.com/selector/F/13/K/100546/stuff/21515/stuff/sampletext/654654/K/stuff/sampletext_sampletext.html :
^http://sub\.domain\.com/selector/F/[0-9]{1,2}(?:/K(?=/)(?:(?!/K/)/[a-z0-9_.-]+)*){1,2}$

Regexp for finding tags without nested tags

I'm trying to write a regexp which will help to find non-translated texts in html code.
Translated texts means that they are going through special tag: or through construction: ${...}
Ex. non-translated:
<h1>Hello</h1>
Translated texts are:
<h1><fmt:message key="hello" /></h1>
<button>${expression}</button>
I've written the following expression:
\<(\w+[^>])(?:.*)\>([^\s]+?)\</\1\>
It finds correct strings like:
<p>text<p>
Correctly skips
<a><fmt:message key="common.delete" /></a>
But also catches:
<li><p><fmt:message key="common.delete" /></p></li>
And I can't figure out how to add exception for ${...} strings in this expression
Can anybody help me?
If I understand you properly, you want to ensure the data inside the "tag" doesn't contain fmt:messsage or ${....}
You might be able to use a negative-lookahead in conjuction with a . to assert that the characters captured by the . are not one of those cases:
/<(\w+)[^>]*>(?:(?!<fmt:message|\$\{|<\/\1>).)*<\/\1>/i
If you want to avoid capturing any "tags" inside the tag, you can ignore the <fmt:message portion, and just use [^<] instead of a . - to match only non <
/<(\w+)[^>]*>(?:(?!\$\{)[^<])*<\/\1>/i
Added from comment If you also want to exclude "empty" tags, add another negative-lookahead - this time (?!\s*<) - ensure that the stuff inside the tag is not empty or only containing whitespace:
/<(\w+)[^>]*>(?!\s*<)(?:(?!\$\{)[^<])*<\/\1>/i
If the format is simple as in your examples you can try this:
<(\w+)>(?:(?!<fmt:message).)+</\1>
Rewritten into a more formal question:
Can you match
aba
but not
aca
without catching
abcba ?
Yes.
FSM:
Start->A->B->A->Terminate
Insert abcba and run it
Start is ready for input.
a -> MATCH, transition to A
b -> MATCH, transition to B
c -> FAIL, return fail.
I've used a simple one like this with success,
<([^>]+)[^>]*>([^<]*)</\1>
of course if there is any CDATA with '<' in those it's not going to work so well. But should do fine for simple XML.
also see
https://blog.codinghorror.com/parsing-html-the-cthulhu-way/
for a discussion of using regex to parse html
executive summary: don't