Is it possible to negate a group in a regular expression?

Is it possible to negate a group in a regular expression? - regex

Let's say that we have this text:
2020-09-29
2020-09-30
2020-10-01
2020-10-02
2020-10-12
2020-10-16
2020-11-12
2020-11-23
2020-11-15
2020-12-01
2020-12-11
2020-12-30
I want to do something like this:
\d\d\d\d-(NOT10)-(30)
So i want to get all dates of any year, but not of the 10th month and it is important, that the day is 30.
I tried a lot to do this using negative lookahead asserations but i did not come up with any working regexes.

You can use negative lookaheads:
\d\d\d\d-(?!10)\d\d-30
The Part (?!10) ensures that no 10 follows at the point where it is inserted into the regex. Notice that you still need to match the following digits afterwards, thus the \d\d part.
Generally speaking you can not (to my knowledge) negate a part that then also matches parts of the string. But with negative lookaheads you can simulate this as I did above. The generalized idea looks something like:
(?!<special-exclusion-pattern>)<general-inclusion-pattern>
Where the special-exclusion-pattern matches a subset of the general-inclusion-pattern. In the above case the general inclusion pattern is \d\d and the special exclusion pattern ins 10.

Try :
/20\d{2}-(?:0[1-9]|1[12])-30/
Explanation :
20\d{2} it will match 20XX
(?:0[1-9]|1[12]) it will match 0X or 11, 12
30 it will match 30
Demo :https://regex101.com/r/O2F1eV/1

It's easiest to simply convert the substring (if present) that matches /^\d{4}-10-30$/ to an empty string, then split the resulting string on one or more newlines.
If your string were
2020-10-16
2020-10-30
2020-11-12
2020-11-23
and was held by the variable str, then in Ruby, for example,
str.sub(/^\d{4}-10-30$/,'')
#=> "2020-10-16\n\n2020-11-12\n2020-11-23\n"
so
str.sub(/^\d{4}-10-30$/,'').split
#=> ["2020-10-16", "2020-11-12", "2020-11-23"]
Whatever language you are using undoubtedly has similar methods.

Related

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great

Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)

I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1

Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

How can I use REGEX to test for currency formats

How can you create a regular expression that checks if a user input matches characters formally found in a currency syntax? (number, period/decimal place, comma, or dollar sign?).
The following can find all characters listed above except for the dollar sign, any idea how to properly structure this?
/([0-9.,])/g

The regex I use for currency validation is as follows:
^(\$)?([1-9]{1}[0-9]{0,2})(\,\d{3})*(\.\d{2})?$|^(\$)?([1-9]{1}[0-9]{0,2})(\d{3})*(\.\d{2})?$|^(0)?(\.\d{2})?$|^(\$0)?(\.\d{2})?$|^$
RegExr is a great website for testing and reviewing these strings (perhaps you could make a regex string that's less of a beast!)

Are you just trying to test the characters? In that case
[0-9,.$]+
will suffice. Or are you testing for the format $1,123,123.12 with the correct placements of commas and everything?
In that case you would need something more like
(\$?\d{1,3}(?:,\d{3})*(?:.\d{2})?)
should do.

You need to define what you want your regex to match, more formally than "matches characters formally found in a currency syntax". We don't know which currencies you're interested in. We don't know how strict you need it to be.
Maybe you'll come up with something like:
These elements must come in this order:
A currency symbol ('£', '€' or '$') (your requirement might specify more currencies)
1 or more numeric digits
A period or a comma
Exactly two numeric digits
Once you have a specification like that, it's easy to translate into a regular expression:
[£€$] // one of these chars.
\d+ // '+' means 'one or more'
[.,] // '[]' means 'any one of these'.
\d\d // Two digits. Could also be written as '\d{2}'
Or concatenated together:
[£€$]\d+[.,]\d\d
If you've learned about escaping special characters like $ and ., you may be surprised not to see it done here. Within [], they lose their special meaning.
(There are dialects of regex -- check the documentation for whatever implementation you're using)
Your requirements may be different though. The example I've given doesn't match:
$ 12.00
$12
USD12
¥200.00
25¢
$0.00005
20 μBTC
44 dollars
£1/19/11¾d ("one pound, nineteen shillings and elevenpence three farthings")
Work out your requirement, then write your code to meet it.

you should set \ before special chars, also you should set star(0+) or plus(1+) for match full currency chars, for example:
/([0-9\.,]*)/g
or for real price how 200,00 where all time exist 2 symbols after comma:
/(([0-9]+)(\.|,)([0-9]){2})/g

Number groups with 0 as delimiter

There's a long natural number that can be grouped to smaller numbers by the 0 (zero) delimiter.
Example: 4201100370880
This would divide to Group1: 42, Group2: 110, Group3: 370880
There are 3 groups, groups never start with 0 and are at least 1 char long. Also the last groups is "as is", meaning it's not terminated by a tailing 0.
This is what I came up with, but it only works for certain inputs (like 420110037880):
(\d+)0([1-9][0-9]{1,2})0([1-9]\d+)
This shows I'm attempting to declare the 2nd group's length to min2 max3, but I'm thinking the correct solution should not care about it. If the delimiter was non-numeric I could probably tackle it, but I'm stumped.

All right, factoring in comment information, try splitting on a regex (this may vary based on what language you're using - .split(/.../) in JavaScript, preg_split in PHP, etc.)
The regex you want to split on is: 0(?!0). This translates to "a zero that is not followed by a zero". I believe this will solve your splitting problem.
If your language allows a limit parameter (PHP does), set it to 3. If not, you will need to do something like this (JavaScript):
result = input.split(/0(?!0)/);
result = result.slice(0,2).concat(result.slice(2).join("0"));

The following one should suit your needs:
^(.*?)0(?!0)(.*?)0(?!0)(.*)$
Visualization by Debuggex

The following regex works:
(\d+?)0(?!0) with the g modifier
Demo: http://regex101.com/r/rS4dE5
For only three matches, you can do:
(\d+?)0(?!0)(\d+?)0(?!0)(.*)

How to match a one of a set of numbers?

I am trying to match a group of numbers in regex that consist of one of the following:
1,2,3,4,5,6,7,8,9,10,11
But I am having trouble figuring out the regex.
For single digits this pattern worked fine "0|1|2|3|4|5|6|7|8|9" but it fails on double digit numbers. For example 12 passes as ok due to the regex finding the 1 in 12.

You can use begin and end anchors to force the whole string to be matched:
^(0|1|2|3|4|5|6|7|8|9|10|11)$
Which can be shortened to:
^(\d|10|11)$

This will work if you want to check if just one number is between 0 and 11.
^[0-9]$|^1?[0-1]$
If you want to match a string like:
1,2,3,12,32,5,1,6,8, 11
and match 0-11 then you can use the following:
(?<=,|^)([0-9]|1?[0-1])(?=,|$)

use this regex ^(0|1|2|3|4|5|6|7|8|9|(10)|(11))$

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.

you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.

.*?idUser=([0-9]+).*?
That regex should work for you :o)

Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match

Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is it possible to negate a group in a regular expression? - regex

Try : /20\d{2}-(?:0[1-9]|1[12])-30/ Explanation : 20\d{2} it will match 20XX (?:0[1-9]|1[12]) it will match 0X or 11, 12 30 it will match 30 Demo :https://regex101.com/r/O2F1eV/1

Related

Regular Expression Extracting Text from a group

How can I use REGEX to test for currency formats

Number groups with 0 as delimiter

How to match a one of a set of numbers?

What is wrong with this Regular Expression?

Categories

Resources