Regex - maximum 30 chars with limitation - regex

I still didn't manage to find a solution for the problem that I have with my regex.
Case:
Input-
Deutsche Gesellschaftsgeschichte
Expected output:
Group 1 - Deutsche
Group 2 - Gesellschaftsgeschichte
The reason is because both word together exceed 30 chars, so they are separated into two groups.
The maximum input can be up to 90 chars.
More examples of inputs and expected outputs:
Input -
Fachlich geeignet, politisch unzuverlässig...
Output -
Fachlich geeignet, politisch
unzuverlässig...
Input -
Textbuch zur Privatrechtsgeschichte der Neuzeit
Output -
Textbuch zur
Privatrechtsgeschichte der
Neuzeit

To get what you want in up to 3 Groups, you can use this regex:
(\b.{1,30}(?=\b))(\b.{1,30}(?=\b))?(\b.{1,30}(?=\b))?
The regex starts with a Word boundary, then matches from 1 to 30 of any character, then it looks forward for a Word boundary (making sure we match and splits Whole Words).
This pattern is repeated 3 times (where the last 2 are optional).
Now you have your matches in 3 Groups, which you can access per index.

Related

Regex needed to match a proper currency format of whole numbers, and set maximum dollar value

I'm trying to create a regex that enforces:
whole numbers only, no decimals/fractions
thousands separated by commas
sets a maximum value allowed. Acceptable range of 1-25,000,000,000 (25 billion)
I created the following regex that already accomplishes the first 2 requirements, only allowing acceptable values like:
1
1,000
25,000
250,000,000 etc.
but it's the 3rd requirement of setting a maximum value of 25 billion that I'm struggling with.
Does anyone know a way to enhance this current pattern to only allow values between the range of 1 - 25,000,000,000 ?
^[1-9]\d?\d?$|^(?!0,)(?!0\d,)(?!0\d\d,)(\d\d?\d?,)+\d{3}$
I did a lot of searching, and I found a regex that could impose a maximum value, but I can't quite figure out how to modify it to what I need to meet all 3 requirements. This is the one I found:
^((25000000000)|(2[0-4][0-9]{9})|(1[0-9]{10})|([1-9][0-9]{9})|([1-9][0-9]{8})|([1-9][0-9]{7})|([1-9][0-9]{6})|([1-9][0-9]{5})|([1-9][0-9]{4})|([1-9][0-9]{3})|([1-9][0-9]{2})|([1-9][0-9]{1})|([1-9]))$
I think this should do the trick:
^([1-9]\d{0,2}(,\d{3}){0,2})$|^(([1-9]|1\d|2[1-4])(,\d{3}){3})$|^25(,000){3}$
This regex consist of 3 main blocks or conditions:
[1-9]\d{0,2}(,\d{3}){0,2}: Any 1-9 followed by up to 2 digits, followed by up to 2 optional blocks of 3 digits preceded with a comma (supports up to 999,999,999).
([1-9]|1\d|2[1-4])(,\d{3}){3}: Three possible billion values: 1-9, or a 1 followed by any digit (to support 10-19), or a 2 followed by a 1-4 digit (to support 20-24). Then followed by 3 blocks of comma and 3 digits (supports up to 24,999,999,999).
25(,000){3}: Finally, special case, support for 25,000,000,000.
It matches:
1
12
123
1,000
25,000
250,000
2,500,000
24,999,999
25,000,000
250,000,000
1,500,000,000
2,500,000,000
15,000,000,000
24,999,999,999
25,000,000,000
And does not match:
0
1234
0,000
0,000,999
0,999,999,999
25,000,000,001
99,999,999,999
250,000,000,000
25,000,000,000,000
99,99,999
9,9,9,9,999
24999999999
25000000000
25000000001
26000000000
35000000000

Regex for date and 3-letter code

been trying to create a REGEX that will parse date and 3-letter code from a bit longer message. Here I will post examples of the messages and what I want to get:
AAA BBB 1A BY PEK14JUN18/1654 OR QQQ MF 812 XXXXX -> PEK, 14JUN18/1654
XXX/WWWW BY 05JUL 0900 BKK LT ELSE BKG WILL BE QQQQ -> BKK, 05JUL 0900
TO AZ BY 02AUG 1910 TYO OR AZ WWWW WILL BE XXX -> TYO, 02AUG 1910
BY TYO20JUL18/0355 OR CXL CA ALL QQQ -> TYO, 20JUL18/0355
BY AMS04JUL18/1954 OR CXL MF 812 L07JUL -> AMS, 04JUL18/1954
I want to be able to match 3-letter code and the date for every message. The code is always nearby the date but can be before or after the date part. Also the date part can be with or without a year.Is it possible to have one regex to use for all the above examples?
I started with this:
(\s[A-Z]{3}\d\d|\d\d[A-Z]{3}\s)
(https://regex101.com/r/LPLjgf/1) but it's not working as it should and I'm not very experienced with regex to be honest.
EDIT:
Actually I would need to use only the 3-letter codes but I need them to be connected with a date - for example in:
AAA BBB 1A BY PEK14JUN18/1654 OR QQQ MF 812 XXXXX
the AAA, BBB or QQQ shouldn't match because they arent right after / before the date as is PEK.
Same with BY TYO20JUL18/0355 OR CXL CA ALL QQQ -> only TYO should match because it's before a date and CXL shouldn't.
You may use the following pattern:
([A-Z]{3})(\d{2}[A-Z]{3}\d{2}\/\d{4})|(\d{2}[A-Z]{3} \d{4}) ([A-Z]{3})
([A-Z]{3}) Capturing group for three capital letters
(\d{2}[A-Z]{3}\d{2}\/\d{4}) Capturing group for two digits, three upper case letters, two digits, /, four digits.
| Logical OR, alternates pattern.
(\d{2}[A-Z]{3} \d{4}) Capturing group. Captures two digits, three upper case letters, whitespace and four digits.
([A-Z]{3}) Capturing group for three upper case letters.
You can try it live here.
Captured groups:
Group 1. 14-17 `PEK`
Group 2. 17-29 `14JUN18/1654`
Group 3. 83-93 `05JUL 0900`
Group 4. 94-97 `BKK`
Group 3. 151-161 `02AUG 1910`
Group 4. 162-165 `TYO`
Group 1. 211-214 `TYO`
Group 2. 214-226 `20JUL18/0355`
Group 1. 269-272 `AMS`
Group 2. 272-284 `04JUL18/1954`
Group 1. 342-345 `PEK`
Group 2. 345-357 `14JUN18/1654`
Group 1. 378-381 `TYO`
Group 2. 381-393 `20JUL18/0355`
Firstly
(\s[A-Z]{3}\d\d|\d\d[A-Z]{3}\s)
The alternation – | – means that will match \s[A-Z]{3}\d\d or \d\d[A-Z]{3}\s which is certainly not what you want. To narrow the scope of an alternation use grouping.
I would think you want to match this fairly directly:
([A-Z]{3})\d{2}[A-Z]{3}\d{2}
And that only captures the three letters in a group.
Try the following RegEx:
[A-Z]{3}(\d{2}[A-Z]{3}[\S]*)|(\d{2}[A-Z]{3}\s\d{4}\s[A-Z]{3})
It will matach 3 letters fowed by 2 numbers, followed by 3 letters OR 2 numbers followed by 3 letters, a Space, 4 numbers, a Space and 3 letters.
You can try it here

Regular Expression for parsing a sports score

I'm trying to validate that a form field contains a valid score for a volleyball match. Here's what I have, and I think it works, but I'm not an expert on regular expressions, by any means:
r'^ *([0-9]{1,2} *- *[0-9]{1,2})((( *[,;] *)|([,;] *)|( *[,;])|[,;]| +)[0-9]{1,2} *- *[0-9]{1,2})* *$'
I'm using python/django, not that it really matters for the regex match. I'm also trying to learn regular expressions, so a more optimal regex would be useful/helpful.
Here are rules for the score:
1. There can be one or more valid set (set=game) results included
2. Each result must be of the form dd-dd, where 0 <= dd <= 99
3. Each additional result must be separated by any of [ ,;]
4. Allow any number of sets >=1 to be included
5. Spaces should be allowed anywhere except in the middle of a number
So, the following are all valid:
25-10 or 25 -0 or 25- 9 or 23 - 25 (could be one or more spaces)
25-10,25-15 or 25-10 ; 25-15 or 25-10 25-15 (again, spaces allowed)
25-1 2 -25, 25- 3 ;4 - 25 15-10
Also, I need each result as a separate unit for parsing. So in the last example above, I need to be able to separately work on:
25-1
2 -25
25- 3
4 - 25
15-10
It'd be great if I could strip the spaces from within each result. I can't just strip all spaces, because a space is a valid separator between result sets.
I think this is solution for your problem.
str.replace(r"(\d{1,2})\s*-\s*(\d{1,2})", "$1-$2")
How it works:
(\d{1,2}) capture group of 1 or 2 numbers.
\s* find 0 or more whitespace.
- find -.
$1 replace content with content of capture group 1
$2 replace content with content of capture group 2
you can also look at this.

Conditional Regex Parsing

I have bunch of product codes that I'm trying to parse (Example 99 ITEM SEC SALE). In rare conditions, product codes are like 99 ITEM SEC SALE.
If it the cell is "99 ITEM SEC SALE" then "ITEM SEC" will be parsed (take out 99 and SALE).
If the cell is "99 ITEM SEC" (NO--> SALE,SOLD, OR PURCHASED). I want ITEM SEC will be parsed as well.In other words, "SALE SOLD AND PURCHASED" are prohibited words.
1-It always starts with a set of numbers (no limit)
2-Alphabetic characters (Any)
3-Alphabetic characters (any)-optional
4-If the ending value(string) is NOT "SALE" or "SOLD" or "PURCHASED" then take the digits out and parse
I found something similar but could not figure out how it should work for my case.
Thanks for the help
Okay, so what you're working for is something like this.
(?P<number>\d+)\s+(?P<Item_Name>\w+)\s+(?P<code>[a-zA-Z]{0,3})\s+(?P<status>SOLD|SALE|PURCHASED)?
(?P<number>\d+) -- Named Capture Group 1 (number)- Match any number
\s+ -- Match any number of spaces
(?P<Item_Name>\w+) -- Named Capture Group 2 (Item_Name) - Match any word until space
\s+ match any number of spaces
(?P<code>[a-zA-Z]{0,3}) -- Named Capture Group 3 (code) - Match any a-zA-Z character 0-3 times
\s+ match any number of spaces
(?P<status>SOLD|SALE|PURCHASED)? -- Named Capture Group 4 (status) - Match SOLD / SALE / PURCHASED (? means 0 or 1 times so this is optional)
Live example: https://regex101.com/r/oR3sK8/1
I don't recall if named capture groups work like this for objective-C, if they don't you can remove the ?P<...> and the regex should still operate without issues (and keep your capture groups largely unchanged).

Visual Basic - RegEx - Overall Length Check regardless the number of matches

I have the following problem :
This is my RegEx-Pattern :
\d*[a-z A-Z][a-zA-Z0-9 _?!()\/\\]*
It allows anything but numbers that stand alone like : 1 , 11 , 111 or so on.
My question : How can I set the overall Length of the input regardless of the matches ?
i tried it with several options like {1,30} before each match and i put the regex in a group with ( ) and then {1,30} but it still doesnt work.
If anyone could help me i would appreciate it :).
Allowed string:
Group1
Group 1
1Group
Group!?()\/
Group !()\?!
a1 a1 a1 a1
Not Allowed:
1
11
And so on. {1,30} after a match restricts the number of how many times i can input the match. What i want to know is: How can i set the maximum length of my above RegEx, like after 30 chars the input is reached regardless of the matches?
In order to disallow a numeric string input only, you can use a negative look-ahead (?!\d+$) and to set a limit to the input, use a limiting quantifier {1,30}:
(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}
See demo
Note that if you plan to match whole strings, you'd need anchors: ^ at the beginning will anchor the regex to the beginning of string, and $ will anchor at the end.
^(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}$
See another demo