Capture values >= 20 $ (without cents, currency symbol and spaces) - regex

I would like to capture $- (or £-)prices >= 20, without cents (pence), where the $ (£) may be in front of or after the value and the currency-symbol may be separated from the value by space(s) or not, e.g.:
$20
$3000
£ 60.67 (but only the '60'-part)
33$
500.99$ (but only the '500'-part)
90 £
Something like:
(?:[^\d][$£] ?)([\d]{3,}|[2-9][\d]{1})|([\d]{3,}|[2-9][\d]{1})(?: *.?[0-9]* ?[$£])
...which works, but simpler (or at least without the non-capturing (?: )-syntax because it doesn't work with my regex browser highlight extension.
I would like to use this to highlight prices e.g. on Amazon via a regex browser extension. If you happen to know a good one (which possibly even supports (?: )-syntax) I'd be happy to hear your suggestions, too :-)
Many thanks in advance

You can use the following regex:
(?<=[$£])\s*([2-9]\d|\d{3,})(?=[\.\s])|(?<!\.)([2-9]\d|\d{3,})(?=(?:\.\d+)?\s*[$£])
which splits the matching in two cases:
case 1: your money symbol is found before the numeric value
case 2: your money symbol is found after the numeric value
"Case 1": (?<=[$£])\s*([2-9]\d|\d{3,})(?=[\.\s])
(?<=[$£]): the money symbol is followed by...
\s*: spaces
([2-9]\d|\d{3,}): 20 or bigger numeric values
(?=[\.\s]): followed by either a dot (if decimal) or a space
"Case 2": (?<!\.)([2-9]\d|\d{3,})(?=(?:\.\d+)?\s*[$£])
(?<!\.): match is not preceeded by a dot
([2-9]\d|\d{3,}): 20 or bigger numeric values
(?=(?:\.\d+)?\s*[$£]): followed by optional dot and numbers (for decimal), mandatory spaces and the money symbol
Check the demo here.

Related

Capture the latest in backreference

I have this regex
(\b(\S+\s+){1,10})\1.*MY
and I want to group 1 to capture "The name" from
The name is is The name MY
I get "is" for now.
The name can be any random words of any length.
It need not be at the beginning.
It need on be only 2 or 3 words. It can be less than 10 words.
Only thing sure is that it will be the last set of repeating words.
Examples:
The name is Anthony is is The name is Anthony - "The name is Anthony".
India is my country All Indians are India is my country - "India is my country "
Times of India Alphabet Google is the company Alphabet Google canteen - "Alphabet Google"
You could try:
(\b\w+[\w\s]+\b)(?:.*?\b\1)
As demonstrated here
Explanation -
(\b\w+[\w\s]+\b) is the capture group 1 - which is the text that is repeated - separated by word boundaries.
(?:.*?\b\1) is a non-capturing group which tells the regex system to match the text in group 1, only if it is followed by zero-or-more characters, a word-boundary, and the repeated text.
Regex generally captures thelongest le|tmost match. There are no examples in your question where this would not actualny be the string you want, but that could just mean you have not found good examples to show us.
With that out of the way,
((\S+\s)+)(\S+\s){0,9}\1
would appear to match your requirements as currently stated. The "longest leftmost" behavior could still get in the way if there are e.g. straddling repetitions, like
this that more words this that more words
where in the general case regex alone cannot easily be made to always prefer the last possible match and tolerate arbitrary amounts of text after it.

regex for a string ,\"16 questions\",

I am not good in regex and I spent so much time figure out how to search for the below pattern:
,\"16 questions\",
This is what I constructed .\"[0-9,]+ questions\".
I think I am close but not sure how much. Can someone please correct it. The numeric value can have comma in it when the number crosses 1k. e.g 2,500 questions.
,"\d{1,3}(,\d{3,3})*\squestions?",
Explanation:
\d{1,3}= 1~3 decimal digits
(,\d{3,3})* = comma and 3 decimal digits, the whole group repeating 0~N times
\s = whitespace
s? = letter s can be missing
These two parts give you accurate recognition of possible numbers.
▶ Test and visualization.
If the backslashes in your text are true backslashes, then the regex including them would be
,\\"\d{1,3}(,\d{3,3})*\squestions?\\",
This works. You didn't indicate if the numeric value could have more than 1 comma (e.g. 1,000,000)
,\\"((\d{1,3})(,\d{3})*)\squestions?\\",

Regex for validation of a street number

I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !
You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty
\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max
You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.
Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan
The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.

How can I use REGEX to test for currency formats

How can you create a regular expression that checks if a user input matches characters formally found in a currency syntax? (number, period/decimal place, comma, or dollar sign?).
The following can find all characters listed above except for the dollar sign, any idea how to properly structure this?
/([0-9.,])/g
The regex I use for currency validation is as follows:
^(\$)?([1-9]{1}[0-9]{0,2})(\,\d{3})*(\.\d{2})?$|^(\$)?([1-9]{1}[0-9]{0,2})(\d{3})*(\.\d{2})?$|^(0)?(\.\d{2})?$|^(\$0)?(\.\d{2})?$|^$
RegExr is a great website for testing and reviewing these strings (perhaps you could make a regex string that's less of a beast!)
Are you just trying to test the characters? In that case
[0-9,.$]+
will suffice. Or are you testing for the format $1,123,123.12 with the correct placements of commas and everything?
In that case you would need something more like
(\$?\d{1,3}(?:,\d{3})*(?:.\d{2})?)
should do.
You need to define what you want your regex to match, more formally than "matches characters formally found in a currency syntax". We don't know which currencies you're interested in. We don't know how strict you need it to be.
Maybe you'll come up with something like:
These elements must come in this order:
A currency symbol ('£', '€' or '$') (your requirement might specify more currencies)
1 or more numeric digits
A period or a comma
Exactly two numeric digits
Once you have a specification like that, it's easy to translate into a regular expression:
[£€$] // one of these chars.
\d+ // '+' means 'one or more'
[.,] // '[]' means 'any one of these'.
\d\d // Two digits. Could also be written as '\d{2}'
Or concatenated together:
[£€$]\d+[.,]\d\d
If you've learned about escaping special characters like $ and ., you may be surprised not to see it done here. Within [], they lose their special meaning.
(There are dialects of regex -- check the documentation for whatever implementation you're using)
Your requirements may be different though. The example I've given doesn't match:
$ 12.00
$12
USD12
¥200.00
25¢
$0.00005
20 μBTC
44 dollars
£1/19/11¾d ("one pound, nineteen shillings and elevenpence three farthings")
Work out your requirement, then write your code to meet it.
you should set \ before special chars, also you should set star(0+) or plus(1+) for match full currency chars, for example:
/([0-9\.,]*)/g
or for real price how 200,00 where all time exist 2 symbols after comma:
/(([0-9]+)(\.|,)([0-9]){2})/g

best approach for my pattern match

So, I've built a regex which follows this:
4!a2!a2!c[3!c]
which is translated to
4 alpha character followed by
2 alpha characters followed by
2 characters followed by
3 optional character
this is a standard format for SWIFT BIC code HSBCGB2LXXX
my regex to pull this out of string is:
(?<=:32[^:]:)(([a-zA-Z]{4}[a-zA-Z]{2})[0-9][a-zA-Z]{1}[X]{3})
Now this is targeting a specific tag (32) and works, however, I'm not sure if it's the cleanest, plus if there are any characters before H then it fails.
the string being matched against is:
:32B:HsBfGB4LXXXHELLO
the following returns HSBCGB4LXXX, but this:
:32B:2HsBfGB4LXXXHELLO
returns nothing.
EDIT
For clarity. I have a string which contains multiple lines all starting with :2xnumber:optional letter (eg, :58A:) i want to specify a line to start matching in and return a BIC from anywhere in the line.
EDIT
Some more example data to help:
:20:ABCDERF Z
:23B:CRED
:32A:140310AUD2120,
:33B:AUD2120,
:50K:/111222333
Mr Bank of Dad
Dads house
England
:52D:/DBEL02010987654321
address 1
address 2
:53B:/HSBCGB2LXXX
:57A://AU124040
AREFERENCE
:59:/44556677
A line which HSBCGB2LXXX contains a BIC
:70:Another line of data
:71A:Even more
Ok, so I need to pass in as a variable the tag 53 or 59 and return the BIC HSBCGB2LXXX only!
Your regex can be simplified, and corrected to allow a character before the H, to:
:32[^:]:.?([a-zA-Z]{6}\d[a-zA-Z]XXX)
The changes made were:
Lost the look behind - just make it part of the match
Inserting .? meaning "optional character"
([a-zA-Z]{4}[a-zA-Z]{2}) ==> [a-zA-Z]{6} (4+2=6)
[0-9] ==> \d (\d means "any digit")
[X]{3} ==> XXX (just easier to read and less characters)
Group 1 of the match contains your target
I'm not quite sure if I understand your question completely, as your regular expression does not completely match what you have described above it. For example, you mentioned 3 optional characters, but in the regexp you use 3 mandatory X-es.
However, the actual regular expression can be further cleaned:
instead of [a-zA-Z]{4}[a-zA-Z]{2}, you can simply use [a-zA-Z]{6}, and the grouping parentheses around this might be unnecessary;
the {1} can be left out without any change in the result;
the X does not need surrounding brackets.
All in all
(?<=:32[^:]:)([a-zA-Z]{6}[0-9][a-zA-Z]X{3})
is shorter and matches in the very same cases.
If you give a better description of the domain, probably further improvements are also possible.