RegEx capture group that excludes a specific pattern - regex

I am trying to come up with a RegEx pattern that takes strings that look like this:
KEEP_THIS_L_1234
KEEP_THIS_R_12
KEEP_THIS
and returns a capture group with this result:
KEEP_THIS
KEEP_THIS
KEEP_THIS
So far I have tried /^(\w+)(?=_(L|R))(?=_\d{0,4})/, but this pattern only returns the capture groups for the first two instances:
KEEP_THIS
KEEP_THIS
Can someone help me understand what I am missing?
Thanks!

you need to make the last two groups optional, like this:
/^(\w+?)((_(L|R))(_\d{0,4}))?$/
Your desired result will always be in $1.
This has the advantage that your other data captured (if any) will be in groups $2 and $3.

Change your regex like below.
^(\w+)(?=(?:_[LR]_\d{0,4})|$)
DEMO

Related

how to make a regular expression for this?

I want to make a regular expression on the string "{{c1::tiger}} is
a kind of {{c2::animal::something movable}}" to get the word "tiger" and "animal",and I have made this expression \{\{c\d+::((?P<value>.*?)(:{0,2})(.*?))\}\},also I want to use group('value') to achieve this.The result word "tiger" is exactly what I need,but always get the wrong result "animal::something movable"(which I mean "animal"),could anyone help me to solve this problem?Thanks.
The pattern that you tried contains 4 capturing groups and for the current example data group 1 and group 3 are empty.
To get tiger you could use a single capturing group with a negated character class:
\{\{c\d+::(?P<value>.*?)(?:::|}})
Regex demo
If the closing }} have to be present, you could use:
\{\{c\d+::(?P<value>.*?)(?:::.*)?}}
Regex demo
This would work for your example String:
c1::(?P<valueTiger>[a-z]*).*?c2::(?P<valueAnimal>[a-z]*)
Regex101
\{\{c\d+::(?P<value>[^:}]+)(?::{0,2}([^}]+))?\}\}
Demo

Regular Expression to return number without coma

I have to extract a number formatted xx,xxx.xx in a different format - xxxxx.xx by applying a regular expression. In other words, I have to remove the comma from the number in the final capture group.
I am not quite sure if it's possible to achieve only with the regular expression and without writing specific code to split and join at these values.
Here is the demo.
This is the part of input string:
AMT : EGP 3,000.00
My current regex is AMT\s*:\s*EGP\s*(\d*,\d*.\d*), which basically retreives 3,000.00.
I'm expecting to have 3000.00 in final capture group.
EDIT:
Since the OP doesn't want to capture and replace, the following can be done:
AMT\s*:\s*EGP\s*(\d*),(\d*.\d*)
The expected data is now part of the two capturing groups, and can be accessed by concatenating them: \1\2.
Demo
You can capture everything other than the , in two groups, and then replace:
Capture with:
(AMT\s*:\s*EGP\s*\d*),(\d*.\d*)
Replace with: \1\2
Demo
Try this:
AMT\s*:\s*EGP\s*\K\d+(,\d{3})*(\.\d+)?
Here is Demo
After find the match, do something like: Mystring.Replac(",", "")

Regex: using alternatives

Let's say I would like to get all the 'href' values from HTML.
I could run a regex like this on the content:
a[\s]+href[\s]*=("|')(.)+("|')
which would match
a href="something"
OR
a href = 'something' // quotes, spaces ...
which is OK; but with ("|') I get too many groups captured which is something I do not want.
How does one use alternative in regex without capturing groups as well?
The question could also be stated like: how do I delimit alternatives to match? (start and stop). I used parenthesis since this is all that worked...
(I know that the given regex is not perfect or very good, I'm just trying to figure this alternating with two values thing since it is not perfectly clear to me)
Thanks for any tips
Use non-capture groups, like this: (?:"|'), the key part being the ?:at the beginning. They act as a group but do not result in a separate match.

Find Regex for states in US having this pattern

Regex for us state
I want to retrieve state in this string. there is two types.
My string having these types.
US-VA-Arlington
VA-Arlington
In above from these i want to get state(VA) every time.
Please send suggestions.
Thanks,
Girish
Try following regex
([^-]*)-[^-]*$
Required state will be captured in \1
Try with following regex:
([A-Z]+)-\w+$
Use this regexp:
^(?:[A-Z]{2}-)?([A-Z]{2})-
The first optional group will match the country code if it exists; but it's a non-capturing group. The second group matches the state code. The state will be in capture group 1.
(US\-)?(\w\w)\-(\w+)
The first group collects 0 or 1 instances of US-
The second group collects the state abbreviation
The third group collects the city name - you may have to modify this regex to accept spaces (as others pointed out)

Using RegEx with something of the format "xxx:abc" to match just "abc"?

I've not done much RegEx, so I'm having issues coming up with a good RegEx for this script.
Here are some example inputs:
document:ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2, file:90jfa9_189204hsfiansdIASDNF, pdf:a09srjbZXMgf9oe90rfmasasgjm4-ab, spreadsheet:ASr0gk0jsdfPAsdfn
And here's what I'd want to match on each of those examples:
ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2, 90jfa9_189204hsfiansdIASDNF, a09srjbZXMgf9oe90rfmasasgjm4-ab, ASr0gk0jsdfPAsdfn
What would be the best and perhaps simplest RegEx to use for this? Thanks!
.*:(.*) should get you everything after the last colon in the string as the value of the first group (or second group if you count the 'match everything' group).
An alternative would be [^:]*$ which gets you all characters at the end of the string up to but not including the last character in the string that is a colon.
Use something like below:
([^:]*)(,|$)
and get the first group. You can use a non-capturing group (?:ABC) if needed for the last. Also this makes the assumption that the value itself can have , as one of the characters.
I don't think answers like (.*)\:(.*) would work. It will match entire string.
(.*)\:(.*)
And take the second capture group...
Simplest seems to be [^:]*:([^,]*)(?:,|$).
That is find something that has something (possibly nothing) up to a colon, then a colon, then anything not including a comma (which is the thing matched), up to a comma or the end of the line.
Note the use of a non-capturing group at the end to encapsulate the alternation. The only capturing group appearing is the one which you wish to use.
So in python:
import re
exampStr = "document:ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2, file:90jfa9_189204hsfiansdIASDNF, pdf:a09srjbZXMgf9oe90rfmasasgjm4-ab, spreadsheet:ASr0gk0jsdfPAsdfn"
regex = re.compile("[^:]*:([^,]*)(?:,|$)")
result = regex.findall(exampStr)
print result
#
# Result:
#
# ['ASoi4jgt0w9efcZXNDOFzsdpfoasdf-zGRnae4iwn2', '90jfa9_189204hsfiansdIASDNF', 'a09srjbZXMgf9oe90rfmasasgjm4-ab', 'ASr0gk0jsdfPAsdfn']
#
#
A good introduction is at: http://www.regular-expressions.info/tutorial.html .