java regex : getting a substring from a string which can vary - regex

I have a String like - "Bangalore,India=Karnataka". From this String I would like to extract only the substring "Bangalore". In this case the regex can be - (.+),.*=.*. But the problem is, the String can sometimes come like only "Bangalore". Then in that case the above regex wont work. What will be the regex to get the substring "Bangalore" whatever the String be ?

Try this one:
^(.+?)(?:,.*?)?=.*$
Explanation:
^ # Begining of the string
( # begining of capture group 1
.+? # one or more any char non-greedy
) # end of group 1
(?: # beginig of NON capture group
, # a comma
.*? # 0 or more any char non-greedy
)? # end of non capture group, optional
= # equal sign
.* # 0 or more any char
$ # end of string
Updated:
I thougth OP have to match Bangalore,India=Karnataka or Bangalore=Karnataka but as farr as I understand it is Bangalore,India=Karnataka or Bangalore so the regex is much more simpler :
^([^,]+)
This will match, at the begining of the string, one or more non-comma character and capture them in group 1.

matcher.matches()
tries to match against the entire input string. Look at the javadoc for java.util.regex.Matcher. You need to use:
matcher.find()

Are you somehow forced to solve this using one regexp and nothing else? (Stupid interview question? Extremely inflexible external API?) In general, don't try to make regexes do what plain old programming constructs do better. Just use the obvious regex, and it it doesn't match, return the entire string instead.

Try this regex, This will grab any grouping of characters at the start followed by a comma but not the comma itself.
^.*(?=,)

If you are only interested to check that "Bangalore" is contained in the string then you don't need a regexp for this.
Python:
In [1]: s = 'Bangalorejkdjiefjiojhdu'
In [2]: 'Bangalore' in s
Out[2]: True

Related

How to negate string pattern using re2 regex?

I'm using google re2 regex for the purpose of querying Prometheus on Grafana dashboard. Trying to get value from key by below 3 types of possible input strings
1. object{one="ab-vwxc",two="value1",key="abcd-eest-ed-xyz-bnn",four="obsoleteValues"}
2. object{one="ab-vwxc",two="value1",key="abcd-eest-xyz-bnn",four="obsoleteValues"}
3. object{one="ab-vwxc",two="value1",key="abcd-eest-xyz-bnn-ed",four="obsoleteValues"}
..with validation as listed below
should contain abcd-
shouldn't contain -ed
Somehow this regex
\bkey="(abcd(?:-\w+)*[^-][^e][^d]\w)"
..satisfies the first condition abcd- but couldn't satisfy the second condition (negating -ed).
The expected output would be abcd-eest-xyz-bnn from the 2nd input option. Any help would be really appreciated. Thanks a lot.
If I understand your requirements correctly, the following pattern should work:
\bkey="(abcd(?:-e|-(?:[^e\W]|e[^d\W])\w*)*)"
Demo.
Breakdown for the important part:
(?: # Start a non-capturing group.
-e # Match '-e' literally.
| # Or the following...
- # Match '-' literally.
(?: # Start a second non-capturing group.
[^e\W] # Match any word character except 'e'.
| # Or...
e[^d\W] # Match 'e' followed by any word character except 'd'.
) # Close non-capturing group.
\w* # Match zero or more additional word characters.
) # Close non-capturing group.
Or in simple terms:
Match a hyphen followed by:
only the letter 'e'. Or..
a word* not starting with 'e'. Or..
a word starting with 'e' not followed by 'd'.
*A "word" here means a string of word characters as defined in regex.
Maybe have a go with:
\bkey="((?:ktm-(?:(?:e-|[^e]\w*-|e[^d]\w*-)*)abcd(?:(?:-e|-[^e]\w*|-e[^d]\w*)*)|abcd(?:(?:-e|-[^e]\w*|-e[^d]\w*)*)))"
This would ensure that:
String starts with either ktm- or abcd.
If starts with ktm-, there should at least be an element called abcd.
If starts with abcd, there doesn't have to be another element.
Both options check that there must not be an element starting with -ed.
See the online demo
The struggle without lookarounds...

How do I replace all instances of a string within a repeating bounds?

Suppose I have a string like this
<other...Stuff> BoundsTag <relevant...Stuff> EndsBoundsTag <other...Stuff> BoundsTag <relevant...Stuff> EndsBoundsTag <other...Stuff>
I want to do a search and replace on my string, but only change it if its within BoundsTag/EndsBoundsTag. The string I'm trying to match exists many times in both <relevant...Stuff> and <other...Stuff>. Also, there are an arbitrary number of BoundsTag/EndsBoundsTag pairs.
Is this possible with perl regexes?
Here is an example of a specific string where i'm trying to replace MyMatch
BoundsTag asdfasdfa MyMatch asdfasdfasdf MyMatch sdfasd EndsBoundsTag asdfasdfasdfsad **MyMatch** asd *MyMatch** asf2ef23fasdfasdf BoundsTag fghjfghj MyMatch fghjfghjgh MyMatch fghjfghj EndsBoundsTag
Here I would want to replace all the instance of MyMatch except for the ones between **. And I don't mean specifically the characters **, those are just there to point it out. Also, the spacing is just there for legibility.
Assuming that those tags always occur in pairs and are unnested, it's simple:
/Stuff(?=(?:(?!BoundsTag).)*EndsBoundsTag)/s
will match Stuff only if EndsBoundsTag can be matched after it, with no BoundsTag in-between.
Test it live on regex101.com.
Explanation:
Stuff # Match Stuff
(?= # only if the following matches afterwards:
(?: # 1. A group that matches...
(?!BoundsTag) # ...unless it's the start of "BoundsTag"...
. # any character,
)* # repeated as needed.
EndsBoundsTag # 2. EndsBoundsTag must also be present
) # End of lookahead - if that succeeds, we're between tags.
It is possible. In Perl, you can use this regex:
\*{2}(?![^*]?\*{2})([^*]+?)\*{2}
See example here.

How to write a RegEx pattern that accepts a string with at most one of each letter, but unordered?

I have tried this:
[a]?[b]?[c]?[d]?[e]?[f]?[g]?[h]?[i]?[j]?[k]?[l]?[m]?[n]?[o]?[p]?[q]?[r]?[s]?[t]?[u]?[v]?[w]?[x]?[y]?[z]?
But this RegEx rejects string where the order in not alphabetical, like these:
"zabc"
"azb"
I want patterns like these two to be accepted too. How could I do that?
EDIT 1
I don't want letter repetitions, i.e., I want the following strings to be rejected:
aazb
ozob
Thanks.
You can use a negative lookahead assertion to make sure no two characters are the same:
^(?!.*(.).*\1)[a-z]*$
Explanation:
^ # Start of string
(?! # Assert that it's impossible to match the following:
.* # any number of characters
(.) # followed by one character (capture that in group 1)
.* # followed by any number of characters
\1 # followed by the same character as the one captured before
) # End of lookahead
[a-z]* # Match any number of ASCII lowercase letters
$ # End of string
Test it live on regex101.com.
Note: This regex needs to brute-force check all possible character pairs, so performance may be a problem with larger strings. If you can use anything besides regex, you're going to be happier. For example, in Python:
if re.search("^[a-z]*$", mystring) and len(mystring) == len(set(mystring)):
# valid string

Extract USD amout at beginning of string using regex

Need help to extract the usd amount at beginning of string.
Example
String : $50,000.00 NAMED INSURED FOR $100K/$300K LIMITS
Output : 50,000
This is what I tried \$\d.*?000\.00 but I am sure there is a better way.
^\$?([\d,\.]+)\b
Or something of the sort. Broken down:
^ # anchor to start of line
\$? # look for dollar sign (but optional)
( # begin capture group
[\d,\.]+ # match 1 or more of the following: `0-9`, `,` or `.`
) # end capture group
\b # word boundary
Example of the above can be found here: http://regexr.com?372t4
note: if you're looking to convert this in to a double/float value in a language, you're going to need a parser that ignores commas or do some simple string manipulation to remove them.
Try this regular expression:
\$(\S+)
which will give you 50,000.00 -- the exact amount.
Find the $ sign and then match all non-whitespace characters.
Try:
^\$(\d{1,3}(?=,\d{3})*.\d{2})

regular expression for matching

It is for a normal register name, could be 1-n characters with a-zA-Z and -, like
larry-cai, larrycai, larry-c-cai, l,
but - can't be the first and end character, like
-larry, larry-
my thinking is like
^[a-zA-Z]+[a-zA-Z-]*[a-zA-Z]+$
but the length should be 2 if my regex
should be simple, but don't how to do it
Will be nice if you can write it and pass http://tools.netshiftmedia.com/regexlibrary/
You didn't specify which regex engine you're using. One way would be (if your engine supports lookaround):
^(?!-)[A-Za-z-]+(?<!-)$
Explanation:
^ # Start of string
(?!-) # Assert that the first character isn't a dash
[A-Za-z-]+ # Match one or more "allowed" characters
(?<!-) # Assert that the previous character isn't a dash...
$ # ...at the end of the string.
If lookbehind is not available (for example in JavaScript):
^(?!-)[A-Za-z-]*[A-Za-z]$
Explanation:
^ # Start of string
(?!-) # Assert that the first character isn't a dash
[A-Za-z-]* # Match zero or more "allowed" characters
[A-Za-z] # Match exactly one "allowed" character except dash
$ # End of string
This should do it:
^[a-zA-Z]+(-[a-zA-Z]+)*$
With this there need to be one or more alphabetic characters at the begin (^[a-zA-Z]+). And if there is a - following, it needs to be followed by at least one alphabetic character (-[a-zA-Z]+). That pattern can be repeated arbitrary times until the end of the string is reached.
A simple answer would be:
^(([a-zA-Z])|([a-zA-Z][a-zA-Z-]*[a-zA-Z]))$
This matches either a string with length 1 and characters a-zA-Z or it matches an improved version of your original expression which is fine for strings with length greater than 1.
Credit for the improvement goes to Tim and ridgerunner (see comments).
Try this:
^[a-zA-Z]+([-]*[a-zA-Z])*$
Not sure which lazy group takes precedence..
^[a-zA-Z][a-zA-Z-]*?[a-zA-Z]?$
maybe this?
^[^-]\S*[^-]$|^[^-]{1}$