Can i make my regex shortest? - regex

Is there any better(shortest) regex then the below which matches the below conditions?
/((.*,)|\s*)String((,.*)|\s*)/
Conditions:
--> Should select only when there is the exact match for the string (String might be in comma separated list or just the only String)
few accepted cases is for inputs:
String, some other, something other
some other, String
String
Example inputs for failure:
String test,String new,Stringtest
The problem is after encoding the url length will be increased because of this big regex. So i am thinking if there is a way to make my regex better to match the conditions.

You may use
(^|,\s*)String($|\s*,)
See the regex demo.
Details
(^|,\s*) - either the start of string (^) or (|) a comma followed with 0+ whitespace chars
String - a literal String
($|\s*,) - either the end of string ($) or (|) 0+ whitespace chars followed with a comma.

Related

Regex for replacing anything other than characters, more than one spaces and number only in end with empty char

I want to replace anything other than character, spaces and number only in end with empty string or in other words: we replace any number or spaces comes in-starting or in-middle of the string replace with empty string.
Example
**Input** **Output**
Ndd12 Ndd12
12Ndd12 Ndd12
Ndd 12 Ndd 12
Nav G45up Nav Gup
Attempted Code
regexp_replace(df1[col_name]), "(^[A-Za-z]+[0-9 ])", ""))
You may use:
\d+(?!\d*$)|[^\w\n]+(?!([A-Z]|$))
RegEx Demo
Explanation:
\d+(?!\d*$): Match 1+ digits that are not followed by 0+ digits and end of line
|: OR
[^\w\n]+(?!([A-Z]|$)): Match 1+ non-word characters that are not followed by an uppercase letter or and end of line
if you use python, you can use regular expressions.
You can use the re module.
import re
new_string = re.sub(r"[^a-zA-Z0-9]","",s)
Where ^ means exclusion.
Regular expressions exist in other languages. So it would be helpful to find a regular expression.
I came up with this regex to capture all characters that you want to remove from the string.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+
Do
regexp_replace(df1[col_name]), "^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+", ""))
Regex Demo
Explanation:
^\d+ - captures all digits in a sequence from the start.
(?<=\w)\d+(?![\d\s]) - Positive look behind for a word character with a negative look ahead for a number followed by space and capturing a sequence of digits in the middle. (Captures digits in G45up)
(?<=\s)\s+ - positive look behind for a space followed by one or more spaces, capturing all additional spaces.
Note : This regex could be inefficient when matching large strings as it uses expensive look-arounds.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+|(?<=\w)\W|\W(?=\w)|(?<!\w)\W|\W(?!\w)

Match all characters after the last instance of a string in regex

I am looking to capture all characters after the last instance of a string in regex.
The string (that which we're searching after the last instance of) is as follows, sans quotes: " - ", or \b\s\-\s\b: boundary(whitespace character, preceded by -, preceded by whitespace character).
Test string as follows:
One Thing - Two Things - Three Things - Four Things
Desired match:
Four Things
This regex only matches everything after the first instance of the string:
(?<=\b\s\-\s\b)(.*)$
(Returns, sans quotes: "Two Things - Three Things - Four Things")
Whereas this matches everything after the last single character -:
[^\-]+$
(Returns, sans quotes: " Four Things")
Thoughts?
Try using a positive lookbehind then negating on the - delimiter and taking the last result
(?<=- )[^-]+$
https://regex101.com/r/sMX9FC/1
I think you could get your match without using lookarounds.
You could match any char except a newline from the start of the string followed by matching your pattern. That will match the last instance.
Then capture in a group matching 0+ times any char except a newline until the end of the string.
^.*\b\s\-\s\b(.*)$
^ Start of string
.* Match any char except a newline
\b\s\-\s\b\ Match your pattern
(.*) Capture in group 1 matching 0+ times any char except a newline
$ End of string
Regex demo
The is no tool or programming language listed, but if \K is supported to forget what was matched, you might also use:
^.*\b\s\-\s\b\K.*$
Regex demo
This matches the end of a string, everything that is not a - after a -.
-\s*([^-]+)$
It's the simplest regex I could think of.
.*(?<=\b\s\-\s\b)(.*)$, or putting a .* before your current regex should achieve what you're after, since that's a greedy match by default.

Regexp word boundary for substrings matches even if it should not

I've this string asdas d HC-HMC BACK SIDE saas and these regexp:
\bHC\b
\bHC-HMC BACK SIDE\b
I expect the first regular expression fails while the second should match. Trying it I see that both match but I don't understand why.
This is the first regexp: https://regex101.com/r/hBZK2h/1
And this is the second one: https://regex101.com/r/TEZmP6/1
I have to create a regular expression that matches exactly the string with boundary HC-HMC BACK SIDE but not the string HC. Any hint?
The \bHC\b pattern matches HC if it is at the start of the string or is preceded with a non-word char (not a letter, a digit, and _ (and maybe other chars depending on the regex Unicode support) and that is at the end of string or immediately followed with a non-word char. See word boundary definition.
You may add a (?!-) lookahead to the first regex to make it fail if HC is followed with -:
\bHC\b(?!-)
See the regex demo

Match a String with optional number of hyphens - Java Regex

I am trying to match Strings with optional number of hyphens.
For example,
string1-string2,
string1-string2-string3,
string1-string2-string3 and so on.
Right now, I have something which matches one hyphen. How can I make the regex to match optional number of hyphens?
My current regex is: arn:aws:iam::\d{12}:[a-zA-Z]/?[a-zA-Z]-?[a-zA-Z]*
What do I need to add?
Use this regex:
^\\w+(-\\w+)*$
Explanation:
\\w+ - match any string containing [a-zA-Z_0-9]
(-\\w+)* - match a hyphen followed by a string zero or more times
Regex101
Note that this won't match an empty string, or a string containing weird characters. You could handle these cases manually or you could update the regex.

How to use regex for field validation on whole string?

I've been working for many hours trying to do a "simple thing": use a regex to validate a text field.
I need to make sure of:
1- Only use (a-z), (A-Z) and (0-9) values
2- Add a SINGLE wildcard only at the end.
Ex.
Match
MICHE*
Match
JAMES
No match
MICHE**
No match
MIC_HEAL*
I have this regex till now:
[a-zA-Z0-9\s-]+.\z*?
The problem is it still matches when I introduce an invalid character as long as I have a matching sub-string See my REGEX
What can I do to force a match on the whole string? What am I missing?
Thx!
Use ^ (start of line) and $ (end of line) to only match the whole string:
^[a-zA-Z0-9\s-]+.\z*?$
(If you have a multiline input you can also use \A and \z - start and end of string)
On a second look, I don't understand the end of your regex: . (anything) \z * ? (end of string, zero or more times, zero or one time). This regex will match something like:
Ikdflfdf&
Is that correct? If you only want the character *, you should use:
^[a-zA-Z0-9\s-]+\*?$
Also, as Robbie pointed out, you're including spaces and the - in your list of accepted characters. If you only want letters and digits, a shortcut would be using \w (word characters):
^\w+\*$
However, depending on whether the matcher is Unicode-aware or not, \w will also match non-ASCII letters and digits, which may or may not be what you want.
Try this one :
^[a-zA-Z0-9]+\*?$
^ string start
$ string end
* is meta character so it should be escaped like \* to use it as a letter
I think you just need ^ at the begining and $ at the end
^[a-zA-Z0-9\s-]+.\*?$
Also, you don't need the \z
Also, you haven't mentioned that you want to allow spaces and dashes - but you have included them in your allowed character set.