Regex pattern to match string that's not followed by a colon - regex

Using regex, I'm trying to match any string of characters that meets the following conditions (in the order displayed):
Contains a dollar sign $; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, underscores, periods (dots), opening brackets, and/or closing brackets [a-zA-Z0-9_.\[\]]*; then
one pipe character |; then
one at sign #; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, and/or underscores [a-zA-Z0-9_]*; then
zero colons :
In other words, if a colon is found at the end of the string, then it should not count as a match.
Here are some examples of valid matches:
$tmp1|#hello
$x2.h|#hi_th3re
Valid match$here|#in_the middle of other characters
And here are some examples of invalid matches:
$tmp2|#not_a_match:"because there is a colon"
$c.4a|#also_no_match:
Here are some of the patterns I've tried:
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]*)(\|#)([a-zA-Z][a-zA-Z0-9_]*(?!.[:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*(?![:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:])

This pattern will do what you need
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+[\w]*+(?!:)
Regex Demo
I am using possessive quantifiers to cut down the backtracking using [\w]*+. You can also use atomic groups instead of possessive quantifiers like
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+(?>[\w]*)(?!:)
NOTE
\w => [A-Za-z0-9_]

I tested your third pattern in Regex 101 and it appears to be working correctly:
^.*(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:]).*$
The only change I needed to make to the regex to make it work was to add anchors ^ and $ to the start and end of the regex. I also allowed for your pattern to occur as a substring in the middle of a larger string.
By the way, you had the following example as a string which should not match:
$tmp2|#not_a_match:"because there is a colon"
However, even if we remove the colon from this string it will still not match because it contains quotes which are not allowed.
Regex101

Related

Regex for replacing anything other than characters, more than one spaces and number only in end with empty char

I want to replace anything other than character, spaces and number only in end with empty string or in other words: we replace any number or spaces comes in-starting or in-middle of the string replace with empty string.
Example
**Input** **Output**
Ndd12 Ndd12
12Ndd12 Ndd12
Ndd 12 Ndd 12
Nav G45up Nav Gup
Attempted Code
regexp_replace(df1[col_name]), "(^[A-Za-z]+[0-9 ])", ""))
You may use:
\d+(?!\d*$)|[^\w\n]+(?!([A-Z]|$))
RegEx Demo
Explanation:
\d+(?!\d*$): Match 1+ digits that are not followed by 0+ digits and end of line
|: OR
[^\w\n]+(?!([A-Z]|$)): Match 1+ non-word characters that are not followed by an uppercase letter or and end of line
if you use python, you can use regular expressions.
You can use the re module.
import re
new_string = re.sub(r"[^a-zA-Z0-9]","",s)
Where ^ means exclusion.
Regular expressions exist in other languages. So it would be helpful to find a regular expression.
I came up with this regex to capture all characters that you want to remove from the string.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+
Do
regexp_replace(df1[col_name]), "^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+", ""))
Regex Demo
Explanation:
^\d+ - captures all digits in a sequence from the start.
(?<=\w)\d+(?![\d\s]) - Positive look behind for a word character with a negative look ahead for a number followed by space and capturing a sequence of digits in the middle. (Captures digits in G45up)
(?<=\s)\s+ - positive look behind for a space followed by one or more spaces, capturing all additional spaces.
Note : This regex could be inefficient when matching large strings as it uses expensive look-arounds.
^\d+|(?<=\w)\d+(?![\d\s])|(?<=\s)\s+|(?<=\w)\W|\W(?=\w)|(?<!\w)\W|\W(?!\w)

Regex. Replace particular characters in string

In the following string i need to replace (with Regex only) all _ with the dots, except ones that are surrounded by digits. So this:
_this_is_a_2_2_replacement_
Should become
.this.is.a.2_2.replacement.
Tried lots of things. That's where i got so far:
([a-z]*(_)[a-z]*(_))*(?=\d_\d)...(_)\w*(_)
But it obviously doesn't work.
Try finding the following regex pattern:
(?<=\D)_|_(?=\D)
And then just replace that with dot. The logic here is that a replacement happens whenever an underscore which has at least one non digit on either side gets replaced with a dot. The regex pattern I used here asserts precisely this:
(?<=\D)_ an underscore preceded by a non digit
| OR
_(?=\D) an underscore followed by a non digit
Demo
If you are using PCRE, you could assert a digit on the left of the underscore and match a digit after. Then use make use of SKIP FAIL.
In the replacement use a dot:
(?<=\d)_\d(*SKIP)(*FAIL)|_
(?<=\d) Positive lookbehind, assert what is on the left is a digit
\d(*SKIP)(*FAIL) Consume the digit which should not be part of the match result
| Or
_ Match a single underscore
Regex demo

Regex, match anything unless just numbers

I have got the following regex expression so far:
used-cars\/((?:\d+[a-z]|[a-z]+\d)[a-z\d]*)
This is sort of working, I need it to match basically ANYTHING apart from JUST numbers after used-cars/
Match:
used-cars/page-1
used-cars/1eeee
used-cars/page-1?*&_-
Not Match:
used-cars/2
used-cars/400
Can someone give me a hand? Been trying get this working for a while now!
There are few shortcomings of your regex used-cars\/((?:\d+[a-z]|[a-z]+\d)[a-z\d]*).
It's checking for used-cars/ followed by multiple digits then one character within a-z OR multiple characters within a-z then one digit.
[a-z\d]* is searching for either characters or digits which is also optional.
It's inaccurate for your pattern.
Try with following regex.
Regex: ^used-cars\/(?!\d+$)\S*$
Explanation:
used-cars\/ searches for literal used-cars/
(?!\d+$) is negative lookahead for only digits till end. If only digits are present then it won't be a match.
\S* matches zero or more characters other than whitespace.
Regex101 Demo

Regular expression let periods in (.)

My regular expression lets in periods for some reason, how can I keep that from happening.
Rules:
4-15 characters
Any alphanumeric characters
Underscore as long as it's not first or last
[A-Za-z][A-Za-z0-9_]{3,14}
I don't want "bad.example" for work.
Edit: changed to 4-15 characters
Your regex matches example as a substring of bad.example. Use anchors to prevent that:
^[A-Za-z][A-Za-z0-9_]{1,12}[A-Za-z]$
Note that (like your regex) this regex also prevents digits from matching in the first and last position - if they should be allowed (as per your specs), just add 0-9 at the end of the character classes.
^[A-Za-z][A-Za-z0-9_]{3,14}$
try this
This will match any alphanumeric at the beginning and end. In the middle it will accept from one up to twelve alphanumerics including an underscore:
^[a-zA-Z\d]\w{1,12}[a-zA-Z\d]$
It does not match bad.example but matches only example as your regex allows a character from 4 to 15.See here.
http://regex101.com/r/xV4eL5/5
To prevent it you need to match the whole input and not make partial matches.Put a ^ start anchor and $ end anchor.
Use
\A[A-Za-z0-9][\w]{1,12}[A-Za-z0-9]\Z

What is the regular expression to allow uppercase/lowercase (alphabetical characters), periods, spaces and dashes only?

I am having problems creating a regex validator that checks to make sure the input has uppercase or lowercase alphabetical characters, spaces, periods, underscores, and dashes only. Couldn't find this example online via searches. For example:
These are ok:
Dr. Marshall
sam smith
.george con-stanza .great
peter.
josh_stinson
smith _.gorne
Anything containing other characters is not okay. That is numbers, or any other symbols.
The regex you're looking for is ^[A-Za-z.\s_-]+$
^ asserts that the regular expression must match at the beginning of the subject
[] is a character class - any character that matches inside this expression is allowed
A-Z allows a range of uppercase characters
a-z allows a range of lowercase characters
. matches a period
rather than a range of characters
\s matches whitespace (spaces and tabs)
_ matches an underscore
- matches a dash (hyphen); we have it as the last character in the character class so it doesn't get interpreted as being part of a character range. We could also escape it (\-) instead and put it anywhere in the character class, but that's less clear
+ asserts that the preceding expression (in our case, the character class) must match one or more times
$ Finally, this asserts that we're now at the end of the subject
When you're testing regular expressions, you'll likely find a tool like regexpal helpful. This allows you to see your regular expression match (or fail to match) your sample data in real time as you write it.
Check out the basics of regular expressions in a tutorial. All it requires is two anchors and a repeated character class:
^[a-zA-Z ._-]*$
If you use the case-insensitive modifier, you can shorten this to
^[a-z ._-]*$
Note that the space is significant (it is just a character like any other).