RegEx expression not allowing only spaces? - regex

I have this regEx expression which allows only spaces, letters and dashes. I'd like to modify it so it wouldn't allow ONLY spaces too. Can someone help me ?
/^([A-zăâîșțĂÂÎȘȚ-\s])+$/

You can use a negative lookahead to restrict this generic pattern:
/^(?!\s+$)[A-Za-zăâîșțĂÂÎȘȚ\s-]+$/
^^^^^^^^
See the regex demo
The (?!\s+$) lookahead is executed once at the very beginning and returns false if there are 1 or more whitespaces until the end of the string.
Also, your regex contained a classical issue of [A-z] that matches more than just ASCII letters, you need to replace this with [A-Za-z] (or just [a-z] and use the /i case insensitive modifier).
Also, the - inside a character class is usually placed at the end so as not to escape it, and it will be parsed as a literal hyphen (however, you might want to escape it if another developer will have to update this pattern by adding more symbols to the character class).
And just in case this is a regex engine that does not support lookarounds:
^[A-Za-zăâîșțĂÂÎȘȚ\s-]*[A-Za-zăâîșțĂÂÎȘȚ-][A-Za-zăâîșțĂÂÎȘȚ\s-]*$
It requires at least 1 non-space character from the allowed set (also matching 1 obligatory symbol).
Another regex demo

Related

Trim value after 2 text patterns

I have a string which I need to extract the "migration" value from (dynamic content).
The problem is that there are several patterns on the marked section.
Instead of defining 2 regex I would like to have it on single one.
(?i)Host: api-(.*?).A9net.io
(?i)Host: stt-(.*?).A9net.io
One pattern: Host: api-**migration**.A9net.io
Second pattern: Host: stt-**migration**.A9net.io
I need the migration value extracted
You might use an alternation to match either api or sst. Note to escape the dot to match it literally.
(?i)Host: (?:api|stt)-(.*?)\.A9net\.io
Regex demo
The (.*?) matches 0+ times which would also match when migration is not there. In that case you could use (.+?) instead to at least match 1 char.
If the migration value can not contain a dot, you might also use a negated character class to match 1+ times not a dot ([^.]+)
You could use this pattern: (?i)^Host: (?:stt|api)-([^.]+).A9net.io$
As already mentioned, alternation is key to your problem.
Additionally, it's recommended to use negated character class instead of lazy quantifier (such as +?) when possible. In this case it's [^.]+ - it matches one or more characters other than dot, so it will match untill first occurence of a dot, which is what you want when using lazu quantifier followed by dot.
Demo

Regexp. How to match word isn't followed and preceded by another characters

I want to replace mm units to cm units in my code. In the case of the big amount of such replacements I use regexp.
I made such expression:
(?!a-zA-Z)mm(?!a-zA-Z)
But it still matches words like summa, gamma and dummy.
How to make up regexp correctly?
Use character classes and change the first (?!...) lookahead into a lookbehind:
(?<![a-zA-Z])mm(?![a-zA-Z])
^^^^^^^^^^^^^ ^^^^^^^^^^^
See the regex demo
The pattern matches:
(?<![a-zA-Z]) - a negative lookbehind that fails the match if there is an ASCII letter immediately to the left of the current location
mm - a literal substring
(?![a-zA-Z]) - a negative lookahead that fails the match if there is an ASCII letter immediately to the right of the current location
NOTE: If you need to make your pattern Unicode-aware, replace [a-zA-Z] with [^\W\d_] (and use re.U flag if you are using Python 2.x).
There's no need to use lookaheads and lookbehinds, so if you wish to simplify your pattern you can try something like this;
\d+\s?(mm)\b
This does assume that your millimetre symbol will always follow a number, with an optional space in-between, which I think that in this case is a reasonable assumption.
The \b checks for a word boundary to make sure the mm is not part of a word such as dummy etc.
Demo here

Do not repeat placeholders in the same string regex

I made a regex to validate arrays that contain variable placeholders surrounded by { and }:
^(\/?(([a-zA-Z0-9\-\_]+)|(\{[a-zA-Z][a-zA-Z0-9]*\}))\/?)*$
It will validate strings like test/{a}/{b} and /some-text/{a}/{a}/ and its working fine. Here is the test: https://regex101.com/r/nP1tB2/2
Is it possible to block duplicated placeholders?
For example, in the 2nd string, {a} appears twice, but I would like to "block" (regex that doesn't match) it.
You may use a negative lookahead to restrict the matching process:
^(?!.*{([\w-]+)}.*{\1})(\/?(([\w-]+)|(\{[a-zA-Z][a-zA-Z0-9]*\}))\/?)*$
^^^^^^^^^^^^^^^^^^^^^^
It means that right after a beginning of string is detected, (?!.*{([\w-]+)}.*{\1}) will check if there are 0+ characters other than a newline followed with a {...} substring (with only letters, digits, underscores or hyphens) followed with the same pattern. If the pattern is found, the whole match is failed.
See the regex demo
Note that if you do not use a Unicode aware pattern (and it is not .NET without RegexOptions.ECMAScript), \w is equal to [A-Za-z0-9_]. So, I replaced that with \w in your pattern. Else, restore that subpattern in both lookahead and the main pattern.
Also, [a-zA-Z] can also be expressed as [^\W\d_] or \p{L} (or even [:alpha:]) and [a-zA-Z0-9] as [^\W_] (or [:alnum:], [\p{L}\p{N}]). These subpatterns are handy if you need to make the pattern Unicode aware. A lot depends on the regex flavor.

Regular Expression to Match Unescaped Characters Only

Okay, so I'm trying to use a regular expression to match instances of a character only if it hasn't been escaped (with a backslash) and decided to use the a negative look-behind like so:
(?<!\\)[*]
This succeeds and fails as expected with strings such as foo* and foo\* respectively.
However, it doesn't work for strings such as foo\\*, i.e - where the special character is preceded by a back-slash escaping another back-slash (an escape sequence that is itself escaped).
Is it possible to use a negative look-behind (or some other technique) to skip special characters only if they are preceded by an odd number of back-slashes?
I've found the following solution which works for NSRegularExpression but also works in every regexp implementation I've tried that supports negative look-behinds:
(?<!\\)(?:(\\\\)*)[*]
In this case the second unmatched parenthesis matches any pairs of back-slashes, effectively eliminating them, at which point the negative look-behind can compare any remaining (odd numbered) back-slashes as expected.
A lookbehind can not solve this problem. The only way is to match escaped characters first to avoid them and to find unescaped characters:
you can isolate the unescaped character from the result with a capture group:
(?:\\.)+|(\*)
or with the \K (pcre/perl/ruby) feature that removes all on the left from the result:
(?:\\.)*\K\*
or using backtracking control verbs (pcre/perl) to skip escaped characters:
(?:\\.)+(*SKIP)(*FAIL)|\*
The only case you can use a lookbehind is with the .net framework that allows unlimited length lookbehind:
(?<!(?:[^\\]|\A)(?:\\\\)*\\)\*
or in a more limited way with java:
(?<!(?:[^\\]|\A)(?:\\\\){0,1000}\\)\*

Regex help NOT a-z or 0-9

I need a regex to find all chars that are NOT a-z or 0-9
I don't know the syntax for the NOT operator in regex.
I want the regex to be NOT [a-z, A-Z, 0-9].
Thanks in advance!
It's ^. Your regex should use [^a-zA-Z0-9]. Beware: this character class may have unexpected behavior with non-ascii locales. For instance, this would match é.
Edited
If the regexes are perl-compatible (PCRE), you can use \s to match all whitespace. This expands to include spaces and other whitespace characters. If they're posix-compatible, use [:space:] character class (like so: [^a-zA-Z0-9[:space:]]). I would recommend using [:alnum:] instead of a-zA-Z0-9.
If you want to match the end of a line, you should include a $ at the end. Turning on multiline mode is only when your match should extend across multiple lines, and it reduces performance for larger files since more must be read into memory.
Why don't you include a copy of sample input, the text you want to match, and the program you are using to do so?
It's pretty simple; you just add ^ at the beginning of a character set to negate that character set.
For example, the following pattern will match everything that's not in that character set -- i.e., not a lowercase ASCII character or a digit:
[^a-z0-9]
As a side note, some of the more helpful Regular Expression resources I've found have been this site and this cheat sheet (C# specific).
Put at ^ at the begining of your character class expression: [^a-z0-9]
At start [^a-zA-Z0-9]
for condition;
pre_match();
pre_replace();
ergi();
try this
You can also use \W it's a shorthand for non-word character (equal to [^a-zA-Z0-9_])