I need a regex to find all chars that are NOT a-z or 0-9
I don't know the syntax for the NOT operator in regex.
I want the regex to be NOT [a-z, A-Z, 0-9].
Thanks in advance!
It's ^. Your regex should use [^a-zA-Z0-9]. Beware: this character class may have unexpected behavior with non-ascii locales. For instance, this would match é.
Edited
If the regexes are perl-compatible (PCRE), you can use \s to match all whitespace. This expands to include spaces and other whitespace characters. If they're posix-compatible, use [:space:] character class (like so: [^a-zA-Z0-9[:space:]]). I would recommend using [:alnum:] instead of a-zA-Z0-9.
If you want to match the end of a line, you should include a $ at the end. Turning on multiline mode is only when your match should extend across multiple lines, and it reduces performance for larger files since more must be read into memory.
Why don't you include a copy of sample input, the text you want to match, and the program you are using to do so?
It's pretty simple; you just add ^ at the beginning of a character set to negate that character set.
For example, the following pattern will match everything that's not in that character set -- i.e., not a lowercase ASCII character or a digit:
[^a-z0-9]
As a side note, some of the more helpful Regular Expression resources I've found have been this site and this cheat sheet (C# specific).
Put at ^ at the begining of your character class expression: [^a-z0-9]
At start [^a-zA-Z0-9]
for condition;
pre_match();
pre_replace();
ergi();
try this
You can also use \W it's a shorthand for non-word character (equal to [^a-zA-Z0-9_])
Related
I am trying to extract words that have at least one character from a special character set. It picks up some words and not others. Here is a link to regex101 to test it. This it the regex \b(\w*[āīūẓḍḥṣṭĀĪŪẒḌḤṢṬʿʾ]+\w*)\b, and this is the sample sentence I am using
His full name is Abu ʿĪsa Muḥammad ibn ʿĪsa ibn Sawrah ibn Mūsa ibn
Al-Daḥāk Al-Sulamī Al-Tirmidhī.
It should match the following words:
ʿĪsa Muḥammad ʿĪsa Mūsa Al-Daḥāk Al-Sulamī Al-Tirmidhī
I am not too experienced with regex, so I have no idea what I am doing wrong. If someone knows any tool to find out why a specific word doesn't match a regex pattern, please let me know as well.
You can use
[\w-]*[āīūẓḍḥṣṭĀĪŪẒḌḤṢṬʿʾ][\wāīūẓḍḥṣṭĀĪŪẒḌḤṢṬʿʾ-]*
After matching the one required special character, use another character set to match more occurrences of those characters or normal word characters.
https://regex101.com/r/ovJoLt/2
You can make this work by enabling the Unicode flag /u (so that the word boundary \b assertions support Unicode characters) and adding hyphens to the surrounding character groups:
/\b[\w-]*[āīūẓḍḥṣṭĀĪŪẒḌḤṢṬʿʾ]+[\w-]*\b/gu
Plus, you don't need the capturing group, since the only characters being matched form the desired output anyway (\b is a zero-width assertion).
Demo
You are not doing anything wrong except that to match unicode boundaries you have to enable u modifier or use (?<!\S)\w*[āīūẓḍḥṣṭĀĪŪẒḌḤṢṬʿʾ]+\w*(?!\S)
If you want to match hyphen add it to your character class (?<!\S)\w*[āīūẓḍḥṣṭĀĪŪẒḌḤṢṬʿʾ-]+\w*(?!\S)
I have this regEx expression which allows only spaces, letters and dashes. I'd like to modify it so it wouldn't allow ONLY spaces too. Can someone help me ?
/^([A-zăâîșțĂÂÎȘȚ-\s])+$/
You can use a negative lookahead to restrict this generic pattern:
/^(?!\s+$)[A-Za-zăâîșțĂÂÎȘȚ\s-]+$/
^^^^^^^^
See the regex demo
The (?!\s+$) lookahead is executed once at the very beginning and returns false if there are 1 or more whitespaces until the end of the string.
Also, your regex contained a classical issue of [A-z] that matches more than just ASCII letters, you need to replace this with [A-Za-z] (or just [a-z] and use the /i case insensitive modifier).
Also, the - inside a character class is usually placed at the end so as not to escape it, and it will be parsed as a literal hyphen (however, you might want to escape it if another developer will have to update this pattern by adding more symbols to the character class).
And just in case this is a regex engine that does not support lookarounds:
^[A-Za-zăâîșțĂÂÎȘȚ\s-]*[A-Za-zăâîșțĂÂÎȘȚ-][A-Za-zăâîșțĂÂÎȘȚ\s-]*$
It requires at least 1 non-space character from the allowed set (also matching 1 obligatory symbol).
Another regex demo
I have this line
pattern = "\S*\w+(\s?$|\s{1,}\w+)+"
It all works fine as it allows me to block the initial white space, and allow at those between the words, but I can not include special characters (for example: '+' &%) without changing this property. Can someone help me out ? Thank you
If all you want is a space split you should replace \w with \S.
And anyway having \S*\w+ is sort of redundant, you could simplify with \S*\w.
But if you want finer control why not write out the whole range and replace \w with [a-zA-Z0-9_+&%]?
Check out regular expressions for javascript
Only \S matches special characters, \w only matches [a-zA-Z0-9_].
So you could simply replace them to
pattern="\S*\S+(\s?$|\s{1,}\S+)+"
but there is so much redundancy then. Simplify it to
pattern="\S+(\s+\S+)*\s?"
or if really the only thing you care about is starting with \S then just do
pattern="\S[\s\S]*" <!-- or -->
pattern="\S.*" <!-- not allowing linebreaks -->
From my understanding if you want it to not find the whitespaces or the special characters simply remove the \S* this matches anything OTHER then whitespace which includes special characters.
\w+(\s?$|\s{1,}\w+)+
This means it would block the whitespace and the special characters at the beginning of the regex however special characters inbetween words would be ignored. for that i would replace the \s with \W for non-word characters. This would allow spaces and special characters in between the words.
\w+(\W?$|\W{1,}\w+)+
A great site to test out regex and where I was able to confirm this was regex101.com it's a place you can test out the regex as you type it with detailed information that displays what your regex will do as you type it. You can also include sample text to see what your regex will find in the text. the above regex when given: " ! Test" only captured the Test and ignored both the ! and the spaces prior to Test.
I have a scenario where i want to match specific word and then match everything until i get another pattern. For example
ABC=145865865
Then anything comes in ways
and then
Date=11/11/2001
I have tried (.*?) but it only match that specific line in my scenario i have multiple lines of data in between.
How can i do this?
Closest guess to what I think you're looking for:
ABC=(\d+)[\s\S]*?Date=(\d\d/\d\d/\d{4})
This uses [\s\S] which means "either a whitespace character or not a whitespace character", which is equivalent to "any character". The . can also be set to match any character, but I tend to prefer [\s\S] because it does just that without having to set flags. You haven't specified the language you are using so I can't tell you how to set such a flag anyway (it's re.DOTALL in Python).
Multiple lines? If you mean you have newline characters (\n) in between then you need to set the DOTALL flag, as follows:
Pattern p = Pattern.compile(<your-regex-here>, Pattern.DOTALL)
The above will match new line characters between the two strings.
How can I match all characters including new line with a regex.
I am trying to match all characters between brackets "()". I don't want to activate Dot matches all.
I tried
\([.\n\r]*\)
But it doesn't work.
(.*\) This doesn't work if there is an new line between the brackets.
I have been using http://regexpal.com/ to test my regular expressions. Tell me if you know something better.
I'd usually use something like \([\S\s]*\) in this situation.
The [\S\s] will match any whitespace or non-whitespace character.
The first example doesn't work because inside a character class the dot is treated literally (Matches the . character instead of all characters).
\((.|[\n\r])*\)