Regex for ASCII printable characters excluding some special characters - regex

Can someone please help me with the regex for english characters, numbers and excluding few special characters?
The regex should be between ASCII>=32 and <127 and must not include special characters like
` ~ ! $ % ^ & * ( ) + = [ ] { } < > ? ; : \ |.
I created a simple regex for string only (^\p{L}+$) but how do I include all the characters and numbers but avoid these special ones listed above and others outside that ASCII code?
Thank You

Instead of allowing "everything except x", you should go for a whitelist since you have a defined set of characters that you want to allow.
^[0-9a-zA-Z'"#,\-/_ .#]+$
Please review the regex, I might have missed some special characters. It should give you the right idea!

Consider the following Regex...
^[\x20-\x7F]+$
Good Luck!

Related

Regex to evaluate phone number in Pentaho [duplicate]

I wanted to remove the special characters like ! # # $ % ^ * _ = + | \ } { [ ] : ; < > ? / in a string field.
I used the "Replace in String" step and enabled the use RegEx. However, I do not know the right syntax that I will put in "Search" to remove all these characters from the string. If I only put one character in the "Search" it was removed from the string. How can I remove all of these??
This is the picture of how I did it:
As per documentation, the regex flavor is Java. You may use
\p{Punct}
See the Java regex syntax reference:
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~

Remove special characters using Pentaho - Replace in String

I wanted to remove the special characters like ! # # $ % ^ * _ = + | \ } { [ ] : ; < > ? / in a string field.
I used the "Replace in String" step and enabled the use RegEx. However, I do not know the right syntax that I will put in "Search" to remove all these characters from the string. If I only put one character in the "Search" it was removed from the string. How can I remove all of these??
This is the picture of how I did it:
As per documentation, the regex flavor is Java. You may use
\p{Punct}
See the Java regex syntax reference:
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~

split text into words and exclude hyphens

I want to split a text into it's single words using regular expressions. The obvious solution would be to use the regex \\b unfortunately this one does split words also on the hyphen.
So I am searching an expression doing exactly the same as the \\b but does not split on hyphens.
Thanks for your help.
Example:
String s = "This is my text! It uses some odd words like user-generated and need therefore a special regex.";
String [] b = s.split("\\b+");
for (int i = 0; i < b.length; i++){
System.out.println(b[i]);
}
Output:
This
is
my
text
!
It
uses
some
odd
words
like
user
-
generated
and
need
therefore
a
special
regex
.
Expected output:
...
like
user-generated
and
....
#Matmarbon solution is already quite close, but not 100% fitting it gives me
...
like
user-
generated
and
....
This should do the trick, even if lookaheads are not available:
[^\w\-]+
Also not you but somebody who needs this for another purpose (i.e. inserting something) this is more of an equivalent to the \b-solutions:
([^\w\-]|$|^)+
because:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
--- http://www.regular-expressions.info/wordboundaries.html
You can use this:
(?<!-)\\b(?!-)

RegEx for including alphanumeric and special characters

I have requirement to allow alphanumeric and certain other characters for a field. I am using this regular expression:
"^[a-zA-Z0-9!##$&()-`.+,/\"]*$".
The allowed special characters are! # # $ & ( ) - ‘ . / + , “
But when I test the pattern with a string "test_for_extended_alphanumeric" , the string passes the test. I don't have "_" allowed in the pattern. What am I doing wrong?
You need to escape the hyphen:
"^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$"
If you don't escape it then it means a range of characters, like a-z.
In your character class the )-' is interpreted as a range in the same way as e.g. a-z, it therefore refers to any character with a decimal ASCII code from 41 ) to 96 '.
Since _ has code 95, it is within the range and therefore allowed, as are <, =, > etc.
To avoid this you can either escape the -, i.e. \-, or put the - at either the start or end of the character class:
/^[a-zA-Z0-9!##$&()`.+,/"-]*$/
There is no need to escape the ", and note that because you are using the * quantifier, an empty string will also pass the test.
Using this regex you allow all alphanumeric and special characters. Here \w is allowing all digits and \s allowing space
[><?#+'`~^%&\*\[\]\{\}.!#|\\\"$';,:;=/\(\),\-\w\s+]*
The allowed special characters are ! # # $ & ( ) - ‘ . / + , “ = { } [ ] ? / \ |
Hyphens in character classes denote a range unless they are escaped or at the start or end of the character class. If you want to include hyphens, it's typically a good idea to put them at the front so you don't even have to worry about escaping:
^[-a-zA-Z0-9!##$&()`.+,/\"]*$
By the way, _ does indeed fall between ) and the backtick in ASCII:
http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters
How about this.. which allows special characters and as well as alpha numeric
"[-~]*$"
Because I don't know how many special characters exist, it is difficult to check the string contains special character by white list. It may be more efficient to check the string contains only alphabet or numbers.
for kotlin example
fun String.hasOnlyAlphabetOrNumber(): Boolean {
val p = Pattern.compile("[^a-zA-Z0-9]")
return !(p.matcher(this).matches())
}
for swift4
func hasOnlyAlphabetOrNumber() -> Bool {
if self.isEmpty { return false }
do {
let pattern = "[^a-zA-Z0-9]"
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
return regex.matches(in: self, options: [], range: NSRange(location: 0, length: self.count)).count == 0
} catch {
return false
}
}
Regex sucks. Here is mine
/^[a-zA-Z\d-!##$%^&._"'()+,/;<>=|?[]\`~{}]$/
Mine is a little different than others but it is more self explanatory. You use \ in front of any special symbol like ] or . I had issues with -, , and ] so I had to put ], \, and move the - to the left. I also had issues with | but I moved it left and it fixed it.

Why doesn't this regex pattern work?

I'm trying to select commas without numbers of 4 digits or the word "id" before, I tried with this:
( ? < ! [ \ d { 5 } | id ] ) ,
The problem
for example, if input string is "1999," that comma is not selected, I don't understand why.
Try this pattern:
(?<!\d{5}|id),
Your pattern, (?<![\d{5}|id]), is looking for a comma that is not after a digit, {, }, |, i, or d - They should not be in a charterer class: []. If anything, (?<![\d]{5}|id), will also work, but is redundant.
First of all, unless you're using the /x flag, each space will attempt to match a space. So take those out.
Second, you're using [...] presumably to group an alternation (|) but square brackets actually indicate a character class, i.e. [\d{5}|id] is equivalent to [id5{}|] and matches any one of those characters, but not more. What you mean is this:
(?<!\d{5}|id),
The final problem might be that many implementations of regex (you haven't specified which you're using) don't support variable-width lookbehind assertions. So, you may need to do something like:
(?<!\d{5}|...id),