What is the best way to remove non alphanumeric characters from a text file using notepad++?
I only want to keep numbers and letters, Is there a built in feature to help or should I go the regex route?
I am trying to use this to keep them as well as spaces [a-zA-Z0-9 ]. It is working but I need to do the opposite!
In a Replace dialog window (Ctrl+H), use a negated character class in the Find What field:
[^a-zA-Z0-9\s]+
Here, [^ starts a negated character class that matches any character other than the one that belongs to the character set(s)/range(s) defined in it. So, the whole matches 1 or more chars other than ASCII letters, digits, and any whitespace.
Or, to make the expression Unicode-aware,
[^[:alnum:][:space:]]+
Here, [:alnum:] matches all alphanumeric chars and [:space:] matches all whitespace.
Related
I want to remove non Ascii characters from my text and replace it with ''.
I have some invalid characters in my table that I'm trying to remove. But I ran into a problem with one of them.
Example:
123Abh¿½ï¿½ï¿½ï¿½ï¿½v streeÁÉÍÓt
Expected output:
123Abh street
As of now I'm using
regex_replace('123Abh¿½ï¿½ï¿½ï¿½ï¿½v streeÁÉÍÓt','[^[:print:]],'')
but this is not working, any suggestions?
You can use
regex_replace('123Abh¿½ï¿½ï¿½ï¿½ï¿½v streeÁÉÍÓt', '[^\\x{0000}-\\x7E]+', '')
Here,
[^ - start of a negated character class that matches any chars but
\x{0000}-\x7E - chars from NULL to ~ char in the ASCII table
]+ - end of the class, match one or more times.
What if I need to remove all special characters apart from spaces and hyphens? - In this case, you need to use
regex_replace('123Abh¿½ï¿½ï¿½ï¿½ï¿½v streeÁÉÍÓt', '[^\\w\\s-]|_', '')
Here, [^\w\s-]|_+ matches any one symbol other than letter, digit, _, whitespace and -, or an underscore (note \w matches underscores, thus it must be added via a |, an alternation operator).
I'm not pretty good with regex sot his is my problem.
I have a String who contains c#m#fc#fm# and I want to get all groups of characters with their # at the end.
Like this :
c#
m#
fc#
fm#
I have try some regex but I never get what I want.
Thanks a lot for your help.
You can use [^#]+# and find all matches, where match will start by capturing one or more characters using negated character class [^#]+ (any character except #) and at the end will match one #
Regex Demo
Also, in case you have space in your string which you don't want to include in matched texts, you can put \s also within the negated character class and use this regex,
[^#\s]+#
Regex Demo excluding space from matched tokens
I am trying to understand the purpose of - in this regex capture clause
(?P<slug>[\w-]+)
This is what I came up for when search for a dash
A dash (-) can be used to specify a range. So the dash is a
metacharacter, but only within a character class.If you want to use a
literal dash within a character class, you should escape it with a
backslash, except when the dash is the first or last character of the
character class. So, the regexp [a-z] is equal to [az-] and [-az],
they will match any of those three characters.
My questions is what is the - after \w
You are looking at what my former CS professor would refer to as a rabbit (out of a hat):
(?P<slug>[\w-]+)
The reason it is a rabbit is because normally your research is correct and dash is used as a part of a range of characters. But in this case, the dash is a literal dash, since it appears at the end of the character class.
So here [\w-]+ means to match one or more word characters or literal dashes.
If you want to include a literal dash in a character class, a safer way is to escape it:
[\w\-]+
Then, the dash may be placed anywhere in the class.
VBA Regular Expressions character groups do not support unicode character groups (e.g. {p(L}). Also \w matches only latin alphanumerics. So my problem was how to replace non alphanumeric characters from my unicode string without typing the whole characters' list in pattern field.
For example, trying to replace with underscore every non word character in "abc (for αβψ̌) and de (for δε)", with pattern \W results in "abc__for_______and_de__for____" instead of abc__for_αβψ___and_de__for_δε_
Finally I think there is at least one quick solution...
An approach is to find the unicode first and last character in range and use it as character range. With the pattern [^\w,\u0370-\u03FF\u1F00-\u1FFF] I can get rid of any non-latin or non-greek alphanumeric character.
Also we can use this pattern in the excel function RegExReplace
I want a regex which can match all numbers, letters, and all punctuation symbols as well (full stop, comma, question mark, exclamation mark, colon, etc.).
The string must be at least one character long, but can be any length above that.
Is it possible?
Try \\p{Graph}+ or \\p{Print}+
#Test
public void shouldMatch()
{
assertTrue("asdf123ASFD!##$%^&*()".matches("\\p{Graph}+"));
}
#Test
public void shouldMatchWithWhitespaces()
{
assertTrue("asdf 123 ASFD !##$%^&*()".matches("[\\p{Graph}\\s]+"));
}
You can get more infos here (Section: POSIX character classes (US-ASCII only)):
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
Start by looking at character classes
http://www.regular-expressions.info/charclass.html
An example:
[A-Za-z_0-9]*
Will match anything with standard letters in ascii plus the underscore.
You can add your desired punctuation to the set.
You can use \w to match any word characters, and depending on which regex implementation you use it may match unicode characters too.
Another approach is to decide what you DON'T want to match. If you want to match a string of characters that are not whitespace you could use
\S*
If I understood well, it should be easy. Please try:
([^\s]+)
This regex match one or more occurrences of any characters but not a space.
This is the easiest way to match (and reuse) any string. Maybe you already know what's parenthesis means in regular expressions. They are used for backreference, I.e. to reuse later the matched string.