I am using regexes in my function. And I need to wrap all my hashtags in string in tags. But I can't figure out how to get all characters that are letters, [a-zA-Z] doesn't do exactly what I need, because people can use not English language and this regex won't work as expected.
Currently this is what I'm doing now, but this doesn't work as it should:
Regex hashtagRegex = new Regex("(#[a-zA-Z0-9_]+)");
How can I do what I need?
Use \p{L} to match any kind of letter from any language.
Regex hashtagRegex = new Regex("#([\\p{L}_]+)");
Related
Here's what I want:
a regex expression that can match the word "hello" in any font and case. So it would match Hello, heLLo, HelLo, 🄷🄴🄻🄻🄾, 𝐇𝐞𝐥𝐥𝐨, etc.
This is for a Discord bot that is written in Javascript, but I enter the regex via commands on Discord.
I feel like I'm close with Unicode, but can't quite find an answer. I'm very new to this, but here are the resources I've already explored:
Regular expression to match non-ASCII characters
Regex any ASCII character
https://www.regular-expressions.info/unicode.html
I've used [\u00BF-\u1FFF\u2C00-\uD7FF\w]{h} but it doesn't match the strange fonts.
This matches all of your samples.
/([^\x00-\x7F]?\w?)/"ug
I'm working with a large document in Sublime Text 3, whose Find and Replace feature takes regex. Each string in the document is separated by a line break. I need a regex that will match strings made up of lowercase alphanumeric characters mixed in any order, such as the following:
aa0555aaaaf
593dm03ks03
19204f02040
After looking into regex, the best I've been able to come up with so far is the below:
^[a-z][0-9]{11,}$\n
...although this only seems to match strings that start with letters and end in numbers, and for some reason doesn't seem to be case-sensitive either:
aa09304030
AA00450354
Try this one.
^[a-z0-9]{11,}$\n
Updated:
Remember to enable "case sensitive"
Updated:
Thanks #Wiktor Stribiżew about the inline modifier of "case insenstive mode"
(?-i)^[a-z0-9]{11,}$\R?
Let's say I have the text a123456. I want a string of b123456 to match. So essentially, 'match if all characters are the same except for the first character'. Am I asking for the impossible with regex?
Use the dot (.) to match any character. So, a possible Regex would be:
/^.123456$/
If you want to use zero length assertion with regex, you can have lookbehind approach in following way :
(?<=\w)your_value$ // your_value should be text which you want to check
I think you can figure it out on your own. This ain't tough, just needs some understanding between you and Regex. Why don't you go through the following links and try to make a regex on your own.
https://www.talentcookie.com/2015/07/regular-expressions/
https://www.talentcookie.com/2015/07/lets-practice-regular-expression/
https://www.talentcookie.com/2016/01/some-useful-regular-expression-terminologies/
I need to make a replacement like this
from arouzière to AROUZIÈRE.
I use notepad++ 6.6.7 for this in the following manner:
search: (\p{L}*?)
replace: \U\1\E
Problem:
The result is AROUZIèRE.
As you can see the accented letter is not made UPPERCASE.
Do you know a workaround or even if this is possible via RegEx with notepad++?
Thanks a lot for any help.
Try the pattern
(\p{Ll})
Which find lowercase letters that have an uppercase variant.
Demo
Note that some unicode characters do not have an uppercase variant.
I had this problem today:
This regex matches only English: [a-zA-Z0-9].
If I need support for any language in this world, what regex should I write?
If you use character class shorthands and a Unicode aware regex engine you can do that. The \w class matches "word characters" (letters, digits, and underscores).
Beware of some regex flavors that don't do this so well: JavaScript uses ASCII for \d (digits) and \w, but Unicode for \s (whitespace). XML does it the other way around.
Alphabet/Letter: \p{L}
Number: \p{N}
So for alphnum match for all languages, you can use: [\p{L}\p{N}]+
I was looking for a way to replace all non-alphanum chars for all languages with a space in JS and ended up using the following way to do it:
const regexForNonAlphaNum = new RegExp(/[^\p{L}\p{N}]+/ug);
someText.replace(regexForNonAlphaNum, " ");
Here as it is JS, we need to add u at end to make the regex unicode aware and g stands for global as I wanted match all instances and not just a single instance.
References:
https://www.linkedin.com/pulse/regex-one-pattern-rule-them-all-find-bring-darkness-bind-carranza/?trackingId=U6tRte%2BzTAG6O4AA3CrFmA%3D%3D
https://www.regular-expressions.info/unicode.html
Regex supporting most languages
^[A-zÀ-Ÿ\d-]*$
The regex below is the only one worked for me:
"\\p{LD}+" ==> LD means any letter or digit.
If you want to clean your text from any non alphanumeric characters you can use the following:
text.replaceAll("\\P{LD}+", "");//Note P is capital.