Regular expressions for making uppercase accented letters

Regular expressions for making uppercase accented letters - regex

I need to make a replacement like this
from arouzière to AROUZIÈRE.
I use notepad++ 6.6.7 for this in the following manner:
search: (\p{L}*?)
replace: \U\1\E
Problem:
The result is AROUZIèRE.
As you can see the accented letter is not made UPPERCASE.
Do you know a workaround or even if this is possible via RegEx with notepad++?
Thanks a lot for any help.

Try the pattern
(\p{Ll})
Which find lowercase letters that have an uppercase variant.
Demo
Note that some unicode characters do not have an uppercase variant.

Related

Regex to match mixed lower alphanumeric strings

I'm working with a large document in Sublime Text 3, whose Find and Replace feature takes regex. Each string in the document is separated by a line break. I need a regex that will match strings made up of lowercase alphanumeric characters mixed in any order, such as the following:
aa0555aaaaf
593dm03ks03
19204f02040
After looking into regex, the best I've been able to come up with so far is the below:
^[a-z][0-9]{11,}$\n
...although this only seems to match strings that start with letters and end in numbers, and for some reason doesn't seem to be case-sensitive either:
aa09304030
AA00450354

Try this one.
^[a-z0-9]{11,}$\n
Updated:
Remember to enable "case sensitive"
Updated:
Thanks #Wiktor Stribiżew about the inline modifier of "case insenstive mode"
(?-i)^[a-z0-9]{11,}$\R?

How to select all letters in regex in Vala

I am using regexes in my function. And I need to wrap all my hashtags in string in tags. But I can't figure out how to get all characters that are letters, [a-zA-Z] doesn't do exactly what I need, because people can use not English language and this regex won't work as expected.
Currently this is what I'm doing now, but this doesn't work as it should:
Regex hashtagRegex = new Regex("(#[a-zA-Z0-9_]+)");
How can I do what I need?

Use \p{L} to match any kind of letter from any language.
Regex hashtagRegex = new Regex("#([\\p{L}_]+)");

Vim regex match multiple underscores

I'm trying to add some syntax colouring in vim for constants written in the standard uppercase form:
HELLO_WORLD
_GOOD_BYE_WORLD
when I go to http://regex101.com/ I am able to match these with the following:
/(_*[A-Z]+_*)+
but with vim it doesn't match anything.
/_ will match a single underscore but /_* will not match multiple underscores, it matches every character. After reading some of the vim regex documentation (http://vimdoc.sourceforge.net/htmldoc/pattern.html) it seems as though the underscore is used for extending matches across lines. However, all of the patterns listed in the documentation use \_ (an escaped underscore) as opposed to just the character.
How can I match words of this form?
And why does _* match every character?

I think \<[_A-Z]\+\> will do what you want.
Accepted answer is matching underscores and capital letters contained in lowercase words.

Vim has slightly different regex format, some key characters needs excaping, like + and (), here's your same regex formatted for vim
\(_*[A-Z]\+_*\)\+
For more info you can visit http://vimregex.com/

You can also use vim's magic option \v
/\v(_*[A-Z]+_*)+
http://vim.wikia.com/wiki/Simplifying_regular_expressions_using_magic_and_no-magic

Regex to convert words in TitleCase

I use this regex to convert words in TitleCase and confirm each substitution:
:s/\%V\<\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc
However this matches also the words who are already in Titlecase.
Does anyone know how to change the above regex in order to jump over words who are already in TitleCase?

:s/\%V\<\([a-z0-9àäâæèéëêìòöôœùüûç]\)\([A-Za-z0-9àäâæèéëêìòöôœùüûçÀÄÂÆßÈÉËÊÌÖÔŒÙÜÛ]*\)\>/\u\1\L\2/gc
seems to do the trick, here.
Because you have explicitely included uppercase characters in the range you use in the first letter capture group, your pattern is going to match both foo and Foo. Removing the uppercase characters from that range seems to resolve your immediate problem.

To match only non-titlecase words, you want to match those that start either (a) with a lowercase letter or (b) with two uppercase letters. The following will do it (add accented letters and digits to taste):
\b([A-Z])([A-Z][A-Za-z]*)|\b([a-z])([a-zA-Z]+)
But some words match at groups \1 and \2, others at \3 and \4. I don't use vim so I can't say if it'll let you substitute with this kind of pattern. (E.g., \u\1\3\L\2\4; only two of the four will ever be non-empty)

How to match Cyrillic characters with a regular expression

How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have
[A-Za-z]

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with:
[\p{IsCyrillic}] or [\p{Cyrillic}]
Otherwise try using:
[U+0400–U+04FF]
For PHP use:
[\x{0400}-\x{04FF}]
Explanation:
[\p{IsCyrillic}]
Match a character from the Unicode block "Cyrillic" (U+0400–U+04FF) «[\p{IsCyrillic}]»
Note:
Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

To match only Russian Cyrillic characters use:
[\u0401\u0451\u0410-\u044f]
which is the equivalent of:
[ЁёА-я]
where А is Cyrillic, not Latin. (Despite looking the same they have different codes)
\p{IsCyrillic}, \p{Cyrillic}, [\u0400-\u04FF] which others suggested will match all variants of Cyrillic, not only Russian

If you use modern PHP version - just:
preg_match("/^[\p{L}]+$/u");
Don't forget the u flag for unicode support!

Regex to match cyrillic alphabets with normal(english) alphabets :
^[A-Za-z.!#?#"$%&:;() *\+,\/;\-=[\\\]\^_{|}<>\u0400-\u04FF]*$
It matches special chars,cyrillic alphabets,english alphabets.

Various regex dialects use [:alpha:] for any alphanumeric character in the current locale. (You may need to put that in a character class, e.g. [[:alpha:]].)

this worked for me
[a-z\u0400-\u04FF]

If you use Elixir:
String.match?(string, ~r/^\p{Cyrillic}*$/u)
You need to add the u flag for unicode support.

You can use the first and the last letter. For example in Bulgarian:
[А-я]+

For modern PHP (source):
$string = 'тест тест Тест Обязателльно Stackoverflow >!<';
var_dump(preg_replace('/[\x{0410}-\x{042F}]+.*[\x{0410}-\x{042F}]+/iu', '', $string));

In Java to match Cyrillic letters and space use the following pattern
^[\p{InCyrillic}\s]+$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expressions for making uppercase accented letters - regex

Try the pattern (\p{Ll}) Which find lowercase letters that have an uppercase variant. Demo Note that some unicode characters do not have an uppercase variant.

Related

Regex to match mixed lower alphanumeric strings

How to select all letters in regex in Vala

Vim regex match multiple underscores

Regex to convert words in TitleCase

How to match Cyrillic characters with a regular expression

Categories

Resources