I want Regular Expression to remove Arabic and english numbers
my varibale is
$variable="12121212ABDHSتشؤآئ۳۳۴۳۴729384234owiswoisw";
i want remove all digits ! LIKE:
ABDHSتشؤآئowiswoisw
I found the following expression but not work !
$newvariable = preg_replace('/^[\u0621-\u064A]+$', '', $variable);
thanks for you helps
You may use
$newvariable = preg_replace('/\d+/u', '', $variable);
See the regex demo
The \d matches ASCII digits by default, but when you add the u modifier, it enables the PCRE_UCP option (together with PCRE_UTF8) that enables \d to match all Unicode digits.
See PCRE documentation:
This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W,
\w, and some of the POSIX character classes. By default, only ASCII
characters are recognized, but if PCRE_UCP is set, Unicode properties
are used instead to classify characters.
You may fix your regex if you need to only restrict matching to ASCII and those of your choice:
preg_replace('/[0-9\u0621-\u064A]+/u', '', $variable)
Related
I want to match non printable character plus currency symbols the following matches non printable, how to add expectations for currency symbols?
$str = preg_replace('/[[:^print:]]/', '', $str);
The \p{Sc} pattern matches currency symbols, you just need to place it into the negated character class (or bracket expression in POSIX terminology).
Use
$re = '/(*UTF)[^[:print:]\p{Sc}]+/';
echo preg_replace($re, '', '£aA€');
See the regex demo and the PHP demo.
Details:
(*UTF) - a PCRE verb that makes PCRE engine treat the string as a Unicode string, not a byte string (note we cannot use /u modifier since it enables both the (*UTF) and (*UCP) verbs, the latter making all subpatterns Unicode aware and [^[:print:]] starts matching a lot more characters then)
[^[:print:]\p{Sc}]+ - matches any 1 or more symbols (due to the + quantifier) other than:
[:print:] - printable chars
\p{Sc} - currency symbols
The RegEx:
^([0-9\.]+)\Q|\E([^\Q|\E])\Q|\E
does not match the string:
1203730263.912|12.66.18.0|
Why?
From PHP docs,
\Q and \E can be used to ignore regexp metacharacters in the pattern.
For example:
\w+\Q.$.\E$ will match one or more word characters, followed by literals .$. and anchored at the end of the string.
And your regex should be,
^([0-9\.]+)\Q|\E([^\Q|\E]*)\Q|\E
OR
^([0-9\.]+)\Q|\E([^\Q|\E]+)\Q|\E
You forget to add + after [^\Q|\E]. Without +, it matches single character.
DEMO
Explanation:
^ Starting point.
([0-9\.]+) Captures digits or dot one or more times.
\Q|\E In PCRE, \Q and \E are referred to as Begin sequence. Which treats any character literally when it's included in that block. So | symbol in that block tells the regex engine to match a literal |.
([^\Q|\E]+) Captures any character not of | one or more times.
\Q|\E Matches a literal pipe symbol.
The accepted answer seems somewhat incorrect so I wanted to address this for future readers.
If you did not already know, using \Q and \E ensures that any character between \Q ... \E will be matched literally, not interpreted as a metacharacter by the regular expression engine.
First and most important, \Q and \E is NOT usable within a bracketed character class [].
[^\Q|\E] # Incorrect
[^|] # Correct
Secondly, you do not follow that class with a quantifier. Using this, the correct syntax would be:
^([0-9.]+)\Q|\E([^|]+)\Q|\E
Although, it is much simpler to write this out as:
^([0-9.]+)\|([^|]+)\|
${str?replace("\d+", "", "r")};
I wanted to use \d to remove numbers, but it didn't work!!!
But ${str?replace("[0-9]", "", "r")}; works!!!
So, I wanna know how to use regex like \d, \b, \w, etc?
You need to double the backslashes:
${str?replace("\\d+", "", "r")};
This is because string escaping rules are applied before regex escaping rules. So the string "\\d" is translated to the regex \d which then matches a digit.
If your string is "\d", the string processor translates it to a literal d (because \d is not a recognized string escape sequence, so it's ignored).
I have the following text:
üyü
The following regex search matches the characters ü:
/\W
Is there a unicode flag in Vim regex?
Unfortunately, there is no such flag (yet).
Some built-in character classes (can) include multi-byte characters,
others don't. The common \w \a \l \u classes only contain ASCII
letters, so even umlaut characters aren't included in them, leading to
unexpected behavior! See also https://unix.stackexchange.com/a/60600/18876.
In the 'isprint' option (and 'iskeyword', which determines what motions like w move over), multi-byte characters 256 and
above are always included, only extended ASCII characters up to 255 are specified with
this option.
I always use:
ASCII UTF-8
----- -----
\w [a-zA-Z\u0100-\uFFFF]
\W [^a-zA-Z\u0100-\uFFFF]
You can use \%uXXXX to match a multibyte character. In that case…
/\%u00fc
But I'm not aware of a flag that would make the whole matching multibyte-friendly.
Note that with the default value of iskeyword on UNIX systems, ü is matched by \k.
very often I find \S+ takes me where I want to go. i.e:
s/\(\S\+\)\s\+\(\S\+\).*/\1 | \2/ selects "wörd1 w€rd2 but not word3" and replaces the line with "wörd1 | w€rd2"
I have the following regular expression for eliminating spaces, tabs, and new lines: [^ \n\t]
However, I want to expand this for certain additional characters, such as > and <.
I tried [^ \n\t<>], which works well for now, but I want the expression to not match if the < or > is preceded by a \.
I tried [^ \n\t[^\\]<[^\\]>], but this did not work.
Can any one of the sequences below occur in your input?
\\>
\\\>
\\\\>
\blank
\tab
\newline
...
If so, how do you propose to treat them?
If not, then zero-width look-behind assertions will do the trick, provided that your regular expression engine supports it. This will be the case in any engine that supports Perl-style regular expressions (including Perl's, PHP, etc.):
(?<!\\)[ \n\t<>]
The above will match any un-escaped space, newline, tab or angled braces. More generically (using \s to denote any space characters, including \r):
(?<!\\)\s
Alternatively, using complementary notation without the need for a zero-width look-behind assertion (but arguably less efficiently):
(?:[^ \n\t<>]|\\[<>])
You may also use a variation of the latter to handle the \\>, \\\>, \\\\> etc. cases as well up to some finite number of preceding backslashes, such as:
(?:[^ \n\t<>]|(?:^|[^<>])[\\]{1,3,5,7,9}[<>])
According to the grep man page:
A bracket expression is a list of
characters enclosed by [ and ]. It
matches any single character in that
list; if the first character of the
list is the caret ^ then it matches
any character not in the list.
This means that you can't match a sequence of characters such as \< or \> only single characters.
Unless you have a version of grep built with Perl regex support then you can use lookarounds like one of the other posters mentioned. Not all versions of grep have this support though.
Maybe you can use egrep and put your pattern string inside quotes. This should obliterate the need for escaping.