Escaped Periods In R Regular Expressions - regex

Unless I am missing something, this regex seems pretty straightforward:
grepl("Processor\.[0-9]+\..*Processor\.Time", names(web02))
However, it doesn't like the escaped periods, \. for which my intent is to be a literal period:
Error: '\.' is an unrecognized escape in character string starting "Processor\."
What am I misunderstanding about this regex syntax?

My R-Fu is weak to the point of being non-existent but I think I know what's up.
The string handling part of the R processor has to peek inside the strings to convert \n and related escape sequences into their character equivalents. R doesn't know what \. means so it complains. You want to get the escaped dot down into the regex engine so you need to get a single \ past the string mangler. The usual way of doing that sort of thing is to escape the escape:
grepl("Processor\\.[0-9]+\\..*Processor\\.Time", names(web02))
Embedding one language (regular expressions) inside another language (R) is usually a bit messy and more so when both languages use the same escaping syntax.

Instead of
\.
Try
\\.
You need to escape the backspace first.

The R-centric way of doing this is using the [::] notation, for example:
grepl("[:.:]", ".")
# [1] TRUE
grepl("[:.:]", "a")
# [1] FALSE
From the docs (?regex):
The metacharacters in extended regular expressions are . \ | ( ) [ { ^ $ * + ?, but note that whether these have a special meaning depends on the context.
[:punct:]
Punctuation characters:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~.

Related

MariaDB regex with Concat only catches some results

I have a table of VARCHAR column and I need to check it for a value:
Typical data values are:
Brize Norton (501, 622, 2624, 4624, 4626)
Wyton (7006, 7010, 7630)
Waddington (2503, 7006)
Honington (2623)
Marham (2620, 7010)
Leeming (607 & 609)
The only part I need to check is that it contains the full number only. I can not check just the number because LIKE '%607%' will also incorrectly match 6070 or 2607, so I check the number and a variation of wrappers as so:
I have this query:
SELECT id, name FROM aux WHERE aux.name REGEXP CONCAT('[(,\h]',:num, '[),\h]')
this is intended to catch any (( or, or <whitespace>, a variable number value , ) or , or <whitespace> ) in a VARCHAR column.
This works on some numbers but not on others;
An example :
:num = 2620
SELECT id, name FROM aux WHERE aux.name REGEXP CONCAT('[(,\h]',:num, '[),\h]')
Result:
"Marham (2620, 7010)"
but fails on other numbers:
:num = 7010
SELECT id, name FROM aux WHERE aux.name REGEXP CONCAT('[(,\h]',:num, '[),\h]')
Result:
(Nothing)
How can I tell the REGEXP to catch the data shaped as above ( or, or <whitespace>, a variable number value , ) or , or <whitespace>.
I have tried EXPLAIN on my query but that doesn't help me at see the REGEXP mechanism.
I have replaced \h with \s but this doesn't make a difference.
Essentially, it's a matter of versions, as until version 10.0.5 MariaDB used the POSIX 1003.2 compliant regular expression library. This library didn't support \h, \d etc. character classes, using their POSIX variants - [:alpha:], [:digit:] and so on.
In your case, however, it seems you might just replace \s or \h with a single whitespace in that character class:
REGEXP CONCAT('[(, ]', :num, '[), ]')
Your MariabDB does not support PCRE regex syntax, so only POSIX-compliant regex can be used. Neither \h nor \s are POSIX compliant, in POSIX world, \h is "equivalent" to [:blank:] and \s to [:space:].
More POSIX/PCRE character class equivalent patterns:
POSIX Character Class
PCRE
Description
[:alnum:]
[:alnum:] / [\p{L}\p{N}]
Alphanumeric
[:alpha:]
\p{L}
Alphabetic
[:blank:]
\h
Whitespace
[:cntrl:]
[:cntrl:] / \p{Cc} / \p{C}
Control characters
[:digit:]
\d
Digits
[:graph:]
[:graph:]
Graphic characters
[:lower:]
\p{Ll}
Lowercase alphabetic
[:print:]
[:print:]
Graphic or space characters
[:punct:]
[\p{P}\p{S}]
Punctuation
[:space:]
\s
Space, tab, newline, and carriage return
[:upper:]
\p{Lu}
Uppercase alphabetic
[:xdigit:]
[:xdigit:] / [A-Fa-f0-9]
Hexadecimal digit
You can use
REGEXP CONCAT('[(,[:blank:]]', :num, '[),[:blank:]]')
REGEXP CONCAT('[(,[:space:]]', :num, '[),[:space:]]')
If you simply want to enforce numeric boundaries use
CONCAT('([^0-9]|^)',:num, '([^0-9]|$)')
CONCAT('([^[:digit:]]|^)',:num, '([^[:digit:]]|$)')
The regex details:
[(,[:space:]] - a (, , or any whitespace char
[),[:space:]] - a ), , or any whitespace char
[(,[:blank:]] - a (, , or a horizontal whitespace char
[),[:blank:]] - a ), , or a horizontal whitespace char
([^0-9]|^) / ([^[:digit:]]|^) - any non-digit char or start of string
([^0-9]|$) / ([^[:digit:]]|$) - any non-digit char or end of string.

using regular expression in list.files of R function

I want to use list.files of R to list files containing this pattern "un[a digit]" such as filename_un1.txt, filename_un2.txt etc... Here is the general code:
list_files <- list.files(path="my_file_path", recursive = TRUE, pattern = "here I need help", full.names = TRUE)
I have tried putting un\d in the pattern input but does not work.
You should bear in mind that in R, strings allow using escape sequences. However, the regex engine needs a literal \ to pass shorthand character classes (like \d for digits) or to escape special chars (like \\. to match a literal dot.)
So, you need
pattern = "_un\\d+\\.txt$"
where
_un - matches a literal substring _un
\\d+ - matches 1 or more digits (as + is a one or more quantifier)
\\. - matches a literal dot
txt - matches a literal sequence of characters txt
$ - end of string.
list_files <- list.files(path="my_file_path", recursive = TRUE, pattern = "un[0-9]", full.names = TRUE)

trying rereplace and replace to modify some data

I am suffering from regex illness, i am taking medicines but nothing happening, now i am stuck again with this issue
<cfset Change = replacenocase(mytext,'switch(cSelected) {',' var x = 0;while(x < cSelected.length){switch(cSelected[x]) {','one')>
this did not changed anything
i tried Rereplace too
<cfset Change = rereplacenocase(mytext,'[switch(cSelected) {]+',' var x = 0;while(x < cSelected.length){switch(cSelected[x]) {','one')>
this created weird results
Parentheses, square brackets, and curly brackets are special characters in any implementation of RegEx. Wrapping something in [square brackets] means any of the characters within so [fifty] would match any of f,i,t,y. The plus sign after it just means to match any of these characters as many times as possible. So yes [switch(cSelected) {]+ would replace switch(cSelected) {, but it would also replace any occurrence of switch, or s, or w, or the words this or twitch() because each character in these is represented in your character class.
As a regex, you would instead want (switch\(cSelected\) \{) (the + isn't useful here, and we have to escape the parentheses that we want literally represented. It is also a good idea to escape curly braces because they have special meaning in parts of regex and I believe that when you're new to regex, there's no such thing as over-escaping.
(switch # Opens Capture Group
# Literal switch
\(cSelected # Literal (
# Literal cSelected
\) # Literal )
# single space
\{ # Literal {
) # Closes Capture Group
You can also try something like (switch\(cSelected\)\s*\{), using the token \s* to represent any number of whitespace characters.
(switch # Opens CG1
# Literal switch
\(cSelected # Literal (
# Literal cSelected
\) # Literal )
\s* # Token: \s for white space
# * repeats zero or more times
\{ # Literal {
) # Closes CG1
What's needed, and the reason people can't be of much assistance is an excerpt from what you're trying to modify and more lines of code.
Potential reasons that the non-regex ReplaceNoCase() isn't working is either that it can't make the match it needs, which could be a whitespace issue, or it could be that you have two variables setting Change to an action based on the mytext variable..

NetBIOS Name Regular Expression

I have a question
according to this link http://support.microsoft.com/kb/188997
(
A computer name can be up to 15 alphanumeric characters with no blank spaces. The name must be unique on the network and can contain the following special characters:
! # # $ % ^ & ( ) - _ ' { } . ~
The Following characters are not allowed:
\ * + = | : ; " ? < > ,
)
and I am developing in C++
so i used the following code but when i input character which isn't allowed.. it is matched ! why ?
regex rgx("[a-zA-Z0-9]*(!|#|#|$|%|^|&|\(|\)|-|_|'|.|~|\\{|\\})*[a-zA-Z0-9]*");
string name;
cin>>name;
if (regex_match(name, rgx))
{
cout << " Matched :) " << endl;
}
else
cout << "Not Matched :(" << endl;
your help will be greatly appreciated :)
Your regular expression will match any string, because all your quantifiers are "none or more characters" (*) and since you're not looking for start and end of the string, you'll match even empty strings. Also you're using an unescaped ^ within one pair of brackets ((...|^|...), which will never match, unless this position is the beginning of a string (which may happen due to the *quantifier as explained above).
It's a lot more easier to achieve what you're trying to though:
regex rgx("^[\\w!##$%^()\\-'{}\\.~]{1,15}$");
If you're using C++11, you might as well use a raw string for better readability:
regex rgx(R"(^[\w!##$%^()\-'{}\.~]{1,15}$)");
This should match all valid names containing at least one (and up to) 15 of the selected characters.
\w matches any "word" character, that is A-Z, a-z, digits, and underscores (and based on your locale and regex engine possibly also umlauts and accented characters). Due to this it might be better to actually replace it with A-Za-z\d_ in the above expression:
regex rgx("^[A-Za-z\\d_!##$%^()\\-'{}\\.~]{1,15}$");
Or:
regex rgx(R"(^[A-Za-z\d_!##$%^()\-'{}\.~]{1,15}$)");
{a,b} is a quantifier matching the previous expresssion between a and b times (inclusive).
^ and $ will force the regular expression to fill the whole string (since they'll match beginning and end).
Look here: http://www.cplusplus.com/reference/regex/ECMAScript/ . There you have something about special characters (with a special meaning for a regex).
For example, ^ has a special meaning in a regex, so you must escape it: \^. Other special characters are: $ \ . * + ? ( ) [ ] { } |.
Also, I thing your regex will not allow names like a-b-c (multiple parts of special characters, or more than two parts of alphanumerical characters).

Matching two single quotes or double quote

I have the following strings. It is LatLongs in degrees, minutes and seconds format,
and can be entered as follows:
Option1: 25º 23" 40.6' or
Option2: 25º 23'' 40.6' or
Option3: 25 23 40.6
With one regx i would like to match both strings, the problem for me is matching the "(double quote) AND ' '(two single quotes).
I have the following so far.
^[+|-]?[0-9]{1,2}[\º| ][ ]?[0-9]{1,2}[\"|'{2}| ]
I am building and testing the regx in the terminal on lunix (Ubuntu). From the output i get in the terminal its matches the "(double quote) but only ONE of the ' '(two single quotes).
How can i change the regx to match the "(double quote) and ' '(two single quotes), in one expression?
Thanks in advance.
Check out this pattern:
([+-]?\d{1,2}(?:\.\d{1,2})?.)\s*(\d{1,2}(?:\.\d{1,2})?[\S]*)\s*(\d{1,2}(?:\.\d{1,2})?'?)
It is independent of any special character including support of up-to 2 digits, along with the resolution of your issue.
Your regex has problems. For example, [\"|'{2}| ] matches a single ", |, ', {, 2, } or . Try the following:
^([+-]?\d+)º? ?\b(\d+)\b(?:''|\")? ?([\d.]+)'?$
Explanation:
^ # Start of string
([+-]?\d+) # Match an integer
º?[ ]? # Match a degree and/or a space (both optional)
\b(\d+)\b # Match a positive integer (entire number)
(?:''|\")?[ ]? # Match quotes and/or space (all optional)
([\d.]+) # Match a floating point number
'? # Match an optional single quote
$ # End of string
I think what you really want to have with the Regex above is
^[+|-]?[0-9]{1,2}º? ?[0-9]{1,2}(\"|'{2})? ?[0-9]{1,2}\.[0-9]'?
Although this also matches weird things like
25 23'' 40.6
Your Regex uses custom character classes (the sections in [ and ]) which only can match one single character. You can group together multiple characters by ( and ) and make these groups optional with a ?.