Which characters must be escaped in a Perl regex pattern

Which characters must be escaped in a Perl regex pattern - regex

Im trying to find files that are looking like this:
access_log-20160101
access_log-20160304
...
with perl regex i came up with something like this:
/^access_log-\d{8}$/
But im not sure about the "_" and the "-". are these metacharacter?
What is the expression for this?
i read that "_" in regex is something like \w, but how do i use them in my exypression?
/^access\wlog-\d{8}$/ ?

Underscore (_) is not a metacharacter and does not need to be quoted (though it won't change anything if you quote it).
Hyphen (-) IS a metacharacter that defines the range between two symbols inside a bracketed character class. However, in this particular position, it will be interpreted verbatim and doesn't need quoting since it is not inside [] with a symbol on both sides.
You can use your regexp as is; hyphens (-) might need quoting if your format changes in future.

Your regex pattern is exactly right
Neither underscore _ nor hyphen - need to be escaped. Outside a square-bracketed character class, the twelve Perl regex metacharacters are
Brackets ( ) [ {
Quantifiers * + ?
Anchors ^ $
Alternator |
Wild character .
The escape itself \
and only these must be escaped
If the pattern of your file names doesn't vary from what you have shown then the pattern that you are using
^access_log-\d{8}$
is correct, unless you need to validate the date string
Within a character class like [A-F] you must escape the hyphen if you want it to be interpreted literally. As it stands, that class is the equivalent to [ABCDEF]. If you mean just the three characters A, - or F then [A\-F] will do what you want, but it is usual to put the hyphen at the start or end of the class list to make it unambiguous. [-AF] and [AF-] are the same as [A\-F] and rather more readable

Related

Regular expression in Snowflake - starts with string and ends with digits

I am struggling with writing regex expression in Snowflake.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123','^DEM.*\d\d$') AS regex
I would like to find all strings that starts with "DEM" and ends with two digits. Unfortunately the expression that I am using returns FALSE.
I was checking this expression in two regex generators and it worked.

In snowflake the backslash character \ is an escape character.
Reference: Escape Characters and Caveats
So you need to use 2 backslashes in a regex to express 1.
SELECT
'DEM7BZB01-123' AS SKU,
RLIKE('DEM7BZB01-123', '^DEM.*\\d\\d$') AS regex
Or you could write the regex pattern in such a way that the backslash isn't used.
For example, the pattern ^DEM.*[0-9]{2}$ matches the same as the pattern ^DEM.*\d\d$.

You need to escape your backslashes in your SQL before it can be parsed as a regex string. (sometimes it gets a bit silly with the number of backslashes needed)
Your example should look like this
RLIKE('DEM7BZB01-123','^DEM.*\\d\\d$') AS regex

RLIKE (which is an alias in Snowflake for the SQL Standard REGEXP_LIKE function) implicitly adds ^ and $ to your search pattern...
The function implicitly anchors a pattern at both ends (i.e. '' automatically becomes '^$', and 'ABC' automatically becomes '^ABC$').
so you can remove them, and that then allows you to use $$ quoting
In single-quoted string constants, you must escape the backslash character in the backslash-sequence. For example, to specify \d, use \d. For details, see Specifying Regular Expressions in Single-Quoted String Constants (in this topic).
You do not need to escape backslashes if you are delimiting the string with pairs of dollar signs ($$) (rather than single quotes).
so you can simply use the regex DEM.*\d\d to find all strings that starts with DEM and ends with two digits without extra escaping as follows
SELECT
'DEM7BZB01-123' AS SKU
, RLIKE('DEM7BZB01-123', $$DEM.*\d\d$$) AS regex
which gives
SKU |REGEX|
-------------+-----+
DEM7BZB01-123|true |

How to escape double quote in parsley data-parsley-pattern?

I am using parsley for javascript validation. My current regex pattern is
data-parsley-pattern="/^[0-9a-zA-Z\!\#\#\$\%\^\&\*\(\)\-\_\+\?\'\.\,\/\\r\n ]+$/"
How to add double quote in my pattern. I have added \" to pattern
data-parsley-pattern="/^[0-9a-zA-Z\!\#\#\$\%\^\&\*\(\)\-\_\+\?\'\"\.\,\/\\r\n ]+$/"
But it is not working.

Note that you overescaped the pattern, almost all the chars you escaped are not special in a character class.
Next, you may shorten the code if you use a string pattern. See Parseley docs:
data-parsley-pattern="\d+"
Note that patterns are anchored, i.e. must match the whole string.
Parsley deviates from the standard for patterns looking like /pattern/{flag}; these are interpreted as literal regexp and are not anchored.
That means you do not need ^ and $ if you define the pattern without regex delimiters, /.
As for the quotation marks, you may use a common \xXX notation.
You may use
data-parsley-pattern="[0-9a-zA-Z!##$%^&*()_+?\x27\x22.,/\r\n` -]+"
or
data-parsley-pattern="/^[0-9a-zA-Z!##$%^&*()_+?\x27\x22.,/\r\n` -]+/$"
where \x27 is ' and \x22 is ".
Note that - at the end of the character class is a safe placement for a literal hyphen where you do not have to escape it.

RegEx with Pipes and IPs not working

The RegEx:
^([0-9\.]+)\Q|\E([^\Q|\E])\Q|\E
does not match the string:
1203730263.912|12.66.18.0|
Why?

From PHP docs,
\Q and \E can be used to ignore regexp metacharacters in the pattern.
For example:
\w+\Q.$.\E$ will match one or more word characters, followed by literals .$. and anchored at the end of the string.
And your regex should be,
^([0-9\.]+)\Q|\E([^\Q|\E]*)\Q|\E
OR
^([0-9\.]+)\Q|\E([^\Q|\E]+)\Q|\E
You forget to add + after [^\Q|\E]. Without +, it matches single character.
DEMO
Explanation:
^ Starting point.
([0-9\.]+) Captures digits or dot one or more times.
\Q|\E In PCRE, \Q and \E are referred to as Begin sequence. Which treats any character literally when it's included in that block. So | symbol in that block tells the regex engine to match a literal |.
([^\Q|\E]+) Captures any character not of | one or more times.
\Q|\E Matches a literal pipe symbol.

The accepted answer seems somewhat incorrect so I wanted to address this for future readers.
If you did not already know, using \Q and \E ensures that any character between \Q ... \E will be matched literally, not interpreted as a metacharacter by the regular expression engine.
First and most important, \Q and \E is NOT usable within a bracketed character class [].
[^\Q|\E] # Incorrect
[^|] # Correct
Secondly, you do not follow that class with a quantifier. Using this, the correct syntax would be:
^([0-9.]+)\Q|\E([^|]+)\Q|\E
Although, it is much simpler to write this out as:
^([0-9.]+)\|([^|]+)\|

Regex Check Whether a string contains characters other than specified

How to check whether a string contains character other than:
Alphabets(Lowe-Case/Upper-Case)
digits
Space
Comma(,)
Period (.)
Bracket ( )
&
'
$
+(plus) minus(-) (*) (=) arithmetic operator
/
using regular expression in ColdFusion?
I want to make sure a string doesn't contain even single character other than the specified.

You can find if there are any invalid characters like this:
<cfif refind( "[^a-zA-Z0-9 ,.&'$()\-+*=/]" , Input ) >
<!--- invalid character found --->
</cfif>
Where the [...] is a character class (match any single char from within), and the ^ at the start means "NOT" - i.e. if it finds anything that is not an accepted char, it returns true.
I don't understand "Small Bracket(opening closing)", but guess you mean < and > there? If you want () or {} just swap them over. For [] you need to escape them as \[\]
Character Class Escaping
Inside a character class, only a handful of characters need escaping with a backslash, these are:
\ - if you want a literal backslash, escape it.
^ - a caret must be escaped if it's the first character, otherwise it negates the class.
- - a dash creates a range. It must be escaped unless first/last (but recommended always to be)
[ and ] - both brackets should be escaped.

ColdFusion uses Java's engine to parse regular expressions, anyway to make sure a string doesn't contain one of the characters you mentioned then try:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$']).*$
The above expression would only work if you are parsing the file line by line. If you want to apply this to text which contains multiple lines then you need to use the global modifier and the multi-line modifier and change the expression a bit like this:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$'\r\n]).*$
Regex101 Demo

The regular expression:
[^][a-zA-Z0-9 ,.&'$]
will match if the string contains any characters other than the ones in your list.

Visual Studio Find and Replace with regex and single/double quotes

How to use VS Find/Replace to replace:
this: $('a[name="lnkFind"]').on('click', function
with this: $(document).on("click", "a[name='lnkFind']", function
I'm not sure which characters need to be escaped - single or double quotes or both? None of the patters I've tried seem to find a match.

You'll need to escape many of these characters.
Find/Replace will complain about the un-escaped ( and ), even the bare ( at the end because it's missing a matching ). Also the square brackets, which are used for character sets, and finally the $.
So this should work as the pattern:
\$\('a\[name="lnkFind"\]'\).on\('click', function

You should look at a list of special characters in Regular Expressions.
$, ., [, ] should all be escaped.
http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html

Except in special cases (such as vim regex), in general you can escape any and all special characters in regex to get their literal form, i.e. escaping a special character that doesn't need to be escaped, won't do any harm.
That said, here's the minimum that needs to be escaped:
\$\('a\[name="lnkFind"]')\.on\('click', function
I don't think you'll need to escape anything in the replacement, because only a $ or \ followed by a number will be interpreted.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Which characters must be escaped in a Perl regex pattern - regex

Related

Regular expression in Snowflake - starts with string and ends with digits

How to escape double quote in parsley data-parsley-pattern?

RegEx with Pipes and IPs not working

Regex Check Whether a string contains characters other than specified

Visual Studio Find and Replace with regex and single/double quotes

Categories

Resources