Regex Check Whether a string contains characters other than specified - regex

How to check whether a string contains character other than:
Alphabets(Lowe-Case/Upper-Case)
digits
Space
Comma(,)
Period (.)
Bracket ( )
&
'
$
+(plus) minus(-) (*) (=) arithmetic operator
/
using regular expression in ColdFusion?
I want to make sure a string doesn't contain even single character other than the specified.

You can find if there are any invalid characters like this:
<cfif refind( "[^a-zA-Z0-9 ,.&'$()\-+*=/]" , Input ) >
<!--- invalid character found --->
</cfif>
Where the [...] is a character class (match any single char from within), and the ^ at the start means "NOT" - i.e. if it finds anything that is not an accepted char, it returns true.
I don't understand "Small Bracket(opening closing)", but guess you mean < and > there? If you want () or {} just swap them over. For [] you need to escape them as \[\]
Character Class Escaping
Inside a character class, only a handful of characters need escaping with a backslash, these are:
\ - if you want a literal backslash, escape it.
^ - a caret must be escaped if it's the first character, otherwise it negates the class.
- - a dash creates a range. It must be escaped unless first/last (but recommended always to be)
[ and ] - both brackets should be escaped.

ColdFusion uses Java's engine to parse regular expressions, anyway to make sure a string doesn't contain one of the characters you mentioned then try:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$']).*$
The above expression would only work if you are parsing the file line by line. If you want to apply this to text which contains multiple lines then you need to use the global modifier and the multi-line modifier and change the expression a bit like this:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$'\r\n]).*$
Regex101 Demo

The regular expression:
[^][a-zA-Z0-9 ,.&'$]
will match if the string contains any characters other than the ones in your list.

Related

What is the meaning of this line in perl?

$line =~ s/^<(\w+)=\"(.*?)\">//;
What is the meaning of this line in perl?
The s/.../.../ is the substitution operator. It matches its first operand, which is a regular expression and replaces it with its second operand.
By default, the substitution operator works on a string stored in $_. But your code uses the binding operator (=~) to make it work on $line instead.
The two operands to the substitution operator are the bits delimited by the / characters (there are more advanced versions of these delimiters, but we'll ignore them for now). So the first operand is ^<(\w+)=\"(.*?)\"> and the second operand is an empty string (because there is nothing between the second and third / characters).
So your code says:
Examine the variable $line
Look for a section of the string which matches ^<(\w+)=\"(.*?)\">
Replace that part of the string with an empty string
All that is left now is for us to untangle the regular expression and see what that matchs.
^ - matches the start of the string
< - matches a literal < character
(...) - means capture this bit of the match and store it in $1
\w+ - matches one or more "word characters" (where a word character is a letter, a digit or an underscore)
= - matches a literal = character
\" - matches a literal " character (the \ is unnecessary here)
(...) - means capture this bit of the match and store it in $2
.*? - matches zero or more instances of any character
\" - matches a literal " character (once again, the \ is unnecessary here)
> - matches a literal >
So, all in all, this looks like a slightly broken attempt to match XML or HTML. It matches tags of the form <foo="bar"> (which isn't valid XML or HTML) and replaces them with an empty string.
It's searching for an XML tag at the start of a string, and substituting it with nothing (i.e. removing it).
For example, in the input:
<hello="world">example
The regex will match <hello="world">, and substitute it with nothing - so the final result is just:
example
In general, this is something that you shouldn't do with regex. There are a dozen different ways you could create false negatives here, that don't get stripped from the string.
But if this is a "quick and dirty" script, where you don't need to worry about all possible edge cases, then it may be OK to use.

Which characters must be escaped in a Perl regex pattern

Im trying to find files that are looking like this:
access_log-20160101
access_log-20160304
...
with perl regex i came up with something like this:
/^access_log-\d{8}$/
But im not sure about the "_" and the "-". are these metacharacter?
What is the expression for this?
i read that "_" in regex is something like \w, but how do i use them in my exypression?
/^access\wlog-\d{8}$/ ?
Underscore (_) is not a metacharacter and does not need to be quoted (though it won't change anything if you quote it).
Hyphen (-) IS a metacharacter that defines the range between two symbols inside a bracketed character class. However, in this particular position, it will be interpreted verbatim and doesn't need quoting since it is not inside [] with a symbol on both sides.
You can use your regexp as is; hyphens (-) might need quoting if your format changes in future.
Your regex pattern is exactly right
Neither underscore _ nor hyphen - need to be escaped. Outside a square-bracketed character class, the twelve Perl regex metacharacters are
Brackets ( ) [ {
Quantifiers * + ?
Anchors ^ $
Alternator |
Wild character .
The escape itself \
and only these must be escaped
If the pattern of your file names doesn't vary from what you have shown then the pattern that you are using
^access_log-\d{8}$
is correct, unless you need to validate the date string
Within a character class like [A-F] you must escape the hyphen if you want it to be interpreted literally. As it stands, that class is the equivalent to [ABCDEF]. If you mean just the three characters A, - or F then [A\-F] will do what you want, but it is usual to put the hyphen at the start or end of the class list to make it unambiguous. [-AF] and [AF-] are the same as [A\-F] and rather more readable

R Regular expression for string containing full stops

I have a bunch of strings, some of which end with ..t.. I am trying to find a regular expression to match these strings but dealing with the full stops is giving me a headache!
I have tried
grep('^.+(..t.)$', myStrings)
but this also matches strings such as w...gate. I think I am dealing with the full stops incorrectly. Any help at all appreciated.
Note: I am using grep within R.
Since you are only checking if the end of the string ends with ..t., you can eliminate ^.+ in your pattern.
The dot . in regular expression syntax is a character of special meaning which matches any character except a newline sequence. To match a literal dot or any other character of special meaning you need to escape \\ it.
> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('\\.{2}t\\.$', x)
# [1] 1 4
Or place that character inside of a character class.
> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('[.]{2}t[.]$', x)
# [1] 1 4
Note: I used the range operator \\.{2} to match two dots instead of escaping it twice \\.\\.
k, a little bit of better googling provided the answer;
grep("^.+(\\.\\.t\\.)$", myStrings)
this works because we need to escape the point as \\. in R.
The dot(.) matches only a single character.. to remove meaning of dot u should use double slash before dot char (\\).
try this instead.....
grep('^.+(\\.\\.t\\.)$', myStrings)
Satheesh Appu

regular expression what's the meaning of this regular expression s#^.*/##s

what is the meaning of s#^.*/##s
because i know that in the pattern '.' denotes that it can represent random letter except the \n.
then '.* 'should represent the random quantity number of random letter .
but in the book it said that this would be delete all the unix type of path.
My question is that, does it means I could substitute random quantity number of random letter by space?
s -> subsitution
# -> pattern delimiter
^.* -> all chars 0 or more times from the begining
/ -> literal /
## -> replace by nothing (2 delimiters)
s -> single line mode ( the dot can match newline)
Substitutions conventionally use the / character as a delimiter (s/this/that/), but you can use other punctuation characters if it's more convenient. In this case, # is used because the regexp itself contains a / character; if / were used as the delimiter, any / in the pattern would have to be escaped as \/. (# is not the character I would have chosen, but it's perfectly valid.)
^ matches the beginning of the string (or line; see below)
.*/ matches any sequence of characters up to and including a / character. Since * is greedy, it will match all characters up to an including the last / character; any precedng / characters are "eaten" by the .*. (The final / is not, because if .* matched all / characters the final / would fail to match.)
The trailing s modifier treats the string as a single line, i.e., causes . to match any character including a newline. See the m and s modifiers in perldoc perlre for more information.
So this:
s#^.*/##s
replaces everything from the beginning of the string ($_ in this case, since that's the default) up to the last / character by nothing.
If there are no / characters in $_, the match fails and the substitution does nothing.
This might be used to replace all directory components of an absolute or relative path name, for example changing /home/username/dir/file.txt to file.txt.
It will delete all characters, including line breaks because of the s modifier, in a string until the last slash included.
Please excuse a little pedantry. But I keep seeing this and I think it's important to get it right.
s#^.*/##s is not a regular expression.
^.* is a regular expression.
s/// is the substitution operator.
The substitution operator takes two arguments. The first is a regular expression. The second is a replacement string.
The substitution operator (like many other quote-like operators in Perl) allows you you change the delimiter character that you use.
So s### is also a substitution operator (just using # instead of /).
s#^.*/## means "find the text that matches the regular expression ^.*/ and replace it with an empty string. And the s on the end is a option which changes the regex so that the . matches "\n" as well as all other characters.

gvim search match multiple characters using regex

I am trying to search in gvim for the following pattern:
arrayA[*].entryx
hoping it would match the following:
arrayA[size].entryx
arrayA[i].entryx
arrayA[index].entryx
but it prints message saying Pattern not found even though the above lines are present in the file.
arrayA[.].entryx
only matches arrayA[i].entryx
i.e. with only one character between [] braces.
What should I do to match multiple characters between [] braces?
Here is the PCRE expression detail
/arrayA\[[^]]*]\.entryx/
^^^^^ # 0 or more characters before a ']'
^^ ^^ # Escaped '[' & '.'
^ # Closing ']' -- does not need to be escaped
^^^^^^ ^^^^^^ # Literal parts
If you want to look for arrayA[X].entryx where, there is at least on character in the [],
You need to replace \[[^]]* with \[[^]]\+
ps: Note my edit -- I've changed the \* to just * -- you don't escape that either.
But, you need to escape the + :-)
Update on your comment:
While my comment answers your question on escaping ] broadly,
for more detail look at Perl Character Class details.
Specifically, the Special Characters Inside a Bracketed Character Class section.
Rules of what needs to be escaped change after a [character starts a Character Class (CCL).
The * repeats the previous character; and [ starts a character class. So, you need something more like:
/arrayA\[[^]]*]\.entryx/
That looks for a literal [, a series of zero or more characters other than ], a literal ], a literal . and the entryx.
Always remember that in VIM you need to scape some special characters, such as [, ], {, } and .. As said before the *repeats the previous character, with this you can simply use the /arrayA\[.*\]\.entryx, but the * is greedy character, it may match some strange things, add the following line to your file and you'll understand: arrayA[size].entryx = arrayB[].entryx
A "safer" Regular Expression would be:
/arrayA\[.\{-\}\]\.entryx
The .\{-\} matches any character in a non-greedy way, witch is safer for some cases.