Regular Expressions for some alphanumeric sentences - regex

Can anyone provide Regular expression for there statements--
annotations/16/16366.eng
annotations/29/21345.eng
annotations/10/20132.eng
And these type of statements. I have tried 'a(\w+).eng' but, it did not worked.

To match alphanumeric separated by slashes, ending with .eng you can do:
(\w+\/\w+\/\w+\.eng)
Remember that [ and ] are used for sets. You can specify a word verbatim as a match, without any flags. If you wanted to match anything in the same format with annotations you can do:
annotations\/\w+\/\w+\.eng
Where \/ escapes a / and \. escapes a period.
And to simplify it:
[\w/]*\.eng
Meaning "Match any repetitions of the set with alphanumeric characters \w, and / followed by `.eng'.

Related

How to capture a word when it's not followed by hyphens, underscores, and alphanumeric

How do I capture a word "entity", when it's not followed by hyphens, underscores, and alphanumeric, and ignores anything else that follows it?
For example, I want to capture the word "entity" in the following situations:
entity
entity,
[entity]
But I do NOT want it to capture the word in the following situations:
entity-foo
entity_bar
entityfoobar
entity0foo
The furthest I got to is:
(entity)[^-\$a-zA-Z_0-9]
However, the above regex identifies:
entity, without ignoring the ,
entity] without ignoring ]
I'm trying to capture this token in a Sublime Syntax definition.
Sounds like a job for lookaheads!
Something like this should work:
(entity)(?=[\s,\]])
Explanation:
(?<=\[)?: The (?<=regex) construct is a lookbehind. We make it optional by using a trailing ?. This lookbehind looks for a [ character in front of our regex
(entity): Matching the phrase entity and capturing it
(?=[\s,\]]): A lookahead ((?=regex)), looking for any of \s, , and ]. \s in RegEx matches a whitespace character, which includes spaces, tabs, newlines, etc.
One caveat of my pattern is that the phrase entity] will be matched, without the leading [, which isn't specified in your examples. This can potentially be expanded further, but it will begin to get messy, and may not be necessary, anyway.
For the examples posted by the OP and by the rules, "when it's not followed by hyphens, underscores, and alphanumeric", one could also use negative lookaheads:
entity(?![0-9a-zA-Z_-])
which says in essence match entity as long as it is not followed by a digit, alphanumeric, _ (underscore) or - (hyphen).
See Regex Demo

Regexp question mark (in emacs)

I'd like to ask what the following emacs regular expression means (if anyone wonders, this is the regexp that erlang-mode uses for matching a single-quoted atom):
'\\(?:[^\\']\\|\\(?:\\\\.\\)\\)*'
specifically I'm having trouble finding explanations for three things.
First, the question mark which supposedly should either make the preceding item optional or specify that the preceding quantifier make lazy, but there is no item or quantifier here, only the start of a new group so what effect does it have here?
Second, the escaped apostrophe. Why would you need to escape the apostrophe?
Third, the quadruple escape \\., wouldn't this leave you with an escaped backslash and a \. which would make it an invalid regexp?
Thanks
"[^\\']"
Second, the escaped apostrophe. Why would you need to escape the apostrophe?
Firstly note that In Emacs regexp syntax, \` matches the start of the string, and \' matches the end of the string. In multi-line strings this is different to the more familiar ^ and $, which match the beginning of a line and the end of a line.
However that is not relevant within a character alternative (square brackets), so this sequence is actually matching any character other than a backslash or an apostrophe.
Edit:
So from the comments, this is still causing confusion, so let's break it down:
"'\\(?:[^\\']\\|\\(?:\\\\.\\)\\)*'"
That code evaluates to this string/regexp:
'\(?:[^\']\|\(?:\\.\)\)*'
' matches an apostrophe
\(?:foo\)* matches zero or more foo
foo\|bar matches either of foo or bar
[^\'] matches any character other than a backslash or an apostrophe
\(?:\\.\) could (in this case, being a non-capturing group which occurs exactly once) be rewritten as simply \\., and matches a backslash followed by any character other than a newline.
' matches an apostrophe
So the whole thing matches a single-quoted string in which:
any other single-quotes must each be preceded by a backslash
any backslash must be paired with another non-newline character (which could also be a backslash)
Which of course sounds like a typical string syntax in which backslashes can be used to escape special characters, including backslashes themselves and any instances of the delimiting quote character.
First: (?: groups multiple tokens together without creating a capturing group. This allows you to apply quantifiers to the full group.
Second and third, I think those are escaped bars. Each pair means \, and the quadruple means \\. So, its not scaping apostrophe at all.

Regular Expressions pattern with Special characters

I'm working on a regular expressions pattern, but it contains a number of special characters. I'm not really sure how to incorporate them in a normal regex pattern string. Specifically, I need to test to see if a string contains '+/-'...
I've tried using quotes etc but have no luck (I'm extremely new to regex). I am coding this in C# 4.0.
One string example is "3Z1Z +/- 5.5"
Any help is much appreciated - Thanks a lot!
Create a simple regex :
foundMatch = Regex.IsMatch(SubjectString, #"\+/-");
Will return true if this sequence of characters is found anywhere in your string. The explanation is left as an exercise to you.
Read more here.
These are part of the special character list (see also). Basically, add them to the pattern by prefixing them with a backslash (\). e.g. + becomes \+
^\+|\-$ # + or -
The same would go for anything else with special meaning, such as ., {, }, (, ), ^, $, |, [, ], etc.
There are some exceptions though. For instance, when creating a class such as: [a-z] the hyphen (-) would have special meaning (all letters from a through z). So if you wanted a literal hyphen you'd have to escape it (unless it falls as the last character of the class). e.g.
[a-z-A-Z] # hyphen should be escaped if you wanted a literal hyphen
[a-z\-A-Z] # the "correct" counter-part
[a-zA-Z-] # actually legal because it's inserted as the last character
# and therefor treated as a literal hyphen despite not being
# escaped.

What does \'.- mean in a Regular Expression

I'm new to regular expression and I having trouble finding what "\'.-" means.
'/^[A-Z \'.-]{2,20}$/i'
So far from my research, I have found that the regular expression starts (^) and requires two to twenty ({2,20}) alphabetical (A-Z) characters. The expression is also case insensitive (/i).
Any hints about what "\'.-" means?
The character class is the entire expression [A-Z \'.-], meaning any of A-Z, space, single quote, period, or hyphen. The \ is needed to protect the single quote, since it's also being used as the string quote. This charclass must be repeated 2 to 20 times, and because of the leading ^ and trailing $ anchors that must be the entire content of the matching string.
It means to escape the single quote (') that delmits the regex (as to not prematurely end the string), and then a . which means a literal . and a - which means a literal -.
Inside of the character range, the . is treated literally, and if the - isn't part of a valid range, e.g. a-z, then it is treated literally as well.
Your regex says Match the characters a-zA-Z '.- between 2 and 20 times as the entire string, with an optional trailing \n.
This regex is in a string. The backslash is there to escape the single quote so the string doesn't end early, in the middle of the regex. The dot and dash are just what they are, a period and a dash.
So, you were nearly right, except it's 2-20 characters that are letters, space, single quote, period, or dash.
It's quoting the quote.
The regular expression is ^[A-Z'.-]{2,20}$.
In the programming language you are using, you write it as a quoted string:
'SOMETHING'
To get a single quote in there, it's been backslashed.
Everything inside the square brackets is part of the character class, and will match a single character listed. In your example, the characters listed are the letters A through Z, a space, a single quote, a period, or a hyphen. (Note the hyphen must be listed last to avoid indicating a range, like A-Z.) Your full regular expression will match between 2 and 20 of the listed characters. The single quote is needed so the compiler knows you are not ending the string that defines the regular expression.
Some examples of things this will match:
....................
abaca af - .
AAfa- - ..
.z
And so on.

Is "#" a special character in regular expressions?

I am working on an email filter and I have come across a list of regular expressions that are used to block all emails coming from senders that match a record in that list. While browsing through the list, I have discovered that all occurrences of the # character are escaped with a \.
Does the # mean anything special in regular expressions and needs to be escaped like so \#?
It's normally not a special character, but it doesn't hurt to escape it which is probably why many people do it, they just want to be safe (or they think it's a special character).
No, the # is not special character in regex.
The the \ can be use in this meaning
Pattern:
\Q...\E
Def
Matches the characters between \Q and \E literally, suppressing the meaning of special characters.
Example:
\Q+-/\E matches +-/