Does anyone know how to write this Regular Expression? - regex

I want to create a regex pattern to match a string which might include (`) not ('). For example: "This is Joe`s book", which is different from "This is Joe's book". I know how to match a string with (') but (`). So does anyone know how to write this Regular Expression?
Thanks!

This should do it...
^[^']+$
The caret inside a bracket expression [^ ] is the negation operator.

This captures strings from start ^ to end $ containing the character range in the square brackets. Note the back-tick at the end of the range.
^([a-zA-Z0-9 \.,;:\?\!`]+)$

[^']*[`][^']*
Accept any number of characters (including 0) that are not a single quote until a you encounter a backtick, and then accept any characters (including 0) that are not a single quote after that

If you are only wanting to test that the string has a back-tick:
/`/
Should work...
If you want to test for strings with backticks that don't contain apostrophes:
/^(?!.*').*`/
Should work...

Related

Can metacharacters be used in a regular expression as a normal character?

I was wondering if metacharacters, such as ? or *, can be used in a regular expression as a normal character instead of metacharacters.
For example, I have the following text:
"Hi. How are you? What time is it? Beep?"
And I wanted to use regular expressions to substring each group of words ending with a question mark (?).
Therefore resulting in:
Hi.
How are you?
What time is it?
Beep?
Thanks
You can look for ? followed by an optional space and replace it with a ? and newline like this.
Regex: \?\s? Here ? is escaped.
[?]\s? does the same.
Replacement: Replace with ?\n
Regex101 Demo

Groovy : RegEx for matching Alphanumeric and underscore and dashes

I am working on Grails 1.3.6 application. I need to use Regular Expressions to find matching strings.
It needs to find whether a string has anything other than Alphanumeric characters or "-" or "_" or "*"
An example string looks like:
SDD884MMKG_JJGH1222
What i came up with so far is,
String regEx = "^[a-zA-Z0-9*-_]+\$"
The problem with above is it doesn't search for special characters at the end or beginning of the string.
I had to add a "\" before the "$", or else it will give an compilation error.
- Groovy:illegal string body character after dollar sign;
Can anyone suggest a better RegEx to use in Groovy/Grails?
Problem is unescaped hyphen in the middle of the character class. Fix it by using:
String regEx = "^[a-zA-Z0-9*_-]+\$";
Or even shorter:
String regEx = "^[\\w*-]+\$";
By placing an unescaped - in the middle of character class your regex is making it behave like a range between * (ASCII 42) and _ (ASCII 95), matching everything in this range.
In Groovy the $ char in a string is used to handle replacements (e.g. Hello ${name}). As these so called GStrings are only handled, if the string is written surrounding it with "-chars you have to do extra escaping.
Groovy also allows to write your strings without that feature by surrounding them with ' (single quote). Yet the easiest way to get a regexp is the syntax with /.
assert "SDD884MMKG_JJGH1222" ==~ /^[a-zA-Z0-9*-_]+$/
See Regular Expressions for further "shortcuts".
The other points from #anubhava remain valid!
It's easier to reverse it:
String regEx = "^[^a-zA-Z0-9\\*\\-\\_]+\$" /* This matches everything _but_ alnum and *-_ */

Regex detect if a matched comma(,) does not lie in a regex

I am trying to figure out a way to determine if my matched comma(,) does not lie inside a regex. Basically, i do not want to match my character if it lies in a regex.
The regex i have come up with is ,(?<!.+\/)(?!.+\/) but its not quite working.
Any ideas?
I want to skip /some,regex/ but match any other commas.
Edit:
Live example: http://rubular.com/r/WjrwSnmzyP
Here is the regex that will work for you:
,(?!\s)(?=(?:(?:[^/]*\/){2})*[^/]*$)
Live Demo: http://rubular.com/r/37buDdg1tW
Explanation: It means match comma followed by EVEN number of forward slash /. Hence comma (,) between 2 slash (/) characters will NOT be matched and outside ones will be matched (since those are followed by even number of / characters).
A curious thing about regular expressions is that if you want to use them to ignore "something" that is within "something else", you need to match that "something else", prefer matches of it, and then either silently discard or reproduce those matches.
For example, in order to remove all commas from a string unless they are in a regular expression literal—
In Perl:
my $s = "/foo,bar/,baz";
$s =~ s{(/(?:[^/\\]|\\.)+/)|,}{\1}g;
In ECMAScript:
var s = "/foo,bar/,baz";
s = s.replace(/(\/([^\/\\]|\\.)+\/)|,/g, "$1");
or
s = s.replace(new RegExp("(/([^/\\\\]|\\\\.)+/)|,", "g"), "$1");
Note that I am capturing the match for the regular expression literal in the string value, and reproducing it (\1 or $1) if it matched. (If the other part of the alternation – the standalone comma – matched, the empty string is captured, so this simple approach suffices here.)
For further reading I recommend “Mastering Regular Expressions” by Jeffrey E. F. Friedl. Two rather enlightening example chapters, each from a different edition, are available for free online.

Regex to remove decimal values

Given a timestamp in ISO 8601 format below:
2012-04-21T01:56:00.581550
what regular expression would remove the decimal point and the millisecond precision? In other words, a regex that applies to the above and returns the following:
2012-04-21T01:56:00
This is probably very simple, but not being particular familiar with regex I am unsure how to approach the solution. Thanks in advance for any assistance.
If you must use regex, you can use "[.][0-9]+$" and replace it with an empty string "".
It is easier to locate the trailing '.', and chop off the string at its index. In C#, that would be
myStr = myStr.Substring(0, myStr.LastIndexOf('.')-1);
why do you want to use regex?
use string operations
in python :
>>> "2012-04-21T01:56:00.581550".split(".")
['2012-04-21T01:56:00', '581550']
>>> "2012-04-21T01:56:00.581550".split(".")[0]
'2012-04-21T01:56:00'
This regex ^[\w\-:]+ will only match up to the period and excluding it. You can use this to find the part of the time-stamp you want.
^ is the beginning of the string.
\w is any "word".
\- includes the hyphen.
: includes the colon.
These placed in [] means only matching these characters.
The + means matching one or many instances of those characters.
Since the period (.) is not included, the regex will stop matching when it gets to that.
s/\..*$//
It looks like you can assume there will only be one dot. The above sed expression finds a dot, then replaces everything after that dot up until the newline with nothing.
Without sed: replace \..*$ with the empty string ""
\. is the literal period (have to escape it because . means any character)
.* means any and all characters
$ means end of line
Code:
$_ = '2012-04-21T01:56:00.581550';
s/\.\d*//;
print $_, "\n";
Test:
http://ideone.com/52hij
Output:
2012-04-21T01:56:00

Regex: finding a string with an undetermined amount of words

I have a tag that is like
tag="text textwithdot. text text"
followed by a further tag that would resemble
tag="text text text"
I wanted to use the following regular expression
tag="\w+"
but that only finds one word, how do I find the whole string within the quotes, what wildcard does that?
This should work for you:
tag="([^"]*)"
That basically means tag=" followed by zero or more characters that are not a double quote, followed by a double quote.
BTW: I'm assuming that there is no such thing as a tag that contains the double quote character. If there is such a thing, it would need some escaping rule applied to it and the regular expression would be more complicated.
Also,
tag=['"]([^"]*)['"]
if that tags could change between ' and "
You could use an ungreedy match everything.
tag="[\s\S]*?"
Or use the . with dot matches newlines flag (assuming \n is a possibility).