Help with Regex patterns - regex

I need some help with regex.
I have a pattern AB.* , this pattern should match for strings
like AB.CD AB.CDX (AB.whatever).and
so on..But it should NOT match
strings like AB,AB.CD.CD ,AB.CD.
AB.CD.CD that is ,if it encounters a
second dot in the string. whats the
regex for this?
I have a pattern AB.** , this pattern should match strings like
AB,AB.CD.CD, AB.CD. AB.CD.CD but NOT
strings like AB.CD ,AB.CDX,
AB.whatever Whats the regex for
this?
Thanks a lot.

Looks like you've got globs not regular expressions. Dot matches any char, and * makes the previous element match any 0+ times.
1) AB\.[^.]*
Escape the first dot so it matches a literal dot, and then match any character other than a dot, any number of times.
2) "^(AB)|(AB\.[^.]*\.[^.]*$"
This matches AB or AB followed by .<stuff>.<stuff>

http://www.regular-expressions.info/ contains lots of useful information for learning about regular expressions.

If your regex engine supports negative lookahead you might try something like:
^AB\.[^.]+$
^AB(?!\.[^.]+$)
(or
^AB\.[^.]*$
^AB(?!\.[^.]*$)
if you want to allow AB. )

I don't find you're question entirely clear; please comment here (or edit your question if you can't add comments) if I'm getting this wrong but what I think you're looking for is:
1) matching strings "AB.AnyTextHereWithoutDots" but not "AB" or "AB.foo." etc
If so a matching regex would be:
"^AB\.[^.]*$"
2) matching "AB" or "AB.something.something" with either none or two or more dots
If so a matching regex would be something like:
"^AB(\..*\..*)?$" or "'^AB\(\..*\..*\)\?" (depending on the nature of your regex engine)
As Douglas suggests matching with globs would likely be easier.
And as spdenne suggests, find a good regex reference.

I tried this in vim. Here is the sample data:
AB.CD
AB.CDX
AB.whatever
AB
AB.CD.CD
AB.CD.
AB.CD.CD
Here is my regexes
This captures all lines starting with AB and then expects a literal dot, and then filters out all lines that has a second dot.
^AB\.[^.]*$
This captures all lines that is just an AB (the part before the pipe) or lines that start with AB that is followed by two literal dots (escaped with a backslash)
^AB$\|^AB\..\..$

Related

trying to find the correct regular expression

I have the following cases that should match with a regular expression, I've tried several combinations and have read a lot of answers but still no clue on how to solve it.
the rule is, find any combination of . inside a quoted string, atm I have the following regexp
\"\w*((..)|(.))\w*\"
that covers most of the cases:
mmmas"A.F"asdaAA
196.34.45.."asd."#
".add"
sss"a.aa"sss
".."
"a.."
"a..a"
"..A"
but still having problems with this one:
"WERA.HJJ..J"
I've been testing the regpexp in the http://regexr.com/ site
I will really appreciate any help on this
Change your regex to
\"\w*(\.+\w*)+\"
Update: escape . to match the dot and not any character
demo
From the question, it seems that you need to find every occurrence of one or more dot (along with optional word characters) inside a pair of quotes. The following regex would do this:
\"\w*(\.+\w*)+\"
In "WERA.HJJ..J", you have some word characters followed by a dot which is followed by a sequence of word characters again followed by dot and word characters. Your regex would match one or two dots with a pair of optional word character blocks on either sides only.
The dots in the regex are escaped to avoid them being matched against any character, since it is a metacharacter.
Check here.

Get text using Regular Expression

I have the sentence as below:
First learning of regular expression.
And I want to extract only First learning and expression by means of regular expressions.
Where would I start/
Regular expressions are for pattern matching, which means we'd need to know a pattern that is to be matched.
If you literally just want those strings, you'd just use First learning and expression as your patterns.
As #orique says, this is kind of pointless; you don't need RegEx for that. If you want something more complicated, you'd need to explain what you're trying to match.
Regex is not usually used to match literal text like what you're doing, but instead is used to match patterns of text. If you insist on using regex, you'll have to match the trivial expression
(First learning|expression)
As already pointed out, it is unusual to match a literal string like you are asking, but more common to match patterns such as several word characters followed by a space character etc...
Here is a pattern to match several word characters (which are a-z, A-Z, 0-9 and _) followed by a space, followed by several more word characters etc... It ends up capturing three groups. The first group will match the first two words, the second part the next to words, and the last part, the fifth word and the preceding space.
$words = "First learning of regular expression.";
preg_match(/(\w+\s\w+)\s(\w+\s\w+)(\s\w+)/, $words, $matches);
$result = matches[1]+matches[3];
I hope this matches your requirement.

regular expression no characters

I have this regular expression
([A-Z], )*
which should match something like
test, (with a space after the comma)
How to I change the regex expression so that if there are any characters after the space then it doesn't match.
For example if I had:
test, test
I'm looking to do something similar to
([A-Z], ~[A-Z])*
Cheers
Use the following regular expression:
^[A-Za-z]*, $
Explanation:
^ matches the start of the string.
[A-Za-z]* matches 0 or more letters (case-insensitive) -- replace * with + to require 1 or more letters.
, matches a comma followed by a space.
$ matches the end of the string, so if there's anything after the comma and space then the match will fail.
As has been mentioned, you should specify which language you're using when you ask a Regex question, since there are many different varieties that have their own idiosyncrasies.
^([A-Z]+, )?$
The difference between mine and Donut is that he will match , and fail for the empty string, mine will match the empty string and fail for ,. (and that his is more case-insensitive than mine. With mine you'll have to add case-insensitivity to the options of your regex function, but it's like your example)
I am not sure which regex engine/language you are using, but there is often something like a negative character groups [^a-z] meaning "everything other than a character".

How can I have two wildcards in this regex expression?

trying to get the following regex: <- bad english from me :(
I'm trying to get the following input text converted as regex...
xx.*.aaa.bbb*
where * are wildcards .. as in .. they represent wildcards to me .. not regex syntax.
Any suggestions, please?
Update - example inputs.
xx.zzzzzzzzz.aaa.bbb = match
xx.eee.aaa.bbbzzzz = match
xx.eee.aaa.bbb.zzzz = match
xx.aaa.bbb = not a match
You misunderstood the concept of * in Regular Expressions.
I think what you are looking for is:
xx\..*\.aaa\.bbb.*
The thing is:
a . is not a real .. It means any character, so if you want to match a . you must escape it: \.
* means that the character that preceeds it will be matched 0 or many times, so how to emulate the wildcard you are looking for? Using .*. It will match any character 0 or many times.
If you want to match exactly the entire string, and not any substring that matches the pattern, you have to include ^ at the begining and $ at the end, so your regex will be:
^xx\..*\.aaa\.bbb.*$
Try this expression:
^xx\.[^\.]+\.aaa\.bbb.*
Assuming that you're saying that * is a wildcard in the 'normal sense', and that your string isn't an attempt at regex, I'd say that xx\..+\.aaa\.bbb.+ is what you're after.
What you refer to as "wildcard -- not regex syntax" is from globbing. It's a pattern matchnig technique that was popularized in the first Unix version in the late 60's. Originally it was a separate program -- called glob -- that produced a result that could be piped to other programs. Now bash, MS-Dos and almost any shell has this feature built-in. In globbing * normally means match any character, any number of times.
The regex syntax is different. The .* idiom in regex is similar to the * in globbing, but not exactly the same. Normally, .* doesn't match line-breaks. You usually have to set the single-line mode (in Ruby called multi line) if you want .* to match any character, any number of times in regex.
* are not wildcards, they mean the preceeding character is repeated 0 or 1 or many times.
And the dot can be any character.
UPDATE:
You can try this
^xx\.[a-z]+\.aaa\.bbb\.?[a-z]*
and you can test it for example here online on rubular
The [a-z] are character groups, within you can define what character is allowed (or not allowed using [^a-z]). so if you are only looking for lowercase letters then you can use [a-z].
The + means it has to there at least once.
The \.? near the end means there can be a dot or not
The ^ at the beginning means to match at the start of the string
A nice tutorial (for Perl, but at least the basics are the same nearly everywhere) is the PerlReTut

Regular Expressions and negating a whole character group [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 5 years ago.
I'm attempting something which I feel should be fairly obvious to me but it's not. I'm trying to match a string which does NOT contain a specific sequence of characters. I've tried using [^ab], [^(ab)], etc. to match strings containing no 'a's or 'b's, or only 'a's or only 'b's or 'ba' but not match on 'ab'. The examples I gave won't match 'ab' it's true but they also won't match 'a' alone and I need them to. Is there some simple way to do this?
Using a character class such as [^ab] will match a single character that is not within the set of characters. (With the ^ being the negating part).
To match a string which does not contain the multi-character sequence ab, you want to use a negative lookahead:
^(?:(?!ab).)+$
And the above expression disected in regex comment mode is:
(?x) # enable regex comment mode
^ # match start of line/string
(?: # begin non-capturing group
(?! # begin negative lookahead
ab # literal text sequence ab
) # end negative lookahead
. # any single character
) # end non-capturing group
+ # repeat previous match one or more times
$ # match end of line/string
Use negative lookahead:
^(?!.*ab).*$
UPDATE: In the comments below, I stated that this approach is slower than the one given in Peter's answer. I've run some tests since then, and found that it's really slightly faster. However, the reason to prefer this technique over the other is not speed, but simplicity.
The other technique, described here as a tempered greedy token, is suitable for more complex problems, like matching delimited text where the delimiters consist of multiple characters (like HTML, as Luke commented below). For the problem described in the question, it's overkill.
For anyone who's interested, I tested with a large chunk of Lorem Ipsum text, counting the number of lines that don't contain the word "quo". These are the regexes I used:
(?m)^(?!.*\bquo\b).+$
(?m)^(?:(?!\bquo\b).)+$
Whether I search for matches in the whole text, or break it up into lines and match them individually, the anchored lookahead consistently outperforms the floating one.
Yes its called negative lookahead. It goes like this - (?!regex here). So abc(?!def) will match abc not followed by def. So it'll match abce, abc, abck, etc.
Similarly there is positive lookahead - (?=regex here). So abc(?=def) will match abc followed by def.
There are also negative and positive lookbehind - (?<!regex here) and (?<=regex here) respectively
One point to note is that the negative lookahead is zero-width. That is, it does not count as having taken any space.
So it may look like a(?=b)c will match "abc" but it won't. It will match 'a', then the positive lookahead with 'b' but it won't move forward into the string. Then it will try to match the 'c' with 'b' which won't work. Similarly ^a(?=b)b$ will match 'ab' and not 'abb' because the lookarounds are zero-width (in most regex implementations).
More information on this page
abc(?!def) will match abc not followed
by def. So it'll match abce, abc,
abck, etc. what if I want neither def
nor xyz will it be abc(?!(def)(xyz))
???
I had the same question and found a solution:
abc(?:(?!def))(?:(?!xyz))
These non-counting groups are combined by "AND", so it this should do the trick. Hope it helps.
Using a regex as you described is the simple way (as far as I am aware). If you want a range you could use [^a-f].
Simplest way is to pull the negation out of the regular expression entirely:
if (!userName.matches("^([Ss]ys)?admin$")) { ... }
Just search for "ab" in the string then negate the result:
!/ab/.test("bamboo"); // true
!/ab/.test("baobab"); // false
It seems easier and should be faster too.
In this case I might just simply avoid regular expressions altogether and go with something like:
if (StringToTest.IndexOf("ab") < 0)
//do stuff
This is likely also going to be much faster (a quick test vs regexes above showed this method to take about 25% of the time of the regex method). In general, if I know the exact string I'm looking for, I've found regexes are overkill. Since you know you don't want "ab", it's a simple matter to test if the string contains that string, without using regex.
The regex [^ab] will match for example 'ab ab ab ab' but not 'ab', because it will match on the string ' a' or 'b '.
What language/scenario do you have? Can you subtract results from the original set, and just match ab?
If you are using GNU grep, and are parsing input, use the '-v' flag to invert your results, returning all non-matches. Other regex tools also have a 'return nonmatch' function, too.
If I understand correctly, you want everything except for those items which contain 'ab' anywhere.