trying to get the following regex: <- bad english from me :(
I'm trying to get the following input text converted as regex...
xx.*.aaa.bbb*
where * are wildcards .. as in .. they represent wildcards to me .. not regex syntax.
Any suggestions, please?
Update - example inputs.
xx.zzzzzzzzz.aaa.bbb = match
xx.eee.aaa.bbbzzzz = match
xx.eee.aaa.bbb.zzzz = match
xx.aaa.bbb = not a match
You misunderstood the concept of * in Regular Expressions.
I think what you are looking for is:
xx\..*\.aaa\.bbb.*
The thing is:
a . is not a real .. It means any character, so if you want to match a . you must escape it: \.
* means that the character that preceeds it will be matched 0 or many times, so how to emulate the wildcard you are looking for? Using .*. It will match any character 0 or many times.
If you want to match exactly the entire string, and not any substring that matches the pattern, you have to include ^ at the begining and $ at the end, so your regex will be:
^xx\..*\.aaa\.bbb.*$
Try this expression:
^xx\.[^\.]+\.aaa\.bbb.*
Assuming that you're saying that * is a wildcard in the 'normal sense', and that your string isn't an attempt at regex, I'd say that xx\..+\.aaa\.bbb.+ is what you're after.
What you refer to as "wildcard -- not regex syntax" is from globbing. It's a pattern matchnig technique that was popularized in the first Unix version in the late 60's. Originally it was a separate program -- called glob -- that produced a result that could be piped to other programs. Now bash, MS-Dos and almost any shell has this feature built-in. In globbing * normally means match any character, any number of times.
The regex syntax is different. The .* idiom in regex is similar to the * in globbing, but not exactly the same. Normally, .* doesn't match line-breaks. You usually have to set the single-line mode (in Ruby called multi line) if you want .* to match any character, any number of times in regex.
* are not wildcards, they mean the preceeding character is repeated 0 or 1 or many times.
And the dot can be any character.
UPDATE:
You can try this
^xx\.[a-z]+\.aaa\.bbb\.?[a-z]*
and you can test it for example here online on rubular
The [a-z] are character groups, within you can define what character is allowed (or not allowed using [^a-z]). so if you are only looking for lowercase letters then you can use [a-z].
The + means it has to there at least once.
The \.? near the end means there can be a dot or not
The ^ at the beginning means to match at the start of the string
A nice tutorial (for Perl, but at least the basics are the same nearly everywhere) is the PerlReTut
Related
I'm wondering is there a symbol for any number (including zero) of any characters
.*
. is any char, * means repeated zero or more times.
You can use this regular expression (any whitespace or any non-whitespace) as many times as possible down to and including 0.
[\s\S]*
This expression will match as few as possible, but as many as necessary for the rest of the expression.
[\s\S]*?
For example, in this regex [\s\S]*?B will match aB in aBaaaaB. But in this regex [\s\S]*B will match aBaaaaB in aBaaaaB.
Do you mean
.*
. any character, except newline character, with dotall mode it includes also the newline characters
* any amount of the preceding expression, including 0 times
I would use .*. . matches any character, * signifies 0 or more occurrences. You might need a DOTALL switch to the regex to capture new lines with ..
Yes, there is one, it's the asterisk: *
a* // looks for 0 or more instances of "a"
This should be covered in any Java regex tutorial or documentation that you look up.
I am new to regular expression. I am trying to construct a regular expression that
first three characters must be alphabets and then the rest of the string could be any character. If the part of the string after first three characters contains & then this part should start and end with ".
I was able to construct ^[a-z]{3}, but stuck at conditional statement.
For example abcENT and abc"E&T" are valid strings but not abcE&T.
Can this be done in a single expression?
In most regex flavors, you may use simple lookaheads to make sure some text is present or not somewhere to the right of the current locations, and using an alternation operator | it possible to check for alternatives.
So, we basically have 2 alternatives: there is a & somewhere in the string after the first 3 alphabets, or not. Thus, we can use
^[A-Za-z]{3}(?:(?=.*&)".*"|(?!.*&).*)$
See the regex demo
Details:
^ - start of string
[A-Za-z]{3} - 3 alphabets
(?:(?=.*&)".*"|(?!.*&).*) - Either of the two alternatives:
(?=.*&)".*" - if there is a & somewhere in the string ((?=.*&)) match ", then any 0+ characters, and then "
| - or
(?!.*&).* - if there is no & ((?!.*&)) in the string, just match any 0+ chars up to the...
$ - end of string.
In PCRE, or .NET, or some other regex flavors, you have access to the conditional construct. Here is a PCRE demo:
^[A-Za-z]{3}(?(?=.*&)".*"|.*)$
^^^^^^^^^^^^^^^^^
The (?(?=.*&)".*"|.*) means:
(?(?=.*&) - if there is a & after any 0+ characters...
".*" - match "anything here"-like strings
| - or, if there is no &
.* - match any 0+ chars from the current position (i.e. after the first 3 alphabets).
A conditional statement could be use with | and groups, but it probably will be complicated.
^[a-z]{3}([^&]*$|".*"$)
You might think about using plain old string manipulation for this task, it probably will be simple
Yeah this is possible, it is not really an if, but in your case you can make an "or" with regex capturing Group. Your regex would look something like that:
\d{3}(\".*\"|[^&]*)
P.S. here is a good site to test and learn These things:
https://regex101.com/
The expression itself will depend on the regexp parser you'll use. If you're using Python, shell, vim, boost, etc. , the same symbol could have different meanings.
I would try the following :
$ echo 'abc"&def"' | grep -E "^[a-zA-Z]{3}(\".*\&.*\"|[^&]*)"
abc"&def"
Regular expressions don't necessarily support conditionals as in 'if', to achive this in a general case you have to state your conditions as alternatives. (But see Wiktor's comment, depending on your regex engine there might be conditionals available.)
For a relatively basic solution you might try something like this:
^[a-z]{3}([^&]*|\..*\.)$
Which says "After four letters, there should be a string of any length with no ampersand (&) OR a string starting and ending with a full stop (.).
I have this regular expression
([A-Z], )*
which should match something like
test, (with a space after the comma)
How to I change the regex expression so that if there are any characters after the space then it doesn't match.
For example if I had:
test, test
I'm looking to do something similar to
([A-Z], ~[A-Z])*
Cheers
Use the following regular expression:
^[A-Za-z]*, $
Explanation:
^ matches the start of the string.
[A-Za-z]* matches 0 or more letters (case-insensitive) -- replace * with + to require 1 or more letters.
, matches a comma followed by a space.
$ matches the end of the string, so if there's anything after the comma and space then the match will fail.
As has been mentioned, you should specify which language you're using when you ask a Regex question, since there are many different varieties that have their own idiosyncrasies.
^([A-Z]+, )?$
The difference between mine and Donut is that he will match , and fail for the empty string, mine will match the empty string and fail for ,. (and that his is more case-insensitive than mine. With mine you'll have to add case-insensitivity to the options of your regex function, but it's like your example)
I am not sure which regex engine/language you are using, but there is often something like a negative character groups [^a-z] meaning "everything other than a character".
What does this regex do? i know it replaces the filename (but not extension)
Regex r = new Regex("(?<!\\.[0-9a-z]*)[0-9]");
return r.Replace(sz, "#");
How do i make it only repeat 5 times? to make it convert
"1111111111.000" to "11111#####.000" ?
I havent' tried this but Have tried it, works: how about changing the general pattern to use a positive lookahead instead? That way, it should work:
[0-9a-z](?=[0-9a-z]{0,4}\.)
Basically, this finds any (alphanumeric) character followed by up to four other alphanumeric characters and a period. This might just work to match the last five characters in front of the period consecutively. It's hellishly inefficient though, and it only works with engines that allow variable-width lookahead patterns.
Try changing the asterisk after the closing bracket to "{5}" (no quotes). This depends a bit on your regex environment's feature set, but that is a common syntax.
The regex matches a character (0 to 9) that is not preceded by a dot then any number of 0 to 9 or a to z's.
The replace line is responsible for the multi replacements. So you want to use the Replace method the has an extra parameter count.
so the code would look like:
Regex r = new Regex("(?<!\\.[0-9a-z]*)[0-9]");
return r.Replace(sz, "#", 5);
I need some help with regex.
I have a pattern AB.* , this pattern should match for strings
like AB.CD AB.CDX (AB.whatever).and
so on..But it should NOT match
strings like AB,AB.CD.CD ,AB.CD.
AB.CD.CD that is ,if it encounters a
second dot in the string. whats the
regex for this?
I have a pattern AB.** , this pattern should match strings like
AB,AB.CD.CD, AB.CD. AB.CD.CD but NOT
strings like AB.CD ,AB.CDX,
AB.whatever Whats the regex for
this?
Thanks a lot.
Looks like you've got globs not regular expressions. Dot matches any char, and * makes the previous element match any 0+ times.
1) AB\.[^.]*
Escape the first dot so it matches a literal dot, and then match any character other than a dot, any number of times.
2) "^(AB)|(AB\.[^.]*\.[^.]*$"
This matches AB or AB followed by .<stuff>.<stuff>
http://www.regular-expressions.info/ contains lots of useful information for learning about regular expressions.
If your regex engine supports negative lookahead you might try something like:
^AB\.[^.]+$
^AB(?!\.[^.]+$)
(or
^AB\.[^.]*$
^AB(?!\.[^.]*$)
if you want to allow AB. )
I don't find you're question entirely clear; please comment here (or edit your question if you can't add comments) if I'm getting this wrong but what I think you're looking for is:
1) matching strings "AB.AnyTextHereWithoutDots" but not "AB" or "AB.foo." etc
If so a matching regex would be:
"^AB\.[^.]*$"
2) matching "AB" or "AB.something.something" with either none or two or more dots
If so a matching regex would be something like:
"^AB(\..*\..*)?$" or "'^AB\(\..*\..*\)\?" (depending on the nature of your regex engine)
As Douglas suggests matching with globs would likely be easier.
And as spdenne suggests, find a good regex reference.
I tried this in vim. Here is the sample data:
AB.CD
AB.CDX
AB.whatever
AB
AB.CD.CD
AB.CD.
AB.CD.CD
Here is my regexes
This captures all lines starting with AB and then expects a literal dot, and then filters out all lines that has a second dot.
^AB\.[^.]*$
This captures all lines that is just an AB (the part before the pipe) or lines that start with AB that is followed by two literal dots (escaped with a backslash)
^AB$\|^AB\..\..$