With /\escape/ I can escape special regex right? But why isn't working?
I'm trying to search specific numbers from the beginning which start with |something in the middle have numbers only [0-9] and ends with | again.
Also have other string etc from left and from the right like so left|something[0-9]|right
This is what I've done, but is not working
/\|/234123[0-9]/\|/
\ only escapes the next character, so the second forward slash is ending the regular expression. Instead, you want this:
/\|something[0-9]\|/
You have to make sure that something is escaped correctly.
Note that if you need to match any number not just a digit, you need [0-9]+.
What would probably help you the most would be the right tool for the job:
https://regex101.com/r/ZdjhCE/2
You'll still have to set your language, as regex are similar between languages, but unluckily not 100% identical.
Related
I am trying to regex the following string:
https://www.amazon.com/Tapps-Top-Apps-and-Games/dp/B00VU2BZRO/ref=sr_1_3?ie=UTF8&qid=1527813329&sr=8-3&keywords=poop
I want only B00VU2BZRO.
This substring is always going to be a 10 characters, alphanumeric, preceded by dp/.
So far I have the following regex:
[d][p][\/][0-9B][0-9A-Z]{9}
This matches dp/B00VU2BZRO
I want to match only B00VU2BZRO with no dp/
How do I regex this?
Here is one regex option which would produce an exact match of what you want:
(?<=dp\/)(.*)(?=\/)
Demo
Note that this solution makes no assumptions about the length of the path fragment occurring after dp/. If you want to match a certain number of characters, replace (.*) with (.{10}), for example.
Depending on your language/method of application, you have a couple of options.
Positive look behind. This will make your regex more complicated, but will make it match what you want exactly:
(<=dp/)[0-9A-Z]{10}
The construct (<=...) is called a positive look behind. It will not consume any of the string, but will only allow the match to happen if the pattern between the parens is matched.
Capture group. This will make the regex itself slightly simpler, but will add a step to the extraction process:
dp/([0-9A-Z]{10})
Anything between plain parens is a capture group. The entire pattern will be matched, including dp/, but most languages will give you a way of extracting the portion you are interested in.
Depending on your language, you may need to escape the forward slash (/).
As an aside, you never need to create a character class for single characters: [d][p][\/] can equally well be written as just dp\/.
I'm trying to mass update a web app, I need to create a regex that matches:
lang::id(ALLCHARACTERS]
Can someone assist me with this? I'm not good with regex. I'm pretty sure it can start like:
lang\:\:\(WHAT GOES HERE\]
Something like this would work:
lang::id\([^]]*]
This will match a literal lang::id\(, followed by zero or more of any character other than ], followed by a literal ].
Note that the only character that really needs to be escaped is the open parenthesis.
lang::id\(.*]
The . means any single character, and then * repeats it zero->N times. Make sure to escape the ( since it is used inside regex and is a special char for them, so escaping it with \ is needed, or the regex will probably complain about unbalanced parenthesis.
If you wanted it to not include all characters, you can add a smaller regex in place of the .*. This way you can break the regex down into smaller chunks which help make it easier to understand and develop for some complex rules.
im trying to write a regex finds all the characters that have
exactly 3 capital letters on both their sides
The following regex finds all the characters that have exactly 3 capital letters on the left side of the char, and 3 (or more) on the right:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})'
When trying to limit the right side to no more then 3 capitals using the regex:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})(?![A-Z])'
i get no results, there seems to be a fail when adding the (?![A-Z]) to the first regex.
can someone explain me the problem and suggest a way to solve it?
Thanks.
You need to put the negative lookahead inside the positive one:
(?<![A-Z])[A-Z]{3}.(?=[A-Z]{3}(?![A-Z]))
You can do that with the lookbehind, too:
(?<=(?<![A-Z])[A-Z]{3}).(?=[A-Z]{3}(?![A-Z]))
It doesn't violate the "fixed-length lookbehind" rule because lookarounds themselves don't consume any characters.
EDIT (about fixed-length lookbehind): Of all the flavors that support lookbehind, Python is the most inflexible. In most flavors (e.g. Perl, PHP, Ruby 1.9+) you could use:
(?<=^[A-Z]{3}|[^A-Z][A-Z]{3}).
...to match a character preceded by exactly three uppercase ASCII letters. The first alternative - ^[A-Z]{3} - starts looking three positions back, while the second - [^A-Z][A-Z]{3} - goes back exactly four positions. In Java, you can reduce that to:
(?<=(^|[^A-Z])[A-Z]{3}).
...because it does a little extra work at compile time to figure out that the maximum lookbehind length will be four positions. And in .NET and JGSoft, anything goes; if it's legal anywhere, it's legal in a lookbehind.
But in Python, a lookbehind subexpression has to match a single, fixed number of characters. If you've butted your head against that limitation a few times, you might not expect something like this to work:
(?<=(?<![A-Z])[A-Z]{3}).
At least I didn't. It's even more concise than the Java version; how can it work in Python? But it does work, in Python and in every other flavor that supports lookbehind.
And no, there are no similar restrictions on lookaheads, in any flavor.
Taking out the positive lookahead worked for me.
(?<![A-Z])[A-Z]{3}(.)([A-Z]{3})(?![A-Z])
'ABCdDEF' 'ABCfDEF' 'HHHhhhHHHH' 'jjJJjjJJJ' JJJjJJJ
matches
ABCdDEF
ABCfDEF
JJJjJJJ
I'm not sure how the regexp engines should work with multiple lookahead assertions, but the one you're using may have its own opinion on that.
You could as well use a single assertion as follows:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3}[^A-Z])'
The same with lookbehind:
'(?<=[^A-Z][A-Z]{3})(.)(?=[A-Z]{3}[^A-Z])'
This will have a problem matching the pattern in the beginning and in the end of the line.
I can't think of a proper solution, but there can be a dirty trick: for instance, add a space (or something else) in the beginning and the end of the whole line, then perform the matching.
$ echo 'ABCdDEF ABCfDEF HHHhhhHHHH AAAaAAAbAAA jjJJJJjJJJ JJJjJJJ' | sed 's/.*/ & /' | grep -oP '(?<=[^A-Z][A-Z]{3})(\S)(?=[A-Z]{3}[^A-Z])'
d
f
a
b
j
Note that I changed (.) to (\S) in the middle, change it back if you want the space to match.
P.S. Are you solving The Python Challenge? :)
Since the look ahead pattern is the same as the look behind pattern, you could also use the continue anchor \G:
/(?:[A-Z]{3}|\G[A-Z]*)(.)[A-Z]{3}/
A match is returned if three capitals precede a single character or where the last match left off (optionally followed by other capitals).
I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
First expression:
^([^#]*)
Second expression:
#(.*)$
[a-zA-Z0-9]*[\#]
If your string contains any other special characters you need to add them into the first square bracket escaped.
I don't use C#, but i will assume that it uses pcre... if so,
"([^#]*)#.*"
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
"(.*)#.*"
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
Use look ahead and look behind:
To get all characters up to, but not including the pound (#): .*?(?=\#)
To get all characters following, but not including the pound (#): (?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.
Honestly though, for you situation, it is probably easier to use the Split method provided in String.
More on lookahead and lookbehind
first:
/[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/
second:
/(?<=\#).*/
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:
string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];
I am trying to get one regular expression that does the following:
makes sure there are no white-space characters
minimum length of 8
makes sure there is at least:
one non-alpha character
one upper case character
one lower case character
I found this regular expression:
((?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])(?!\s).{8,})
which takes care of points 2 and 3 above, but how do I add the first requirement to the above regex expression?
I know I can do two expressions the one above and then
\s
but I'd like to have it all in one, I tried doing something like ?!\s but I couldn't get it to work. Any ideas?
^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])\S{8,}$
should do. Be aware, though, that you're only validating ASCII letters. Is Ä not a letter for your requirements?
\S means "any character except whitespace", so by using this instead of the dot, and by anchoring the regex at the start and end of the string, we make sure that the string doesn't contain any whitespace.
I also removed the unnecessary parentheses around the entire expression.
Tim's answer works well, and is a good reminder that there are many ways to solve the same problem with regexes, but you were on the right track to finding a solution yourself. If you had changed (?!\s) to (?!.*\s) and added the ^ and $ anchors to the end, it would work.
^((?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])(?!.*\s).{8,})$