Regex for AND operator - regex

I need a regex that needs to match
start from origin to id= and ;to cases.
I applied "OR" condition but it satifies only one condition. Any suggestions?
origin=eBook;id=**N27F-00000-00**;type=cases
Regex:
(^(.*id=)|(;type=cases.*))

You are mistaking some fundamentals of regular expressions, which I'll explain in a minute. But for now, try this:
id=(.*?);type=cases
Regular expressions try to match as much as a string as possible. This means it can match part of a string, and you don't need to use .* on either side of the string (unless you want to capture that information).
Since we aren't matching the .* in the beginning, you won't need to start from the beginning of the string (^).
There is no such thing as an AND operator, since an entire regular expression must match by default.
Link
Update
This will still match the whole chunk of regex. Since I used parenthesis around the important part (N27F-00000-00), it will be placed in a "match group". If you don't want to deal with match groups, you can use "lookarounds":
(?<=id=).*?(?=;type=cases)
Link

Related

Regex everything after, but not including

I am trying to regex the following string:
https://www.amazon.com/Tapps-Top-Apps-and-Games/dp/B00VU2BZRO/ref=sr_1_3?ie=UTF8&qid=1527813329&sr=8-3&keywords=poop
I want only B00VU2BZRO.
This substring is always going to be a 10 characters, alphanumeric, preceded by dp/.
So far I have the following regex:
[d][p][\/][0-9B][0-9A-Z]{9}
This matches dp/B00VU2BZRO
I want to match only B00VU2BZRO with no dp/
How do I regex this?
Here is one regex option which would produce an exact match of what you want:
(?<=dp\/)(.*)(?=\/)
Demo
Note that this solution makes no assumptions about the length of the path fragment occurring after dp/. If you want to match a certain number of characters, replace (.*) with (.{10}), for example.
Depending on your language/method of application, you have a couple of options.
Positive look behind. This will make your regex more complicated, but will make it match what you want exactly:
(<=dp/)[0-9A-Z]{10}
The construct (<=...) is called a positive look behind. It will not consume any of the string, but will only allow the match to happen if the pattern between the parens is matched.
Capture group. This will make the regex itself slightly simpler, but will add a step to the extraction process:
dp/([0-9A-Z]{10})
Anything between plain parens is a capture group. The entire pattern will be matched, including dp/, but most languages will give you a way of extracting the portion you are interested in.
Depending on your language, you may need to escape the forward slash (/).
As an aside, you never need to create a character class for single characters: [d][p][\/] can equally well be written as just dp\/.

Regular expression for duplicate string

Hello I am trying to formulate the regular expression to find substring and replace portion of that string. I have input in the format
Some_text_beginning_AASHISH_XX_YY_COPY_COPY_COPY_COPY
Please see that every string will have word AASHISH and in the end there could be indeterminate number of COPY. I want to delete all the COPY
I wrote the regular expression as
(.*)_AASHISH_(.*)_COPY+
I could find all the valid expression with this. But when I try to replace it with
$1_AASHISH_$2
It replaces just the last _COPY All the _COPY which came before last one are taken to be in group 2.
Further see that I am not using any programming language. I am using some third party tool. All it allows me is to search for string and replace it. It allows me to write regular expression.
Just to clarify why this question is not the same as posted before, tool I am using does not allow me use all regular expression somehow. I dont know how that tool is created. I just have UI.
Thanks in advance
Here's a regex that will capture the whole portion you want to maintain, resulting in a replacement that's just $1.
(.*_AASHISH_.*?)(?:_COPY)+
A few notes:
.*? - The ? on the end makes the repetition operator * non-greedy. It will match the minimum characters given its context.
(?:_COPY) - The ?: prefix makes this a non-capturing grouping.
+ - The repetition operator will make the entire last group (_COPY) repeat 1 or more times, not just the Y.

Complicated regex to match anything NOT within quotes

I have this regex which scans a text for the word very: (?i)(?:^|\W)(very)[\W$] which works. My goal is to upgrade it and avoid doing a match if very is within quotes, standalone or as part of a longer block.
Now, I have this other regex which is matching anything NOT inside curly quotes: (?<![\S"])([^"]+)(?![\S"]) which also works.
My problem is that I cannot seem to combine them. For example the string:
Fred Smith very loudly said yesterday at a press conference that fresh peas will "very, very defintely not" be served at the upcoming county fair. In this bit we have 3 instances of very but I'm only interested in matching the first one and ignore the whole Smith quotation.
What you describe is kind of tricky to handle with a regular expression. It's difficult to determine whether you are inside a quote. Your second regex is not effective as it only ignores the first very that is directly to the right of the quote and still matches the second one.
Drawing inspiration from this answer, that in turn references another answer that describes how to regex match a pattern unless ... I can capture the matches you want.
The basic idea is to use alternation | and match all the things you don't want and then finally match (and capture) what you do want in the final clause. Something like this:
"[^"]*"|(very)
We match quoted strings in the first clause but we don't capture them in a group and then we match (and capture) the word very in the second clause. You can find this match in the captured group. How you reference a captured group depends on your regex environment.
See this regex101 fiddle for a test case.
This regex
(?i)(?<!(((?<DELIMITER>[ \t\r\n\v\f]+)(")(?<FILLER>((?!").)*))))\bvery\b(?!(((?<FILLER2>((?!").)*)(")(?<DELIMITER2>[ \t\r\n\v\f]+))))
could work under two conditions:
your regex engine allows unlimited lookbehind
quotes are delimited by spaces
Try it on http://regexstorm.net/tester

How to continue a match in Regex

price:(?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?
This Regex matches the following examples correctly.
price:1.00-342
price:.1-23
price:4
price:min-900.00
price:.10-.50
price:45-100
price:453.23-231231
price:min-max
Now I want to improve it to match these cases.
price:4.45-8.00;10.45-14.50
price:1.00-max;3-12;23.34-12.19
price:1.00-2.50;min-12;23.34-max
Currently the match stops at the semi colon. How can I get the regex to repeat across the semi-colon dividers?
Final Solution:
price:(((\d*\.)?\d+|min)-?((\d*\.)?\d+|max)?;?)+
Add an optional ; at the end, and make the whole pattern to match one or more:
price:((?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?;?)+
(?:\d+)? is the same thing as \d*, and (?:\.)? can just be \.?. Simplified, your original regex is:
price:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?
You have two choices. You can either do price([:;]range)* where range is the regex you have for matching number ranges, or be more precise about the punctuation but have to write out range twice and do price:range(;range)*.
price([:;]range)* -- shorter but allows first ':' to be ';'
price:range(;range)* -- longer but gets colon vs semi-colon correct
Pick one of these two regexes:
price[:;](?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?
price:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?(?:(?:\d*\.?\d+|min)(?:-(?:\d*\.?\d+|max))?)*
First there are some issues with your regular expression: to match xx.yyy instead of the expression (?:\d+)?(?:\.)?\d+ you can use this (?:\d*\.)?\d+. This can only match in one way so it avoids unnecessary backtracking.
Also currently your regular expression matches things like price:minmax and price:1.2.3 which I assume you do not want to match.
The simple way to repeat your match is to add a semi-colon and then repeat your regular expression verbatim.
You can do it like this though to avoid writing out the entire regular twice:
price:(?:(?:(?:\d*\.)?\d+|min)(?:-(?:(?:\d*\.)?\d+|max))?(?:;|$))*
See it in action on Rubular.
price:((?:(?:\d+)?(?:\.)?\d+|min)-?(?:(?:\d+)?(?:\.)?\d+|max)?;?)+
I'm not sure what's up with all of the ?'s (I know the syntax, I just don't know why you're using it so much), but that should do it for you.

How to get the inverse of a regular expression?

Let's say I have a regular expression that works correctly to find all of the URLs in a text file:
(http://)([a-zA-Z0-9\/\.])*
If what I want is not the URLs but the inverse - all other text except the URLs - is there an easy modification to make to get this?
You could simply search and replace everything that matches the regular expression with an empty string, e.g. in Perl s/(http:\/\/)([a-zA-Z0-9\/\.])*//g
This would give you everything in the original text, except those substrings that match the regular expression.
If for some reason you need a regex-only solution, try this:
((?<=http://[a-zA-Z0-9\/\.#?/%]+(?=[^a-zA-Z0-9\/\.#?/%]))|\A(?!http://[a-zA-Z0-9\/\.#?/%])).+?((?=http://[a-zA-Z0-9\/\.#?/%])|\Z)
I expanded the set of of URL characters a little ([a-zA-Z0-9\/\.#?/%]) to include a few important ones, but this is by no means meant to be exact or exhaustive.
The regex is a bit of a monster, so I'll try to break it down:
(?<=http://[a-zA-Z0-9\/\.#?/%]+(?=[^a-zA-Z0-9\/\.#?/%])
The first potion matches the end of a URL. http://[a-zA-Z0-9\/\.#?/%]+ matches the URL itself, while (?=[^a-zA-Z0-9\/\.#?/%]) asserts that the URL must be followed by a non-URL character so that we are sure we are at the end. A lookahead is used so that the non-URL character is sought but not captured. The whole thing is wrapped in a lookbehind (?<=...) to look for it as the boundary of the match, again without capturing that portion.
We also want to match a non-URL at the beginning of the file. \A(?!http://[a-zA-Z0-9\/\.#?/%]) matches the beginning of the file (\A), followed by a negative lookahead to make sure there's not a URL lurking at the start of the file. (This URL check is simpler than the first one because we only need the beginning of the URL, not the whole thing.)
Both of those checks are put in parenthesis and OR'd together with the | character. After that, .+? matches the string we are trying to capture.
Then we come to ((?=http://[a-zA-Z0-9\/\.#?/%])|\Z). Here, we check for the beginning of a URL, once again with (?=http://[a-zA-Z0-9\/\.#?/%]). The end of the file is also a pretty good sign that we've reached the end of our match, so we should look for that, too, using \Z. Similarly to a first big group, we wrap it in parenthesis and OR the two possibilities together.
The | symbol requires the parenthesis because its precedence is very low, so you have to explicitly state the boundaries of the OR.
This regex relies heavily on zero-width assertions (the \A and \Z anchors, and the lookaround groups). You should always understand a regex before you use it for anything serious or permanent (otherwise you might catch a case of perl), so you might want to check out Start of String and End of String Anchors and Lookahead and Lookbehind Zero-Width Assertions.
Corrections welcome, of course!
If I understand the question correctly, you can use search/replace...just wildcard around your expression and then substitute the first and last parts.
s/^(.*)(your regex here)(.*)$/$1$3/
im not sure if this will work exactly as you intend but it might help:
Whatever you place in the brackets [] will be matched against. If you put ^ within the bracket, i.e [^a-zA-Z0-9/.] it will match everything except what is in the brackets.
http://www.regular-expressions.info/