Reg expression to get a string starting from particular string - regex

I'm trying to write a regular expression which returns a string after a particular string.
For example:
The string is
"<https://meraki/api/v1/sm/devices?fields%5B%5D=imei%2Ciccid%2ClastConnected%2CownerEmail%2C+ownerUsername%2CphoneNumber&perPage=1000&startingAfter=0>; rel=first"
result I'm expecting is -- first.
Here is the expression i'm using
(?<=rel=\s").*(?=\)

Okay so this should work:
(?<=rel[=])[^"]*
I would advise looking over the syntax of regex again, because yours was not even matching the colons correctly. Look behinds (?<=pattern) match before the pattern you want to capture. Likewise look aheads (?=pattern) match after the pattern.
You can test your regex online here (or many other sites). They will show you the matching groups and errors, but will also explain what certain parts of the pattern do.

Related

Trying to extract repeating pattern from string in php/javascript

The following is in PHP but the regex will also be used in javascript.
Trying to extract repeating patterns from a string
string can be any of the following:
"something arbitrary"
"D123"
"D111|something"
"D197|what.org|when.net"
"D297|who.197d234.whatever|when.net|some other arbitrary string"
I'm currently using the following regex: /^D([0-9]{3})(?:\|([^\|]+))*/
This correctly does not match the first string, matches the second and third correctly. The problem is the third and fourth only match the Dxxx and the last string. I need each of the strings between the '|' to be matched.
I'm hoping to use a regex as it makes it a single step. I realize I could just detect the leading Dxxx then use explode or split as appropriate to break the strings out. I've just gotten stuck on wanting a single regular expression match step.
This same regex may be used in Python as well so just want a generic regex solution.
There is no way to have a dynamic number of capture groups in a regular expression, but if you know some upper limit to how many parts you would have in one string, you can just repeat the pattern that many times:
/^D([0-9]{3})(?:$|\|)(.*?)(?:$|\|)(.*?)(?:$|\|)(.*?)(?:$|\|)(.*?)(?:$|\|)/
So after the initial ^D([0-9]{3})(?:$|\|) you just repeat (.*?)(?:$|\|) as many times as you need it.
When the string has fewer elements, those remaining capture groups will match the empty string.
See regex tester.
Is something like preg_match_all() (the PHP variant of a global match) also acceptable for you?
Then you could use:
^(?|D([0-9]{3})|^.+$|(?!^)\|([^|\n]*)(?=\||$))
This will match everything in a string in different matches, e.g. take your string:
D197|what.org|when.net
It will you then give three matches:
D197
what.org
when.net
Running live: https://regex101.com/r/jL2oX6/4 (Everything in green are your group matches. Ignore what's in blue.)

Regular expression, match anything but these strings

Within Splunk I have a number of field extractions for extracting values from uri stems. I have a few which match a specific pattern, I now want another regex which matches anything but these.
^/SiteName/[^/]*/(?<a_request_type>((?!Process)|(?!process)|(?!Assets)|(?!assets))[^/]+)
The regex above is what I have so far. I am expecting the negative lookaheads to prevent it from matching Process, process, assets or Assets. However it seems that the [^/]+ after these lookaheads can then go ahead and match these strings anyway. Resulting in this regex sometimes overriding the other regexes I wrote to accept these strings
What is the correct syntax for me to make the regex match any string, other than those specified in the negative lookaheads?
Thanks!
Negative lookaheads do not consume any of the string being searched. When you want multiple negative lookaheads, there is no need to separate them with | (OR). Try this:
^/SiteName/[^/]*/(?<a_request_type>((?![Pp]rocess)(?![Aa]ssets))[^/]+)
Note that I have combined your lookaheads ([Pp]rocess and [Aa]ssets) to make the regular expression more concise.
Live test.

regular expressions: first match vs greedy match

Consider the regular expression \d*
If I try to match this against the string JJJ123, Vertica's regex functions say it matches against the string of width zero at the beginning.
If I try it instead in matlab, it reports a match starting at the character 1.
The Vertica docs say that its regex engine is PCRE. I can't find much on matlab's, though I found hints that it's similar to perl's.
Which of the behaviors is more standard for perl-like regex engine?
Matlab's regexp has an emptymatch option that controls whether it will allow an entire regex expression to match an empty string. It is off ("noemptymatch") by default. See help regexp.
Vertica's matching the 0-length empty string at the beginning is normal behavior for most regex dialects that I know, including anything Perl-like.
To get the same behavior as Vertica, where it can match 0-length strings, pass the 'emptymatch' option in your regexp call. Also pass 'once' to prevent it from matching the empty spaces between each and every character in your string.
[a,b,c,d] = regexp('JJJ123', '\d*', 'emptymatch', 'once')

Can you put optional tokens within a positive look behind of a regular expression?

I have the following content with what I think are the possible cases of someone defining an link:
hello link what <a href=something.jpg>link</a>
I also have the following regular expression with a positive look behind:
(?<=href=["\'])something
The expression matches the word "something" in the first two links. In an attempt to capture the third instance of "something" in the link without any quotes, I thought making the ["\'] token optional (using ?) would capture it. The expression now looks like this:
(?<=href=["\']?)something
Unfortunately it now does not mach any of the instances of "something". What could I be doing incorrectly? I'm using http://gskinner.com/RegExr/ to test this out.
Many regex flavors only support fixed-length lookbehind assertions. If you have an optional token in your lookbehind, its length isn't fixed, rendering it invalid.
So the real question is: What regex flavor are you actually targeting with your regex?

Regex: Does not have/include pattern

I have a regex pattern to match an HTML script tag. How can I change this script tag pattern so that the patterns means "input string DOES NOT MATCH" the script tag pattern?
In other words, given a pattern, what is the alteration needed to change the meaning of the pattern to "does not match this pattern"?
For example, if I have a pattern: \d{3}-\d{3}-\d{4}, what is the equivalent pattern for this that means "does not match \d{3}-\d{3}-\d{4}"?
You can negate a regex pattern by using a negative lookahead. This is slightly different than simply negating the regex though. Negative lookahead would look like the following in Java (and many other languages):
(?!\d{3}-\d{3}-\d{4})
It should be noted that this doesn't exactly answer the question. Finding the inverse of a regular language is not an easy task using a regular expression (I don't think). A much easier way to solve the problem would be to inverse the program logic:
Instead of:
if (string.matches(yourRegex))
Do:
if (!string.matches(yourRegex))
That is not easily achievable for arbitrary patterns. In practice, it's almost always easier to do what you want in the surrounding code than in the pattern itself. For instance, instead of
grep '\d{3}-\d{3}-\d{4}' file
you could use
grep -v '\d{3}-\d{3}-\d{4|' file
Or in a program you could change something like
if (pattern.matches()) {
foo();
}
into something like
if (!pattern.matches()) {
foo();
}
In a more tedious approach, you would have to enumerate all possible values that should match instead of what should not match. So, say you want to match everything but the string <html>, you could write a regex like so:
([^<]|<([^h]|h([^t]|t([^m]|m([^l]|l[^>])))))
Reading that regex is like saying: "Okay, you can match any character but '<', or you could match '<' but then you can't match an 'h' after that... or you do match an 'h' after that but then you can't match a 't' after that... and so on.
It's butt ugly, but then again, for simple string matches, you can easily write a recursive function that transforms any given term into a pattern like the above.
easier to just negate the test surely? eg...
if (!regex.test(str)) ...
(javascript example)
Negating a character class is easy with ^ but a whole regex will get much more convoluted.
What language are you using? The easiest solution to the specific problem you stated is to simply prepend a negation operator (usually "!") to the match.
I definitely agree with the other answers saying you should negate testing for a match, but this should do what you want using just a regex:
(?!.*\d{3}-\d{3}-\d{4})
This is a negative lookahead, by not placing any characters outside of the lookahead the regex basically means "fail on any string that starts with any number of characters (.*) followed by the regex \d{3}-\d{3}-\d{4}".