Matching a string between two sets of characters without using lookarounds - regex

I've been working on some regex to try and match an entire string between two characters. I am trying to capture everything from "System", all the way down to "prod_rx." (I am looking to include both of these strings in my match). Below is the full text that I am working with:
\"alert_id\":\"123456\",\"severity\":\"medium\",\"summary\":\"System generated a Medium severity alert\\\\prod_rx.\",\"title\":\"123456-test_alert\",
The regex that I am using right now is...:
(?<=summary\\":\\").*?(?=\\")
This works perfectly when I am able to use lookarounds, such as in Regex101: https://regex101.com/r/jXltNZ/1. However, the regex parser in the software that my company uses does not support lookarounds (crazy, right?).
Anyway - my question is basically how can I match the above text described without using lookaheads/lookbehinds. Any help is VERY MUCH appreciated!!

Well, we can simply use other non-lookaround method, such as this simple expression:
.+summary\\":\\"(.+)\\",
and our data is in this capturing group:
(.+)
our right boundary is:
\\",
and our left boundary is:
.+summary\\":\\"
Demo

Related

Regex: extract characters from two patterns

I have the following string:
https://www.google.com/today/sunday/abcde2.hopeho.3345GETD?weatherType=RAOM&...
https://www.google.com/today/monday/jbkwe3.ho4eho.8495GETD?weatherType=WHTDSG&...
I'd like to extract jbkwe3.ho4eho.8495GETD or abcde2.hopeho.3345GETD. Anything between the {weekday}/ and the ?weatherType=.
I've tried (?<=sunday\/)$.*?(?=\?weatherType=) but it only works for the first line and I want to make it applicable to all strings regardless the value of {weekday}.
I tried (?<=\/.*\/)$.*?(?=\?weatherType=) but it didn't work. Could anyone familiar with Regex can lend some help? Thank you!
[Update]
I'm new to regex but I was experimenting it on sublime text editor via the "find" functionality which I think should be PCRE (according to this post)
Try this regex:
(?:sun|mon|tues|wednes|thurs|fri|satur)day\/\K[^?]+(?=\?weatherType)
Click for Demo
Link to Code
Explanation:
(?:sun|mon|tues|wednes|thurs|fri|satur)day - matches the day of a week i.e, sunday,monday,tuesday,wednesday,thursday,friday,saturday
\/ - matches /
\K - unmatches whatever has been matched so far and pretends that the match starts from the current position. This can be used for the PCRE.
[^?]+ - matches 1 or more occurences of any character that is not a ?
(?=\?weatherType) - the above subpattern[^?]+ will match all characters that are not ? until it reaches a position which is immediately followed by a ? followed by weatherType
To make the match case-insensitive, you can prepend the regex with (?i) as shown here
In the examples given, you actually only need to grab the characters between the last forward slash ("/") and the first question mark ("?").
You didn't mention what flavor regex (ie, PCRE, grep, Oracle, etc) you're using, and the actual syntax will vary depending on this, but in general, something like the following (Perl) replacement regex would handle the examples given:
s/.*\/([^?]*)\?.*/$1/gm
There are other (and more efficient) ways, but this will do the job.

Regex - Find the Shortest Match Possible

The Problem
Given the following:
\plain\f2 This is the first part of the note. This is the second part of the note. This is the \plain\f2\fs24\cf6{\txfielddef{\*\txfieldstart\txfieldtype1\txfieldflags144\txfielddataval44334\txfielddata 35003800380039000000}{\*\txfielddatadef\txfielddatatype1\txfielddata 340034003300330034000000}{\*\txfieldtext 20{\*\txfieldend}}{\field{\*\fldinst{ HYPERLINK "44334" }}{\fldrslt{20}}}}\plain\f2\fs24 part of the note.
I'd like to produce this:
\plain\f2 This is the first part of the note. This is the second part of the note. This is the third part of the note.
What I've Tried
The example input/output is a very simplified version of the data I need to parse and it would be nice to have a way to parse the data programmatically. I have a PHP application and I've been trying to use regex to match the segments that are important and then filter out the parts of the string that aren't required. Here's what I've come up with so far:
/\\plain.*?\\field{\\\*\\fldinst{ HYPERLINK "(.*?)" }}{\\fldrslt{(.*?)}}}}\\plain.*? /gm
regex101: https://regex101.com/r/ILLZU6/2
It almost matches what I want, but it but grabs the longest possible match instead of the shortest. I want it to match only one \\plain before the \\field{.... Maybe after the \\plain, I could match anything except for a space? How would I go about doing that?
I'm no regex expert, but my use-case really calls for it. (Otherwise, I'd just write code to handle everything.) Any help would be much appreciated!
(?:(?!\\plain).)* will match any string unless it contains a match for \\plain. Here's the regex implementing this:
/\\plain(?:(?!\\plain).)*\\field{\\\*\\fldinst{ HYPERLINK "(.*?)" }}{\\fldrslt{(.*?)}}}}\\plain.*? /gm
regex101: https://regex101.com/r/ILLZU6/5
Also, you can replace the space at the end with (?: |$) if you want to allow the end of the text to trigger it as well as a space:
/\\plain(?:(?!\\plain).)*\\field{\\\*\\fldinst{ HYPERLINK "(.*?)" }}{\\fldrslt{(.*?)}}}}\\plain.*?(?: |$)/gm
regex101: https://regex101.com/r/ILLZU6/4

Regex in search & replace: avoid fixed length of lookaround

In a long corpus of text, I want to make some corrections in certain
environments. However, I am encountering problems when using regex with text
editors. I switched to gedit to have an editor which supports regex in
search & replace.
Crucially, I only want to make changes if the line starts with a certain
pattern (\nm or \mb). The problem is that the element that I want to
replace (o' -> o'o) is not at a fixed length from the beginning of the line
and I can't include the regex in the lookbehind (the lookbehind fails).
Is there any way to include what I am looking for in a simple text editor
regex? Or is this already a step where I have to learn how to script in, for
example, Python?
This is what the regex looks like so far.
(?<=\\(nm|mb)).*o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Of course, I can't apply .* in the replace without losing its content.
Put a capture group around .* and a back-reference in the replacement.
Find: (?<=\\(nm|mb))(.*)o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Replace: \1o'o

Negative lookahead Regex Issue

I started looking lookaheads and tried to create a simple example, but for some reason it's not working properly when I try using negative lookahead.
I have the following simple regex:
href="(.+?)"(?!\s)
and this string:
test
test
Testing enviorment: https://regex101.com/r/JztPUe/1
I'm trying to take the url beween the href only if it's not followed by a space, but it seems that it doesn't undestand me, since it's getting the first and the second URL.
When I change it to a positive lookahead it's working as it should be and it takes only the second URL, but the negative one is not working as expected.
Can someone point where is my mistake?
You should consider using an HTML parser instead of trying to do this with a regex. That being said, you could just phrase your regex by insisting that what follows the href clause is not a space:
href="([^"]*)"[^ ]
Demo
Your current regex:
href="(.+?)"(?!\s)
works as expected in Regex 101 when slightly rewritten as this:
href="([^"]*)"(?!\s)
Demo
The issue you were having appears to be caused by the flavor of regex in your demo not supporting the lazy dot (.+?). This is a Perl extension and is not supported by all engines.
With space href="\K(\S+)"\s\K demo
Without space href="\K(\S+)">\K demo
\K escapes string sequences.

Regex word count - matching words with apostrophe

I'm trying to count words using Regex, with the following pattern:
#"\\w+"
This works, however it's matching it's as:
it
s
Is there a better way to match words that contain punctuation?
Also, words surrounded by punctuation, for example 'word' should also be matched (withhout the ')
The one way to test for such cases is:
#"\\w+(?:'\\w+)?"
So it will match both its and it's, but only its in its'.
I find this style readable, this is with hyphenated words.
'?([a-zA-z'-]+)'?
this is without hyphenation
'?([a-zA-z']+)'?
if you want quick and dirty regex testing with visual feedback you can use one of the many online regex testing tools, i like rubular.com (even for non ruby regex testing)