Finding a pattern with optional end using regular expression - regex

I am looking for one single regular expression to extract a block of text, which can be surrounded with an optional end. The challenge here is just to use a single regular expression.
The input is as follows:
Anchor: This is the text I want to extract A/C : 2015-5-20
Anchor: This is the text I want to extract
I am currently using the following regular expression
Anchor:(?<extact>.*)(A\/C)
The result looks as follows:
If I make the A/C block optional, Anchor:(?<extact>.*)(A\/C)? using a ? the matching gets to long:
It looks as follows:
Any ideas how to elegantly solve this with a single regex. An additional constraint is that I want to have a named block in the regex, (here extact)
You can find the sample code on regex101: https://regex101.com/r/wH5iQ4/1

Anchor:(?<extact>.*?)\s*(?=A\/C|$)
You can make use of lookahead here.See demo.
https://regex101.com/r/wH5iQ4/3

Related

Regular Expression - Starting and ending with, and contains specific string in the middle

I would like to generate a regex with the following condition:
The string "EVENT" is contained within a xml tag called "SHEM-HAKOVETZ".
For example, the following string should be a match:
<SHEM-HAKOVETZ>104000514813450EVENTS0001dfd0.DAT</SHEM-HAKOVETZ>
I think you want something like this ^<SHEM-HAKOVETZ>.*EVENT.*<\/SHEM-HAKOVETZ>$
Regular expression
^<SHEM-HAKOVETZ>.*EVENTS.*<\/SHEM-HAKOVETZ>$
Parts of the regular expression
^ From the beginning of the line
<SHEM-HAKOVETZ> Starting tag
.* Any character - zero or more
EVENT Middle part
<\/SHEM-HAKOVETZ>$ Ending part of the match
Here is the working regex.
If you want to match this line, you could use this regex:
<SHEM-HAKOVETZ>*EVENTS.*(?=<\/SHEM-HAKOVETZ>)
However, I would not recommend using regex XML-based data, because there may be problems with whitespace handling in XML (see this article for more information). I would suggest using an actual XML parser (and then applying the reg to be sure about your results.
Here is a solution to only match the "value" part ignoring the XML tags:
(?<=<SHEM-HAKOVETZ>)(?:.*EVENTS.*)(?=<\/SHEM-HAKOVETZ>)
You can check it out in action at: https://regex101.com/r/4XiRch/1
It works with Lookbehind and Lookahead to make sure it will only match if the tags are correct, but for further coding will only match the content.

inclusive exclusion in regular expression

Trying to create an inclusive exclusion in regular expression using the following syntax. Not having much succcess so figured try my luck with stackoverflow.
EG of URL I'm trying run exclusion on is:
'https://somesite.domain.com:port/folder1/subfolder1/subfolder2/18`
Regex I've for it is:
\d{2,3}\/folder1\/subfolder1\/subfolder2(?!18)\d
The above regex cover all from 181-189. I only want to see 18.
You need to use the end of the string anchor:
\d{2,3}\/folder1\/subfolder1\/subfolder2\/(?!18$)\d
or a slash or another character if your string is only a substring (not at the end):
\d{2,3}\/folder1\/subfolder1\/subfolder2\/(?!18\/)\d

Regular Expression to unmatch a particular string

I am trying to use regular expression in Jmeter where in I need to unmatch a particular string. Here is my input test string : <activationCode>insvn</activationCode>
I need to extract the code insvn from it. I tried using the expression :
[^/<activationCode>]\w+, but does not yield the required code. I am a newbie to regular expression and i need help with this.
Can you use look-behind assertion in jmeter? If so, you can use thatr regex which will give you a word that follows <activationCode>
(?<=\<activationCode\>)\w+
If your input string is encoded (e.g for HTML), use:
(?<=\<activationCode\>)\w+
When designing a regular expression in any language for something like this you can match your input string as three groups: (the opening tag, the content, and the closing tag) then select the content from the second group.

how to modify this regular expression to exclude the 3rd type of result

Here are three patterns which may occur in the search string:
.+?
<font color=green>.+?</font>
<b><font color=green>.+?</font></b>
The expression I wrote matches all of the above:
(<font color=.+?>)?(.+?)(</font>)?
How can I write a regular expression to match only the first and the second string, the third one should be excluded in the result.
Generally, you should avoid parsing (X)HTML with regex.
In your case, you may be able to avoid matching tags in the contained text using an expression like
(<font color=.+?>)?([^<]*?)(</font>)?
Note that this will ignore all tags in the <a> content.

Multiple max lengths in a regular expression

I have the following regular expression:
[0-9]{7}-[0-9]{1}$
I should be able to match the following patterns:
1234567-8
3142539-1
But not the following:
12345678-1
1234567-12
Currently my regex matches 12345678-1 but not 1234567-12 (in JavaScript). Both should fail. What am I doing wrong?
Your pattern would match any string that ends($) with [0-9]{7}-[0-9]{1} and so it would match those inputs..
Use ^(start of the string) to specify that you want to match exactly..
^[0-9]{7}-[0-9]{1}$