Regex to select something between a #OnlinePopup and _ - regex

Can someone show me a regex to select #OnlinePopup_AFE53E2CACBF4D8196E6360D4DDB6B70 its okay to assume #OnlinePopup
~DCTM~dctm://aicpcudev/37004e1f8000219e?DMS_OBJECT_SPEC=RELATION_ID#OnlinePopup_AFE53E2CACBF4D8196E6360D4DDB6B70_11472026_1214836152225_6455280574472127786

NB: The following is .NET Regex syntax, modify for your flavour.
The following:
#[^_]+_[^_]+
will match:
Hash
One or more characters until an underscore
Underscore
One or more characters until an underscore
If the first bit is constant, and you want to be more specific you could use:
#OnlinePopup_[A-F0-9]+
This will match
OnlinePopup_ (exactly)
One or more hex characters until a non Hex character

Simply matching anything between the first '#' and the first or last '_' will not work for your example since the string that you want returned has an underscore in it. If all the text that you want to match has only one underscore in it, you could use this regex:
/(#[^_]+_[^_]+)/
This matches an octothorpe (#), followed by two strings that do not contain an underscore, seperated by a single underscore.

Something a little simpler:
(\#OnlinePopup_.*?)_
Assuming your text starts with # and ends with _

Related

Regex match till end of text

I'm using Regex to match whole sentences in a text containing a certain string. This is working fine as long as the sentence ends with any kind of punctuation. It does not work however when the sentence is at the end of the text without any punctuation.
This is my current expression:
[^.?!]*(?<=[.?\s!])string(?=[\s.?!])[^.?!]*[.?!]
Works for:
This is a sentence with string. More text.
Does not work for:
More text. This is a sentence with string
Is there any way to make this word as intended? I can't find any character class for "end of text".
End of text is matched by the anchor $, not a character class.
You have two separate issues you need to address: (1) the sentence ending directly after string, and (2) the sentence ending sometime after string but with no end-of-sentence punctuation.
To do this, you need to make the match after string optional, but anchor that match to the end of the string. This also means that, after you recognize an (optional) end-of-sentence punctuation mark, you need to match everything that follows, so the end-of-string anchor will match.
My changes: Take everything after string in your original regex and surround it in (?:...)? - the (?:...) being a "non-remembered" group, and the ? making the entire group optional. Follow that with $ to anchor the end of the string.
Within that optional group, you also need to make the end-of-sentence itself optional, by replacing the simple [.?!] with (?:[.?!].*)? - again, the (?:...) is to make a "non-remembered" group, the ? makes the group optional - and the .* allows this to match as much as you want after the end-of-sentence has been found.
[^.?!]*(?<=[.?\s!])string(?:(?=[\s.?!])[^.?!]*(?:[.?!].*)?)?$
The symbol for end-of-text is $ (and, the symbol for beginning-of-text, if you ever need it, is ^).
You probably won't get what you're looking for with by just adding the $ to your punctuation list though (e.g., [.?!$]); you'll find it works better as an alternative choice: ([.?!]|$).
Your regex is way too complex for what you want to achieve.
To match only a word just use
"\bstring\b"
It will match start, end and any non-alphanum delimiters.
It works with the following:
string is at the start
this is the end string
this is a string.
stringing won't match (you don't want a match here)
You should add the language in the question for more information about using.
Here is my example using javascript:
var reg = /^([\w\s\.]*)string([\w\s\.]*)$/;
console.log(reg.test('This is a sentence with string. More text.'));
console.log(reg.test('More text. This is a sentence with string'));
console.log(reg.test('string'))
Note:
* : Match zero or more times.
? : Match zero or one time.
+ : Match one or more times.
You can change * with ? or + if you want more definition.

Limiting RegEx to match only a string of 1-254 characters length

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.

Regex for this particular pattern

I have three different things
xxx
xxx>xxx
xxx>xxx>xxx
Where xxx can be any combination of letters and number
I need a regex that can match the first two but NOT the third.
To match ASCII letters and digits try the following:
^[a-zA-Z0-9]{3}(>[a-zA-Z0-9]{3})?$
If letters and digits outside of the ASCII character set are required then the following should suffice:
^[^\W_]{3}(>[^\W_]{3})?$
^\w+(?:>\w+)?$
matches an entire string.
\w+(?:>\w+)?\b(?!>)
matches strings like this in a larger substring.
If you want to exclude the underscore from matching, you can use [\p{L]\p{N}] instead (if your regex engine knows Unicode), or [^\W_] if it doesn't, as a substitute for \w.

regex to grep all numbers after the second-last underscore

I want to get all the character's after the second last underscore in a string any ideas how this could be accomplished
Input Output
PART1_PART2_PART3_G2010 PART3_G2010
any idea what the regex should look like
.*_([^_]*_[^_]*)$
Isn't bound to a specific total count of parts between the underscores, like the regex of Andrea Spadaccini is.
edit
The first two symbols .* capture every character, cause . captures one arbitrary character and * is a quantifier for "as much as possible". Then, a underscore should appear.
The expression in brackets should capture the two parts between underscores. Well, at first, we capture all (again the *) non-underscore-characters:
This is done using the square brackets and saying we want any character except (^) the underscore => [^_]. The very last symbol $ defines the end of the input string. I think it is possible to either leave this OR the .* in the beginning out...
Andrea Spadaccini's answer works if you know that the input has three underscores. If the question was meant more generally, referring to everything after the second underscore independent of how many underscores come before that, the regex needs to search from the end ($) like this:
_([^_]*_[^_]*)$
First N not-underscores, than an underscore. Repeat. Group the last characters.
[^_]*_[^_]*_(.*)

Matching parts of string that contain no consecutive dashes

I need a regex that will match strings of letters that do not contain two consecutive dashes.
I came close with this regex that uses lookaround (I see no alternative):
([-a-z](?<!--))+
Which given the following as input:
qsdsdqf--sqdfqsdfazer--azerzaer-azerzear
Produces three matches:
qsdsdqf-
sqdfqsdfazer-
azerzaer-azerzear
What I want however is:
qsdsdqf-
-sqdfqsdfazer-
-azerzaer-azerzear
So my regex loses the first dash, which I don't want.
Who can give me a hint or a regex that can do this?
This should work:
-?([^-]-?)*
It makes sure that there is at least one non-dash character between every two dashes.
Looks to me like you do want to match strings that contain double hyphens, but you want to break them into substrings that don't. Have you considered splitting it between pairs of hyphens? In other words, split on:
(?<=-)(?=-)
As for your regex, I think this is what you were getting at:
(?:[^-]+|-(?<!--)|\G-)+
The -(?<!--) will match one hyphen, but if the next character is also a hyphen the match ends. Next time around, \G- picks up the second hyphen because it's the next character; the only way that can happen (except at the beginning of the string) is if a previous match broke off at that point.
Be aware that this regex is more flavor dependent than most; I tested it in Java, but not all flavors support \G and lookbehinds.