Need to refine this regex expression - regex

The input I can get might be
/DemoSystems/DemoFramework/MyRepo/MyModule/tags/2015_02_22
or
/DemoSystems/DemoFramework/MyRepo/MyModule/tags/2015_02_22/Demo.Tests/AverageTests.cs
I need to extract in both cases.
/DemoSystems/DemoFramework/MyRepo/MyModule/tags/2015_02_22
Regex:
^(.*?)tags
is matching till
/DemoSystems/DemoFramework/MyRepo/MyModule/tags
And added complexity is that 2015_02_22 can be anything. A mix of number alphabets and whitespaces. Basically depends on developer. So in other words I have to match till 'tags' + the next folder after it.
Any pointers?

You can use:
.*?tags\/[^\/]+
It will match anything from the start of the line until the word tags, the / after the word tags and the following characters until another / (excluding that) or the end of the string.
Online demo

Related

Allowing words picked up in regex in certain cases only

I have a regex expression to look for people just sticking "N/A" or similar into a form field.
^(?!(\b(N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)\b))
Probably not the most elegant I am sure. However I cannot for the life of me get it to allow the above words if followed by something.
So if someone just types "yes" then I want it to fail the regex check. But if someone types "yes, I have blah blah etc etc" I want it to pass.
The expression I have allows the word to be used as long as it isn't the first word in the sentence. I just want to disallow the listed words as the ONLY words in the field.
Any ideas?
Thanks
You may remove the first \b (it is redundant between the start of string and a word char) and replace the second one with $ (end of string):
^(?!(?:N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)$)
See the regex demo
With a case insensitive option, you may reduce the pattern to
^(?!(?:n/?a|yes|no)$)
See another regex demo
Details
^ - start of string, then...
(?!(?:n/?a|yes|no)$) - a location in string that is not immediately followed with n/?a (na, n/a), yes or no that are followed with the end of string.
In human words, only the start of string is matched if the whole string is not equal to the alternatives inside the alternation group.
The easiest way would be to match all the forbidden strings exactly and invert the result.
Try ^(n/?a|yes|no)$ with a case-insensitive option and invert the result.
^ matches the beginning of the string. $ matches the end of the string.
When you don't have a case-insensitive option, use ^([nN]/?[aA]|[yY][eE][sS]|[nN][oO])$.

Regex match till end of text

I'm using Regex to match whole sentences in a text containing a certain string. This is working fine as long as the sentence ends with any kind of punctuation. It does not work however when the sentence is at the end of the text without any punctuation.
This is my current expression:
[^.?!]*(?<=[.?\s!])string(?=[\s.?!])[^.?!]*[.?!]
Works for:
This is a sentence with string. More text.
Does not work for:
More text. This is a sentence with string
Is there any way to make this word as intended? I can't find any character class for "end of text".
End of text is matched by the anchor $, not a character class.
You have two separate issues you need to address: (1) the sentence ending directly after string, and (2) the sentence ending sometime after string but with no end-of-sentence punctuation.
To do this, you need to make the match after string optional, but anchor that match to the end of the string. This also means that, after you recognize an (optional) end-of-sentence punctuation mark, you need to match everything that follows, so the end-of-string anchor will match.
My changes: Take everything after string in your original regex and surround it in (?:...)? - the (?:...) being a "non-remembered" group, and the ? making the entire group optional. Follow that with $ to anchor the end of the string.
Within that optional group, you also need to make the end-of-sentence itself optional, by replacing the simple [.?!] with (?:[.?!].*)? - again, the (?:...) is to make a "non-remembered" group, the ? makes the group optional - and the .* allows this to match as much as you want after the end-of-sentence has been found.
[^.?!]*(?<=[.?\s!])string(?:(?=[\s.?!])[^.?!]*(?:[.?!].*)?)?$
The symbol for end-of-text is $ (and, the symbol for beginning-of-text, if you ever need it, is ^).
You probably won't get what you're looking for with by just adding the $ to your punctuation list though (e.g., [.?!$]); you'll find it works better as an alternative choice: ([.?!]|$).
Your regex is way too complex for what you want to achieve.
To match only a word just use
"\bstring\b"
It will match start, end and any non-alphanum delimiters.
It works with the following:
string is at the start
this is the end string
this is a string.
stringing won't match (you don't want a match here)
You should add the language in the question for more information about using.
Here is my example using javascript:
var reg = /^([\w\s\.]*)string([\w\s\.]*)$/;
console.log(reg.test('This is a sentence with string. More text.'));
console.log(reg.test('More text. This is a sentence with string'));
console.log(reg.test('string'))
Note:
* : Match zero or more times.
? : Match zero or one time.
+ : Match one or more times.
You can change * with ? or + if you want more definition.

Skip Second String Between Characters with Regex

I've been working on a regex issue. I have a lot of lines formatted like this:
3240985|#Apple.-+240538|34346|346356356|36433565|6agf8s89auf
The end goal should look like this:
#Apple.-+240538|6agf8s89auf
#Apple.-+240538 is random characters, and 6agf8s89auf is random alphanumeric characters.
I've been using (.*?)[\|] and replacing the parts I need with blank characters in Notepad++ but it's impossible to complete it this way with the number of lines I have.
The regex for this kind of string is (?:(?<=^)|(?<=\|))(\d+(?:$|\|))
Demo: https://regex101.com/r/sO0fZ2/2
However Find and Replace in Notepad++ may have some issues because Notepad++ finds and replace strings only once. Some other text editors like, sublime text find and replaces the contents recursively. However you can simple overcome this by clicking Replace All button multiple times.
Input
Result after clicking "Replace All in All Opened Documents" twice
In sublime text, you can achieve this in single click:
Input
Result
P.S.: I'm not aware if there's any feature in Notepad++ that finds and replaces the content recursively. You can google for that. If there's any feature like that, then you can use it. However, I think that this shouldn't be a problem because it will only require a couple of more clicks.
There is a simple approach with an alternation:
^\d+\||\|\d+(?=\||$)
Details:
^\d+\| - Branch 1 matching a chunk of 1+ digits (\d+) at the beginning of the string (^) and a | after them
| - alternation operator meaning OR
\|\d+(?=\||$) - a literal pipe (\|, must be escaped) with 1+ digits after it (\d+) that are followed with a literal pipe or end of string ((?=...) is a positive lookahead that does not advance the regex index, thus, you can still match adjacent matches with the same pattern.)

Match pattern anywhere in string?

I want to match the following pattern:
Exxxx49 (where x is a digit 0-9)
For example, E123449abcdefgh, abcdefE123449987654321 are both valid. I.e., I need to match the pattern anywhere in a string.
I am using:
^*E[0-9]{4}49*$
But it only matches E123449.
How can I allow any amount of characters in front or after the pattern?
Remove the ^ and $ to search anywhere in the string.
In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.
I would go for
^.*E[0-9]{4}49.*$
EDIT:
since it fullfills all requirements state by OP.
"[match] Exxxx49 (where x is digit 0-9)"
"allow for any amount of characters in front or after pattern"
It will match
^.* everything from, including the beginning of the line
E[0-9]{4}49 the requested pattern
.*$ everthing after the pattern, including the the end of the line
Your original regex had a regex pattern syntax error at the first *. Fix it and change it to this:
.*E\d{4}49.*
This pattern is for matching in engines (most engines) that are anchored, like Java. Since you forgot to specify a language.
.* matches any number of sequences. As it surrounds the match, this will match the entire string as long as this match is located in the string.
Here is a regex demo!
Just simply use this:
E[0-9]{4}49
How do I allow for any amount of characters in front or after pattern? but it only matches E123449
Use global flag /E\d{4}49/g if supported by the language
OR
Try with capturing groups (E\d{4}49)+ that is grouped by enclosing inside parenthesis (...)
Here is online demo

Write a wildcard that matches specific delimiter in Word

I'm writing a wildcard string in Word that should match:
{0>yadayada<}100{>yadayada<0}
Where yadayada can be anything EXCEPT the start of a new delimiter denoted by: {0>
This is what I have so far:
(\{0\>)*(\<\}100\{\>)*(\<0\})
This works except that the first '*' keeps matching tekst until it finds <}100{>yadayada<0}
I need to change it so that the * selects everything EXCEPT strings that contain '{0>'
I tried this by changing the first * with
[!(\{0>)]*
Or everything together:
(\{0\>)[!(\{0>)]*(\<\}100\{\>)*(\<0\})
But this evidently doesn't work.
Please help!
Try this:
\{0>.+?(?=\{0>)
You only need to escape the \{
What this regular expression says is:
Match all strings containging {0> then any text one or more times .+ and the ? at the end tells the regex engine to do a lazy search, since .+ will consume all characters if you let it. The lazy search says find the least amount of characters until the next part of the regex can take over.
Then the (?=\{0>) says to match the next deliminter but do not include it in selection.
Hope this helps!