Regex problems in TextMate - regex

Regular Expressions are new to me (yet they are wonderful and useful :D). However, after trying to use them in TextMate, I'm getting unexpected results. I'm not sure if that's a bug or that's how regular expressions work.
I have this code
begin text in the middle end and more text and a
second begin second text in the middle end
Searching with begin.+end I would expect two results
begin text in the middle end and
begin second text in the middle end
But I get the whole text selected; I would expect begin.+end to search for .+ until the first end is found, but it searches until the last one.
Is that how they work? Where could I learn how to use regular expressions?
The truth is I'm interested in just selecting the inside .+ without begin and end but that's another question.

Use the below regex to get the strings between begin and end,
(?<=begin).+?(?=end)
DEMO
Explanation:
(?<=begin) Positive look-behind is used to match after a specific pattern. In this, regex engine sets the matching marker just after to begin.
.+? Matches one or more characters.? makes the regex non-greedy so it would results in a shortest match.
(?=end) Once it finds the string end, regex engine stops matching. Thus giving you the characters between begin and end.

Related

Notepad++ find under a condition?

This is my sentence
「システム、スキャンモード。特定しました。
This sentence has a CRLF at the end. I wish to match the 。CRLF at the end, but ONLY if the string starts with 「.
I thought it would not be too hard to do this but I couldn't do it.
I tried multiple variations of
^(?=「).*。\R
This will go through the condition, but matches the whole line instead of just 。CRLF
I am a regex newbie, so I think this is probably not hard at all. I am just not very knowledgeable about it.
You can match 「 at the beginning of a line first and then use \K to discard the current matched text from the final match:
^「.*\K。\R

Regular expression to delete all words between two specific words

I'm normally ok with regex but I'm struggling with this.
I have a simple file with two words that start and end a set of data. The data between the words changes but - start and status are always in the same place.
Example :
start
Everything in between
status
I'm trying to work out how to delete (replace) everything between and including start and status
I'm sure I had it working with this at one time
(?i)^start.+?status
set(#replaceAll,$replace regular expression(#textTest,"(?i)^start.+?status"," "),"Global")
but its just not working anymore.
You could use the regular expression
\bstart\b.+?\bstatus\b
which does not require "status" to be on the same line as "start". Two flags should be set:
case indifference (/i)
single-line mode, which allows . to match a newline (/s)
Demo
The regex reads, "match 'start' with a word break fore and aft (to avoid matching 'starting' or 'jumpstart', for example), then match one or more characters lazily, then match 'status' with wordbreaks". The middle match must be lazy so that the regex engine will stop at the next (rather than last) instance of 'status'.
If the regex engine being used does not support single-line mode, or something comparable, one can replace .+ with [\s\S]+.
So my original expression works and so dose Cary's
The files have changed since I last used the expression. They contain some white-space in the form of newlines that needed to be removed first
set(#cleanup,$replace(#text2,$new line," "),"Global")
set(#text2,$replace regular expression(#cleanup,"\\bstart\\b.*?\\bstatus\\b",""),"Global")
set(#cleanup,$replace regular expression(#cleanup,"(?i)^start.+?status:",""),"Global")
Sorry about that but thanks to all who looked and helped :)

Regex last word starting at end of string

I have the following regex \b(\w+)$ that works to find the last word in a string. However the longer the string, the more steps it takes.
How can I make it start the search from the end of the line?
Answer
Brief
Using the regex you specified \b(\w+)$ you will get an increasing number of steps depending on the string's length (it will match each \b, then each \b\w, then each \b\w\w until it finds a proper match of \b\w$), but it still has to do that check on each item in the string until it's satisfied.
What you can do to get the last item of a string using regex explicitly is to flip the string and then use the ^ anchor. This will cause regex to immediately be satisfied upon the first match and stop processing.
You can search how to flip a string in multiple languages. Some examples for languages include the following:
Java
C#
PHP
Code
You can see the regex in use here
Your programming language
// flip string code goes here
Regex
^(\w+)
Your programming language
// flip regex capture code goes here
Input
This is my string
Output
Converted to the following by flipping the string in your language
gnirts ym si sihT
Regex returns the following result
gnirts
Flip the string back in your language
string
Explanation
Since the anchor ^ is used, it will check from the beginning of the string (as per usual regex behaviour). If this is satisfied it will return the match, otherwise, it will return no matches. Testing in regex101 (provided through the link in the Code section) shows that it takes exactly 6 steps to ensure that a match is made. It also takes exactly 3 steps to ensure no match is made. These values do not change with string length.
It only works in .NET:
Regex rx = new Regex(Pattern, RegexOptions.RightToLeft);
Match match = rx.Match(Source);
In most regex engines, you can't.
Regex engines work by consuming input from the start of the input.
You can programmatically do it with a simple decrementing loop over the characters starting from the last character. If you need more performance, using code over regex is the only way.
This can be faster.
^.*\b(\w+)
• add ^.* before and capture \w+
• drop the $ if possible
Good luck!

Using Flags of Regex within Google Forms

I'm trying to use flags within Google Forms, and I've been googling hoping to find an answer in the last couple of hours, but didn't find any. Google Forms say that the regular expression is not valid. Even when I use a simple regex such as: (?i)t. I'm trying to use the regex inside a paragraph question.
How can I make it work?
Edit:
What I really need is to match [a-zA-Z" ]+( *),( *)[1-9]([0-9]??)\n repeatedly, so each line will look something like: Sam "The Man" McAdams , 9\n. Of course, the number of lines is unknown. using the repetition modifiers of * or + at the end of the regex does not satisfy my needs, because if the first line is accepted as valid, the other lines might be composed of anything really, and it considers it as a valid input, while it's not.
You can use the following expression to validate an entire string that only consists of lines meeting your pattern:
^([a-zA-Z" ]+ *, *[1-9][0-9]?(\n|$))+$
See the regex demo.
The main point is to add an alternation group to match either a newline or the end of string ((\n|$)) and wrap the whole pattern into a +-quantified group ((...)+) anchored at both start (^) and end ($).

Regular expression to match specified substring to end of string?

What regular expression will match everything including a specified substring to the end of the string?
For example, in
"now is the time for (all) good men"
I want to match the substring:
"for (all) good men"
I know the specific sub-substring "for" that begins what I want to match; I don't know what's after it.
I think you're asking to match everything from the given string to the end of the line.
You need to match anything following the search string, using . to match anything, and * to tell the regex to expect any number of anythings, including 0.
So, your regex should read something like
/\(for all) good men.*/
The forward slash before the first bracket is necessary in this case because the bracket is reserved in regex -- that is, it means something -- the forward slash escapes it so that it is treated as a normal character.
The slashes on either end of the pattern are standard practice. You wouldn't need them in java, but you would in javascript, sed, vi, and other implementations.
If you're asking how to match the given string only if it is at the end of the line then you'd use this:
/\(for all) good men$/
Where the $ means end-of-line.