Notepad++ find under a condition? - regex

This is my sentence
「システム、スキャンモード。特定しました。
This sentence has a CRLF at the end. I wish to match the 。CRLF at the end, but ONLY if the string starts with 「.
I thought it would not be too hard to do this but I couldn't do it.
I tried multiple variations of
^(?=「).*。\R
This will go through the condition, but matches the whole line instead of just 。CRLF
I am a regex newbie, so I think this is probably not hard at all. I am just not very knowledgeable about it.

You can match 「 at the beginning of a line first and then use \K to discard the current matched text from the final match:
^「.*\K。\R

Related

Notepad++ Regex Remove Character from Markdown Formatted Footnote

This is a follow-up question to what was solved yesterday:
Notepad++ Regex Replace Makeshift Footnotes format With Proper Markdown format
I managed to find a Regex to remove the offending semicolons in the main text area but by only cutting out the text and pasting back the result, which can only be done one by one.
I'm not sure how this can be done, but the expert can tell me.
So I have footnote references in markdown format. Two instances of the same thing:
[^1]:
[^2]:
.
.
.
[^99]:
I might not have 99 in a document but I wanted to show I need to match two digits here again.
As I said, there are two instances of these numbered references in the text. One in the main text pointing to the footnote and the footnote at the end of the document.
What I need is deleting the semi-colons from the main text and leave the
[^3]:
[^15]:
etc.
references at the end intact.
Because the main text references come after a word or at the end of a sentence (ususally before the sentence-ending period), there is never a case a reference would start a sentence (even if they seem to appear there once or twice because of word wrap).
I provided the exact opposite of my needs here:
Click here for Regex101 website link
I put in the exact opposite of what I want because I already knew of the
^
sign to match anything that is at the front of the line.
Now I would like to negate this, if possible, so that I would delete the semi-colons in the main text, not down at the bottom.
Of course, it is likely that my approach is not good and you'll come up with a completely different approach. Especially because there doesn't seem to be a NOT operator in Regex, if I read correctly.
I repeat: the Regex101 example with the match and substitution is exactly the opposite of what I want.
I am not sure if you can play around in the substitution line to get the desired negative effect.
I could have probably asked for removing the first occurence of semi-colons but I thought the important part of tackling the problem is that those items not to be matched are always at the start of the line, not the others.
Thanks for any suggestions
In Notepad++ you might use a negative lookabehind asserting not the start of the string to the left, and use \K to clear the match buffer matching only the colon that should be replaced by an empty string.
(?<!^)\[\^\d{1,2}]\K:
Explanation
(?<!^) Negative lookbehind, assert not the start of the start directly to the left
\[\^ Match [^
\d{1,2} Match 1 or 2 digits
] Match literally
\K Forget what is matched so far
: Match a colon
Regex demo

How to multiline regex but stop after first match?

I need to match any string that has certain characteristics, but I think enabling the /m flag is breaking the functionality.
What I know:
The string will start and end with quotation marks.
The string will have the following words. "the", "fox", and "lazy".
The string may have a line break in the middle.
The string will never have an at sign (used in the regex statement)
My problem is, if I have the string twice in a single block of text, it returns once, matching everything between the first quote mark and last quote mark with the required words in-between.
Here is my regex:
/^"the[^#]*fox[^#]*lazy[^#]*"$/gim
And a Regex101 example.
Here is my understanding of the statement. Match where the string starts with "the and there is the word fox and lazy (in that order) somewhere before the string ends with ". Also ignore newlines and case-sensitivity.
The most common answer to limiting is (.*?) But it doesn't work with new lines. And putting [^#?]* doesn't work because it adds the ? to the list of things to ignore.
So how can I keep the "match everything until ___" from skipping until the last instance while still being able to ignore newlines?
This is not a duplicate of anything else I can find because this deals with multi-line matching, and those don't.
In your case, all your quantifiers need to be non-greedy so you can just use the flag ungreedy: U.
/^"the[^#]*fox[^#]*lazy[^#]*"$/gimU
Example on Regex101.
The answer, which was figured out while typing up this question, may seem ridiculously obvious.
Put the ? after the *, not inside the brackets. Parenthesis and Brackets are not analogous, and the ? should be relative to the *.
Corrected regex:
/^"the[^#]*?fox[^#]*?lazy[^#]*?"$/gim
Example from Regex101.
The long and the short of this is:
Non-greedy, multi-line matching can be achieved with [^#]*?
(substituting # for something you don't want to match)

Regex problems in TextMate

Regular Expressions are new to me (yet they are wonderful and useful :D). However, after trying to use them in TextMate, I'm getting unexpected results. I'm not sure if that's a bug or that's how regular expressions work.
I have this code
begin text in the middle end and more text and a
second begin second text in the middle end
Searching with begin.+end I would expect two results
begin text in the middle end and
begin second text in the middle end
But I get the whole text selected; I would expect begin.+end to search for .+ until the first end is found, but it searches until the last one.
Is that how they work? Where could I learn how to use regular expressions?
The truth is I'm interested in just selecting the inside .+ without begin and end but that's another question.
Use the below regex to get the strings between begin and end,
(?<=begin).+?(?=end)
DEMO
Explanation:
(?<=begin) Positive look-behind is used to match after a specific pattern. In this, regex engine sets the matching marker just after to begin.
.+? Matches one or more characters.? makes the regex non-greedy so it would results in a shortest match.
(?=end) Once it finds the string end, regex engine stops matching. Thus giving you the characters between begin and end.

How to extract file location using Regular Expressions(VB.NET)

I am facing a problem whereby I am given a string that contains a path to a file and the file's name and I only want to extract the path (without the file's name)
For example, I will receive something like
C:\Users\OopsD\Projects\test.acdbd
and from that string I want to extract only
C:\Users\OopsD\Projects
I was trying to create a RegEx to match a backslash followed by a word, followed by a dot followed by another word - this is to match the
\test.acdbd
part and replace it with empty string so that the final result is
C:\Users\OopsD\Projects
Can anyone, familiar with RegEx, help me on this one? Also, I will be using regular expressions quite a lot in the future. Is there a (free) program I can download to create regular expressions?
Are you really sure you need to be using Regex for such as simple task? How about this:
Dim file As New IO.FileInfo(" C:\Users\OopsD\Projects\test.acdbd")
MsgBox(file.Directory.FullName)
Regarding the free program on Regex, I would definitely recommend http://www.gskinner.com/RegExr/ - using it all the time. But you always have to consider alternatives, before going the Regex way.
The regex that you are looking for is as below:
[^/]+$
where,
^ (caret):Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.
$ (dollar):Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break.
+ (plus):Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.
More reference can be found out at this link.
Many Regex softwares and tools are out there. Some of them are:
www.gskinner.com/RegExr/
www.txt2re.com
Rubular- It is not just for Ruby.

Notepad++ Regex: Find all 1 and 2 letter words

I’m working with a text file with 200.000+ lines in Notepad++. Each line has only one word. I need to strip out and remove all words which only contains one letter (e.g.: I) and words which contains only two letters (e.g.: as).
I thought I could just pas in regular regex like this [a-zA-Z]{1,2} but I does not recognize anything (I’m trying to Mark them).
I’ve done manual search and I know that there do exists words of that length so therefor can it only be my regex code that’s wrong. Anyone knows how to do this in Notepad++ ???
Cheers,
- Mestika
If you want to remove only the words but leave the lines empty, this works:
^[a-zA-Z]{1,2}$
Replace this with an empty string. ^ and $ are anchors for the beginning and the end of a line (because Notepad++'s regexes work in multi-line mode).
If you want to remove the lines completely, search for this:
^[a-zA-Z]{1,2}\r\n
And replace with an empty string. However, this won't work before Notepad++ 6, so make sure yours is up-to-date.
Note that you will have to replace \r\n with the specific line-endings of your file!
As Tim Pietzker suggested, a platform independent solution that also removes empty lines would be:
^[a-zA-Z]{1,2}[\r\n]+
A platform-independent solution that does not remove empty lines but only those with one or two letters would be:
^[a-zA-Z]{1,2}(\r\n?|\n)
I don't use Notepad++ but my guess is it could be because you have too many matches - try including word boundaries (your exp will match every set of 2 letters)
\b[a-zA-Z]{1,2}\b
The regex you specified should find 1-or-2 characters (even in Notepad++'s Find-dialog), but not in the way you'd think. You want to have the regex make sure it starts at the beginning of the line and ends at the end with ^ and $, respecitevely:
^[a-zA-Z]{1,2}$
Notepad++ version 6.0 introduced the PCRE engine, so if this doesn't work in your current version try updating to the most recent.
You seem to use the version of Notepad++ that doesn't support explicit quantifiers: that's why there's no match at all (as { and } are treated as literals, not special symbols).
The solution is to use their somewhat more lengthy replacement:
\w\w?
... but that's only part of the story, as this regex will match any symbol, and not just short words. To do that, you need something like this:
^\w\w?$