Regular expression to match the first appearance of a character - regex

I'm far from a regex master, and I'm trying to match the first appearance of a semicolon for a Notepad++ search-and-replace and failing miserably at it. The best I've come up with is the following:
[^.*];
I figured this would capture the beginning of the line, all characters (if any), and then get to the semicolon. But this still ended up replacing all of the semicolons in the line. It also consumed the character before the semicolon, and I have no clue at all why that happens, so if anyone could explain that phenomenon, that would be an added bonus (but of course not essential to the actual answer).
I've got nothin'.

You need to capture the output before the semicolon in a group with parentheses, then the semicolon, then the remainder of the line. The following worked for me in Notepad++:
Find: ^([^;]*);(.*)$
Replace with: \1{whatever}\2

Related

Notepad++ Regex Remove Character from Markdown Formatted Footnote

This is a follow-up question to what was solved yesterday:
Notepad++ Regex Replace Makeshift Footnotes format With Proper Markdown format
I managed to find a Regex to remove the offending semicolons in the main text area but by only cutting out the text and pasting back the result, which can only be done one by one.
I'm not sure how this can be done, but the expert can tell me.
So I have footnote references in markdown format. Two instances of the same thing:
[^1]:
[^2]:
.
.
.
[^99]:
I might not have 99 in a document but I wanted to show I need to match two digits here again.
As I said, there are two instances of these numbered references in the text. One in the main text pointing to the footnote and the footnote at the end of the document.
What I need is deleting the semi-colons from the main text and leave the
[^3]:
[^15]:
etc.
references at the end intact.
Because the main text references come after a word or at the end of a sentence (ususally before the sentence-ending period), there is never a case a reference would start a sentence (even if they seem to appear there once or twice because of word wrap).
I provided the exact opposite of my needs here:
Click here for Regex101 website link
I put in the exact opposite of what I want because I already knew of the
^
sign to match anything that is at the front of the line.
Now I would like to negate this, if possible, so that I would delete the semi-colons in the main text, not down at the bottom.
Of course, it is likely that my approach is not good and you'll come up with a completely different approach. Especially because there doesn't seem to be a NOT operator in Regex, if I read correctly.
I repeat: the Regex101 example with the match and substitution is exactly the opposite of what I want.
I am not sure if you can play around in the substitution line to get the desired negative effect.
I could have probably asked for removing the first occurence of semi-colons but I thought the important part of tackling the problem is that those items not to be matched are always at the start of the line, not the others.
Thanks for any suggestions
In Notepad++ you might use a negative lookabehind asserting not the start of the string to the left, and use \K to clear the match buffer matching only the colon that should be replaced by an empty string.
(?<!^)\[\^\d{1,2}]\K:
Explanation
(?<!^) Negative lookbehind, assert not the start of the start directly to the left
\[\^ Match [^
\d{1,2} Match 1 or 2 digits
] Match literally
\K Forget what is matched so far
: Match a colon
Regex demo

How to multiline regex but stop after first match?

I need to match any string that has certain characteristics, but I think enabling the /m flag is breaking the functionality.
What I know:
The string will start and end with quotation marks.
The string will have the following words. "the", "fox", and "lazy".
The string may have a line break in the middle.
The string will never have an at sign (used in the regex statement)
My problem is, if I have the string twice in a single block of text, it returns once, matching everything between the first quote mark and last quote mark with the required words in-between.
Here is my regex:
/^"the[^#]*fox[^#]*lazy[^#]*"$/gim
And a Regex101 example.
Here is my understanding of the statement. Match where the string starts with "the and there is the word fox and lazy (in that order) somewhere before the string ends with ". Also ignore newlines and case-sensitivity.
The most common answer to limiting is (.*?) But it doesn't work with new lines. And putting [^#?]* doesn't work because it adds the ? to the list of things to ignore.
So how can I keep the "match everything until ___" from skipping until the last instance while still being able to ignore newlines?
This is not a duplicate of anything else I can find because this deals with multi-line matching, and those don't.
In your case, all your quantifiers need to be non-greedy so you can just use the flag ungreedy: U.
/^"the[^#]*fox[^#]*lazy[^#]*"$/gimU
Example on Regex101.
The answer, which was figured out while typing up this question, may seem ridiculously obvious.
Put the ? after the *, not inside the brackets. Parenthesis and Brackets are not analogous, and the ? should be relative to the *.
Corrected regex:
/^"the[^#]*?fox[^#]*?lazy[^#]*?"$/gim
Example from Regex101.
The long and the short of this is:
Non-greedy, multi-line matching can be achieved with [^#]*?
(substituting # for something you don't want to match)

regular expression to remove the first word of each line

I am trying to make a regular expression that grabs the first word (including possible leading white space) of each line. Here it is:
/^([\s]+[\S]*).*$/\1//
This code does not seem to be working (see http://regexr.com?34o6m). The code is supposed to
Begin at the start of the line
Create a capturing group where it places the first word (with possible leading white space)
Grab the rest of the line
Substitute the entire line with just the inside of the first capturing group
I tried another version also:
/\S(?<=\s).*^//
It looks like this one fails too (http://regexr.com?34o6s). The goal here was to
Find the first non-whitespace character.
Look behind to make sure it has a whitespace character behind it (i.e. not the first letter of the line).
Grab the rest of the line.
Erase everything the expression just grabbed.
Any insight to what is going wrong would be greatly appreciated. Thanks!
Try this regular expression
^(\s*.*?\s).*
Demo: gskinner
You mixed up your + and *.
/^([\s]*[\S]+).*$/\1/
This means zero or more spaces followed by one or more non-spaces.
You might also want to use $1 instead of \1:
/^([\s]*[\S]+).*$/$1/
Okay, well this seems to work using replace() in Javascript:
/^([\s]*[\S]+).*$/
I tested it on www.altastic.com/regexinator, which as far as I know is accurate [I made it though, so it may not be ;-) ]
remove the first two words
#"^.asterisk? .asterisk? "
this works for me
when posted, the asterisk sign doesn't show. have no idea.
if you want to remove the first word, simply start the regex as follow
a dot sign
an asterisk sign
a question mark
a space
replace with ""

Regex negation in vim

In vim I would like to use regex to highlight each line that ends with a letter, that is preceeded by neither // nor :. I tried the following
syn match systemverilogNoSemi "\(.*\(//\|:\).*\)\#!\&.*[a-zA-Z0-9_]$" oneline
This worked very good on comments, but did not work on lines containing colon.
Any idea why?
Because with this regex vim can choose any point for starting match for your regular expression. Obviously it chooses the point where first concat matches (i.e. does not have // or :). These things are normally done by using either
\v^%(%(\/\/|\:)#!.)*\w$
(removed first concat and the branch itself, changed .* to %(%(\/\/|\:)#!.)*; replaced collection with equivalent \w; added anchor pointing to the start of line): if you need to match the whole line. Or negative look-behind if you need to match only the last character. You can also just add anchor to the first concat of your variant (you should remove trailing .* from the first concat as it is useless, and the branch symbol for the same reason).
Note: I have no idea why your regex worked for comments. It does not work with comments the way you need it in all cases I checked.
does this work for you?
^\(\(//\|:\)\#<!.\)*[a-zA-Z0-9_]$

RegExp adaption with new line

I've the following RegExp to find the URIs listed above:
"^w{3}\.[\S\-\n|\S]+[^\s.!?,():]+$"
URLs to find:
www.example.org
www.example-example.org
www.example-example.org/product
You'll find it at www.example-
example.org/product.
www.example.org
You'll find it there.
Number 1, 2 and 3 will be found, but 4. delivers "www.example-" as URI.
When there is no point at the end of 4. it would deliver it correct.
EDIT: With deleting ^ and $ only number 5 is not working.
Does anyone can help here?
Your pattern
^w{3}\.[\S\-\n|\S]+[^\s.!?,():]+$
can be simplified to
^w{3}\.[\S\n]+[^\s.!?,():]$
[\S\-\n|\S] this is a character class, no OR possible, no repetition needed, - is included in \S. So [\S\n] is doing the same.
[^\s.!?,():]+ because you match every non whitespace with the expression before this one, here the + is not needed. I assume you just want your pattern not to end with one of the characters from the class.
See your pattern on Regexr (I added \r to your first class, because the line breaks there needs it)
This is a very useful tool to test regexes
I think your problem is that you want to allow line breaks in the link. How do you want to handle this? How do you want to distinguish when the line ends with a link if the word in the next line is just a word or part of the link. I think this is not possible!
The problem is the '^\s' in the second squared bracketed part. Depending on your programming language, '\s' might match the new line. So, you are telling it to match anything that is not a whitespace and it finds a whitespace (new line).
However, this should only be one of your issues. Your regex uses the '^' and '$' characters which mean start and end of line respectively. Try this URL example:
hello from www.example.org
Did it match? I think it will not.