Regex - how do I match this? - regex

I've been trying hard to get this Regex to work, but am simply not good enough at this stuff apparently :(
Regex - Trying to extract sources
I thought this would work... I'm trying to get all of the content where:
It starts with ds://
Ends with either carriage return or line feed
That's it! Essentially I'm going to then do a negative lookahead such that I can remove all content that is NOT conforming to above (in Notepad++) which allows for Regex search/replace.

Search for lines that contain the pattern, and mark them
Search menu > Mark
Find what: ds://.*\R
check Regular expression
Check Mark the lines
Find all
Remove the non marked lines
Search menu > Bookmark
Remove unmarked lines

You don't need to add the \w specifier to look for a word after the ds:// in the look ahead. Removing that and altering the final specification from "zero or one carriage return, then zero or one newline" to "either a carriage return or a newline" in capture group should do it for you:
(?=ds:\/\/).*(?:\r|\n)
Update: Carriage return or Line feed group does not need to be captured.
Update 2: The following regex will actually work for your proposed use case in the comments, matching everything but the pattern you described in the question.
^(?:(?!ds:\/\/.*(?:\r|\n)).)*$

You regex (?=ds:\w+).*\r?\n? does not match because in the content there is ds:// and \w does not match a forward slash. To make your regex work you could change it to:
(?=ds://\w+).*\r?\n? demo which can be shortened to ds://.*\R? demo
Note that you don't have to escape the forward slash.
If you want to do a find and replace to keep the lines that contain ds:// you could use a negative lookahead:
Find what
^(?!.*ds://).*\R?
Replace with
Leave empty
Explanation
^ Start of the string
(?!.*ds://) Negative lookahead to assert the string does not contain ds://
.* Match any character 0+ times
\R? An optional unicode newline sequence to also match the last line if it is not followed by a newline
See the Regex demo

Here you go, Andrew:
Regex: ds:\/\/.*
Link: https://regex101.com/r/ulO9GO/2
Let me know if any question.

Related

How to end a string with $ directly after .* with a RegEx?

I'm trying to report on a set of URLs that catches all potential URL parameters and I'm having an issue defining the RegEx properly.
We have this RegEx to capture a few variations of our URLs to feed into our reporting but I need to be able to end the string with a $ but when I do, it doesn't show any results.
The RegEx:
/join/$|/join/\?product.*|/join/\.*
For another account, we only use one variation which is outlined below (which works):
^/join/$
I believe the issue is in that after \?product.*, I'm not ending the string (or even starting it).
So far I have tried: ^/join/$|(^[/join/\?product.*]$)|(^[/join/\.*]$) with no luck.
If you want to match the dollar sign literally you have to escape it \$ or else it would mean an anchor to assert the end of the string / line.
This pattern ^/join/$ would therefore only match /join/
In your pattern you use an alternation where the last part /join/\.* would match /join/ but also /join/..... because when you escape the dot you will match it literally and the * quantifier repeats 0+ times.
Perhaps you are looking for:
^/join/(?:\?product.*\$)?$
This will match /join/ followed by an optional part (?:\?product.*\$)? that will match ?product, followed by any char 0+ times and will end on $.
Regex demo
Please, make the pattern lazy and $ is a special character for regex so need to escape that. (Regarding escaping part, google analytics may follow something else.) [] is used to capture a character in a range, be careful with that as well, as you are trying to capture a group I think.
\?product.*?\$

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

Notepad++ Regex Find all endline without periods

I'm trying to find all lines without ending period (dot) but without finding blank (empty) lines. And after that I want to add ending period to that sentence.
Example:
The good is whatever stops such things from happening.
Meaning as the Higher Good
It was from this that I drew my fundamental moral conclusions.
I have tried few regex but they also find empty lines as well.
Is there a regex for Notepad++ that can achieve that?
Enable Regular Expression match, then search for:
\S(?<!\.)\K\s*$
and replace with:
.$0
Breakdown:
\S Match a non-whitespace character
(?<!\.) It shouldn't be a period
\K Reset match
\s* Match optional whitespace characters
$ End of line
You could use something like this to find the lines that you are interested in adding capture group to it and appending you needed chars.
(?<!\.)\r\n
This works by using negative look behind (?<!\.) to check that there is no . before \r
There is a group or regex operators that can be used to accomplish this type of tasks.
Look ahead positive (?=)
Look ahead negative (?!)
Look behind positive (?<=)
Look behind negative (?
Try this short and effective solution too.
Search: \w$
Replace: $0.

regex npp - search string must be followed by specific chars, but not include those chars

In the line below, I need to these two lines into one single line by replacing the newline and empty space with nothing.
Provisioned Links : 2/14, 2/24, 7/10, 7/12,
7/25, 7/31, 7/32
Therefore I have this regex (in Notepad++):
(\r\n|\n)\s+[0-9]\/[0-9]*
Problem: the match includes the 7/25 - I need it to look for the #/## but not include it.
If I use this lookaround pattern:
(\r\n|\n)\s+(q=[0-9]\/[0-9])*
all lines beginning with newline + spaces are matched, whether or not they end with #/##.
What am I doing wrong?
regex101 fiddle to play with
Be careful:
You should correct the way you constructed the lookahead: (?=....)
Lookarounds are not quantifiable.
so what you need really is [\r\n]\s+(?=[0-9]\/[0-9]*).
Live demo
To normalize whitespace, why not simply replace "comma with additional space after it" with "comma plus one tab character" ?
You don't need that complicated pattern at all, because \s matches spaces, newlines, and tabs all at the same time:
Pattern: ,\s*
Replacement string: ,\t
https://regex101.com/r/T0QJnq/1

Notepad++ - Add link html to beginning/end of every line using regular expressions

I'm not as comfortable with RegEx as I'd like to be. What I'm trying to do is prepend every line (of a list of URL's) with
for the prepend, I've been using Replace with regular expressions: ^ with <a href="
this works alright, however, there are certain blank lines that get <a href=" added to them. Is it possible to replace the beginning of each line only if there's more than 1 character in the line?
And as for doing the end of the line, I have no idea. Any help would be much appreciated--I have a very large amount of url's in different text files to go through to edit.
Seach and replace by ^(?=.) and (?<=.)$ instead. The period implies "any character, excluding a linebreak". combined with ^ and $, it would be the start and end of a line that is followed by (or preceeded by in the case of $) a character. This example combines it with positive lookahead and lookbehind to ensure that you don't replace any of the original line but append/prepend instead.
You can use a negative lookahead (at least if you upgrade to Notepad++ 6).
Find what: ^(?!$)
And for line endings:
Find what: (?!^)$
Taking the first one as an example, it matches at the start of a line (^) but only if $ does not match at that position - i.e. if it is not a line ending at the same time.
An alternative approach does both replacements in one replacement (and the assertion as well):
Find what: ^.+$
Replacement:
In fact, you can even omit the anchors, due to the greediness of the +, the pattern will always consume whole lines (but only if there is at least one character):
Find what: .+
Replacement:
Note that any of these will wrap your anchor around lines that contain only spaces and tabs. The best way to avoid that is to modify the third pattern:
Find what: ^[ \t]*\S[^\r\n]*
Replacement:
Starting at the beginning of a line we consume all spaces and tabs (no line breaks). Then we require one non-space character (\S). And then we consume as many non-line-break characters as possible. Due to greediness, there is again no need for the $ anchor.