Somehow I am not able to find anything online about how to set a pattern ending to a double \n. My particular case is the following. I have this string:
"1 Matt\n00:00:00,100 --> 00:00:01,500\nThis is said \nby Matt.\n\n2 Lucas\n00:00:01,700 --> 00:00:02,300\nWhile this is said by Lucas"
And I would like to extract only the texts between digit\n and \n\n. So, in my case, I'd like to have
This is said \nby Matt.
While this is said by Lucas
Although I am not very skilled with RegEx, I tried many combinations such as ?<=\d\n).*?(?=\n\n), ?<=\d\n).\n\n and ?<=\d\n).*?(?=\r\n\r\n) but without any luck.
I have tried those as well as others with R's stringr library, but also with python's re.
The issue first came up in this answer: https://stackoverflow.com/a/72547966/19284124
You can make the . match across lines with the (?s) inline modifier and extend the double newline pattern to alternatively match the end of string:
(?s)(?<=\d\n).*?(?=\n\n|\Z)
See the regex demo.
Details:
(?s) - a flag allowing . match line break chars
(?<=\d\n) - a positive lookbehind that matches a location that is immediately preceded with a digit and a newline
.*? - any zero or more chars, as few as possible
(?=\n\n|\Z) - a positive lookahead that matches a location that is immediately followed with two newline chars or end of string.
This regex is more efficient and is a variant that would work on many regex flavors such as Javascript, PHP, Python, java, .NET etc because we avoid using (?s) and \Z or \z:
(?<=\d\n)(?:.*\n)*?.*(?=\n\n|$)
Make sure to use it without MULTILINE mode.
RegEx Demo
Related
I have the following string thisIs/My-7777-Any-other-text it also is possible for the following thisIs/My-7777
I am looking to extract My-777 in both scenarios using regex. So essentially I am looking to extract everything between the first forward flash and the second hyphen (Second hyphen may not exist). I tried the following regex which wasn't quite right
(?<=\/)(.*)(?=-)
You could use a capture group
^[^\/]*\/([^-]*-[^-]*)
^ Start of string
[^\/]*\/
( Capture group
[^-]*-[^-]* Match a - between optional chars that are not -
) Close capture group
regex demo
Without an anchor, and not allowing / before and after -
[^\/]*\/([^-\/]*-[^-\/]*)
Regex demo
If we take into account the structure of your current input strings, you can use
(?<=\/)[^-]+-[^-]+
See the regex demo.
If your strings are more complex and look like thisIs/My-7777/more-text-here, and you actually want to match from the first /, then you may use
^[^\/]+\/\K[^\/-]+-[^\/-]+ ## PHP, PCRE, Boost (Notepad++), Onigmo (Ruby)
(?<=^[^\/]+\/)[^\/-]+-[^\/-]+ ## JS (except IE & Safari), .NET, Python PyPi regex)
See this regex demo or this regex demo. Note \n is added in the demo since the input is a single multiline string, in real life input, if a newline char is expected, use it in each negated character class to keep matching on the one line.
This one is working for me, Try it with case insensitive ticked
Find what: .*?/|-any.*
Replace with: blank
Output should be ↠↠ My-7777
I'm trying to replace a link in a html file with regex and nodejs. I want to replace links without a .min.js extension.
For example, it should match "common.js" but not "common.min.js"
Here's what I've tried:
let htmlOutput = html.replace(/common\.(?!min)*js/g, common.name);
I think this negative lookahead should work but it doesn't match anything. Any help would be appreciated.
The (?!min)*js part is corrupt: you should not quantify zero-width assertions like lookaheads (they do not consume text so quantifiers after them are treated either as user errors or are ignored). Since js does not start with min this lookahead even without a quantifier is redundant.
If you want to match a string with a whole word common, then having any chars and ending with .js but not .min.js you need
/\bcommon\b(?!.*\.min\.js$).*\.js$/
See the regex demo.
Details:
\b - word boundary
common - a substring
\b - word boundary
(?!.*\.min\.js$) - immediately to the right, there should not be any 0 or more chars followed with .min.js at the end of the string
.* - any 0 or more chars
\.js - a .js substring
$ - end of string.
Here, we likely can find a simple expression to pass any char except new lines and ., after the word common, followed by .js:
common([^\.]+)?\.js
Demo
RegEx Circuit
jex.im visualizes regular expressions:
The end regex I'm using is /\bcommon[^min]+js\b/g
This will find the word common with any amount of chracters afterword except if those characters contain the word minand ending in js allowing me to replace scripts on my html page like:
script src="~/dist/common.js"
OR
script src="~/dist/common.9cf5748e0e7fc2928a07.js"
Thanks to Wiktor Stribiżew for helping me.
I'm trying to report on a set of URLs that catches all potential URL parameters and I'm having an issue defining the RegEx properly.
We have this RegEx to capture a few variations of our URLs to feed into our reporting but I need to be able to end the string with a $ but when I do, it doesn't show any results.
The RegEx:
/join/$|/join/\?product.*|/join/\.*
For another account, we only use one variation which is outlined below (which works):
^/join/$
I believe the issue is in that after \?product.*, I'm not ending the string (or even starting it).
So far I have tried: ^/join/$|(^[/join/\?product.*]$)|(^[/join/\.*]$) with no luck.
If you want to match the dollar sign literally you have to escape it \$ or else it would mean an anchor to assert the end of the string / line.
This pattern ^/join/$ would therefore only match /join/
In your pattern you use an alternation where the last part /join/\.* would match /join/ but also /join/..... because when you escape the dot you will match it literally and the * quantifier repeats 0+ times.
Perhaps you are looking for:
^/join/(?:\?product.*\$)?$
This will match /join/ followed by an optional part (?:\?product.*\$)? that will match ?product, followed by any char 0+ times and will end on $.
Regex demo
Please, make the pattern lazy and $ is a special character for regex so need to escape that. (Regarding escaping part, google analytics may follow something else.) [] is used to capture a character in a range, be careful with that as well, as you are trying to capture a group I think.
\?product.*?\$
I'm just having trouble figuring out how to regex properly. What I need is to match an asterisk followed by a space followed by any amount of characters that aren't \n. (Similar to reddit list formatting)
Example:
* Test
* Test2
* Test3
The closest I got was this, but it wasn't working.
/^[*][ ](.*?)/s
Can anyone familiar with PCRE help me.
You should not use a lazy dot pattern at the end of the regex because it will never match any single char (as it will be skipped when the regex engine comes up to it, and since there is nothing to match after it, the empty string will be matched by .*?).
Use the greedy dot pattern:
^\* (.*)
See the regex demo
Other notes: you may use \h to match any horizontal whitespace instead of the regular space in the pattern. To match start of lines with ^ use m modifier. Only use s modifier if you need . to match any chars including a newline (and carriage return depending on PCRE verbs that are active).
I have a regex, for example (ma|(t){1}). It matches ma and t and doesn't match bla.
I want to negate the regex, thus it must match bla and not ma and t, by adding something to this regex. I know I can write bla, the actual regex is however more complex.
Use negative lookaround: (?!pattern)
Positive lookarounds can be used to assert that a pattern matches. Negative lookarounds is the opposite: it's used to assert that a pattern DOES NOT match. Some flavor supports assertions; some puts limitations on lookbehind, etc.
Links to regular-expressions.info
Lookahead and Lookbehind Zero-Width Assertions
Flavor comparison
See also
How do I convert CamelCase into human-readable names in Java?
Regex for all strings not containing a string?
A regex to match a substring that isn’t followed by a certain other substring.
More examples
These are attempts to come up with regex solutions to toy problems as exercises; they should be educational if you're trying to learn the various ways you can use lookarounds (nesting them, using them to capture, etc):
codingBat plusOut using regex
codingBat repeatEnd using regex
codingbat wordEnds using regex
Assuming you only want to disallow strings that match the regex completely (i.e., mmbla is okay, but mm isn't), this is what you want:
^(?!(?:m{2}|t)$).*$
(?!(?:m{2}|t)$) is a negative lookahead; it says "starting from the current position, the next few characters are not mm or t, followed by the end of the string." The start anchor (^) at the beginning ensures that the lookahead is applied at the beginning of the string. If that succeeds, the .* goes ahead and consumes the string.
FYI, if you're using Java's matches() method, you don't really need the the ^ and the final $, but they don't do any harm. The $ inside the lookahead is required, though.
\b(?=\w)(?!(ma|(t){1}))\b(\w*)
this is for the given regex.
the \b is to find word boundary.
the positive look ahead (?=\w) is here to avoid spaces.
the negative look ahead over the original regex is to prevent matches of it.
and finally the (\w*) is to catch all the words that are left.
the group that will hold the words is group 3.
the simple (?!pattern) will not work as any sub-string will match
the simple ^(?!(?:m{2}|t)$).*$ will not work as it's granularity is full lines
This regexp math your condition:
^.*(?<!ma|t)$
Look at how it works:
https://regex101.com/r/Ryg2FX/1
Apply this if you use laravel.
Laravel has a not_regex where field under validation must not match the given regular expression; uses the PHP preg_match function internally.
'email' => 'not_regex:/^.+$/i'