I know that the following regex will match "red", "green", or "blue".
red|green|blue
Is there a straightforward way of making it match everything except several specified strings?
If you want to make sure that the string is neither red, green nor blue, caskey's answer is it. What is often wanted, however, is to make sure that the line does not contain red, green or blue anywhere in it. For that, anchor the regular expression with ^ and include .* in the negative lookahead:
^(?!.*(red|green|blue))
Also, suppose that you want lines containing the word "engine" but without any of those colors:
^(?!.*(red|green|blue)).*engine
You might think you can factor the .* to the head of the regular expression:
^.*(?!red|green|blue)engine # Does not work
but you cannot. You have to have both instances of .* for it to work.
Depends on the language, but there are generally negative-assertions you can put in like so:
(?!red|green|blue)
(Thanks for the syntax fix, the above is valid Java and Perl, YMMV)
Matching Anything but Given Strings
If you want to match the entire string where you want to match everything but certain strings you can do it like this:
^(?!(red|green|blue)$).*$
This says, start the match from the beginning of the string where it cannot start and end with red, green, or blue and match anything else to the end of the string.
You can try it here: https://regex101.com/r/rMbYHz/2
Note that this only works with regex engines that support a negative lookahead.
You don't need negative lookahead. There is working example:
/([\s\S]*?)(red|green|blue|)/g
Description:
[\s\S] - match any character
* - match from 0 to unlimited from previous group
? - match as less as possible
(red|green|blue|) - match one of this words or nothing
g - repeat pattern
Example:
whiteredwhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredwhiteredwhiteredwhiteredwhiteredwhiteredgreenbluewhiteredwhiteredwhiteredwhiteredwhiteredredgreenredgreenredgreenredgreenredgreenbluewhiteredbluewhiteredbluewhiteredbluewhiteredbluewhiteredwhite
Will be:
whitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhite
Test it: regex101.com
I had the same question, the solutions proposed were almost working but they had some issue. In the end the regex I used is:
^(?!red|green|blue).*
I tested it in Javascript and .NET.
.* should't be placed inside the negative lookahead like this: ^(?!.*red|green|blue) or it would make the first element behave different from the rest (i.e. "anotherred" wouldn't be matched while "anothergreen" would)
Matching any text but those matching a pattern is usually achieved with splitting the string with the regex pattern.
Examples:
c# - Regex.Split(text, #"red|green|blue") or, to get rid of empty values, Regex.Split(text, #"red|green|blue").Where(x => !string.IsNullOrEmpty(x)) (see demo)
vb.net - Regex.Split(text, "red|green|blue") or, to remove empty items, Regex.Split(text, "red|green|blue").Where(Function(s) Not String.IsNullOrWhitespace(s)) (see demo, or this demo where LINQ is supported)
javascript - text.split(/red|green|blue/) (no need to use g modifier here!) (to get rid of empty values, use text.split(/red|green|blue/).filter(Boolean)), see demo
java - text.split("red|green|blue"), or - to keep all trailing empty items - use text.split("red|green|blue", -1), or to remove all empty items use more code to remove them (see demo)
groovy - Similar to Java, text.split(/red|green|blue/), to get all trailing items use text.split(/red|green|blue/, -1) and to remove all empty items use text.split(/red|green|blue/).findAll {it != ""}) (see demo)
kotlin - text.split(Regex("red|green|blue")) or, to remove blank items, use text.split(Regex("red|green|blue")).filter{ !it.isBlank() }, see demo
scala - text.split("red|green|blue"), or to keep all trailing empty items, use text.split("red|green|blue", -1) and to remove all empty items, use text.split("red|green|blue").filter(_.nonEmpty) (see demo)
ruby - text.split(/red|green|blue/), to get rid of empty values use .split(/red|green|blue/).reject(&:empty?) (and to get both leading and trailing empty items, use -1 as the second argument, .split(/red|green|blue/, -1)) (see demo)
perl - my #result1 = split /red|green|blue/, $text;, or with all trailing empty items, my #result2 = split /red|green|blue/, $text, -1;, or without any empty items, my #result3 = grep { /\S/ } split /red|green|blue/, $text; (see demo)
php - preg_split('~red|green|blue~', $text) or preg_split('~red|green|blue~', $text, -1, PREG_SPLIT_NO_EMPTY) to output no empty items (see demo)
python - re.split(r'red|green|blue', text) or, to remove empty items, list(filter(None, re.split(r'red|green|blue', text))) (see demo)
go - Use regexp.MustCompile("red|green|blue").Split(text, -1), and if you need to remove empty items, use this code. See Go demo.
NOTE: If you patterns contain capturing groups, regex split functions/methods may behave differently, also depending on additional options. Please refer to the appropriate split method documentation then.
All except word "red"
var href = '(text-1) (red) (text-3) (text-4) (text-5)';
var test = href.replace(/\((\b(?!red\b)[\s\S]*?)\)/g, testF);
function testF(match, p1, p2, offset, str_full) {
p1 = "-"+p1+"-";
return p1;
}
console.log(test);
All except word "red"
var href = '(text-1) (frede) (text-3) (text-4) (text-5)';
var test = href.replace(/\(([\s\S]*?)\)/g, testF);
function testF(match, p1, p2, offset, str_full) {
p1 = p1.replace(/red/g, '');
p1 = "-"+p1+"-";
return p1;
}
console.log(test);
Related
I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".
A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.
Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));
Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/
You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]
The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/
I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"
You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.
I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives
Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)
If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example
You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.
The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]
Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}
I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".
A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.
Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));
Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/
You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]
The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/
I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"
You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.
I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives
Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)
If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example
You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.
The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]
Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}
I'm trying to match strings that are repeated the same number of times, like
abc123
abcabc123123
abcabcabc123123123
etc.
That is, I want the second group (123) to be matched the same number of times as the first group (abc). Something like
(abc)+(123){COUNT THE PREVIOUS GROUP MATCHED}
This is using the Rust regex crate https://docs.rs/regex/1.4.2/regex/
Edit As I feared, and pointed out by answers and comments, this is not possible to represent in regex, at least not without some sort of recursion which the Rust regex crate doesn't for the time being support. In this case, as I know the input length is limited, I just generated a rule like
(abc123)|(abcabc123123)|(abcabcabc123123123)
Horribly ugly, but got the job done, as this wasn't "serious" code, just a fun exercise.
As others have commented, I don't think it's possible to accomplish this in a single regex. If you can't guarantee the strings are well-formed then you'd have to validate them with the regex, capture each group, and then compare the group lengths to verify they are of equal repetitions. However, if it's guaranteed all strings will be well-formed then you don't even need to use regex to implement this check:
fn matching_reps(string: &str, group1: &str, group2: &str) -> bool {
let group2_start = string.find(group2).unwrap();
let group1_reps = (string.len() - group2_start) / group1.len();
let group2_reps = group2_start / group2.len();
group1_reps == group2_reps
}
fn main() {
assert_eq!(matching_reps("abc123", "abc", "123"), true);
assert_eq!(matching_reps("abcabc123", "abc", "123"), false);
assert_eq!(matching_reps("abcabc123123", "abc", "123"), true);
assert_eq!(matching_reps("abcabc123123123", "abc", "123"), false);
}
playground
Pure regular expressions are not able to represent that.
There may be some way to define back references, but I am not familiar with regexp syntax in Rust, and this would technically be a way to represent something more than a pure regular expression.
There is however a simple way to compute it :
use a regexp to make sure your string is a ^((abc)*)((123)*)$
if your string matches, take the two captured substrings, and compare their lengths
Building a pattern dynamically is also an option. Matching one, two or three nested abc and 123 is possible with
abc(?:abc(?:abc(?:)?123)?123)?123
See proof. (?:)? is redundant, it matches no text, (?:...)? matches an optional pattern.
Rust snippet:
let a = "abc"; // Prefix
let b = "123"; // Suffix
let level = 3; // Recursion (repetition) level
let mut result = "".to_string();
for _n in 0..level {
result = format!("{}(?:{})?{}", a, result, b);
}
println!("{}", result);
// abc(?:abc(?:abc(?:)?123)?123)?123
There's an extension to the regexp libraries, that is implemented from the old times unix and that allows to match (literally) an already scanned group literally after the group has been matched.
For example... let's say you have a number, and that number must be equal to another (e.g. the score of a soccer game, and you are interested only in draws between the two teams) You can use the following regexp:
([0-9][0-9]*) - \1
and suppose we feed it with "123-123" (it will match) but if we use "123-12" that will not match, as the \1 is not the same string as what was matched in the first group. When the first group is matched, the actual regular expression converts the \1 into the literal sequence of characters that was matched in the first group.
But there's a problem with your sample... is that there's no way to end the first group if you try:
([0-9][0-9]*)\1
to match 123123, because the automaton cannot close the first group (you need at least a nondigit character to make the first group to finalize)
But for example, this means that you can use:
\+(\([0-9][0-9]*\))\1(-\1)*
and this will match phone numbers in the form
+(358)358-358-358
or
+(1)1-1-1-1-1-1-1
(the number in between the parenthesys is catched as a sample, and then you use the group to build a sequence of that number separated by dashes. You can se the expression working in this demo.)
I know that the following regex will match "red", "green", or "blue".
red|green|blue
Is there a straightforward way of making it match everything except several specified strings?
If you want to make sure that the string is neither red, green nor blue, caskey's answer is it. What is often wanted, however, is to make sure that the line does not contain red, green or blue anywhere in it. For that, anchor the regular expression with ^ and include .* in the negative lookahead:
^(?!.*(red|green|blue))
Also, suppose that you want lines containing the word "engine" but without any of those colors:
^(?!.*(red|green|blue)).*engine
You might think you can factor the .* to the head of the regular expression:
^.*(?!red|green|blue)engine # Does not work
but you cannot. You have to have both instances of .* for it to work.
Depends on the language, but there are generally negative-assertions you can put in like so:
(?!red|green|blue)
(Thanks for the syntax fix, the above is valid Java and Perl, YMMV)
Matching Anything but Given Strings
If you want to match the entire string where you want to match everything but certain strings you can do it like this:
^(?!(red|green|blue)$).*$
This says, start the match from the beginning of the string where it cannot start and end with red, green, or blue and match anything else to the end of the string.
You can try it here: https://regex101.com/r/rMbYHz/2
Note that this only works with regex engines that support a negative lookahead.
You don't need negative lookahead. There is working example:
/([\s\S]*?)(red|green|blue|)/g
Description:
[\s\S] - match any character
* - match from 0 to unlimited from previous group
? - match as less as possible
(red|green|blue|) - match one of this words or nothing
g - repeat pattern
Example:
whiteredwhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredgreenbluewhiteredwhiteredwhiteredwhiteredwhiteredwhiteredgreenbluewhiteredwhiteredwhiteredwhiteredwhiteredredgreenredgreenredgreenredgreenredgreenbluewhiteredbluewhiteredbluewhiteredbluewhiteredbluewhiteredwhite
Will be:
whitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhitewhite
Test it: regex101.com
I had the same question, the solutions proposed were almost working but they had some issue. In the end the regex I used is:
^(?!red|green|blue).*
I tested it in Javascript and .NET.
.* should't be placed inside the negative lookahead like this: ^(?!.*red|green|blue) or it would make the first element behave different from the rest (i.e. "anotherred" wouldn't be matched while "anothergreen" would)
Matching any text but those matching a pattern is usually achieved with splitting the string with the regex pattern.
Examples:
c# - Regex.Split(text, #"red|green|blue") or, to get rid of empty values, Regex.Split(text, #"red|green|blue").Where(x => !string.IsNullOrEmpty(x)) (see demo)
vb.net - Regex.Split(text, "red|green|blue") or, to remove empty items, Regex.Split(text, "red|green|blue").Where(Function(s) Not String.IsNullOrWhitespace(s)) (see demo, or this demo where LINQ is supported)
javascript - text.split(/red|green|blue/) (no need to use g modifier here!) (to get rid of empty values, use text.split(/red|green|blue/).filter(Boolean)), see demo
java - text.split("red|green|blue"), or - to keep all trailing empty items - use text.split("red|green|blue", -1), or to remove all empty items use more code to remove them (see demo)
groovy - Similar to Java, text.split(/red|green|blue/), to get all trailing items use text.split(/red|green|blue/, -1) and to remove all empty items use text.split(/red|green|blue/).findAll {it != ""}) (see demo)
kotlin - text.split(Regex("red|green|blue")) or, to remove blank items, use text.split(Regex("red|green|blue")).filter{ !it.isBlank() }, see demo
scala - text.split("red|green|blue"), or to keep all trailing empty items, use text.split("red|green|blue", -1) and to remove all empty items, use text.split("red|green|blue").filter(_.nonEmpty) (see demo)
ruby - text.split(/red|green|blue/), to get rid of empty values use .split(/red|green|blue/).reject(&:empty?) (and to get both leading and trailing empty items, use -1 as the second argument, .split(/red|green|blue/, -1)) (see demo)
perl - my #result1 = split /red|green|blue/, $text;, or with all trailing empty items, my #result2 = split /red|green|blue/, $text, -1;, or without any empty items, my #result3 = grep { /\S/ } split /red|green|blue/, $text; (see demo)
php - preg_split('~red|green|blue~', $text) or preg_split('~red|green|blue~', $text, -1, PREG_SPLIT_NO_EMPTY) to output no empty items (see demo)
python - re.split(r'red|green|blue', text) or, to remove empty items, list(filter(None, re.split(r'red|green|blue', text))) (see demo)
go - Use regexp.MustCompile("red|green|blue").Split(text, -1), and if you need to remove empty items, use this code. See Go demo.
NOTE: If you patterns contain capturing groups, regex split functions/methods may behave differently, also depending on additional options. Please refer to the appropriate split method documentation then.
All except word "red"
var href = '(text-1) (red) (text-3) (text-4) (text-5)';
var test = href.replace(/\((\b(?!red\b)[\s\S]*?)\)/g, testF);
function testF(match, p1, p2, offset, str_full) {
p1 = "-"+p1+"-";
return p1;
}
console.log(test);
All except word "red"
var href = '(text-1) (frede) (text-3) (text-4) (text-5)';
var test = href.replace(/\(([\s\S]*?)\)/g, testF);
function testF(match, p1, p2, offset, str_full) {
p1 = p1.replace(/red/g, '');
p1 = "-"+p1+"-";
return p1;
}
console.log(test);
I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".
A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.
Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));
Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/
You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]
The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/
I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"
You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.
I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives
Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)
If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example
You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.
The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]
Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}