How to extract a text in HTML tag? [duplicate]

How to extract a text in HTML tag? [duplicate] - regex

I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".

A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.

Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));

Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/

You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]

The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/

I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"

You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.

I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives

Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)

If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example

You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.

The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]

Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}

Related

Regex to match all the words looking for [duplicate]

I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".

A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.

Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));

Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/

You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]

The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/

I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"

You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.

I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives

Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)

If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example

You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.

The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]

Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}

How to capture text between a specific word and the semicolon immediately preceding it with regex?

I have many rows of people and titles in Excel, and am looking to filter out certain people by title. For example, cells may contain the following:
John Smith, Co-Founder;Jane Doe, CEO;James Jackson, Co-Founder
These cells are varying lengths and have varying numbers of people and titles. My plan is to add semicolons at the beginning and end to standardize it. This would give me:
;John Smith, Co-Founder;Jane Doe, CEO;James Jackson, Co-Founder;
Currently, I have a code that can iterate through and uses the following regex Founder.*?; which will return each instance of founder based on my code (i.e. Founder;Founder;) but the trouble is that I can't seem to figure out how to also capture the names of the people. I would think I would need to designate the semicolon immediately preceding "Founder" but so far I have not been able to get this. My ultimate goal would be to return something like the following, which I have the code for with the exception of the correct regular expression.
;John Smith, Co-Founder;James Jackson, Co-Founder;

Depending on your version of Excel, you could also do this with a formula:
=FILTERXML("<t><s>" & SUBSTITUTE(A1,";","</s><s>")&"</s></t>","//s[contains(.,'Co-Founder')]")
However, for a regex, you could use
(?:^|;)([^;]*?Co-Founder)
which will return the Co-Founders in capturing group 1.
There is no need for leading/trailing semicolons.
Even though VBA regex does not support look-behind, you can work with that limitation.
the Co-Founders Regex
(?:^|;)([^;]*?Co-Founder)
Options: Case sensitive (or not, as you prefer); ^$ match at line breaks
Match the regular expression below (?:^|;)
Match this alternative ^
Assert position at the beginning of the string ^
Or match this alternative ;
Match the character “;” literally ;
Match the regex below and capture its match into backreference number 1 ([^;]*?Co-Founder)
Match any character that is NOT a “;” [^;]*?
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) *?
Match the character string “Co-Founder” literally Co-Founder
Created with RegexBuddy

Split the whole string combined with a positive filtering and the getCoFounders() function will return an array of findings:
Sub ExampleCall()
Dim s As String
s = ";John Smith, Co-Founder;Jane Doe, CEO;James Jackson, Co-Founder;"
Debug.Print Join(getCoFounders(s), "|")
End Sub
Function getCoFounders(s As String)
getCoFounders = Filter(Split(s, ";"), "Co-Founder", True, vbTextCompare)
End Function
Results in VB Editor's immediate window
John Smith, Co-Founder|James Jackson, Co-Founder

Building a Regex String - Any assistance provided

Im very new to REGEX, I understand its purpose, but Im struggling to yet fully comprehend how to use it. Im trying to build a REGEX string to pull the A8OP2B out from the following (or whatever gets dumped in that 5th group).
{"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}}
The other items in above line, will change in character length, so I cannot say the 51st to the 56th character. It will always be the 5th group in quotation marks though that I want to pull out.
Ive tried building various regex strings up, but its still mostly a foreign language to me and I still have much reading to do on it.
Could anyone provide me a working example with the above, so I can reverse engineer and understand better?
Thanks

Demo 1: Reference the JSON to a var, then use either dot or bracket notation.
Demo 2: Using RegEx is not recommended, but here's one in JavaScript:
/\b(\w{6})(?=","RfKey":)/g
First Match
non-consuming match: :"A
meta border: \b: A non-word=:, any char=", and a word=A
consuming match: A8OP2B
begin capture: (, Any word =\w, 6 times={6}
end capture: )
non-consuming match: ","RfKey":
Look ahead: (?= for: ","RfKey": )
Demo 1
var obj = {"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}};
var dataDot = obj.RfReceived.Data;
var dataBracket = obj['RfReceived']['Data'];
console.log(dataDot);
console.log(dataBracket)
Demo 2
Note: This is consuming a string of 3 consecutive patterns. 3 matches are expected.
var rgx = /\b(\w{6})(?=","RfKey":)/g;
var str = `{"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}},{"RfReceived":{"Sync":8080,"Low":102,"High":1200,"Data":"PFN07U","RfKey":"None"}},{"RfReceived":{"Sync":7580,"Low":471,"High":360,"Data":"XU89OM","RfKey":"None"}}`;
var res = str.match(rgx);
console.log(res);

Regex for picking a Value After “#word_” [duplicate]

I have found very similar posts, but I can't quite get my regular expression right here.
I am trying to write a regular expression which returns a string which is between two other strings. For example: I want to get the string which resides between the strings "cow" and "milk".
My cow always gives milk
would return
"always gives"
Here is the expression I have pieced together so far:
(?=cow).*(?=milk)
However, this returns the string "cow always gives".

A lookahead (that (?= part) does not consume any input. It is a zero-width assertion (as are boundary checks and lookbehinds).
You want a regular match here, to consume the cow portion. To capture the portion in between, you use a capturing group (just put the portion of pattern you want to capture inside parenthesis):
cow(.*)milk
No lookaheads are needed at all.

Regular expression to get a string between two strings in JavaScript
The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. However, a dot . in JavaScript regex does not match line break characters, so, what will work in 100% cases is a [^] or [\s\S]/[\d\D]/[\w\W] constructs.
ECMAScript 2018 and newer compatible solution
In JavaScript environments supporting ECMAScript 2018, s modifier allows . to match any char including line break chars, and the regex engine supports lookbehinds of variable length. So, you may use a regex like
var result = s.match(/(?<=cow\s+).*?(?=\s+milk)/gs); // Returns multiple matches if any
// Or
var result = s.match(/(?<=cow\s*).*?(?=\s*milk)/gs); // Same but whitespaces are optional
In both cases, the current position is checked for cow with any 1/0 or more whitespaces after cow, then any 0+ chars as few as possible are matched and consumed (=added to the match value), and then milk is checked for (with any 1/0 or more whitespaces before this substring).
Scenario 1: Single-line input
This and all other scenarios below are supported by all JavaScript environments. See usage examples at the bottom of the answer.
cow (.*?) milk
cow is found first, then a space, then any 0+ chars other than line break chars, as few as possible as *? is a lazy quantifier, are captured into Group 1 and then a space with milk must follow (and those are matched and consumed, too).
Scenario 2: Multiline input
cow ([\s\S]*?) milk
Here, cow and a space are matched first, then any 0+ chars as few as possible are matched and captured into Group 1, and then a space with milk are matched.
Scenario 3: Overlapping matches
If you have a string like >>>15 text>>>67 text2>>> and you need to get 2 matches in-between >>>+number+whitespace and >>>, you can't use />>>\d+\s(.*?)>>>/g as this will only find 1 match due to the fact the >>> before 67 is already consumed upon finding the first match. You may use a positive lookahead to check for the text presence without actually "gobbling" it (i.e. appending to the match):
/>>>\d+\s(.*?)(?=>>>)/g
See the online regex demo yielding text1 and text2 as Group 1 contents found.
Also see How to get all possible overlapping matches for a string.
Performance considerations
Lazy dot matching pattern (.*?) inside regex patterns may slow down script execution if very long input is given. In many cases, unroll-the-loop technique helps to a greater extent. Trying to grab all between cow and milk from "Their\ncow\ngives\nmore\nmilk", we see that we just need to match all lines that do not start with milk, thus, instead of cow\n([\s\S]*?)\nmilk we can use:
/cow\n(.*(?:\n(?!milk$).*)*)\nmilk/gm
See the regex demo (if there can be \r\n, use /cow\r?\n(.*(?:\r?\n(?!milk$).*)*)\r?\nmilk/gm). With this small test string, the performance gain is negligible, but with very large text, you will feel the difference (especially if the lines are long and line breaks are not very numerous).
Sample regex usage in JavaScript:
//Single/First match expected: use no global modifier and access match[1]
console.log("My cow always gives milk".match(/cow (.*?) milk/)[1]);
// Multiple matches: get multiple matches with a global modifier and
// trim the results if length of leading/trailing delimiters is known
var s = "My cow always gives milk, thier cow also gives milk";
console.log(s.match(/cow (.*?) milk/g).map(function(x) {return x.substr(4,x.length-9);}));
//or use RegExp#exec inside a loop to collect all the Group 1 contents
var result = [], m, rx = /cow (.*?) milk/g;
while ((m=rx.exec(s)) !== null) {
result.push(m[1]);
}
console.log(result);
Using the modern String#matchAll method
const s = "My cow always gives milk, thier cow also gives milk";
const matches = s.matchAll(/cow (.*?) milk/g);
console.log(Array.from(matches, x => x[1]));

Here's a regex which will grab what's between cow and milk (without leading/trailing space):
srctext = "My cow always gives milk.";
var re = /(.*cow\s+)(.*)(\s+milk.*)/;
var newtext = srctext.replace(re, "$2");
An example: http://jsfiddle.net/entropo/tkP74/

You need capture the .*
You can (but don't have to) make the .* nongreedy
There's really no need for the lookahead.
> /cow(.*?)milk/i.exec('My cow always gives milk');
["cow always gives milk", " always gives "]

The chosen answer didn't work for me...hmm...
Just add space after cow and/or before milk to trim spaces from " always gives "
/(?<=cow ).*(?= milk)/

I find regex to be tedious and time consuming given the syntax. Since you are already using javascript it is easier to do the following without regex:
const text = 'My cow always gives milk'
const start = `cow`;
const end = `milk`;
const middleText = text.split(start)[1].split(end)[0]
console.log(middleText) // prints "always gives"

You can use the method match() to extract a substring between two strings. Try the following code:
var str = "My cow always gives milk";
var subStr = str.match("cow(.*)milk");
console.log(subStr[1]);
Output:
always gives
See a complete example here : How to find sub-string between two strings.

I was able to get what I needed using Martinho Fernandes' solution below. The code is:
var test = "My cow always gives milk";
var testRE = test.match("cow(.*)milk");
alert(testRE[1]);
You'll notice that I am alerting the testRE variable as an array. This is because testRE is returning as an array, for some reason. The output from:
My cow always gives milk
Changes into:
always gives

Just use the following regular expression:
(?<=My cow\s).*?(?=\smilk)

If the data is on multiple lines then you may have to use the following,
/My cow ([\s\S]*)milk/gm
My cow always gives
milk
Regex 101 example

You can use destructuring to only focus on the part of your interest.
So you can do:
let str = "My cow always gives milk";
let [, result] = str.match(/\bcow\s+(.*?)\s+milk\b/) || [];
console.log(result);
In this way you ignore the first part (the complete match) and only get the capture group's match. The addition of || [] may be interesting if you are not sure there will be a match at all. In that case match would return null which cannot be destructured, and so we return [] instead in that case, and then result will be null.
The additional \b ensures the surrounding words "cow" and "milk" are really separate words (e.g. not "milky"). Also \s+ is needed to avoid that the match includes some outer spacing.

The method match() searches a string for a match and returns an Array object.
// Original string
var str = "My cow always gives milk";
// Using index [0] would return<br/>
// "**cow always gives milk**"
str.match(/cow(.*)milk/)**[0]**
// Using index **[1]** would return
// "**always gives**"
str.match(/cow(.*)milk/)[1]

Task
Extract substring between two string (excluding this two strings)
Solution
let allText = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum";
let textBefore = "five centuries,";
let textAfter = "electronic typesetting";
var regExp = new RegExp(`(?<=${textBefore}\\s)(.+?)(?=\\s+${textAfter})`, "g");
var results = regExp.exec(allText);
if (results && results.length > 1) {
console.log(results[0]);
}

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

I know I can exclude outside characters in a string using look-ahead and look-behind, but I'm not sure about characters in the center.
What I want is to get a match of ABCDEF from the string ABC 123 DEF.
Is this possible with a Regex string? If not, can it be accomplished another way?
EDIT
For more clarification, in the example above I can use the regex string /ABC.*?DEF/ to sort of get what I want, but this includes everything matched by .*?. What I want is to match with something like ABC(match whatever, but then throw it out)DEF resulting in one single match of ABCDEF.
As another example, I can do the following (in sudo-code and regex):
string myStr = "ABC 123 DEF";
string tempMatch = RegexMatch(myStr, "(?<=ABC).*?(?=DEF)"); //Returns " 123 "
string FinalString = myStr.Replace(tempMatch, ""); //Returns "ABCDEF". This is what I want
Again, is there a way to do this with a single regex string?

Since the regex replace feature in most languages does not change the string it operates on (but produces a new one), you can do it as a one-liner in most languages. Firstly, you match everything, capturing the desired parts:
^.*(ABC).*(DEF).*$
(Make sure to use the single-line/"dotall" option if your input contains line breaks!)
And then you replace this with:
$1$2
That will give you ABCDEF in one assignment.
Still, as outlined in the comments and in Mark's answer, the engine does match the stuff in between ABC and DEF. It's only the replacement convenience function that throws it out. But that is supported in pretty much every language, I would say.
Important: this approach will of course only work if your input string contains the desired pattern only once (assuming ABC and DEF are actually variable).
Example implementation in PHP:
$output = preg_replace('/^.*(ABC).*(DEF).*$/s', '$1$2', $input);
Or JavaScript (which does not have single-line mode):
var output = input.replace(/^[\s\S]*(ABC)[\s\S]*(DEF)[\s\S]*$/, '$1$2');
Or C#:
string output = Regex.Replace(input, #"^.*(ABC).*(DEF).*$", "$1$2", RegexOptions.Singleline);

A regular expression can contain multiple capturing groups. Each group must consist of consecutive characters so it's not possible to have a single group that captures what you want, but the groups themselves do not have to be contiguous so you can combine multiple groups to get your desired result.
Regular expression
(ABC).*(DEF)
Captures
ABC
DEF
See it online: rubular
Example C# code
string myStr = "ABC 123 DEF";
Match m = Regex.Match(myStr, "(ABC).*(DEF)");
if (m.Success)
{
string result = m.Groups[1].Value + m.Groups[2].Value; // Gives "ABCDEF"
// ...
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract a text in HTML tag? [duplicate] - regex

Here's a regex which will grab what's between cow and milk (without leading/trailing space): srctext = "My cow always gives milk."; var re = /(.cow\s+)(.)(\s+milk.*)/; var newtext = srctext.replace(re, "$2"); An example: http://jsfiddle.net/entropo/tkP74/

You need capture the .* You can (but don't have to) make the .* nongreedy There's really no need for the lookahead. > /cow(.*?)milk/i.exec('My cow always gives milk'); ["cow always gives milk", " always gives "]

The chosen answer didn't work for me...hmm... Just add space after cow and/or before milk to trim spaces from " always gives " /(?<=cow ).*(?= milk)/

You can use the method match() to extract a substring between two strings. Try the following code: var str = "My cow always gives milk"; var subStr = str.match("cow(.*)milk"); console.log(subStr[1]); Output: always gives See a complete example here : How to find sub-string between two strings.

Just use the following regular expression: (?<=My cow\s).*?(?=\smilk)

If the data is on multiple lines then you may have to use the following, /My cow ([\s\S]*)milk/gm My cow always gives milk Regex 101 example

Related

Regex to match all the words looking for [duplicate]

How to capture text between a specific word and the semicolon immediately preceding it with regex?

Building a Regex String - Any assistance provided

Regex for picking a Value After “#word_” [duplicate]

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract a text in HTML tag? [duplicate] - regex

Here's a regex which will grab what's between cow and milk (without leading/trailing space): srctext = "My cow always gives milk."; var re = /(.*cow\s+)(.*)(\s+milk.*)/; var newtext = srctext.replace(re, "$2"); An example: http://jsfiddle.net/entropo/tkP74/

You need capture the .* You can (but don't have to) make the .* nongreedy There's really no need for the lookahead. > /cow(.*?)milk/i.exec('My cow always gives milk'); ["cow always gives milk", " always gives "]

The chosen answer didn't work for me...hmm... Just add space after cow and/or before milk to trim spaces from " always gives " /(?<=cow ).*(?= milk)/

You can use the method match() to extract a substring between two strings. Try the following code: var str = "My cow always gives milk"; var subStr = str.match("cow(.*)milk"); console.log(subStr[1]); Output: always gives See a complete example here : How to find sub-string between two strings.

Just use the following regular expression: (?<=My cow\s).*?(?=\smilk)

If the data is on multiple lines then you may have to use the following, /My cow ([\s\S]*)milk/gm My cow always gives milk Regex 101 example

Related

Regex to match all the words looking for [duplicate]

How to capture text between a specific word and the semicolon immediately preceding it with regex?

Building a Regex String - Any assistance provided

Regex for picking a Value After “#word_” [duplicate]

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

Categories

Resources

Here's a regex which will grab what's between cow and milk (without leading/trailing space): srctext = "My cow always gives milk."; var re = /(.cow\s+)(.)(\s+milk.*)/; var newtext = srctext.replace(re, "$2"); An example: http://jsfiddle.net/entropo/tkP74/