return text between two characters - regex

My regex is matching a string like between a tab or comma and a colon.
(?<=\t|,)([a-z].*?)(:)
This returns a string: app_open_icon:
I would like the regex to remove the : appended and return app_open_icon only.
How should I chnage this regex to exclude :?

Try (?<=\t|,)[a-z].*?(?=:)
const regex = /(?<=\t|,)[a-z].*?(?=:)/;
const text='Lorem ipsum dolor sit amet,app_open_icon:consectetur adipiscing...';
const result = regex.exec(text);
console.log(result);

You don't have to use lookarounds, you can use a capture group.
[\t,]([a-z][^:]*):
[\t,] Match either a tab or comma
( Capture group 1
[a-z][^:]* Match a char in the range a-z and 0+ times any char except :
) Close group 1
: Match literally
Regex demo
const regex = /[\t,]([a-z][^:]*):/;
const str = `,app_open_icon:`;
const m = str.match(regex);
if (m) {
console.log(m[1]);
}
To get a match only using lookarounds, you can turn the matches into lookarounds, and omit the capture group 1:
(?<=[\t,])[a-z][^:]*(?=:)
Regex demo
const regex = /(?<=[\t,])[a-z][^:]*(?=:)/;
const str = `,app_open_icon:`;
const m = str.match(regex);
if (m) {
console.log(m[0]);
}

You already have achived that with your regex.
Positive Lookbehind : (?<=\t|,) check for comma or tab before start of match
First capturing group : ([a-z].*?) captures everything before : and after , or \t
Second capturing group : (:) captures :
Just extract the first captuaring group of your match.
Check regex demo

Related

How to match in a single/common Regex Group matching or based on a condition

I would like to extract two different test strings /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
and
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8
with a single RegEx and in Group-1.
By using this RegEx ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$ I can get the second string to match the desired result int/2021/11/25/,live_20211125_215206_
but the first string does not match in Group-1 and the missing expected test string 1 extraction is int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45
Any pointers on this is appreciated.
Thanks!
If you want both values in group 1, you can use:
^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)
The pattern matches:
^ Start of string
/ Match literally
(?:[id]|na|fm) Match one of i d na fm
/ Match literally
( Capture group 1
[^/\s]*/ Match any char except a / or a whitespace char, then match /
\d{4}/\d{2}/\d{2}/ Match a date like pattern
\S*? Match optional non whitespace chars, as few as possible
) Close group 1
(?:/,|[^_]+_) Match either /, or 1+ chars other than _ and then match _
640 Match literally
(?:\D|$) Match either a non digits or assert end of string
See a regex demo and a go demo.
We can't know all the rules of how the strings your are matching are constructed, but for just these two example strings provided:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`
match := re.FindAllStringSubmatch(str, -1)
for _, val := range match {
fmt.Println(val[1])
}
}

Pattern match for (length)%code with before length

I have a pattern like x%c, where x is a single digit integer and c is an alphanumeric code of length x. % is just a token separator of length and code
For instance 2%74 is valid since 74 is of 2 digits. Similarly, 1%8 and 4%3232 are also valid.
I have tried regex of form ^([0-9])(%)([A-Z0-9]){\1}, where I am trying to put a limit on length by the value of group 1. It does not work apparently since the group is treated as a string, not a number.
If I change the above regex to ^([0-9])(%)([A-Z0-9]){2} it will work for 2%74 it is of no use since my length is to be limited controlled by the first group not a fixed digit.
I it is not possible by regex is there a better approach in java?
One way could be using 2 capture groups, and convert the first group to an int and count the characters for the second group.
\b(\d+)%(\d+)\b
\b Word boundary
(\d+) Capture group 1, match 1+ digits
% Match literally
(\d+) Capture group 2, match 1+ digits
\b Word boundary
Regex demo | Java demo
For example
String regex = "\\b(\\d+)%(\\d+)\\b";
String string = "2%74";
Pattern pattern = Pattern.compile(regex);
String strings[] = { "2%74", "1%8", "4%3232", "5%123456", "6%0" };
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
if (Integer.parseInt(matcher.group(1)) == matcher.group(2).length()) {
System.out.println("Match for " + s);
} else {
System.out.println("No match for " + s);
}
}
}
Output
Match for 2%74
Match for 1%8
Match for 4%3232
No match for 5%123456
No match for 6%0

Regex negative lookahead is not working properly

I have the following javascript code
var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....
To find the closest string declaration to c, I was using something like this:
var \w+ = '(.*?)';(?:(?!var \w+ = '(.*?)';.*?var c = \w+\.map)).*?var c = \w+\.map
I am using negative lookahead to not match a declaration if there is another declaration ahead that matches. However, the result I am getting is still <some string 2>. Could someone please explain me what causes that to happen?
The are more ways to match a string in JavaScript, and even more ways to break the pattern as this is very very brittle.
For the example data in the question, using a negative lookahead to assert that the next line does not start with the .map part or a string:
^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b
^ Start of string
var\s+\w\s*=\s* Match a possible var format and the equals sign
(["'`]) Capture group 1, match one of " ' or `
(.*?) Capture group 2 capture the data that you want to find
\1;\n A back reference to match up the quote in group 1
(?: Non capture group, to repeat as a whole
(?! Negative lookahead, assert what is directly to the right is not
var\s+\w+\s*=\s* Match a var part with =
(?: Non capture group for the alternation |
\w+\.map\b Match the .map part
| Or
(["'`]).*?\3 Capture group 3 to match one of the quotes with a back reference to match it up
) Close non capture group
;\n Match ; and a newline
) Close lookahead
.*\n Match the whole line and a newline
)* Close non capture group and optionally repeat to match all lines
var\s+\w+\s*=\s*\w\.map\b Match the line with .map
Regex demo
const regex = /^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b/m;
const str = `var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....`;
const m = str.match(regex);
if (m) {
console.log(m[2]);
}

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

c# regex split or replace. here's my code i did

I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.