js regex replace match in certain paragraph - regex

---
title: test
date: 2018/10/17
description: some thing
---
I want to replace what's behind date if it's between ---, in this case 2018/10/17. How to do that with regex in JS?
So far I've tried;
/(?<=---\n)[\s\S]*date.+(?=\n)/
but it only works when date is the first line after ---

It is possible though imo not advisable:
(^---)((?:(?!^---)[\s\S])+?^date:\s*)(.+)((?:(?!^---)[\s\S])+?)(^---)
This needs to be replaced by $1$2substitution$4$5, see a demo on regex101.com.
Broken down this reads
(^---) # capture --- -> group 1
(
(?:(?!^---)[\s\S])+? # capture anything not --- up to date:
^date:\s*
)
(.+) # capture anything after date
(
(?:(?!^---)[\s\S])+?) # same pattern as above
(^---) # capture the "closing block"
Please consider using the afore-mentioned two-step approach as this regex is not going to be readable in a couple of weeks (and the JS engine does not support a verbose mode).

Without using a positive lookbehind, you could use 2 capturing groups and use those in the replacement like $1replacement$2
(^---[\s\S]+?date: )\d{4}\/\d{2}\/\d{2}([\s\S]+?^---)
Regex demo
Explanation
( Capturing group
^---[\s\S]+?date: Match from the start of the line 3 times a - followed by matching any 0+ times any character non greedy and then date:
) Close first capturing group
\d{4}\/\d{2}\/\d{2} Match a date like pattern (Note that this does not validate a date itself)
( Capturing group
[\s\S]+?^--- Match any 0+ times any character non greedy followed by asserting the start of the line and match 3 times -
) Close capturing group
const regex = /(^---[\s\S]+?date: )\d{4}\/\d{2}\/\d{2}([\s\S]+?^---)/gm;
const str = `---
title: test
date: 2018/10/17
description: some thing
---`;
const subst = `$1replacement$2`;
const result = str.replace(regex, subst);
console.log(result);

I'm not sure Javascript supports look behind at all, but if your environment supports it, you can try this regex:
/(?<=---[\s\S]+)(?<=date: )[\d/]+(?=[\s\S]+---)/
It looks behind for '---' followed by anything, then it looks behind for 'date: ' before it matches digits or slash one or more times, followed by a look ahead for anything followed by '---'.
Now you can easily replace the match with a new date.

Related

replaceAll regex to remove last - from the output

I was able to achieve some of the output but not the right one. I am using replace all regex and below is the sample code.
final String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
System.out.println(label.replaceAll(
"([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)", "$3"));
i want this output:
abc-nyd-request-xyxpt
but getting:
abc-nyd-request-xyxpt-
here is the code https://ideone.com/UKnepg
You may use this .replaceFirst solution:
String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
label.replaceFirst("(?:[^-]*-){2}(.+?)(?:--1)?-[^-]+$", "$1");
//=> "abc-nyd-request-xyxpt"
RegEx Demo
RegEx Details:
(?:[^-]+-){2}: Match 2 repetitions of non-hyphenated string followed by a hyphen
(.+?): Match 1+ of any characters and capture in group #1
(?:--1)?: Match optional --1
-: Match a -
[^-]+: Match a non-hyphenated string
$: End
The following works for your example case
([^-]+)-([^-]+)-(.+[^-])-+([^-]+)-([^-]+)
https://regex101.com/r/VNtryN/1
We don't want to capture any trailing - while allowing the trailing dashes to have more than a single one which makes it match the double --.
With your shown samples and attempts, please try following regex. This is going to create 1 capturing group which can be used in replacement. Do replacement like: $1in your function.
^(?:.*?-){2}([^-]*(?:-[^-]*){3})--.*
Here is the Online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^(?:.*?-){2} ##Matching from starting of value in a non-capturing group where using lazy match to match very near occurrence of - and matching 2 occurrences of it.
([^-]*(?:-[^-]*){3}) ##Creating 1st and only capturing group and matching everything before - followed by - followed by everything just before - and this combination 3 times to get required output.
--.* ##Matching -- to all values till last.

Regex Capture Parts of Line

I have been struggling to capture a part of an snmp response.
Text
IF-MIB::ifDescr.1 = 1/1/g1, Office to DMZ
Regex
(?P<ifDescr>(?<=ifDescr.\d = ).*)
Current Capture
1/1/g1, Office to DMZ
How to capture only?
1/1/g1
Office to DMZ
EDIT
1/1/g1
This should match the digit and forward slashes for the port notation in the snmp response.
(?P<ifDescr>(?<=ifDescr.\d = )\d\/\d\/g\d)
Link to regexr
Office to DMZ
This should start the match past the port notation and capture remaining description.
(?P<ifDescr>(?<=ifDescr.\d = \d\/\d\/g\d, ).*)
Link to regexr
You could just use the answer I gave you yesterday and split the first return group, 1/1/g10, by '/' and get the third part.
1/1/g10
split by '/' gives
1
1
g10 <- third part
Why use a more complicated regex when you can use simple code to accomplish the task?
With your shown samples, could you please try following regex with PCRE options available.
(?<=IF-MIB::ifDescr)\.\d+\s=\s\K(?:\d+\/){2}g(?:\d+)
Here is Online demo of above regex
OR with a little variation use following:
(?<=IF-MIB::ifDescr)\.\d+\s=\s\K(?:(?:\d+\/){2}g\d+)
Explanation: Adding detailed explanation for above.
(?<=IF-MIB::ifDescr) ##using look behind to make sure all mentioned further conditions must be preceded by this expression(IF-MIB::ifDescr)
\.\d+\s=\s ##Matching literal dot with digits one or more occurrences then with 1 or more occurrences of space = followed by one or more occurrences of spaces.
\K ##\K is GNU specific to simply forget(kind of) as of now matched regex and consider values in regex for further given expressions only.
(?:\d+\/){2}g(?:\d+) ##Creating a non-capturing group where matching 1 or more digits with g and 1 or more digits.
Without PCRE flavor: To get values in 1st capture group try following, confirmed by OP in comments about its working.
(?<=IF-MIB::ifDescr)\.\d+\s=\s((\d+\/){2}g\d+)
Here are my attempts.
const string pattern = ".* = (.*), (.*)";
var r = Regex.Match(s, pattern);
const string pattern2 = ".* = ([0-9a-zA-Z\\/]*), (.*)";
var r2 = Regex.Match(s, pattern2);
Using the named capture group ifDescr to capture the value 1/1/g1 you can use a match instead of lookarounds.
(Note to escape the dot \. to match it literally)
ifDescr\.\d+ = (?P<ifDescr>\d+\/\d+\/g\d+),
The pattern matches:
ifDescr\.\d+ = Match ifDescr. and 1+ digits followed by =
(?P<ifDescr> Named group ifDescr
\d+\/\d+\/g\d+ Match 1+ digits / 1+ digits /g and 1+ digits
), Close group and match the trailing comma
Regex demo
Do the following:
ifDescr\.\d+\s=\s((?:\d\/){2}g\d+)
The resultant capture groups contain the intended result. Note that \d+ accepts one or more digits, so you don't need the OR operator as used by you.
Demo
Alternatively, it looks like that the number after g will always be the number after ifDescr.. If that is the case, do this:
ifDescr\.(\d+)\s=\s((?:\d\/){2}g\1)
This basically captures the number in a group, then reuses it to match using backreference (note the usage of \1). The intended result in this case is available in the second capturing group.
Demo
I think is what you are looking for
= (.+), (.+)
It looks for "= " then captures all until a comma and then everything afterwards. It returns
1/1/g1
Office to DMZ
as requested.
See it working on regex101.com.

Regular expression with multiline matching (subtitles strings)

Need some help in regexp matching pattern.
The text goes like here (it's subtitles for video)
...
223
00:20:47,920 --> 00:20:57,520
- Hello! This is good subtitle text.
- Yes! How are you, stackoverflow?
224
00:20:57,520 --> 00:21:11,120
Wow, seems amazing.
- We're good, thanks.
Like, you know, everyone is happy around here with their laptops.
225
00:21:11,120 --> 00:21:14,440
- Understood. Some dumb text
...
I need a set of groups:
startTime, endTime, text
For now my achievements are not very good. I can get startTime, endTime and some text, but not all the text, only the last sentence. I've attached a screenshot.
As you can see, group 3 is capturing text, but only last sentence.
Please, explain me what I'm doing wrong.
Thank you.
Accounting for the possibility there is no new-line character after the final text of your string; Would the following work for you:
(\d\d:\d\d:\d\d,\d\d\d)[ >-]*?((?1))\n(.*?(?=\n\n|\Z))
See the online demo
(\d\d:\d\d:\d\d,\d\d\d) - The same pattern as you used to capture starting time in 1st capture group.
[ >-]*? - 0+ (but lazy) character from the character class up to:
((?1)) - A 2nd capture group which matches the same pattern as 1st group.
\n - A newline-character.
(.*?(?=\n\n|\Z)) - A 3rd capture group that captures anything (including newline with the s-flag) up to a positive lookahead for either two newline characters or the end of the whole string.
Note, some (not all) engines allow for backreferencing a previous subpattern. I guess the app you are using does not. Therefor you can swap the (?1) with your own pattern to capture the 2nd group.
Another option is to use a pattern that would capture all lines in group 3 that do not start with 3 digits.
(\d\d:\d\d:\d\d,\d\d\d) --> (\d\d:\d\d:\d\d,\d\d\d)((?:\r?\n(?!\d\d\d\b).*)*)
Explanation
(\d\d:\d\d:\d\d,\d\d\d) Capture group 1 Match a time like pattern
--> Match literally
(\d\d:\d\d:\d\d,\d\d\d) Capture group 2 Same pattern as group 1
( Capture group 3
(?: Non capture group
\r?\n(?!\d\d\d\b).* Match a newline and assert using a negative lookahead that the line does not start with 3 digits followed by word boundary. If that is the case, match the whole line
)* Optionally repeat all lines
) Close group 3
Regex demo
A bitmore specific pattern could be matching all lines that do not start with 3 digits or a start/end time like pattern.
^(\d\d:\d\d:\d\d,\d\d\d)[^\S\r\n]+-->[^\S\r\n]+(\d\d:\d\d:\d\d,\d\d\d)((?:\r?\n(?!\d+$|\d\d:\d\d:\d\d,\d\d\d\b).*)*)
Regex demo

JScript Regex - extract dates preceded by substrings

I've got oneline string that includes several dates. In JScript Regex I need to extract dates that are proceded by case insensitive substrings of "dat" and "wy" in the given order. Substrings can be preceded by and followed by any character (except new line).
reg = new RegExp('dat.{0,}wy.{0,}\\d{1,4}([\-/ \.])\\d{1,2}([\-/ \.])\\d{1,4}','ig');
str = ('abc18.Dat wy.03/12/2019FFF*Dato dost2009/03/03**data wy2020-09-30')
result = str.match(reg).toString()
Received result: 'Dat wy.03/12/2019FFF*Dato dost2009/03/03**data wy2020-09-30'
Expected result: 'Dat wy.03/12/2019,data wy2020-09-30' or preferably: '03/12/2019,2020-09-30'
Thanks.
Several issues.
You want to match as few as possible between the substrings and date, but your current regex uses greed .{0,} (same like .*). See this Question and use .*? instead.
dat.*?wy.*?FOO can still skip over any other dat. To avoid skipping over, use what some call a Tempered Greedy Token. The .*? becomes (?:(?!dat).)*? for NOT skipping over.
Not really an issue, but you can capture the date separator and reuse it.
If you want to extract only the date part, also use capturing groups. I put a demo at regex101.
dat(?:(?!dat).)*?wy.*?(\d{1,4}([/ .-])\d{1,2}\2\d{1,4})
There are many ways to achieve your desired outcome. Another idea, I would think of - if you know, there will never appear any digits between the dates, use \D for non-digit instead of the .
dat\D*?wy\D*(\d{1,4}([/ .-])\d{1,2}\2\d{1,4})
You might use a capturing group with a backreference to make sure the separators like - and / are the same in the matched date.
\bdat\w*\s*wy\.?(\d{4}([-/ .])\d{2}\2\d{2}|\d{2}([-/ .])\d{2}\3\d{4})
\bdat\w*\s*wy\.? A word boundary, match dat followed by 0+ word chars and 0+ whitespace chars. Then match wy and an optional .
( Capture group 1
\d{4}([-/ .])\d{2}\2\d{2} Match a date like format starting with the year where \2 is a backreference to what is captured in group 2
| Or
\d{2}([-/ .])\d{2}\3\d{4} Match a date like format ending with the year where \3 is a backreference to what is captured in group 3
) Close group
The value is in capture group 1
Regex demo
Note That you could make the date more specific specifying ranges for the year, month and day.

Regex - optional capture group after wildcard

Say I have the following list:
No 1 And Your Bird Can Sing (4)
No 2 Baby, You're a Rich Man (5)
No 3 Blue Jay Way S
No 4 Everybody's Got Something to Hide Except Me and My Monkey (1)
And I want to extract the number, the title and the number of weeks in the parenthesis if it exists.
Works, but the last group is not optional (regstorm):
No (?<no>\d{1,3}) (?<title>.*?) \((?<weeks>\d)\)
Last group optional, only matches number (regstorm):
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?
Combining one pattern with week capture with a pattern without week capture works, but there gotta be a better way:
(No (?<no>\d{1,3}) (?<title>.*) \((?<weeks>\d)\))|(No (?<no>\d{1,3}) (?<title>.*))
I use C# and javascript but I guess this is a general regex question.
Your regex is almost there!
First and most importantly, you should add a $ at the end. This makes (?<title>.*?) match all the way towards the end of the string. Currently, (?<title>.*?) matches an empty string and then stops, because it realises that it has reached a point where the rest of the regex matches. Why does the rest of the regex match? Because the optional group can match any empty string. By putting the $, you are making the rest of the regex "harder" to match.
Secondly, you forgot to match an open parenthesis \(.
This is how your regex should look like:
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?$
Demo
You may use this regex with an optional last part:
^No (?<no>\d{1,3}) (?<title>.*?\S)(?: \((?<weeks>\d)\))?$
RegEx Demo
Another option could be for the title to match either not ( or when it does encounter a ( it should not be followed by a digit and a closing parenthesis.
^No (?<no>\d{1,3}) (?<title>(?:[^(\r\n]+|\((?!\d\)))+)(?:\((?<weeks>\d)\))?
In parts
^No
(?\d{1,3}) Group no and space
(?<title>
(?: Non capturing group
[^(\r\n]+ Match any char except ( or newline
| Or
\((?!\d\)) Match ( if not directly followed by a digit and )
)+ Close group and repeat 1+ times
) Close group title
(?: Non capturing group
\((?<weeks>\d)\) Group weeks between parenthesis
)? Close group and make it optional
Regex demo
If you don't want to trim the last space of the title you could exclude it from matching before the weeks.
Regex demo