I have this regex to extract the name of a chatter in my iRC channel along with date and message capture groups
^\[(?:\d+)\-(?:\d+)(?:\-\d+) # (\d+):\d+(?::\d+).\d+ (?:GMT|BST)\] (([^:]+)|\[[^\]]): ((?!\!).*)
it works on this chat line, it will work to give me 'bearwolf3' which is what I want as the 2nd capture group
[04-04-2017 # 12:45:39.204 BST] bearwolf3: Break Fast
But if this line shows, I want to be able to extract a name of 'bladey2k14' from a relayed IRC message from my bot if it contains [ and ]
[04-04-2017 # 12:45:22.338 BST] loonycrewbot: [bladey2k14]: tyt romani :)
so the 2nd capture would be 'bladey2k14'
I've seen if/then/else examples but it is not working for me to use and making my brain hurt!
can anyone modify my regex at the top to do this?
you can see it here . I want match 2 to have group 2 as bladey2k14 and group 3 as the message 'tyt romani'
You may try using the following expression:
^\[\d+-\d+-\d+ # (\d+):\d+:\d+\.\d+ (?:GMT|BST)\] (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]): ([\w\s]*)
See the regex demo
The branch reset group (?|...|...) in a PCRE regex allows using different groups inside it with the same numbering offset. So, (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) will match ([^:]+) and ([^\]]*) will capture the values into Group 2.
I also removed unnecessary non-capturing groups (like in (?:\d+) - the groups are neither quantified, nor do they contain any alternation operators).
The parts I changed are (?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) and [\w\s]*:
(?|([^:]+)(?!:\s*\[[^\]]*])|[^:]+:\s*\[([^\]]*)]) matches 1 of 2 alternatives:
([^:]+)(?!:\s*\[[^\]]*]): 1 or more chars other than : captured into Group 2 (with ([^:]+)) not followed with :, 0+ whitespaces, [, 0+ chars other than ] and ] (with the negative lookahead (?!:\s*\[[^\]]*]))
| - or
[^:]+:\s*\[([^\]]*)] - 1+ chars other than :, followed with :, 0+ whitespaces, [, 0+ chars other than ] captured into (again) Group 2, and then ].
The [\w\s]* matches 0+ chars that are letters/digits/_/whitespace.
Related
I'm struggling to write the correct regex to match the data below. I want to capture the "Focus+Terminal" and its optional parameter "NYET". How can I re-write my incorrect regex?
user:\/\/(.*)(?:=(.*+))?
I also tried and failed:
user:\/\/(.*)=?(?:(.*+))?
Sample Data
* user://Focus+Terminal=NYET
* user://Focus+Terminal
You can use
user:\/\/(.*?)(?:=(.*))?$
See the regex demo.
Details:
user:\/\/ - a user:// string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(?:=(.*))? - an optional non-capturing group that matches a = and then captures into Group 2 any zero or more chars other than line break chars as many as possible
$ - end of string.
As an alternative you might use a negated character class excluding matching a newline or equals sign for the first capture group.
user:\/\/([^=\n]*)(?:=(.*))?
Explanation
user:\/\/ Match user://
([^=\n]*) Capture group 1, match optional chars other than = or a newline
(?:=(.*))? Optionally match = and capture the rest of the line in group 2
Regex demo
edit
I've realized I made a mistake when explaining myself. Apologies for that.
Most of the artifacts come from this path:
D:\Folder1\Folder2\Folder3\Folder4\Folder5\
then breaks into Artifact folders and its sub-folders like this:
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.2\data.xxx
I would appreciate help with following thing:
I have this list (around 5k rows) of paths to different artifacts and they have different versions, to give you an example:
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.2\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.2\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.3\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.2\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.3\data.xxx
And my goal to achieve is this:
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.2\data.xxx
Basically to scope it down to just 1 version.
I've tried using ^(.*)(\n\1)+$ and $1. but that obviously didn't work. So I was wondering if you have an idea how to approach this. Greatly appreciate help, thanks!
You can use
Find what: ^(.*\.)(\d+)\\[^\\\n]+(\n\1\d+\\[^\\\n]+)+$
Replace: $1$2\\
See the regex demo. Details:
^ - start of a line (it is the default ^ behavior in Visual Studio Code)
(.*\.) - Group 1: any one or more chars other than line break chars as many as possible and then a .
(\d+) - Group 2:
\\ - a \ char
[^\\\n]+ - one or more chars other than \ and a line break
(\n\1\d+\\[^\\\n]+)+ - Group 3 capturing one or more sequences of a line break and then the value captured into Group 1, one or more digits, a \ char and then one or more chars other than \ and a line break
$ - end of a line.
Here is another attempt, see regex101 demo.
The basic idea is to isolate someText-\d?. in capture group 2.
Then look for $2 in following lines. What precedes $2 or follows $2 in those following lines can vary.
Find: ^(.*\\(?=.*\\))(.*-\d+\.)(.*\\?.*)(\n.*\2.*)*
Replace: $1$2$3
So here is the most interesting part: ^(.*\\(?=.*\\))(.*-\d+\.)
This will get your Artifact-1. or Artifact-17. or someText-2. into capture group 2. Because using a positive lookahead (?=.*\\) the following group 2 (.*-\d+\.) will be in the last directory only. And then (.*\\?.*) gathers the rest of that line into group 3.
Finally (\n.*\2.*)* checks to see if there is a backreference to group 2, \2, in any following lines. [Technically, that backreference could be anywhere in a line, even the beginning, that can be fixed if necessary - let me know if you need that for your data. See safer regex101 demo if 'someText-/d.' could appear anywhere and should be ignored if not last directory and use that find.]
You can not use a single capture group for the whole line using ^(.*), as you want to repeat only the part before the last dot using a backreference and that will not work capturing the whole line.
Therefore you have to capture the digits in the first match in a separate capture group to keep it in the replacement.
If you want to match all following lines with the same text before the last dot, you can use a repeating group:
^\s*(.*\.)(\d+\\[^\\\r\n]*)(?:\r?\n\s*\1\d*\\[^\\\r\n]*)+
The pattern matches:
^ Start of string
\s* Match optional whitespace chars
(.*\.) Capture group 1, match till the last dot
(\d+\\[^\\\r\n]*) Capture group 2, match 1+ digits, \ and optional chars other than \ or a newline
(?: Non capture group
\r?\n\s*\1 Match a newline and a backreference to group 1
\d+\\[^\\\r\n]* Same pattern as in the first part
)+ Close the non capture group and repeat 1+ times
See a regex demo.
In the replacement use the 2 capture groups $1$2
The replacement will look like
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.2\data.xxx
I have the following strings sample:
MAREMMA TOSCANA BIANCO DOC 2020 CALASOLE MONTEMASSI0,750
CHIANTI CLASSICO DOCG 2012 RISERVA ALBOLA LT.0,750
I need to separate in 5 parts (where I put the | in the following samples:
MAREMMA TOSCANA BIANCO DOC |2020| CALASOLE MONTEMASSI|0,750
CHIANTI CLASSICO DOCG |2012| RISERVA ALBOLA |LT.|0,750
AS you can see, the fourth part is optional.
I tried some variation of this regexp on https://regex101.com/r/NX3DE3/1, but the LT. part is incorporated in the precedent one:
([A-Za-z ]+)((20\d\d)|(19\d\d))([A-Za-z ]*)((LT))\.?[0-9,]*
the ((LT)) group is optional, but if I add a ? it run in the first example, but is not in the second and viceversa.
I would also like to trim the different parts, but really don't know how!
You can use
^(.*?)\s*((?:20|19)\d\d)\s*(.*?)(?:\s+(LT)[. ])?(\d[\d,]*)
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\s* - zero or more whitespaces
((?:20|19)\d\d) - Group 2: 20 or 19 and then two digits
\s* - zero or more whitespaces
(.*?) - Group 3: any zero or more chars other than line break chars as few as possible
(?:\s+(LT)[. ])? - an optional non-capturing group matching one or more whitespaces and then capturing into Group 4 LT and then a space or .
(\d[\d,]*) - Group 5: a digit and then zero or more digits or commas.
Need some help in regexp matching pattern.
The text goes like here (it's subtitles for video)
...
223
00:20:47,920 --> 00:20:57,520
- Hello! This is good subtitle text.
- Yes! How are you, stackoverflow?
224
00:20:57,520 --> 00:21:11,120
Wow, seems amazing.
- We're good, thanks.
Like, you know, everyone is happy around here with their laptops.
225
00:21:11,120 --> 00:21:14,440
- Understood. Some dumb text
...
I need a set of groups:
startTime, endTime, text
For now my achievements are not very good. I can get startTime, endTime and some text, but not all the text, only the last sentence. I've attached a screenshot.
As you can see, group 3 is capturing text, but only last sentence.
Please, explain me what I'm doing wrong.
Thank you.
Accounting for the possibility there is no new-line character after the final text of your string; Would the following work for you:
(\d\d:\d\d:\d\d,\d\d\d)[ >-]*?((?1))\n(.*?(?=\n\n|\Z))
See the online demo
(\d\d:\d\d:\d\d,\d\d\d) - The same pattern as you used to capture starting time in 1st capture group.
[ >-]*? - 0+ (but lazy) character from the character class up to:
((?1)) - A 2nd capture group which matches the same pattern as 1st group.
\n - A newline-character.
(.*?(?=\n\n|\Z)) - A 3rd capture group that captures anything (including newline with the s-flag) up to a positive lookahead for either two newline characters or the end of the whole string.
Note, some (not all) engines allow for backreferencing a previous subpattern. I guess the app you are using does not. Therefor you can swap the (?1) with your own pattern to capture the 2nd group.
Another option is to use a pattern that would capture all lines in group 3 that do not start with 3 digits.
(\d\d:\d\d:\d\d,\d\d\d) --> (\d\d:\d\d:\d\d,\d\d\d)((?:\r?\n(?!\d\d\d\b).*)*)
Explanation
(\d\d:\d\d:\d\d,\d\d\d) Capture group 1 Match a time like pattern
--> Match literally
(\d\d:\d\d:\d\d,\d\d\d) Capture group 2 Same pattern as group 1
( Capture group 3
(?: Non capture group
\r?\n(?!\d\d\d\b).* Match a newline and assert using a negative lookahead that the line does not start with 3 digits followed by word boundary. If that is the case, match the whole line
)* Optionally repeat all lines
) Close group 3
Regex demo
A bitmore specific pattern could be matching all lines that do not start with 3 digits or a start/end time like pattern.
^(\d\d:\d\d:\d\d,\d\d\d)[^\S\r\n]+-->[^\S\r\n]+(\d\d:\d\d:\d\d,\d\d\d)((?:\r?\n(?!\d+$|\d\d:\d\d:\d\d,\d\d\d\b).*)*)
Regex demo
I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.