regex to find value at a particular location - regex

Presently the regex is:
[A-Z]+(?=-\d+$)
This pulls out the correct value for most of the strings which follow the below format:
ANG-RGN-SOR-BCP-0004 i.e. BCP
However it pulls out SS for the following document instead of PMR:
ANG-B31-OPS-PMR-MACE-SS-0229
So basically I want to pull out the fourth term (between the hyphens), so it should pick BCP and PMR.

The following regex will get the 4th item in group 1:
(?:[A-Z0-9]+-){3}([A-Z0-9]+)
The first bit in (?:...) is a "non-capturing group" which acts like a group but won't appear in the backreference list.
The next bit means "3 of these non-capturing groups".
And finally, a capturing group to collect what you want.
I have assumed here that all the groups contain only uppercase letters and digits, you should modify the parts in [square brackets] to represent what these groups could be.
A more easily understandable method in Python:
a = "ANG-B31-OPS-PMR-MACE-SS-0229"
part = a.split('-')[3]
print part
This gives "PMR".

This should suit your needs (demo):
(?:.+?-){3}([^-]+)
You'll be able to access the fourth term in the first capturing group.

Related

RegEx Replace - Remove Non-Matched Values

Firstly, apologies; I'm fairly new to the world of RegEx.
Secondly (more of an FYI), I'm using an application that only has RegEx Replace functionality, therefore I'm potentially going to be limited on what can/can't be achieved.
The Challange
I have a free text field (labelled Description) that primarily contains "useless" text. However, some records will contain either one or multiple IDs that are useful and I would like to extract said IDs.
Every ID will have the same three-letter prefix (APP) followed by a five digit numeric value (e.g. 12911).
For example, I have the following string in my Description Field;
APP00001Was APP00002TEST APP00003Blah blah APP00004 Apple APP11112OrANGE APP
THE JOURNEY
I've managed to very crudely put together an expression that is close to what I need (although, I actually need the reverse);
/!?APP\d{1,5}/g
Result;
THE STRUGGLE
However, on the Replace, I'm only able to retain the non-matched values;
Was TEST Blah blah Apple OrANGE APP
THE ENDGAME
I would like the output to be;
APP00001 APP00002 APP00003 APP00004 APP11112
Apologies once again if this is somewhat of a 'noddy' question; but any help would be much appreciated and all ideas welcome.
Many thanks in advance.
You could use an alternation | to capture either the pattern starting with a word boundary in group 1 or match 1+ word chars followed by optional whitespace chars.
What you capture in group 1 can be used as the replacement. The matches will not be in the replacement.
Using !? matches an optional exclamation mark. You could prepend that to the pattern, but it is not part of the example data.
\b(APP\d{1,5})\w*|\w+\s*
See a regex demo
In the replacement use capture group 1, mostly using $1 or \1

Extracting String using regex

I am using a HTA Application I wrote for our help desk to take notes.
I've been using regex (Best I can) to CTRL+A our ticket pop up and click parse on my app to fill out information
I need to find "TICKET - T00000000.0000 - Account Security (Company Name...)" and only grab the "Account Security" section. or for future grab whatever is between the 2nd - and the (
Any suggestions would be grand
here is an example what I've tried and what I am using
try {
$(".problem_description", context).val(clipdata.match(/TICKET -.+[)]/)[0]);
}
catch (e) {
}
Update
I have tried a few of the suggestions here but the results still seem to give me the entire string or error out in my script.
Here's the regex using positive lookbehind:
(?<=TICKET\ -\ T\d{8}\.\d{4}\ -\ ).*\)
Here's regex101 explanation: https://regex101.com/r/6BN16e/1
The query effectively says matching anything after "TICKET - T(8 digits).(4 digits) - ". You can of course tweak it to your specification.
Here's a tutorial on lookahead and lookbehind that may be helpful: https://www.regular-expressions.info/lookaround.html
Use a capture group. In a regex you can use parentheses to mark a capture group. So if you define a pattern where a portion of it marks the text you want to extract, you can wrap that portion in parentheses. The object returned by the match function in most languages is an object that lets you access the values of individual capture groups.
Try this regex I quickly made up: /[^-]*-[^-]*- ([^(]*)/
Full example: var matches = "TICKET - T00000000.0000 - Account Security (Company Name...)".match(/[^-]*-[^-]*- ([^(]*)/)
Your value will be in matches[1].
It says: start from the beginning, look for anything not a dash, then a dash, then anything not a dash, then another dash, then a space, then capture anything not a left-parenthesis into a capture group.
This one will leave an extra space at the end of the captured group value. Also, it will truncate your value if your value contains a left parenthesis.

Concentric matches with one expression

What is the regex syntax for combining 2 expressions like a Venn diagram?
I have HTML with 2 table cells. Each of the 2 cells contains several table rows:
https://regex101.com/r/cTXwrT/3
This expression captures the 2nd table cell only:
(?<=your mother)(?s).*(?=Monochrome)
This expression matches table rows from all table cells:
[A-Za-z].*Yoghurt
How do I combine both expressions into one, so that I get the table rows from only the 2nd table cell?
I'm writing in AutoHotkey which uses PCRE for the regex engine.
I apologise for poor terminology— I've read up on recursion, back referencing, capture groups, atomic groups, etc but they didn't seem to apply.
I think you can do what you want with a nested capturing group. Here I capture everything between the td tags in an inner capturing group:
(?<=your mother)(?s).*((?<=\<td bgcolor="#F0F0F0"\>).*(?=\<\/td\>)).*(?=Monochrome)
You might need to tweak it a bit, it's a pretty scrappy regex, but it works for your current use case.
Reading the documentation for AutoHotkey#RegExMatch:
FoundPos := RegExMatch(Haystack, NeedleRegEx [, UnquotedOutputVar = "", StartingPosition = 1])
If any capturing subpatterns are present inside NeedleRegEx, their matches are stored in a pseudo-array whose base name is OutputVar. For example, if the variable's name is Match, the substring that matches the first subpattern would be stored in Match1, the second would be stored in Match2, and so on. The exception to this is named subpatterns: they are stored by name instead of number. For example, the substring that matches the named subpattern "(?P\d{4})" would be stored in MatchYear. If a particular subpattern does not match anything (or if the function returns zero), the corresponding variable is made blank.
So you'd have to call it with UnQuotedOutputVar, say Match, and then look in Match2 for what was captured by the second capturing group.

regex: substitute character in captured group

EDIT
In a regex, can a matching capturing group be replaced with the same match altered substituting a character with another?
ORIGINAL QUESTION
I'm converting a list of products into a CSV text file. Every line in the list has: number name[ description] price in this format:
1 PRODUCT description:120
2 PRODUCT NAME TWO second description, maybe:80
3 THIRD PROD:18
The resulting format must include also a slug (with - instead of ) as second field:
1 PRODUCT:product-1:description:120
2 PRODUCT NAME TWO:product-name-two-2:second description, maybe:80
3 THIRD PROD:third-prod-3::18
The regex i'm using is this:
(\d+) ([A-Z ]+?)[ ]?([a-z ,]*):([\d]+)
and substitution string is:
`\1 \2:\L$2-\1:\3:\4
This way my result is:
1 PRODUCT:product-1:description:120
2 PRODUCT NAME TWO:product name two-2:second description, maybe:80
3 THIRD PROD:third prod-3::18
what i miss is the separator hyphen - i need in the second field, that is group \2 with '-' instead of ''.
Is it possible with a single regex or should i go for a second pass?
(for now i'm using Sublime text editor)
Thanx.
I don't think doing this in a single pass is reasonable and maybe it's not even possible. To replace the spaces with hyphens, you will need either multiple passes or use continous matching, both will lose the context of the capturing groups you need to rearrange your structure. So after your first replace, I would search for (?m)(?:^[^:\n]*:|\G(?!^))[^: \n]*\K and replace with -. I'm not sure if Sublime uses multiline modifier per default, you might drop the (?m) then.
The answer might be a different one, if you were to use a programming language, that supports callback function for regex replace operations, where you could do the to - replace inside this function.

Name Capture, need a list of numbers

I need to produced a named capture of numbers in a list
Example Source Data
This is a comment on line 1
Here is another Comment Line 2
Log ID 1234,5555,2342
using: (?<id>(\d+)*) I will pick up the results of
1
2
1234
5555
2342
But this picks up 1 and 2 in error. I need it to pick up the items after Log ID Only.
I am looking for a regular expression that will return
1234
5555
2342
In a named group called id
If your language supports variable length lookbehinds, you should be able to use the following:
(?<=Log ID.*)(?<id>\d+)
I also made some modifications to your original regex, because I don't really see the point of the additional capture group inside of the named capture group, or the nested repetition ((\d+)* is equivalent to (\d*), but I think you actually want \d+ so that it requires you to match at least one digit).
If you can't use variable length lookbehinds (most languages), then you will probably need to do this in two steps. First match any lines with 'Log ID' then look for numbers in those lines.
Would a negative look behind assertion do the trick?
(?<![Ll]ine )(?<id>\d+)
You can do this also without look(ahead|behind):
"Log\s+ID\s+((?<id>\d+),?)+"
This will give you each of the numbers in a separate group named id
Log\s+ID\s+: match the ID that you are after, but don't capture
(?<id>\d+),?: capture the number and allow an optional comma after it (but don't capture)
+: repeat at least once
However, this introduces a problem because you will have several groups with the same name - it depends on the language how this will be handled.
Alternatively you can use this regex to capture the whole string after Log ID into one group:
"Log\s*ID\s+(?<id>(?:\d+,?)+)"