Regex Group not starting with - regex

I'm having trouble to compute 2 regex in one (used to deal with .ini files)
I've got this one (I suggest you to use rubular with theses examples to understand)
^(?<key>[^=;\r\n]+)=((?<value>\"*.*;*.*\"[^;\r\n]*);?(?<comment>.*)[^\r\n]*)
to match :
This="isnot;acomment"
This="isa";comment
This="isa;special";case
And I've got this one :
^(?<key>[^=;\r\n]+)=(?<value>[^;\r\n]*);?(?<comment>[^\r\n]*)
to match
This=isasimplecase
This=isasimple;comment
And I'm trying to merge the 2 regex, sadly I do not manage to say "If my value group is not starting with \" use the second one if not use the first one".
Right now i've got this :
^(?<key>[^=;\r\n]+)=(((?<value>\"*.*;*.*\"[^;\r\n]*);?(?<comment>.*)[^\r\n]*)|(?<value>[^;\r\n]*);?(?<comment>[^\r\n]*))
But it's creating 2 more sections unnamed for the simple case without quoted. I was thinking that maybe by adding "the first item of the value group for the simple case must not start with \". But I didn't manage to do it.
PS : I suggest you to use rubular to understand better my problem. Sorry if I wasn't clear enough

How about this?
^(?<key>[^=;\r\n]+)=(?<value>"[^"]*"|[^;\n\r]*);?(?<comment>.*)
DEMO
(?<key>[^=;\r\n]+) Matches the part before the = symbol.
"[^"]*" Matches the string within the double quotes , ex strings like "foobar". If there is no " then the regex engine move on to the next pattern that is [^;\n\r]* and it matches upto the first ; or newline or \r character. These matched characters are stored into a named group called value.
;? Optional semicolon.
(?<comment>.*) Remaining characters are stored into the comment group.

Related

Dart regex for capturing groups but ignoring certain similar patterns

I'm trying to capture a group from a string with ~, ~~ and ~~~ symbols. I was successful with extracting single symbols but it doesn't ignore the other occurrences in the string.
This is my code I tried experimenting with:
String f = '~the calculator is on and working~I entered 50 into the calculator'+
'~~I press add button~~holding equal button ~~~The result should be 50';
List<String>givens = f.split(RegExp(r'~+'));
List<String>whens = f.split(RegExp(r'~~+'));
List<String>thens = f.split(RegExp(r'~~~+'));
for(String ss in givens){
print(ss);
}
print('xxxxxxxxxxxx');
for(String ss in whens){
print(ss);
}
print('xxxxxxxxxxxx');
for(String ss in thens){
print(ss);
}
Which will result with:
The givens capture group also captured the ones with ~~ and ~~~ which is not intended.
The whens capture group also captured the ones single ~ which made it very confusing.
Lastly, the thens capture group also captured the others which is also not intended.
I only need to capture the strings starting with the specific pattern but will stop when they see a different one.
Example: givens should only capture 'the calculator is on and working' and 'I entered 50 into the calculator' only.
Any hints or help is greatly appreciated!
I think the problem is that you started off by splitting the string into pieces. But it might be easier to search for the elements with a pattern that will look for some text preceeded with either one, two or three ~ chars.
This can be done with regex positive lookbehind patterns.
Typically, if you want to find a string preceeded by one tild then you have to avoid that it matches if we have other tilds before it.
Find givens
(?<=(?:[^~]|^)~)[^~]+ would be the pattern to find only givens.
Test it here: https://regex101.com/r/9WLbM3/2
Explanation
[^~] means search for any character which is not a ~. This is because [abc] means any char which is in the list, so a, b or c. If you add the ^ char at the beginning of the list then it means "not these chars".
[^~]+ means search for one or multiple times a character which is not ~. This will capture phrases between the tilds.
A positive lookbehind is done with (?<=something present). We want to search for a tild so we would put (?<=~) as positive lookbehind. But the problem is that it will also match the ones with several tilds in front. To avoid that we can say that the tild should either be prefixed by ^ (meaning the beginning of a string) or by [^~] (meaning not a tild). To say "either this or that", we use the syntax (this|that|or even that). But using parenthesis will capture the content and we don't need that. To disable group capturing we can add ?: at the beginning of the group, leading finally to (?:[^~]|^) meaning either a non-tild char or the beginning of the string, without capturing it.
Find whens and thens
The regular expression is almost the same. It's just that we replace ~ by ~{2} or ~{3}.
Pattern for whens: (?<=(?:[^~]|^)~{2})[^~]+
Pattern for thens: (?<=(?:[^~]|^)~{3})[^~]+

Regex to find after particular word inside a string

I am using regex to find few keywords after colon(:) and the best I have reached so far is:
sample test case
test {
test1 {
sadffd(test: "aff", aaa: "aa1") {}
}
}
Now I have to find a keyword inside () brackets and its working for 'aaa' but when I add test it fails, it matches entire words in string.
my regex so far
\btest(.*\w") (failed case) expected "aff" returned "aff", aaa: "aa1"
\baaa(.*\w") (pass case) returned "aa1"
please let me know if more information is needed
You may try
:\s*"(.*?)"
And the data you need is in the first capturing group.
Explanation
:\s*"(.*?)"
: colon
\s* followed by optionally any number of spaces
" followed by quote
( ) capturing group, containing...
.*? any number of character, matching as few as possible
" followed by quote
Demo:
https://regex101.com/r/WnvzdG/1
Update:
If you want to match ONLY after specific keywords, followed by colon, you can do something like:
(KEYWORD1|KEYWORD2|KEYWORD3)\s*:\s*"(.*?)"
First capture group will be the keyword matched, second capture group will be the value.
One more approach (executed in Python)
items = ['test{test1 {sadffd(test: "aff", aaa: "aa1") {}}}']
for item in items:
print(re.findall(r'"(\w+)"',item))
print(re.findall(r'(?<=: )"(\w+)"',item))
Output
['aff', 'aa1']
['aff', 'aa1']
I believe a simple regex would work to get everything inside the double quotes in your case:
("\w+")
Note that your question above says you want to capture "aff" and not just aff so I've included the surrounding quotes within the capturing group.
Example from regex101:
It's pretty crude but this should be OK for the input you've presented. (It wouldn't handle things like an escaped double quote in the string, for example).

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

regex needed for parsing string

I am working with government measures and am required to parse a string that contains variable information based on delimiters that come from issuing bodies associated with the fda.
I am trying to retrieve the delimiter and the value after the delimiter. I have searched for hours to find a regex solution to retrieve both the delimiter and the value that follows it and, though there seems to be posts that handle this, the code found in the post haven't worked.
One of the major issues in this task is that the delimiters often have repeated characters. For instance: delimiters are used such as "=", "=,", "/=". In this case I would need to tell the difference between "=" and "=,".
Is there a regex that would handle all of this?
Here is an example of the string :
=/A9999XYZ=>100T0479&,1Blah
Notice the delimiters are:
"=/"
"=>'
"&,1"
Any help would be appreciated.
You can use a regex like this
(=/|=>|&,1)|(\w+)
Working demo
The idea is that the first group contains the delimiters and the 2nd group the content. I assume the content can be word characters (a to z and digits with underscore). You have then to grab the content of every capturing group.
You need to capture both the delimiter and the value as group 1 and 2 respectively.
If your values are all alphanumeric, use this:
(&,1|\W+)(\w+)
See live demo.
If your values can contain non-alphanumeric characters, it get complicated:
(=/|=>|=,|=|&,1)((?:.(?!=/|=>|=,|=|&,1))+.)
See live demo.
Code the delimiters longest first, eg "=," before "=", otherwise the alternation, which matches left to right, will match "=" and the comma will become part of the value.
This uses a negative look ahead to stop matching past the next delimiter.

How to exclude a certain word in regex?

I'm using this expression and it's perfect for what I need:
.*(cq|conquest).*
It returns any word/phrase/sentence/etc. with the letters 'cq' or the word 'conquest' in it. However, from those matches I want to exclude all that contain the term 'conquest power'.
Examples:
some conquest here (should match)
another cq with some conquest here (should match)
too much cq or conquest power is bad (should not match)
How can I do that to the regex above? It has to be only one regex otherwise the program that I'm using (Advanced Combat Tracker) will create two different tabs.
If you want to match any string which contains "conquest" or "cq", but not if the string contains "conquest power", then the regex is
^(?!.*conquest power).*?(?:cq|conquest).*
The above will attempt to match from the start of the string to the end of the line, if you want to match from the start of each line, switch on multiline mode if available - adding (?m) to the start of the regex may do that.
If you want to match across newlines change . to [\s\S], or switch on singleline mode if available.
You have confused people by stating "I want to match 'cq' or 'conquest'" but also "I want the regex to extract that line".
I assume you don't really want to match just "cq" or "conquest", you want to match strings/lines (?) containing "cq" or "conquest".
From your original question I got that you want to match all strings which contain "cq" or "conquest" but do not contain "power". For this case the following regexp works:
^([^p]|p(?!ower))*(cq|conquest)([^p]|p(?!ower))*$
(regexpal)