How can I match multiple hits between 2 delimiters? - regex

Hi, my fellow RegEx'ers ;)
I'm trying to match multiple Texts between every two quotes
Here's my text:
...random code
someArray[] = ["Come and",
"get me,",
"or fail",
"trying!",
"Yours truly"]
random code...
So far, I managed to get the correct matches with two patterns, executed after each other:
(?s)someArray\[\].*?=.*?\[(.*?)\]
this extracts the text between the two brackets and on the result, I use this one:
"(.*?)"
This is working just fine, but I'd love to get the Texts in one regex.
Any help is highly appreciated!

Consider using \G. With its help, you may match "(.*?)" preceded by either someArray[] = [ or previous match of "(.*?)" (well, strictly speaking previous match of entire regex). Then just grab first capture groups from all matches:
(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"
Demo: https://regex101.com/r/eBQWdU/3
How you grab the first capture groups from depends on the language you're using regex in. For example in PHP you may do something like this:
preg_match_all('/(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"/', $input, $matches);
$array_items = $matches[1];
Demo: https://ideone.com/mZgU1x

Related

Split complex string into mutliple parts using regex

I've tried a lot to split this string into something i can work with, however my experience isn't enough to reach the goal. Tried first 3 pages on google, which helped but still didn't give me an idea how to properly do this:
I have a string which looks like this:
My Dogs,213,220#Gallery,635,210#Screenshot,219,530#Good Morning,412,408#
The result should be:
MyDogs
213,229
Gallery
635,210
Screenshot
219,530
Good Morning
412,408
Anyone have an idea how to use regex to split the string like shown above?
Given the shared patterns, it seems you're looking for a regex like the following:
[A-Za-z ]+|\d+,\d+
It matches two patterns:
[A-Za-z ]+: any combination of letters and spaces
\d+,\d+: any combination of digits + a comma + any combination of digits
Check the demo here.
If you want a more strict regex, you can include the previous pattern between a lookbehind and a lookahead, so that you're sure that every match is preceeded by either a comma, a # or a start/end of string character.
(?<=^|,|#)([A-Za-z ]+|\d+,\d+)(?=,|#|$)
Check the demo here.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

regex to select only the zipcode

,Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,
i want to select only 13732 from the line. I came up with this regex
(\d)(\s*\d+)*(\,y,,)
But its also selecting the ,y,, .if i remove it that part from regex, the regex also gets valid for the date. please help me on this.
Generally, if you want to match something without capturing it, use zero-length lookaround (lookahead or lookbehind). In your case, you can use lookahead:
(\d)(\s*\d+)*(?=\,y,,)
The syntax (?=<stuff>) means "followed by <stuff>, without matching it".
More information on lookarounds can be found in this tutorial.
Regex: \D*(\d{5})\D*
Explanation: match 5 digits surrounded by zero or more non-digits on both sides. Then you can extract group containing the match.
Here's code in python:
import re
string = ",Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,"
search = re.search("\D*(\d{5})\D*", string)
print search.group(1)
Output:
13732

Write regex without negations

In a previous post I've asked for some help on rewriting a regex without negation
Starting regex:
https?:\/\/(?:.(?!https?:\/\/))+$
Ended up with:
https?:[^:]*$
This works fine but i've noticed that in case I will have : in my URL besides the : from http\s it will not select.
Here is a string which is not working:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2
You can notice the :query2
How can I modify the second regex listed here so it will select urls which contain :.
Expected output:
http://websites.com/path/subpath/cc:query2
Also I would like to select everything till the first occurance of ?=param
Input:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param
Output:
http://websites.com/path/subpath/cc:query2/text/
It is a pity that Go regex does not support lookarounds.
However, you can obtain the last link with a sort of a trick: match all possible links and other characters greedily and capture the last link with a capturing group:
^(?:https?://|.)*(https?://\S+?)(?:\?=|$)
Together with \S*? lazy whitespace matching, this also lets capture the link up to the ?=.
See regex demo and Go demo
var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])
Results:
"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"
In case there can be spaces in the last link, use just .+?:
^(?:https?://|.)*(https?://.+?)(?:\?=|$)

How do I craft a regular expression to exclude strings with parentheses

I have the following SDDL:
O:BAG:BAD:(A;;CCDCLCSWRP;;;BA)(A;;CCDCSW;;;WD)(A;;CCDCLCSWRP;;;S-1-5-32-562)(A;;CCDCLCSWRP;;;LU)(A;;CCLCRP;;;S-1-5-21-4217728705-3687557540-3107027809-1003)
Unfortunately I keep getting this:
(A;;CCDCLCSWRP;;;BA)(A;;CCDCSW;;;WD)
And what I want is just (A;;CCDCSW;;;WD).
My regex is: (\(A;.+;WD\)) : find "(A;" some characters ending in ";WD)"
I've tried making the match lazy and I've tried excluding the ")(" pair of characters based on a search of the stackoverflow regex tag looking for examples where others have answered similar questions.
I'm really confused why the exclusion of the parens isn't working:
(\(A;.+[^\(\)]*.+;WD\)) : find "(A;" followed by some characters where none of them are ")('' followed by other characters ending in ";WD)"
And this was my guess at using negative look around:
(\(A;.+^((?!\)\().).+;WD\))
which didn't match anything.
I'm also doing this in PowerShell v3.0 with the following code:
$RegExPattern = [regex]"(\($ACE_Type;.*;$ACE_SID\))+?"
if ($SDDL -match $RegExPattern) {
$MatchingACE = $Matches[0]
Where in this instance $ACE_Type = "A" and $ACE_SID = "WD".
You almost had the solution with your second regex pattern. The problem was that you included too many . wildcards. This should be all you need:
A;[^()]+;WD
And of course if you just want to capture the string in between A; and ;WD:
A;([^()]+);WD
Then just replace with \1.
I simplified this a lot and then added lookarounds so that you only matched the intended string (in between A;...;WD). This looks behind for A;, then matches 1+ non-parenthesis characters, while looking ahead for ;WD.
(?<=A;)[^()]+(?=;WD)
Regex101