Regex -> what says this regex? - regex

Is there any one who can help me interpret this regex?
var regex = /<a\s+([^>]+\s+)?href\s*=\s*('([^']*)'|"([^"]*)|([^\s>]+))[^>]*>/g;

Regex Explnation
NODE EXPLANATION
--------------------------------------------------------------------------------
<a '<a'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^>]+ any character except: '>' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
href 'href'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
[^\s>]+ any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \5
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
[^>]* any character except: '>' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
> '>'

Related

C# Regex trying to understand the logic

I'm starving to understand the logic of regex let's say I got this string
1 SM-TEST S/M-BLEU, 25.00 EA 96.00
private void Window_Loaded(object sender, RoutedEventArgs e)
{
var test = ReadPdfFile("C:\\Users\\mducharme\\Desktop\\PO # 70882.pdf");
var result = Regex.Split(test, "\r\n|\r|\n");
foreach (var lines in result)
{
if (Regex.IsMatch(lines, #"^\d\s"))
{
string line = lines.ToString();
string pattern = #"^(\S+\s+\S+).*?,(?=\s*\d+\.\d+\b)";
string replacement = "$1";
string result2 = Regex.Replace(line, pattern, replacement);
System.Diagnostics.Debug.WriteLine(result2);
}
}
}
Each lines show a different value like the first one and so on
2 SM-BLABLA S-M-YELLOW, 50.00 EA 96.00...
In the end I want to show up in my MessageBox for the first value only
1 SM-TEST 25.00 EA 96.00
but the regex doesn't seems to do it's job compared to regex101 website code.
Thank you,
Use
^(\d+\s+(?:(?!\d+\.\d+\s+[xX]\s+\d+\.\d+)[A-Z0-9-])+).*?(?=\s\d+\.\d+\s)
See regex proof. Replace with $1.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[xX] any character of: 'x', 'X'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[A-Z0-9-] any character of: 'A' to 'Z', '0' to
'9', '-'
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-ahead

end anchors with double second bracket

I want data that only matches the passengerData. I tried like below but it's not working properly
$getContent='passenger list:
"passengerData":{"id-1":{...},"id-2":{...},...."id-nth":{...}};
flight List:
"flightData":{"id-1":{...},"id-2":{...},...."id-nth":{...}};';
preg_match_all('/"passengerData([^*]+\}})/', $getContent, $matches);
Use
/"passengerData"\s*:\s*({.*?}});$/m
See regex proof. $matches[1] must have the data you want.
EXPLANATION
--------------------------------------------------------------------------------
"passengerData" '"passengerData"'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
}} '}}'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

condition regex capture only if previous group matching else set capture to null?

Trying to get the condition regex to work to capture domain and user-agent values with the 2 events in https://regex101.com/r/51mp2i/1
but only getting 1 match. How to update the regex to get 2 matches using condition regex? Thanks.
Match 1:
domain: example.org
useragent: "" or not capture
Match 2:
domain: example.org
useragent: Mozilla/5.0 (compatible;example-checks/1.0;+https://www.example.com/; check-id: 9EXc112795a4766a)
Use
"headers":\s+\[{"name":\s+"Host",\s+"value":\s+"(?<domain>[^"]+)(?:.*?"(?i)User-?(?i)Agent",\s+"value":\s+"(?<useragent>[^"]*))?
See proof.
EXPLANATION
--------------------------------------------------------------------------------
"headers": '"headers":'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
{"name": '{"name":'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
"Host", '"Host",'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
"value": '"value":'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^"]+ any character except: '"' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
"User '"User'
--------------------------------------------------------------------------------
-? '-' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
Agent", 'Agent",'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
"value": '"value":'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
)? end of grouping

Regex to ignore sentences/words that are in double quote

I was trying to build a regex that ignores strings in double quotes. I was not able to ignore the strings if there are spaces in double quotes. Here is the regex I was to build so far,
(?<![\S"])([^"\s]+)(?![\S"])
https://regex101.com/r/eTgyWe/1
Use
"[^"\\]*(?:\\.[^"\\]*)*"(*SKIP)(*F)|(?<!\S)([^"\s]+)(?!\S)
See proof.
Explanation
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
(*SKIP)(*F) omit the match and skip it, proceed to search for next
match from the failed location
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^"\s]+ any character except: '"', whitespace
(\n, \r, \t, \f, and " ") (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-ahead

How to select specific content between second and third comma using Regex in sublime text?

Hi everyone I'm trying to select the content in between the second and third comma using regex
This is my content
INSERT INTO table (column1, column2, column3) VALUES ('Alejandro', 'dislike', '', 20, 'otro nombre')
INSERT INTO table (column1, column2, column3) VALUES ('Jando', 'like', '', 30, 'wtf')
As you can see between second and third comma are just single quotes '' and I want to selected them using regex because I need to modify like 5000 lines in sublime text 3, I hope you can help me, I tried with no success ,(.*){2} I know I'm wrong, I have no experience with regex
Note: No all time will be single quotes ''
Use
\bVALUES\s(?:[^\n,]*,){2}\h*\K[^,\n]+
See proof
Explanation
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
VALUES 'VALUES'
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
(?: group, but do not capture (2 times):
--------------------------------------------------------------------------------
[^\n,]* any character except: '\n' (newline),
',' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
){2} end of grouping
--------------------------------------------------------------------------------
\h* horizontal whitespace (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\K match reset operator (discarding text matched so far)
--------------------------------------------------------------------------------
[^,\n]+ any character except: ',', '\n' (newline)
(1 or more times (matching the most amount
possible))
Another attempt:
\bVALUES\s*\((?:\s*'(?:''|[^'])*'\s*,){2}\s*\K'(?:''|[^'])*'
See another proof
Explanation
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
VALUES 'VALUES'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
(?: group, but do not capture (2 times):
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
' ' char
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
'' ''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^'] any character except: '
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
){2} end of grouping
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\K match reset operator (discarding text matched so far)
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
'' '\'\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^'] any character except: '''
--------------------------------------------------------------------------------
)* end of grouping