Regex to match list of numbers inside matching group - regex

I am trying to build a regular expression, to match a list of numbers inside an html tag.
The tag is data-p="and the content is here".
Inside the content there is a list of numbers formatted like this:
["16834091899728893939","8871244709062187521","3716487480481705970","1266937738203421917"]
I would like the regex to return the list of digits: [16834091899728893939, 8871244709062187521, 3716487480481705970, 1266937738203421917]
Is it possible to match a list inside an already matched group?
It is easy to match content of all data-p's tags from the whole page: "data-p="(.*?)"", but I cannot get the list f numbers from inside.
Is it possible to do al in one regex?
Thanks !
Full html below
data-p="%.#.null,null,null,null,null,null,[8,null,["oyo rooms gurgaon",[null,null,null,"INR",[[2023,1,27],[2023,1,28],1,null,0],null,[],[],null,null,null,null,null,null,null,null,null,null,null,null,[[[353,null,true],[]]],null,null,null,[],null,null,null,[],null,null,null,null,[],null,null,null,null,null,null,null,null,null,null,[]],[null,["16834091899728893939","8871244709062187521","3716487480481705970","1266937738203421917"],null,null,null,null,1,1,3,null,null,null,null,null,[],null,null,null,null,"Gurugram, Haryana",null,null,null,null,null,null,null,null,null,null,null,null,"oyo rooms gurgaon",null,[false]],0,null,null,0,null,false,null,null,false,null,null,null,null,null,[[[1],[3,[null,true]],[5,[null,true]],[4,[null,true]],[6],[7],[8]],false]],null,null,null,null,null,2]]"

It depends on which regex engine you are using. For example, using the PCRE engine you can construct the following regex:
(?:data-p="[^"]*?\[|\G,)"(\d+)"
Here is the demo.
This expression match a "(\d+)" string under two conditions: it should either be preceded by data-p="[^"]*?\[ pattern, or it should be preceded by \G, pattern. The first pattern is obvious. The second one includes \G to match the position of the previous match. This disables matching the "(\d+)" after every comma. In the demo above it disables matching of the string 456 in the other-tag.

Related

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

All phrases except in a tag - regexpr

I want to create regular expression that marks all phrases except those in A tag.
I want to use it to replace it with link.
Can I dop it with one regular expression?
Here is my failed trial: https://regex101.com/r/3I2qvL/1
To exclude matches surrounded by the tag match the tagged part first and then throw it away with \K. This match should also be supplied with empty string via alternation to match substrings not starting with the tag:
(?:<a[^>]+>.*?<\/a>\K|)(^|\s|,|;|:|\.)(Test)($|\s|,|;|\.|\b)
Demo: https://regex101.com/r/pUPBQQ/1

Regex to match consecutive tags ignoring text between them

I have a custom tag of Part of speech. I want to check if they are consecutive.
My string is
<pronouns></pronouns><pronouns></pronouns><verbs></verbs><determiners></determiners><noun></noun>
Eg. If i use this regex (<pronouns><\/pronouns>)\1{1} it gives me two pronoun tags consecutively
**<pronouns></pronouns><pronouns></pronouns>**<verbs></verbs><determiners></determiners><noun></noun>
and if i use this regex (<pronouns><\/pronouns><verb><\/verb>)\1{0}
it gives me one occurrence of pronoun and verb tag and if i modify it to (<pronouns><\/pronouns><verb><\/verb>)\1{1} it will give me two consecutive occurrence of pronoun and verb tag.
Problem is this if there is any text between the tag it fails to match if it is consecutive
<pronouns>Hello</pronouns><pronouns>Hi</pronouns><pronouns>Hi</pronouns><verbs>Ok</verbs><determiners>the</determiners><noun>people</noun>
match fails for above if you use Previous Regex.
How can I match the regex with text with the previous conditions and also get the text captured between the consecutive tags tags.
As previously stated - this isn't crystal clear... But if I understand it correctly you want to match if there are two consecutive pair of pronoun-tags, no matter what their text content is.
If that's correct, you could try
(?:<(pronouns)>.*?<\/\1>){2}
It matches the first pronoun-tag, capturing the the name. Then it matches any text up to the closing tag. Matches that and then repeats the same pattern again.
Check it out here at regex101.

Modify regex to filter input containing certain strings

I have this regex: /href=('|")(\w+|\/dashboard)/ that matches every HTML anchor that has an href that starts with /dashboard, or something/without/a/slash/at/the/beginning.
So this regex matches:
<a href='dashboard/security-settings'></a>
<a href='something/security-settings'></a>
But not:
The issue here is that it also matches:
How can I filter href's starting with http or www from the regex? I tried playing with the ^ operator with no luck:
href=('|")(([^http][^www]|\w+)|\/dashboard)
^ within a character class works on individual letters, not strings. So [^http] actually means "Match one character that's neither an h nor a t nor a p".
You need a negative lookahead assertion instead:
href=(['"])(?!http|www)(\w+|/dashboard)
The simplest solution is:
/^href=['"](\w+|\/dashboard)/
The ^ operator (if used at the start of the regexp) makes sure that the regexp is only matched at the beginning of the line, so it only matches strings that begin with href.
As others have mentioned you can use negative lookahead to explicitly filter out strings that begin with http or www. However, if the string would start with ftp:// (or any prefix other than "http" or "www") it would still be matched using negative lookahead for "http" and "www". It seems better to use a white list in this case rather than a black list containing everything that you don't want to match.

Non greedy regex match, JavaScript and ASP

I need to do a non greedy match and hope someone can help me. I have the following, and I am using JavaScript and ASP
match(/\href=".*?\/pdf\/.*?\.pdf/)
The above match, matches the first start of an href tag. I need it to only match the last href that is part of the /pdf/ folder.
any ideas ?
You need to use capturing parenthesis for sub-expression matches:
match(/\href=".*?(\/pdf\/.*?\.pdf)/)[1];
Match will return an array with the entire match at index 0, all sub expression captures will be added to the array in the order they matched. In this case, index 1 contains the section matching \/pdf\/.*?\.pdf.
Try and make your regex more specific than just .*? if it's matching too broadly. For instance:
match(/\href="([^"]+?\/pdf\/[^\.]+?\.pdf)"/)[1];
[^"]+? will lazily match a string of characters that doesn't contain the double quote character. This will limit the match to staying within the quotes, so the match won't be too broad in the following string, for instance:
TestSome PDF