Regex to find arguments in text - regex

There's undoubtedly a better way to do this but this is the way my requirements need me to do this.
I'm creating a search form for my web application. I want to use a tagged based search. So I'm using regex to make it work.
So I have a search string: 'c:john customer:15478'
The regex needs to find the tag (c:) and the argument (john), drop the tag, and give me the argument -- and it needs to do so for all of the instances of a tag and their arguments. The regex I have comes close, but it doesn't work correctly. It doesn't grab every argument, or drop the tags in a consistent way. So the question: what's wrong with my regex that needs to be fixed in order to achieve the correct results?
Currently it finds the first tag, grabs its argument, and everything else after it. I need it to stop the match after it finds an argument. i.e. in the case above it will match john customer:15478
Maybe a better question is how do I make VB's regex return everything between the first colon, and the beginning of the next tag (which is followed by another colon) or otherwise stop matching at the beginning of the next tag?
Regex:
(?<=({0}({1})??:)+?)(\S+\s*\S*)(?=\s+?\b\w+:.+?)??
The {0} and the {1} represent a String.format call using a string, say Customer (but it could be anything), to define the tag. the {0} is the first character, and the {1} are the rest of the characters. This regex will match anything that exists behind the tag including another tag and its argument if it exists. So for the string
"c:5401 4664 c:john smith p:joam d:domain.com p:1548 c:215-548-5487 d:""192.168.0.1"""
The matches would be
'5401 4664, john smith, 215-548-5487 d:"192.168.0.1"'
'domain.com p:1548, "192.168.0.1"'
'joam d:domain.com, 1548 c:215-548-5487'
given the tags I have defined. The regex fails to stop its matching at the start of the next tag.

If I undestood You correctly this should solve the problem in general:
/\w+:([^:]+)(?:\s|$)/g
https://regex101.com/r/vN6fH1/1
and with defined tag it would look like this:
/{0}({1})?:([^:]+)(?:\s|$)/g
but this still rely on semicolon not tag name
(so it won't match at all if You did not pass tag name that is in string)

Related

AEM - How to restrict a template from showing in a certain path

I wonder if someone has achieved what I'll post here. In order to allow a template to be created under a certain path, there is a flag allowedPaths that receives a regex.
So, if I want my template "test" to appear only under /content/www/xx/xx/test-templates and child elements, I can do this:
/content/www/.*/.*/test-templates(/.*)?
But what if I want to make the opposite? I want the template "test" to appear in every /content/www/xx/xx/ node and beyond, EXCEPT /content/www/xx/xx/test-templates and children?
I have tried several ways but no luck so far. Do you have some hint regarding this?
Thanks!
You can always restrict a more generic pattern with a lookahead. Here is an expression that should work for you:
^(?!/content/www/[^/]*/[^/]*/test-templates(?:/|$))/content/www/[^/]*/[^/]*(/.*)
See demo.
^ - matches the start of string
(?!/content/www/[^/]*/[^/]*/test-templates(?:/|$)) - makes sure the next substring is not /content/www/<some_node>/<some_node>/test-templates, followed by the end of string ($) or /
/content/www/[^/]*/[^/]*(/.*) - matches /content/www/<some_node>/<some_node> followed with optional / and zero or more characters other than a newline

Google Analytics - Content grouping - Regex fix

This is our URL structure:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
http://www.disabledgo.com/access-guide/kingston-university/coombehurst-court-2
http://www.disabledgo.com/access-guide/kings-college-london/franklin-wilkins-building-2
http://www.disabledgo.com/access-guide/redbridge-college/brook-centre-learning-resource-centre
I am trying to create a list of groups based on the client names
/access-guide/[this bit]/...
So I can have a performance list of all our clients.
This is my regex:
/access-guide/(.*universit(y|ies)|.*colleg(e|es))/
I want it to group anything that has university/ies or college/es in it, at any point within that client name section of the URL.
At the moment, my current regex will only return groups that are X-University:
Durham-University
Plymouth-University
Cardiff-University
etc.
What does the regex need to be to have the list I'm looking for?
Do I need to have something at the end to stop it matching things after the client name? E.g. ([^/]+$)?
Thanks for your help in advance!
Depending upon your needs you may want to do:
/access-guide/([^/]*(?:university|universities|college|colleges)[^/]*)/
This will match names even if "university" or "college" is not at the end of the string. For example "college-of-the-ozarks" Note the non-capturing internal parenthesis, that should probably be used no matter what solution you go with, as you don't want to just match the word "university" or "college"
Live Example
Additionally, I don't know what may be in your but if you may have compound words you want to eliminate using a \b may be advisable. For instance if you don't want to match "miskatonic-postcollege" you may want to do something like this:
/access-guide/([^/]*\b(?:university|universities|college|colleges)\b[^/]*)/
If the client name section of the URL is after the access-guid/ and before the next /:
http://www.disabledgo.com/access-guide/the-university-of-manchester/176-waterloo-place-2
|----------------------------|
you need to use a negated character class to only match university before the regex reaches that rightmost / boundary.
As per the Reference:
You can extract pages by Page URL, Page Title, or Screen Name. Identify each one with a regex capture group (Analytics uses the first capture group for each expression)
Thus, you can use
/access-guide/([^/]*(universit(y|ies)|colleges?))
^^^^^
See demo.
The regex matches
/access-guide/ - leftmost boundary, matches /access-guide/ literally
[^/]* - any character other than / (so we still remain in that customer section)
(universit(y|ies)|colleges?) - university, or universities, orcollegeorcolleges` literally. Add more if needed.

Regex for BBCode with optional parameters

I'm currently stuck on a regex. I'm trying to fetch the contents of a BBCode, that has optional params and maybe different notations:
[tag]https://example.com/1[/tag]
[tag='https://example.com/2'][/tag]
[tag="http://another-example.com/whatever"][/tag]
[tag=ftp://an-ftp-host][/tag]
[tag='https://example.com/3',left][/tag]
[tag="https://example.com/4",right][/tag]
[tag=https://example.com/5][/tag]
[tag=https://example.com/i-need-this-one,right]http://example.com/i-dont-need-this-one[/tag]
The 2nd param can just be left or right and if this is given, i need the URL from the first param. Otherwise, i need that one between the tags.
An url as param can be wrapped within ' or " or without any of these.
My current regular expression is this:
~\[tag(?|=[\'"]?+([^]"\']++)[\'"]?+]([^[]++)|](([^[]++)))\[/tag]~i
However, this one also includes the 2nd param in the match list and a lot more of things, that i don't want to match.
Any suggestions?
I've made some changes to do what you want. I've included your version here for easy comparison:
Yours: http://regex101.com/r/dE4aE4/1
\[tag(?:=[\'"]?(.*)[\'"]?)?]([^]]*)?\[/tag]
Mine: http://regex101.com/r/dE4aE4/3
\[tag(?:=[\'"]?([^,]*?)(?:,[^]'"]+)?[\'"]?)?]([^\[]+)?\[/tag]
Observe that I've changed a bit to get the URL without the coma (,): from (.*) to ([^,]*?)(?:,[^]'"]+)?
I've also fixed the content part: from ([^]]*)? to ([^\[]+)?

RegExp replace after

I have some link templates and I need to replace substrings inside of that links.
Link templates:
"/all_news"
"/all_news/"
"/all_news/page1"
"/all_news/page1/"
All of these templates mean the same thing - first page of news page without filtering.
So I need to:
1st template - insert "/pageX"
2nd template - insert "pageX"
3rd and 4th templates - replace page number
Is it possible with only one regexp?
If yes, then please help me.
If no, then I have 2nd question:
maybe its possible to replace everything after "/all_news" on "/pageX"?
I mean next logic:
string started
ok, I see substring "/all_news"
I replace everything after "/all_news" even if nothing exist(if string ends by "/all_news")
I return "/all_news/pageX".
This'll do it.
'/all_news/page1'.replace(/(.*\/all_news).*/,'$1' + '/pageX');
Just one for all.
Java has lookbehind. It negates the need for the $1. The solution looks like:
String result = "/all_news/page1";
String pattern = "(?<=\\/all_news).*";
System.out.println(result.replaceAll(pattern,"/PageX"));
Cheers.

RegEx check if string contains certain value

I need some help with writing a regex validation to check for a specific value
here is what I have but it don't work
Regex exists = new Regex(#"MyWebPage.aspx");
Match m = exists.Match(pageUrl);
if(m)
{
//perform some action
}
So I basically want to know when variable pageUrl will contains value MyWebPage.aspx
also if possible to combine this check to cover several cases for instance MyWebPage.aspx, MyWebPage2.aspx, MyWebPage3.aspx
Thanks!
try this
"MyWebPage\d*\.aspx$"
This will allow for any pages called MyWebPage#.aspx where # is 1 or more numbers.
if (Regex.Match(url, "MyWebPage[^/]*?\\.aspx")) ....
This will match any form of MyWebPageXXX.aspx (where XXX is zero or more characters). It will not match MyWebPage/test.aspx however
That RegEx should work in the case that MyWebPage.aspx is in your pageUrl, albeit by accident. You really need to replace the dot (.) with \. to escape it.
Regex exists = new Regex(#"MyWebPage\.aspx");
If you want to optionally match a single number after the MyWebPage bit, then look for the (optional) presence of \d:
Regex exists = new Regex(#"MyWebPage\d?\.aspx");
I won't post a regex, as others have good ones going, but one thing that may be an issue is character case. Regexs are, by default, case-sensitive. The Regex class does have a static overload of the Match function (as well as of Matches and IsMatch) which takes a RegexOptions parameter allowing you to specify if you want to ignore case.
For example, I don't know how you are getting your pageUrl variable but depending on how the user typed the URL in their browser, you may get different casings, which could cause your Regex to not find a match.