Pcrepp - Perl Regular Expression syntax to match host name [duplicate] - c++

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
The Hostname Regex
I'm trying to use pcrepp (PCRE) to extract hostname from url.
the pcre regular expression is as same as Perl 5 regular expression.
for example:
url = "http://www.pandora.com/#/volume/73";
// the match will be "http://www.pandora.com/".
I can't find the correct syntax of the regex for this example.
Needs to work for any url: amazon.com/sds/ should return: amazon.com.
or abebooks.co.uk/isbn="62345627457245"/blabla/ should return abebooks.co.uk
I don't need to check if the url is valid. just to get the hostname.

Something like this:
^(?:[a-z]+://)?[^/]+/?

See Regexp::Common::URI::http which uses sub-patterns defined in Regexp::Common::URI::RFC2396. Examining the source code of those modules should give you a good idea how to put together a decent pattern.

Here is one possibility:
^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$
And another:
^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$
These and other URL related regular expressions can be found here: Regular Expression Library

string regex1, regex2, finalRegex;
regex1 = "^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?#)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??";
regex2 = "([^#]+)?#?(\\w*)";
//concatenation
finalRegex= regex1+regex2;
the result will be at the sixth place.
answered in another question I asked: Details.

Related

I want to extract multiple http cookies values for rewrite URL with the help of single regex expression

For input string1:
application_session=30110020;User_Context=Ghkkaskj228992nkn999
Possible regex for string1:
application_session=(.*);User_Context=(.*)
Where
{C:1} = 30110020
{C:2} = Ghkkaskj228992nkn999
For input string2:
User_Context=Ghkkaskj228992nkn999;application_session=30110020
Possible regex for string2:
User_Context=(.*);application_session=(.*)
Where
{C:1} = Ghkkaskj228992nkn999
{C:2} = 30110020
And the solution fitting for both string1 and string2,
Possible regex:
User_Context=(.*);application_session=(.*)|application_session=(.*);User_Context=(.*)
Also, C:1 and C:2 are conditions references while rewriting the URL.
For references:
https://learn.microsoft.com/en-us/iis/extensions/url-rewrite-module/testing-rewrite-rule-patterns
The above possibile regex has 2 expressions.
But we need single regex expression.
How can we do this?
Hey Friend I'm not a regex pro but you could try
User_Context=(\w+);application_session=(\w+)|application_session=(\w+);User_Context=(\w+)
Lemme know if it works :)

how can I write #... email pattern , using regix [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 2 years ago.
I want to validate an email field using regex in such a way that my email has to has #moore in it.
like a#moore.af, b#moore.sg, and so on. how can I write its pattern? I am using typescript and angular reactive form.
Your help is much appreciated.
You can try to use ([\w-\.]+#moore\.[\w+]{1,5}) to match an email address, as I left a 1-5 characters' space for the domain name.
In JavaScript flavour: const regex = /([\w-\.]+#moore\.[\w+]{1,5})/gm; then you can use regex.test(str) to validate the email field.
Edit:
As #Toto pointed out, This regex matches .....#moore.++++. Better regex would be:
([a-zA-Z0-9\.-]+#moore\.[a-zA-Z0-9\.]{1,5})
to only accept alphabet/number in the domain name.

RegEx remove part of string and and replace another part

I have a challenge getting the desired result with RegEx (using C#) and I hope that the community can help.
I have a URL in the following format:
https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1
I want make two modifications, specifically:
1) Remove everything after 'value' e.g. '&ida=0&idb=1'
2) Replace 'category' with e.g. 'newcategory'
So the result is:
https://somedomain.com/subfolder/newcategory/?abc=text:value
I can remove the string from 1) e.g. ^[^&]+ above but I have been unable to figure out how to replace the 'category' substring.
Any help or guidance would be much appreciated.
Thank you in advance.
Use the following:
Find: /(category/.+?value)&.+
Replace: /new$1 or /new\1 depending on your regex flavor
Demo & explanation
Update according to comment.
If the new name is completely_different_name, use the following:
Find: /category(/.+?value)&.+
Replace: /completely_different_name$1
Demo & explanation
You haven't specified language here, I mainly work on python so the solution is in python.
url = re.sub('category','newcategory',re.search('^https.*value', value).group(0))
Explanation
re.sub is used to replace value a with b in c.
re.search is used to match specific patterns in string and store value in the group. so in the above code re.search will store value from "https to value" in group 0.
Using Python and only built-in string methods (there is no need for regular expressions here):
url = r"https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1"
new_url = (url.split('value')[0] + "value").replace("category", 'newcategory')
print(new_url)
Outputs:
https://somedomain.com/subfolder/newcategory/?abc=text:value

Matching both greedy, nongreedy and all others in between [duplicate]

This question already has answers here:
Parsing valid parent directories with regex
(3 answers)
Closed 8 years ago.
Given a string like "/foo/bar/baz/quux" (think of it like a path to a file on a unixy system), how could I (if at all possible) formulate a regular expression that gives me all possible paths that can be said to contain file quux?
In other words, upon running a regexp against the given string ("/foo/bar/baz/quux"), I would like to get as results:
"/foo/"
"/foo/bar/"
"/foo/bar/baz/"
I've tried the following:
'/\/.+\//g' - this is greedy by default, matches "/foo/bar/baz/"
'/\/.+?\//g' - lazy version, matches "/foo/" and also "/baz/"
P.S.: I'm using Perl-compatible Regexps in PHP in function preg_match(), for that matter)
Felipe not looking for /foo/bar/baz, /bar/baz, /baz but for /foo, /foo/bar, /foo/bar/baz
One solution building on regex idea in comments but give the right strings:
reverse the string to be matched: xuuq/zab/rab/oof/ For instance in PHP use strrev($string )
match with (?=((?<=/)(?:\w+/)+))
This give you
zab/rab/oof/
rab/oof/
oof/
Then reverse the matches with strrev($string)
This give you
/foo/bar/baz
/foo/bar
/foo
If you had .NET not PCRE you could do matching right to left and proably come up with same.
This solution will not give exact output as you are expecting but still give you pretty useful result that you can post-process to get what you need:
$s = '/foo/bar/baz/quux';
if ( preg_match_all('~(?=((?:/[^/]+)+(?=/[^/]+$)))~', $s, $m) )
print_r($m[0]);
Working Demo
OUTPUT:
Array
(
[0] => /foo/bar/baz
[1] => /bar/baz
[2] => /baz
)
Completely different answer without reversing string.
(?<=((?:\w+(?:/|$))+(?=\w)))
This matches
foo/
foo/bar/
foo/bar/baz/
but you have to use C# which use variable lookbehind not PCRE

Need a simple reg ex for url checking [duplicate]

This question already has answers here:
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 9 years ago.
I am looking for it about 2 hours, but can not find what I need.
what I need is very simple:
allow: google.com, http://google.com, https://google.com
disallow spaces "goo gle.com"
with a valid domain: I mean it should have a dot "." + any domain (.com, .net etc.)
and allow anything after that: "googl.com/dsfsdf/sdfs/blablahblah/" without spaces
thanks
Edit:
Thanks all, I had to write it myself.
if (!/^((ftp|http|https):\/\/)?([a-z0-9_\.-]+)\.{1}([a-z0-9_\/\?\=\-\%-]+)$/.test(uri)
|| /([\._\/\?\=\-\%-])\1/.test(uri)) {
}
ps: I am noob in regexs.
www.google.com
http://www.google.com
mailto:somebody#google.com
somebody#google.com
www.url-with-querystring.com/?url=has-querystring
The REGEX below matches all the above cases
((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)
REGEX Explanation can be found here
Working Example
Something that's working for me on a production product (haven't received any complaints yet):
((www\.|(http|https|ftp|news|file)+\:\/\/)?[_.a-z0-9-]+\.[a-z0-9\/_:#=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])