Regex until repeat character (uri path) - regex

I've got such examples
/path/to/service/People("Peter")
/i/dont/care/about/how/much/pathes/we/have/here/Customer("John")
/itcouldbejustone/Client("Rick")
i need to regex and leave just People("Peter"), Customer("John"), Client("Rick") accordingly
i was trying to use:
\/.+?(?=\/)
but we have a lot of "/" slashes, how to avoid it? thanks

Make it greedy ....
\/.+(?=\/)
To match also the last /,
\/.+\/
DEMO

You can do this without regex if using PHP
$url = '/path/to/service/People("Peter")';
$name = end(explode("/", $url));
$name will have People("Peter");

That depends a bit on which regex tool you are using.
Anyway, there are many ways of getting that:
If you will never have a slash at the final part, you can ask for whatever comes after the last slash:
.*\/([^\/]+)
Your desired result will be at the group 1.
If the last part will always have the format name("string") then you can match this format as well - which strikes me as more explicit pattern:
\w+\(".*"\)

Related

match first character in a regex?

I have the following regex:
http://([^:]*):?([0-9]*)(/.*)
When I match that against http://brandonhsiao.com/essays/showers.html, the parentheses grab: http://brandonhsiao.com/essays and /showers.html. How can I get it to grab http://brandonhsiao.com and /essays/showers.html?
Put a question mark after the first * you have to make it non-greedy. Right now your code for matching the hostname is grabbing everything all the way up to the last /.
http://([^:]*?):?([0-9]*)(/.*)
But that's not even what I would recommend. Try this instead:
(http://[^\s/]+)([^\s?#]*)
$1 should have http://brandonhsiao.com and $2 should have /essays/showers.html and any hash or query string is ignored.
Note that this is not designed to validate a URL, just to divide a URL up into the portion before the path, and the path itself. For example, it would happily accept invalid characters as part of the hostname. However, it does work fine for URLs with or without paths.
P.S. I don't know exactly what you are doing with this in Lisp, so I have taken the liberty of only testing it in other PCRE-compatible environments. Usually I test my answers in the exact context where they will be used.
$_ = "http://brandonhsiao.com/essays/showers.html";
m|(http://[^\s/]+)([^\s?#]*)|;
print "1 = '$1' and 2 = '$2'\n";
# [j#5 ~]$ perl test2.pl
# 1 = 'http://brandonhsiao.com' and 2 = '/essays/showers.html'
http://([^/:]*):?([0-9]*)(/.*)
The first group is matching everything but : and now I added /, that's because the [^] operator means match everything but what's inside the group, everything else is just the same.
Hope it helped!
http:\/\/([^:]*?)(\/.*)
The *? is a non-greedy match to the first slash (the one just after .com)
See http://rubular.com/r/VmU2ghAX0k for match groups

Regex to find text between second and third slashes

I would like to capture the text that occurs after the second slash and before the third slash in a string. Example:
/ipaddress/databasename/
I need to capture only the database name. The database name might have letters, numbers, and underscores. Thanks.
How you access it depends on your language, but you'll basically just want a capture group for whatever falls between your second and third "/". Assuming your string is always in the same form as your example, this will be:
/.*/(.*)/
If multiple slashes can exist, but a slash can never exist in the database name, you'd want:
/.*/(.*?)/
/.*?/(.*?)/
In the event that your lines always have / at the end of the line:
([^/]*)/$
Alternate split method:
split("/")[2]
The regex would be:
/[^/]*/([^/]*)/
so in Perl, the regex capture statement would be something like:
($database) = $text =~ m!/[^/]*/([^/]*)/!;
Normally the / character is used to delimit regexes but since they're used as part of the match, another character can be used. Alternatively, the / character can be escaped:
($database) = $text =~ /\/[^\/]*\/([^\/]*)\//;
You can even more shorten the pattern by going this way:
[^/]+/(\w+)
Here \w includes characters like A-Z, a-z, 0-9 and _
I would suggest you to give SPLIT function a priority, since i have experienced a good performance of them over RegEx functions wherever it is possible to use them.
you can use explode function with PHP or split with other languages to so such operation.
anyways, here is regex pattern:
/[\/]*[^\/]+[\/]([^\/]+)/
I know you specifically asked for regex, but you don't really need regex for this. You simply need to split the string by delimiters (in this case a backslash), then choose the part you need (in this case, the 3rd field - the first field is empty).
cut example:
cut -d '/' -f 3 <<< "$string"
awk example:
awk -F '/' {print $3} <<< "$string"
perl expression, using split function:
(split '/', $string)[2]
etc.

Pattern matching in Perl

I am doing pattern match for some names below:
ABCD123_HH1
ABCD123_HH1_K
Now, my code to grep above names is below:
($name, $kind) = $dirname =~ /ABCD(\d+)\w*_([\w\d]+)/;
Now, problem I am facing is that I get both the patterns that is ABCD123_HH1, ABCD123_HH1_K in $dirname. However, my variable $kind doesn't take this ABCD123_HH1_K. It does take ABCD123_HH1 pattern.
Appreciate your time. Could you please tell me what can be done to get pattern with _k.
You need to add the _K part to the end of your regex and make it optional with ?:
/ABCD(\d+)_([\w\d]+(_K)?)/
I also erased the \w*, which is useless and keeps you from correctly getting the HH1_K.
You should check for zero or more occurrences of _K.
* in Perl's regexp means zero or more times
+ means atleast one or more times.
Hence in your regexp, append (_K)*.
Finally, your regexp should be this:
/ABCD(\d+)\w*_([\w\d]+(_K)*)/
\w includes letters, numbers as well as underscores.
So you can use something as simple as this:
/ABCD\w+/

Regex for matching last two parts of a URL

I am trying to figure out the best regex to simply match only the last two strings in a url.
For instance with www.stackoverflow.com I just want to match stackoverflow.com
The issue i have is some strings can have a large number of periods for instance
a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com
should also return only yimg.com
The set of URLS I am working with does not have any of the path information so one can assume the last part of the string is always .org or .com or something of that nature.
What regular expresion will return stackoverflow.com when run against www.stackoverflow.com and will return yimg.com when run against a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com
under the condtions above?
You don't have to use regex, instead you can use a simple explode function.
So you're looking to split your URL at the periods, so something like
$url = "a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com";
$url_split = explode(".",$url);
And then you need to get the last two elements, so you can echo them out from the array created.
//this will return the second to last element, yimg
echo $url_split[count($url_split)-2];
//this will echo the period
echo ".";
//this will return the last element, com
echo $url_split[count($url_split)-1];
So in the end you'll get yimg.com as the final output.
Hope this helps.
I don't know what did you try so far, but I can offer the following solution:
/.*?([\w]+\.[\w]+)$/
There are a couple of tricks here:
Use $ to match till the end of the string. This way you'll be sure your regex engine won't catch the match from the very beginning.
Use grouping inside (...). In fact it means the following: match word that contains at least one letter then there should be a dot (backslashed because dot has a special meaning in regex and we want it 'as is' and then again series of letters with at least one of letters).
Use reluctant search in the beginning of the pattern, because otherwise it will match everything in a greedy manner, for example, if your text is :
abc.def.gh
the greedy match will give f.gh in your group, and its not what you want.
I assumed that you can have only letters in your host (\w matches the word, maybe in your example you will need something more complicated).
I post here a working groovy example, you didn't specify the language you use but the engine should be similar.
def s = "abc.def.gh"
def m = s =~/.*?([\w]+\.[\w]+)$/
println m[0][1] // outputs the first (and the only you have) group in groovy
Hope this helps
if you needed a solution in a Perl Regular Expression compatible way that will work in a number of languages, you can use something like that - the example is in PHP
$url = "a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com";
preg_match('|[a-zA-Z-0-9]+\.[a-zA-Z]{2,3}$|', $url, $m);
print($m[0]);
This regex guarantees you to fetch the last part of the url + domain name. For example, with a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com this produces
yimg.com
as an output, and with www.stackoverflow.com (with or without preceding triple w) it gives you
stackoverflow.com
as a result
A shorter version
/(\.[^\.]+){2}$/

Quick Regex question

Can anybody guide me in the right direction...
I have some folder strucktures that I want to check for a trailing / slash, some of the folders don't have a trailing / and if it does not have a trailing
/ I need to add one
If I use this regex it works but it replace the last character of the folder name
Folder/name/test
folder/name/test1/
folder/name/test2/
replace(/.$/ig,"/");
This regex replaces Folder/name/tes[t]/ but will take out the t and replace it with /
Hope this makes sense...
Try something like this:
replace(/[^/]$/ig, "$0/")
replace(/(.)$/ig,"\1/");
or better
replace(/([^\\])$/ig,"\1/");
if \1 isn't a backreference in your language, then you'll have to figure that out, or tell us teh language.
The regex you made basically means this: take any character that is the last one in the string and replace it with /. What you need to do is to group the last character and then insert it again in the replacement, like this:
replace(/([^\/])$/ig,"$1/");
For more information see
http://www.regular-expressions.info/brackets.html
Without knowing the language it's difficult to post a correct answer and you can't use the code provided in a cut-and-paste fashion. Anyway I might go for this regex:
replace(/(.)\/*$/,"\1/");
This will append the trailing / only if it's not there yet.
I'm not sure which language this is for but this is how you would do it in Perl:
#! /local/bin/perl
my #data = <data>;
while (<DATA>)
{
s#[^/]\n#/\n#m;
print;
}
__DATA__
/foo/bar/
/baz/jazz
/baz/jazz
This then prints out the following:
/foo/bar/
/baz/jaz/
/baz/jazz/
The key to the regex is the "[^/]\n" This basically matches anything at the end next to to the newline. With your nomenclature, I would assume the syntax would be the following:
replace(/[^\/]\n/ig,"/");
Or if there is no newline use this:
replace(/[^\/]$/ig,"/");
I hope that helps.
I would avoid a regular expressions in this case and do something easier like:
$path = rtrim($path, '/').'/';
EDIT:
Woops, assumed it was php...