regex limiting wildcards for url folders - regex

I'd like to set up a regular expression that matches certain patterns for a URL:
http://www.domain.com/folder1/folder2/anything/anything/index.html
This matches, and gets the job done:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/.*\/.*\/index\.html([\?#].*)?$/.test(location.href)
I'm unsure how to limit the wildcards to one folder each. So how can I prevent the following from matching:
http://www.domain.com/folder1/folder2/folder3/folder4/folder5/index.html
(note: folder 5+ is what I want to prevent)
Thanks!

Try this regular expression:
/^http:\/\/www\.domain\.com\/(?:\w+\/){1,3}index\.html([\?#].*)?$/
Change the number 3 to the maximum depth of folders possible.

. matches any character.
[^/] matches any characters except /.
Since the / character marks the begining and end of regex literals, you may have to escape them like this: [^\/].
So, replacing .* by [^\/]* will do what you want:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^\/]*\/[^\/]*\/index\.html([\?#].*)?$/.test(location.href)

/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^/]*\/[^/]*\/index\.html([\?#].*)?$/
I don't remember whether we should escape the slashes within the []. I don't think so.
EDIT: Aknoledging tom's comment using + instead of *:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^/]+\/[^/]+\/index\.html([\?#].*)?$/

/^http:\/\/www\.domain\.com\/\([^/]*\/\)\{2\}/
And you can change 2 to whatever number of directories you want to match.

You may use:
^http:\/\/www\.domain\.com\/folder1\/folder2\/(\w*\/){2}index\.html([\?#].*)?$/.test(location.href)

Related

Laravel Routes Regular Expression start with # character

I need to create a route that responses to any string starting with '#' character. Routes like following examples :
www.mywebsite.com/#john
www.mywebsite.com/#jack
www.mywebsite.com/#something
So I wrote:
Route::get('{something}','SomeController#someMethod')->where('something','/#^/');
But when I test it, I face 404 not found found page.
what is the correct regular expression for this?
Route::get('/{tag}', 'SomeController#someMethod')->where('tag', '^#.*');
This will also work:
Route::get('#{something}', 'SomeController#someMethod');
You can write this
Route::pattern('tag', '#[a-zA-Z]');
Route::get('{tag}', 'SomeController#someMethod');
This way you seperate the logic of the regex and the route and it will work as you want
Note the #^ pattern means # should be followed with the beginning of string, which is not possible, and the pattern never matches any string. The '^#' pattern asserts the position at the start of the string, and only there does it try to match #.
Also, the usual / regex delimiters should be removed from this pattern as they are treated as part of the pattern here.
So, in your case you may just swap the anchor and the # char:
Route::get('{something}','SomeController#someMethod')->where('something','^#');

How to capture text between two markers?

For clarity, I have created this:
http://rubular.com/r/ejYgKSufD4
My strings:
http://blablalba.com/foo/bar_soap/foo/dir2
http://blablalba.com/foo/bar_soap/dir
http://blablalba.com/foo/bar_soap
My Regular expression:
\/foo\/(.*)
This returns:
/foo/bar_soap/dir/dir2
/foo/bar_soap/dir
/foo/bar_soap
But I only want
/foo/bar_soap
Any ideas how I can achieve this? As illustrated above, I want everything after foo up until the first forward slash.
Thanks in advance.
Edit. I only want the text after foo until until the next forward slash after. Some directories may also be named as foo and this would render incorrect results. Thanks
. will match anything, so you should change it to [^/] (not slash) instead:
\/foo\/([^\/]*)
Some of the other answers use + instead of *. That might be correct depending on what you want to do. Using + forces the regex to match at least one non-slash character, so this URL would not match since there isn't a trailing character after the slash:
http://blablalba.com/foo/
Using * instead would allow that to match since it matches "zero or more" non-slash characters. So, whether you should use + or * depends on what matches you want to allow.
Update
If you want to filter out query strings too, you could also filter against ?, which must come at the front of all query strings. (I think the examples you posted below are actually missing the leading ?):
\/foo\/([^?\/]*)
However, rather than rolling out your own solution, it might be better to just use split from the URI module. You could use URI::split to get the path part of the URL, and then use String#split split it up by /, and grab the first one. This would handle all the weird cases for URLs. One that you probably haven't though of yet is a URL with a specified fragment, e.g.:
http://blablalba.com/foo#bar
You would need to add # to your filtered-character class to handle those as well.
You can try this regular expression
/\/foo\/([^\/]+)/
\/foo\/([^\/]+)
[^\/]+ gives you a series of characters that are not a forward slash.
the parentheses cause the regex engine to store the matched contents in a group ([^\/]+), so you can get bar_soap out of the entire match of /foo/bar_soap
For example, in javascript you would get the matched group as follows:
regexp = /\/foo\/([^\/]+)/ ;
match = regexp.exec("/foo/bar_soap/dir");
console.log(match[1]); // prints bar_soap

Regular expression email issue

In my email regex i want following output
abc#abc.co.in
I am writing down below
^[\w]+#[a-z]+([.a-z]+)
Issue is coming in .co.in
I want to iterate the ".co" part not more then 2 times and it should be more then 1 and less then or equals to 2
I tried below but not working
^[\w]+#[a-z]+([.a-z]{1,2})
You need to take the . out of your character class and escape it:
^\w+#[a-z]+(\.[a-z]+){1,2}$
Also changed your [\w] to \w and added a $ to the end so that the whole string must match, not just the beginning.
I think you want this:
^\w+#[a-z]+(\.[a-z]+){1,2}
Note that your [\w] element is unnecessary, \w is sufficient.
your dot in this case means "any character". it must be escaped. try this:
^[a-z]+[a-z0-9_]*#[a-z]+[a-z0-9\-]*\.[a-z]{1,2}\.[a-z]{1,2}$
You can use it like this. \w+#\w+(\.\w{2}){2}

Little vim regex

I have a bunch of strings that look like this: '../DisplayPhotod6f6.jpg?t=before&tn=1&id=130', and I'd like to take out everything after the question mark, to look like '../DisplayPhotod6f6.jpg'.
s/\(.\.\.\/DisplayPhoto.\{4,}\.jpg\)*'/\1'/g
This regex is capturing some but not all occurences, can you see why?
\.\{4,} is trying to match 4 or more . characters. What it looks like you wanted is "match 4 or more of any character" (.\{4,}) but "match 4 or more non-. characters" ([^.]\{4,}) might be more accurate. You'll also need to change the lone * at the end of the pattern to .* since the * is currently applying to the entire \(\) group.
I think the easyest way to go for this is:
s/?.*$/'/g
This says: delete everything after the question mark and replace it with a single quote.
I would use macros, sometime simpler than regexp (and interactive) :
qa
/DisplayPhoto<Enter>
f?dt'
n
q
And then some #a, or 20000#a to go though all lines.
The following regexp: /(\.\./DisplayPhoto.*\.jpg)/gi
tested against following examples:
../DisplayPhotocef3.jpg?t=before&tn=1&id=54
../DisplayPhotod6f6.jpg?t=before&tn=1&id=130
will result:
../DisplayPhotocef3.jpg
../DisplayPhotod6f6.jpg
%s/\('\.\.\/DisplayPhoto\w\{4,}\.jpg\).*'/\1'/g
Some notes:
% will cause the swap to work on all lines.
\w instead of '.', in case there are some malformed file names.
Replace '.' at the start of your matching regex with ' which is exactly what it should be matching.

%:s/\([0-9]*\)_\(*\)/\2 will not rename files

can someone please edit %:s/\([0-9]*\)_\(*\)/\2 so that i can rename files. for example, if file name is 5555_word_word.jpg, then I want the file name to be word_word.jpg. i feel like I am so close!
You may want to simplify and have it just delete leading numbers and the underscore:
s/^[0-9]+_//
Try this:
:%s/\([0-9]*\)_\(.*\)/\2
The . will match any character (part of the second grouping) and the * will greedily match any amount of them. Your original regex was missing that directive. This will also rename files of the form _word_word.txt to word_word.txt. If you want to require digits to match (probably a good idea), use:
:%s/\([0-9]\+\)_\(.*\)/\2
The \+ directive means to match 1 or more instances.
Your version is fine but you forgot a period and you should probably anchor it to the beginning of a line or to a word boundary using either ^ or \<.
:%s/^\([0-9]*\)_\(.*\)/\2/
You can use \v to clean up some of those slashes.
:%s/\v^([0-9]*)_(.*)/\2/
You can use \ze to avoid capture groups.
:%s/^[0-9]*_\ze.*//
But the trailing .* is superfluous, because it matches anything. So use Seth's version, it's the simplest.