Regex help - separating controller and action - regex

I need to separate the following url:
/myapp/public/controller/action
as $1 will be controller and $2 will be the action.
Here is the REGEX I´m using:
^([a-zA-Z0-9\/\-_]+)\.?([a-zA-Z\-_]+)?$
For some reason It is not separating, but putting the whole result in $1:
$1 = /myapp/public/controller/action
$2 = '' (empty)
PS: action is optional, as I may have /myapp/public/controller. In that case $2 shall be empty.
[EDIT]
The URL string may have the following formats:
/myapp/public/controller
/myapp/public/controller/action
/myapp/public/controller/action/param1
/myapp/public/controller/action/param1/param2/paramN
$1 shall contain always the controller with full path
$2 will receive the remaining (action, action/param1, action/param1/param2/paramN)
The controller will be always myapp/public/controller, where myapp/public is static and controller is the controller name that needs to go to $1 (the 3rd string).
At the extreme we can call /myapp/public and will be sending empty '' controller that will default to index on the application.
PS: Sometimes things that seens simple are exactly the other way.... Thanks for the questions...

^(.+)\/([^\/]+)$
See it in action
With the new requirements:
^((?:\/[^\/]+){2,3})((?:\/[^\/]+)*)$
See it in action
Explaination:
(?:\/[^\/]+) - matches a forward slash followed by characters, which are not forward slashes (like /myapp, /public, /controller, /action and so on)
{2,3} - the controller consists of the first two or three such sequences. Two in the case when you are using the default index of the application.
* - the remaining such sequences are part of the action

Related

Splitting URL in three parts in htaccess - regex

I'm trying to split any URL that would show up on my website into three parts:
Language (optional)
Hierarchical structure of the page (parents)
Current page
Right now I operate with 1 and 3 but I need to develop a way to allow for the pages to have the same names if they have different parents and therefore full URL is unique.
Here are the types of URL I may have:
(nothing)
en
en/test
en/parent/test
test
parent/test
ggparent/gparent/parent/test
I thought about extending my current directive:
RewriteRule ^(?:([a-z]{2})(?=\/))?.*(?:\/([\w\-\,\+]+))$ /index.php?lang=$1&page=$2 [L,NC]
to the following:
(?:([a-z]{2})(?=\/))?(.*)\/([^\/]*)?$
Which then I could translate to index.php?lang=$1&tree=$2&page=$3 but the difficulty I have is that the second capturing group captures the slash from the beginning.
I believe I can't (based on my search so far) dynamically have all the strings between slashes to be returned and make the last one to always be first, without repeating the same regex. I thought I would capture anything between language and current page and process the tree in PHP.
However my current regex has some problems and I can't figure them out:
If language is on its own, it doesn't get captured
The second group captures the slash betwen language and the tree
Link to Regex101: https://regex101.com/r/ecHBQT/1
This likely does it: Split the URL by slash into lang, tree, and page at the proper place, with all three parts possibly empty:
RewriteRule ^([a-z]{2}\b)?\/?(?:\/?(.+)\/)?(.*)$ /index.php?lang=$1&tree=$2&page=$3 [L,NC]
Testcase in JavaScript using this regex:
const regex = /^([a-z]{2}\b)?\/?(?:\/?(.+)\/)?(.*)$/;
[
'',
'en',
'en/test',
'en/parent/test',
'test',
'parent/test',
'ggparent/gparent/parent/test'
].forEach(str => {
let rewritten = str.replace(regex, '/index.php?lang=$1&tree=$2&page=$3');
console.log('"' + str + '" ==>', rewritten);
})
Output:
"" ==> /index.php?lang=&tree=&page=
"en" ==> /index.php?lang=en&tree=&page=
"en/test" ==> /index.php?lang=en&tree=&page=test
"en/parent/test" ==> /index.php?lang=en&tree=parent&page=test
"test" ==> /index.php?lang=&tree=&page=test
"parent/test" ==> /index.php?lang=&tree=parent&page=test
"ggparent/gparent/parent/test" ==> /index.php?lang=&tree=ggparent/gparent/parent&page=test
Notes:
This assumes that a page and parent must not be exactly two chars long (you could specify an explicit or-list of all languages you have)
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
I hope I've understood your question right. You can try this regex:
^([a-z]{2}(?=\/|$))?(?:\/?(.+)\/)?(.*)
Regex demo.
This will match 3 groups: first the language (two characters), then the parents and the last group is last part of the URL (after /).

JMeter extract link using regular expression pass into next request with blank values

This is how I have Test Plan set up:
HTTP Request -> Regular Expression Extractor to extract multiple links - This is extracting correctly -- But some of the links are Blank
RegularExpressionExtractor --- <a href="(.*)" class="product-link">
BeanShell Sampler - to filter blank or null values -- This works fine
BeanShell Sampler
log.info("Enter Beanshell Sampler");
matches = vars.get("url_matchNr");
log.info(matches);
for (Integer i=1; i < Integer.parseInt(matches); i++)
{
String url = vars.get("url_"+i);
//log.info(url1);
if(url != null #and url.length() > 0)
{
log.info(i+"->" + url);
//return url;
//vars.put("url2", url);
vars.put("url2", url);
//props.put("url2", url);
log.info("URL2:" + vars.get("url2"));
}
}
ForEach Controller
ForEach Controller
Test Plan
The problem I am facing is ForEach Controller runs through all the values including Blank or NULL -- How can I run the loop only for the non null blank values
You should change your regular expression to exclude empty value
Instead of using any value including empty using * sign
<a href="(.*)" class="product-link">
Find only not empty strings using + sign:
<a href="(.+)" class="product-link">
As mentioned earlier, you should change your regex!
you can replace it directly by
<a href="(.+)" class="product-link">
or by something more constraining like this:
<a href="^((https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?)$" class="product-link">
which is a regex to match only URLs.
https://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149
The first capturing group is all option. It allows the URL to begin
with "http://", "https://", or neither of them. I have a question mark
after the s to allow URL's that have http or https. In order to make
this entire group optional, I just added a question mark to the end of
it.
Next is the domain name: one or more numbers, letters, dots, or hypens
followed by another dot then two to six letters or dots. The following
section is the optional files and directories. Inside the group, we
want to match any number of forward slashes, letters, numbers,
underscores, spaces, dots, or hyphens. Then we say that this group can
be matched as many times as we want. Pretty much this allows multiple
directories to be matched along with a file at the end. I have used
the star instead of the question mark because the star says zero or
more, not zero or one. If a question mark was to be used there, only
one file/directory would be able to be matched.
Then a trailing slash is matched, but it can be optional. Finally we
end with the end of the line.
String that matches:
http://net.tutsplus.com/about
String that doesn't match:
http://google.com/some/file!.html (contains an exclamation point)
Good luck!!!
ForEach controller doesn't work with JMeter Properties, you need to change the "Input Variable Prefix" to url_2 and your test should start working as expected.
Also be aware that since JMeter 3.1 it is recommended to use Groovy language for any form of scripting so consider migrating to JSR223 Sampler and Groovy language on next available opportunity.
Groovy has much better performance while Beanshell might become a bottleneck when it comes to immense loads.

Multi Taxonomy URL rewrite not wokring

I am trying to rewrite WP URL and here is the URL:
http://example.com/?job_listing_region=california&job_listing_category=wordpress
I want to change it as http://example.com/california/wordpress
I tried this:
add_rewrite_rule('([^/]*)/([^/]*)/?','job_listing_region=$matches[1]&job_listing_category=$matches[2]','top');
But its not working. Sorry I am not good at regex it might be a small one but I am not able to find a solution. Thanks in advance
Code
See regex in use here
Regex
\??\w+=([^&]+)&?
Replacement
$1/
Results
Input
http://example.com/?job_listing_region=california&job_listing_category=alcohol-abuse-programs
Output
http://example.com/california/alcohol-abuse-programs/
Explanation
Regex
\?? Match between zero and one of the ? character literally
\w+= Match any word character one or more times, followed by the = character literally (\w can be replaced with [a-zA-Z0-9_] if preferred/doesn't work in your regex flavour)
([^&]+) Capture into capture group 1 any character except the & character literally one or more times
&? Match between zero and one & character literally
Replacement
$1/ Matches the same text as most recently matched by the 1st capturing group, followed by a / literally
Using http://example.com/{job_listing_region}/{job_listing_category}/ is too broad - it would affect every single URL on your website, such as /wp-admin.
I'd recommend using http://example.com/jobs/{job_listing_region}/{job_listing_category}/ as your URL structure, in which case the rewrite rule would be set as follows:
add_rewrite_rule('^jobs/([^/]*)/([^/]*)/?','index.php?page_id=1234&job_listing_region=$matches[1]&job_listing_region=$matches[2]','top');
page_id should be set to the page ID of the page/post you'd like to route this to.
It's important to note that the rewrite might not be available until you view/save the Settings -> Permalinks page in the back end.
Thanks for the above answer, they helped me to get an solution finally.
So while passing url strings to wordpress we need to register the variables in functions.php and then instead of using php get we need to use wordpress var queries to get the urls.
As suggested by #athms above, I changed url structure.
Now "wordpress" is a wordpress page in which the query variables are captured.
So my URL is http://example.com/wordpress/?job_listing_region=california
In functions.php I registered these variables:
function custom_query_vars_filter($vars) {
$vars[] = 'job_listing_region';
return $vars;
}
add_filter( 'query_vars', 'custom_query_vars_filter' );
function custom_rewrite_tag1() {
add_rewrite_tag('%job_listing_region%', '([^&]+)');
}
add_action('init', 'custom_rewrite_tag1', 10, 0);
Rewrite Rule in functions.php:
function custom_rewrite_rule3() {
add_rewrite_rule('^wordpress/([^/]*)/?','index.php?page_id=35349&state=$matches[1]','top');
}
add_action('init', 'custom_rewrite_rule3', 10, 0);
Here page id is the id of page I created i.e "wordpress"
And in the page template for "wordpress" I captured the region using:
$region = get_query_var('job_listing_region');
Now you can pass this variable to your query.
So now you can start using this pretty URL:
http://example.com/wordpress/california
The end of URL california is taken as query string and can be used in our template.

Capture variable with zero or one slashes in Django URL

I need to match a Django URL pattern that can contain zero or one slashes. So far I've got a pattern that matches any number of characters, including slashes.
url(r'^(?P<myvar>[\w/-]+)/', index.home)
I tried adding a question mark after the slash in the variable pattern, but that doesn't seem to work.
url(r'^(?P<myvar>[\w/?-]+)/', index.home)
To be absolutely clear, I want to capture strings such as foo and foo/bar but not foo/bar/bing.
EDIT:
So my root URL is example.com. I need three different routes: one for the main page, one for variable foo that can either be just an alphanumerical string or an alphanumerical string containing one slash, and one for two variables foo and bar where bar must be a simple alphanumerical string.
example.com # should lead to main page
example.com/hey # should lead to second route where foo="hey"
example.com/hey/ho # should lead to second route where foo="hey/ho"
example.com/hey/ho/hi # should lead to third route where foo="hey/ho" and bar="hi"
So I need to capture foo and bar as variables, where foo can either contain a slash or not contain a slash. But as far as this question is concerned, I'm only interested in capturing foo.
You could try the below regex to capture specific parts in an URL.
^[^/]*\/(?P<foo>(?:[^/\n]*(?:/[^/\n]*)?))(?:/(?P<bar>.*))?
DEMO

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+