creating regex for some urls for my robot.txt

creating regex for some urls for my robot.txt - regex

I have five url patterns for which i want to make some regex so that i can put it in my robot.txt for noindexing.
for both this pages i need two different regex
Url patterns are like:
https:// example.com/[varying-data]-addiction-treatmnet
https:// example.com/[varying-data]-addiction-treatmnet/thank-you

As you have everytime the same url with only one varying part, you can use a simple regex like
https:\/\/example.com\/.*-addiction-treatmnet
and
https:\/\/example.com\/.*-addiction-treatmnet/thank-you
or as list:
(https:\/\/example.com\/.*-addiction-treatmnet)|(https:\/\/example.com\/.*-addiction-treatmnet/thank-you)

Related

combine two URLs REGEX

I have data from two URLS that I need to combine using REGEX
/online-teaching
/online-teaching?fbclid
I have /(online-teaching)|(online teaching)
I can't figure out how to include the url with the ? and the one without.
Thanks!

How about something as simple as:
online-teaching(?:.+)?
Regex demo
Match online-teaching and anything that follows, if it exists (might need to constraint for specific characters instead of matching all with . to have a valid URL, but I'll leave that up to you).

Conditional Regex to match url

I am trying to make a if/then condition to match the url, but I can't seem to get it to work. I am trying to match URLs and then capture the non-optional group. So - if a url comes in like this:
/en/testing.aspx
I want to capture /testing.aspx
if the url comes in like this:
/testing.aspx
I want to capture /testing.aspx
Is there an easy way to do this using regex?
EDIT:
The Url can be multi-part url, like /en/sub1/sub2/testing.aspx - I essentially want everything after "/en/".

use regex \/en(\/.+)$
Check this out
edited
https://regex101.com/r/lwowhi/6
If there is "/en/" in the URL and you still want to capture /testing.aspx then here is an edit (?:\/en)*(\/.+)$
https://regex101.com/r/lwowhi/8

You can use a greedy regex which will consume everything up until the final forward slash. Then, capture everything which comes after that point.
^.*?(?:\/en)?(\/.*)$
Demo

Guessing all pages are .aspx then use group.
regex: .(/..aspx)
this will match "/testing.aspx" in all bellow samples
/testing.aspx or
/en/testing.aspx or
www.abc.com/en-us/testing.aspx

WordPress URL Rewrite unable to get second matches

My URL is http://example.com/locate/ny/2
in functions, I use below code
$wp_rewrite->add_rule('locate/([^/]+)','index.php?page_id=294&cs=$matches[1]','top');
I got URL like this http://example.com/locate/ny I got this working, but i want to add a pagination after ny like ny?cpaged=3 and rewrite to ny/3
but what is the regexp for index.php?page_id=294&cs=$matches[1]&cpaged=$matches[2] from url http://example.com/locate/ny/2

You need to add another capturing group within the regex that just picks out the digits from the url. Assuming your url structure isn't going to change this regex should work.
$wp_rewrite->add_rule('locate\/([^\/]+)\/(\d*)','index.php?page_id=294&cs=$matches[1]&cpaged=$matches[2]','top');
See here for a demo and to play around with it further: https://regex101.com/r/BNkZBo/1/

Regex for .htaccess excluding URL from redirect because of occurrence day in URL

What I'm trying to do is to create a regex for matching URL's with this structure:
http://example.com/2016/01/sample-post-title/
and not matching structure of:
http://example.com/2016/01/31/
http://example.com/2016/01/page/sample-post-title/
http://example.com/2016/01/31/page/sample-post-title/
Now I've got such a Regex structure:
^/([0-9]{4})/([0-9]{2})/(?!page/)(.+)$
but it matches links from the first URL exception example. What should I add to regex to solve this problem?

try this:
\/\d{4}\/\d{2}\/sample-post-title.*$
or
\/\d{4}\/\d{2}\/(?!page|\d+).*$

Regex to match any domain except two domains

in my htaccess i'm trying to set document root for all park domains to a specific path except two main domains, so basically i need a regex to match any domain except tow domains
i found something like this
^(?!foo$|bar$).*
and this
(?>[\w-]+)(?<!tea|nuka-cola)
but can not get it work with my situation because there is a dot tld in domain name and i want to use regex there too
here is my current regex
^(.*?)\.(com|net)$
instead of (.*?) i want to make exception there

Use a negative look behind:
^(.*?)(?<!(foo)|(bar))\.(com|net)$
Not sure what you want, but this regex will not match urls ending in foo.com or bar.net etc

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

creating regex for some urls for my robot.txt - regex

Related

combine two URLs REGEX

Conditional Regex to match url

WordPress URL Rewrite unable to get second matches

Regex for .htaccess excluding URL from redirect because of occurrence day in URL

Regex to match any domain except two domains

Categories

Resources