Need a regex that will give me pagepath without domain - regex

I been trying to use regexextract in docs (or excel) to get the pagepath of a url - like what is after the tld.
example http://google.com/this-folder/this-page-is-here
I just want it to extract /this-folder/this-page-is-here, but so far I can only get this-page-is-here or /this-folder seperately.
Sorry I'm not too good with regex can anyone help me out?
This is what I've tried
=regexextract(A1; "\//*\/*.*\/(.*)")
which returns this-page-is-here
But I've been trying it so long I don't even understand life anymore can someone show me how you're supposed to do this?

=REGEXEXTRACT(A1,"//.+?(/.*)")
Tested working - You need to add the ? to make the .+ non-greedy (stop matching ASAP)
Taking your version and adding the ? fixes it as well (I also removed an extra / at the beginning)
=regexextract(A1; "//.*?/(.*)")

Related

Google Form Validate a specific URL Regex

I am creating a google form and trying to create a regex on of the fields because I need them to enter a profile link from a specific website. I'm a beginner with regex and this is what I have come up with:
/^(http:\/\/)?(steamcommunity\.com\/id\/)*\/?$/
But when I go to enter a test link such as: http://steamcommunity.com/id/bagzli it fails it. I don't understand what is wrong about it.
You missed a dot (meaning any character) after the (/id\). Try this:
/^(http:\/\/)?(steamcommunity\.com\/id\/).*\/?$/
^-- added
The ultimate goal of what I was trying to accomplish is to ensure that certain text was entered in the box. I thought I had to use Regex to accomplish that, but google forms also has "Text Contains" feature which I made use of to solve my problem. The regex by Zoff Dino did not work, I am not sure why as it seems completely correct.
I will mark this as resolved as I managed to get my answer, even if it was not via regex.

Regex to find a web address

I'm trying to isolate links from html using a regex and the one I found that is suppose to do it doesn't seem to work.
/^(http?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Am I missing something? I'm using Brackets as my text editor
^(?:http|https):\/\/(?:[a-z0-9\-\.]+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:\.\?\+=&%#!\-\/\(\)]+)|\?(?:[\w#!:\.\?\+=&%#!\-\/\(\)]+))?$
Messy, but works.
Also, you might want to look at a similar question: Regex expression for valid website link
Hope this helps :)
It is hard to make it 100% accurate.
A url could also be a IP address for example.
http://ip/
It can contain query strings.
http://www.google.com/?a=1&b=2
It can contain spaces.
http://www.google.com/this is my url/
It depends on what need you have for accuracy.

Regex, optional match in url

I spend a couple of hour with no good result (maybe my mood is not helping about it).
I am trying to build a regex to help me match both urls:
/reservables/imagenes/4/editar/6
/reservables/imagenes/4/subir
As you note above, the last segment in the first url 6 is not present at the end of the second url, because this segments is optional here. So I need to match both urls in one regex, for that, I have tried this:
reservables/(editar|imagenes)/([0-9]+)/(imagen|editar|actualizar|subir)/([0-9]+)
That works fine only for the first url. So, reading a few notes about regex it suggest me that I need the ? symbol, right? So, I tried this one, but it did not work:
reservables/(editar|imagenes)/([0-9]+)/(imagen|editar|actualizar|subir)/([0-9]+)?
Well, I do not what I am doing wrong.
You want to put the ? around the / as well, like so:
reservables/(editar|imagenes)/([0-9]+)/(imagen|editar|actualizar|subir)(?:/([0-9]+))?
You can see that it matches correctly on debuggex.
This one will work:
reservables/(editar|imagenes)/([0-9]+)/(imagen|editar|actualizar|subir)/([0-9]*)

RegEx all URLs that do NOT contain a string

I seem to be having a bit of a brain fart atm. I've got Google counting my transitions correctly but I'm getting false positives.
This is the current goal RegEx which works great.
^/click/[0-9]+\.html\?.*
But I also want it the RegEx to NOT county anything that has &confirm=1 I'm quite stuck as to how to do that in the RegEx, I thought I might be able to use [^(?:&confirm=1)] but I don't think that's valid.
Use "exclude", not "include" filter option
Try this:
^/click/[0-9]+\.html\?(?!.*\bconfirm=1).*
I changed it slightly so it will still exclude if confirm=1 is the first param (preceded by the ? rather than &)
I'm afraid you can't... I've tried doing this before, what I found was that you used to be able to do this with negative lookahead (see Rubens), but Google Analytics stopped supporting this at some point (source: http://productforums.google.com/forum/#!topic/analytics/3YnwXM0WYxE).
Maybe I'm a little late.
What about just writing :
[^(&confirm=1)]
?

Perl/lighttpd regex

I'm using regex in lighttpd to rewrite URLs, but I can't write an expression that does what I want (which I thought was pretty basic, apparently not, I'm probably missing something).
Say I have this URL: /page/variable_to_pass/ OR /page/variable_to_pass/
I want to rewrite the URL to this: /page.php?var=variable_to_pass
I've already got rules like ^/login/(.*?)$ to handle specific pages, but I wanted to make one that can match any page without needing one expression per page.
I tried this: ^/([^.?]*) but it matches the whole /page/variable_to_pass/ instead of just page.
Any help is appreciated, thanks!
This regexp should do what you need
/([^\/]+)/(.+)
First match would be page name, and the second - variable value
Try:
/([^.?])+/([^.?])+/
That should give you two matches.