Git URL Structure - regex

I am trying to build a regular expression to match any git read+write URL structure (not just GitHub) and I wanted to check to see if I got the regex right. This is what I have so far
([A-Za-z0-9]+#|http(|s)\:\/\/)([A-Za-z0-9.]+)(:|/)([A-Za-z0-9\/]+)(\.git)?
That regex matches all of the following URLs
git#github.com:user/project.git
https://github.com/user/project.git
http://github.com/user/project.git
git#192.168.101.127:user/project.git
https://192.168.101.127/user/project.git
http://192.168.101.127/user/project.git
http://192.168.101.127/user/project
And others like non-top-level domains and single name domains (http://server/). Are there other url structures that I should be concious of? Also is there a shorter way of writing the existing regex that I have?

If you are using rails / ruby to write your program, check this out. You might be able to get some ideas from here:
http://www.simonecarletti.com/blog/2009/04/validating-the-format-of-an-url-with-rails/

Related

import.io and portia regex url patterns

I am using data scrapers: Import.io & Portia.
They both allow you to define a regular expression for the crawler to abide by.
for example the url: https://weedmaps.com/dispensaries/pdi-medical
how would I account for the ending "pdi-medical"?
I've looked all over and understand how to use regex in a JS environment, but I'm a little confused as to what I'd exactly put in the input on Portia/Import.io
Something like this?
https://weedmaps.com/dispensaries//^[a-zA-Z0-9-_]+$/
For Portia, if you want your crawler to follow any URLs starting with https://weedmaps.com/dispensaries/, you can just add a crawling rule with the following regex:
^https?://weedmaps.com/dispensaries/

KimonoLabs crawler Generated URL List with regex

So, I'm trying to crawl a website that has like 7,000 product pages and the link structure is like this:
https://example.com/category/sub-category/numericid-name-of-the-product/
What I'm trying to achieve is to Generate a URL list, the Kimono App has that option, and it actually sections the URL but I'm only offered default value, range, and custom list.
I tried to put in stuff like "/.+/" to match all the chars, but that does not work, I couldn't find any help on that on official kb.
.I know that import.io had that "{alpahnumeric}" for example for different parts of URL so it matches them, is there a way to accomplish that in kimonolabs app?
Try this regex: https://example.com/([^/]+)/([^/]+)/([0-9]+)-([^/]+)
Note: you may need to escape some characters (namely / would be escaped as \/).
Also, I'm not familiar with KimonoLabs, so I don't know if this is what you're looking for exactly. Feel free to clarify.
Explanation
https://example.com/ literally
([^/]+)/ a bunch of not /s, followed by a /
([0-9]+)-([^/]+) Numbers followed by another bunch of not /s

regex rewrite url cluster

I've been trying to learn regex and its terribly complicated. I'm not even positive that it's possible to rewrite these URLs without doing them individually. I can do them individually (search & replace) but there are a few different clusters and there are 1000's of URLs (migration).
This is a Joomla site running acesef software. Here is an example URL from 1 particular cluster. The end of the URL is identical for old and new URL. Only the beginning directories have changed. So is there a way to match the end of the URL for all URLs in those particular directories from old to new and rewrite it with a single expression?
Old URL = www.domain.com/property-details/condominiums/3448-page-title
New URL = www.domain.com/bangkok/condos/rent/3448-page-title
I won't even bother posting what I've tried to write so far, because its so far off. I'm trying to get my feet wet with regex but this is a pretty complicated rewrite for a beginner.
Well uh, at face value you could just use this:
[^/]+$
This will give you anything after the last / so in your example, you'd get 3448-page-title

search & replace wordpress video shortcode with plain URL using regular expressions

i am transferring a friend's wordpress.com blog to a self-hosted install on my server. problem is, he has many videos embedded in his blog using a shortcode plugin that is not necessary on wordpress 3 (you need only to paste the plain URL to embed videos from YouTube, Vimeo, etc;
I've found a Search Regex plugin that will search & replace using regular expressions, but am unfamiliar with regex myself. how might i catch the url in a shortcode such as [youtube="URL"] and replace it with just the URL?
Thanks for any help you can provide!!
-Jenny
Are you trying to go from "[youtube=http://www.youtube.com/watch?v=JaNH56Vpg-A]" to http://www.youtube.com/watch?v=JaNH56Vpg-A?
This works if there's a white space between different URLs.
find: \[youtube=(\S*)\]
replace with: $1
It's difficult to replace every different service at once since it seems that their short codes are different. For Vimeo this would work. It allows a random number of white space between "vimeo" and URL. And it again needs the white space after closing "]".
find: \[vimeo\s+(\S*)\]
replace with: $1
Maybe theres more robust way to write the expression. (Which validates the correct syntax.) This one's pretty straightforward thought.
The actual regex syntax depend on the language used. Hope this helps.

simple regular expression - match specific url

I'm a noob when it comes to Regular Expressions. I'm using Joomla and the Advanced Module Manager to publish a module to a specific url.
I want to publish a module only to the url /tv-show and not /tv-show/anthingthing-else/blahblah
I thought the way to do it is /tv-show* but obviously not, since it still publishes to other urls with /tv-show on the beginning.
I tried many variations, please tell me where am I going wrong?
Try the following
/tv-show$
The dollar matches the end of a string.