How to regex a branch name? - regex

I have variable with "origin/blahbranch" that I want to substring into "blahbranch", how to substring it? I tried with
dev newbranch = (branch1 =~ /.*)[0]
but that left me with
1. / sign included which I don't want
2. the actual git instruction returns error message when embedding the parameter ${newbranch} :
"unexpected char: '''"

Assuming branch1 is string you can use split function
List<String> list = new ArrayList<String>(Arrays.asList(branch1.split("/")));
list.remove(0);
def newbranch = String.join("/", list.toArray(new String[0]))
println newbranch
Very simple solution considering remote always remains origin you can do below
def newbranch = "origin/blahbrachwithslash/blahbranch".replace("origin/","")
println newbranch

Related

remove "?show=false" using regex [duplicate]

I looking for regular expression to use in my javascript code, which give me last part of url without parameters if they exists - here is example - with and without parameters:
https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg?oh=fdbf6800f33876a86ed17835cfce8e3b&oe=599548AC
https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg
In both cases as result I want to get:
14238253_132683573850463_7287992614234853254_n.jpg
Here is this regexp
.*\/([^?]+)
and JS code:
let lastUrlPart = /.*\/([^?]+)/.exec(url)[1];
let lastUrlPart = url => /.*\/([^?]+)/.exec(url)[1];
// TEST
let t1 = "https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg?oh=fdbf6800f33876a86ed17835cfce8e3b&oe=599548AC"
let t2 = "https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg"
console.log(lastUrlPart(t1));
console.log(lastUrlPart(t2));
May be there are better alternatives?
You could always try doing it without regex. Split the URL by "/" and then parse out the last part of the URL.
var urlPart = url.split("/");
var img = urlPart[urlPart.length-1].split("?")[0];
That should get everything after the last "/" and before the first "?".

Replace variable names with actual class Properties - Regex? (C#)

I need to send a custom email message to every User of a list ( List < User > ) I have. (I'm using C# .NET)
What I would need to do is to replace all the expressions (that start with "[?&=" have "variableName" in the middle and then ends with "]") with the actual User property value.
So for example if I have a text like this:
"Hello, [?&=Name]. A gift will be sent to [?&=Address], [?&=Zipcode], [?&=Country].
If [?&=Email] is not your email address, please contact us."
I would like to get this for the user:
"Hello, Mary. A gift will be sent to Boulevard Spain 918, 11300, Uruguay.
If marytech#gmail.com is not your email address, please contact us."
Is there a practical and clean way to do this with Regex?
This is a good place to apply regex.
The regular expression you want looks like this /\[\?&=(\w*)\]/ example
You will need to do a replace on the input string using a method that allows you to use a custom function for replacement values. Then inside that function use the first capture value as the Key so to say and pull the correct corresponding value.
Since you did not specify what language you are using I will be nice and give you an example in C# and JS that I made for my own projects just recently.
Pseudo-Code
Loop through matches
Key is in first capture group
Check if replacements dict/obj/db/... has value for the Key
if Yes, return Value
else return ""
C#
email = Regex.Replace(email, #"\[\?&=(\w*)\]",
match => //match contains a Key & Replacements dict has value for that key
match?.Groups[1].Value != null
&& replacements.ContainsKey(match.Groups[1].Value)
? replacements[match.Groups[1].Value]
: "");
JS
var content = text.replace(/\[\?&=(\w*)\]/g,
function (match, p1) {
return replacements[p1] || "";
});

Javascript regex replace

I have a langauge dropdown, and a javascript function which changes the page to the corresponding language selected. I need help on my regex replace:
For example, I would like this URL to turn into this url:
http://localhost:7007/en/Product/Detail/1038
http://localhost:7007/fr/Product/Detail/1038
function languageChange(sender) {
var lang = $(sender).val();
var target = window.location.href;
target = target.replace(/(http:\/\/.*?)([a-zA-Z]{2})(.*$)/gim, '$1' + lang + '$3');
window.location = target;
}
Is your URL always the same structure? If so, you may not need a regex at all. Split the url at each "/", replace index 3, then join your array back to together with "/".
Here is a code sample:
function changeLanguage(url, newLang) {
var url = url.split('/');
url[3] = newLang;
return url.join('/');
}
changeLanguage('http://localhost:7007/en/Product/Detail/1038','Fr');
Note: I originally wrote "splice" instead of "join" in my response. Join is the correct method.
Here is a function that processes any number of URLs within a string, and replaces the language part (the first part of path), only if exists and is from 2 to 4 chars long:
function changeLanguage(text, lang) {
return text.replace(
/\b(\w+:\/\/[^\/]+\/)[A-Z]{2,4}(?=[\/\s]|$)/gim,
'$1' + lang);
}
Edit: Converted to function format.
Use this regex:
target =
target.replace(/(https?:\/\/[^/]+)\/?([^/]*)(.*)/gi, '$1/' + lang + '$3');
if e.g. lang='fr' then target holds http://localhost:7007/fr/Product/Detail/1038 value;

How can I make a regex to find instances of the word Project not in square brackets?

For example:
$lang['Select Project'] = 'Select Project OK';
$lang['Project'] = 'Project';
I want to find only the instances of the word 'Project' not contained within the square brackets.
I'm using ColdFusion studio's extended replace utility to do a global replace.
Any suggestions?
Code Sample Follows:
<?php
$lang['Project Message Board'] = 'Project Message Board';
$lang['Project'] = 'Project';
$lang['Post Message'] = 'Post Message';
$lang['To'] = 'To';
$lang['Everyone'] = 'Everyone';
$lang['From'] = 'From';
$lang['Private Messsage'] = 'Private Messsage';
$lang['Note: Only private message to programmer'] = '[ Note: Please enter programmers id for private message with comma separate operator ]';
$lang['Select Project'] = 'Select Project';
$lang['message_validation'] = 'Message';
$lang['You must be logged in as a programmer to post messages on the Project Message Board'] = 'You must be logged in as a programmer to post messages on the Project Message Board';
$lang['Your Message Has Been Posted Successfully'] = 'Your message has been posted successfully';
$lang['You must be logged to post messages on the Project Message Board'] = 'You must be logged to post messages on the Project Message Board';
$lang['You must be post project to invite programmers'] = 'You must be post project to invite programmers';
$lang['You must be logged to invite programmers'] = 'You must be logged to invite programmers';
$lang['There is no open project to Post Mail'] = 'There is no open project to Post Mail';
$lang['You are currently logged in as']='You are currently logged in as';
$lang['Tip']='Tip: You can post programming code by placing it within [code] and [/code] tags.';
$lang['Submit']='Submit';
$lang['Preview']='Preview';
$lang['Hide']='Hide';
$lang['Show']='Show';
$lang['You are currently logged in as']='You are currently logged in as';
A regexp for 'Project' to the right of an equals sign would be:
/=.*Project/
a regexp that also does what you ask for, 'Project' that has no equals sign to its right would be:
/Project[^=]*$/
or a match of your example lines comes to:
/^\$lang['[^']+']\s+=\s+'Project';$/
By placing 'Project' in brackets () you can use that match in a replacement, adding the flag /g finds all occurences in the line.
Edit: Below didn't work because look-behind assertions have to be fixed-length. I am guessing that you want to do this because you want to do a global replace of "Project" with something else. In that case, borrowing rsp's idea of matching a 'Project' that is not followed by an equals sign, this should work:
/Project(?![^=]*\=)/
Here is some example code:
<?php
$str1 = "\$lang['Select Project'] = 'Select Project OK';";
$str2 = "\$lang['Project'] = 'Project';";
$str3 = "\$lang['No Project'] = 'Not Found';";
$str4 = "\$lang['Many Project'] = 'Select Project owner or Project name';";
$regex = '/Project(?![^=]*\=)/';
echo "<pre>\n";
//prints: $lang['Select Project'] = 'Select Assignment OK';
echo preg_replace($regex, 'Assignment', $str1) . "\n";
//prints: $lang['Project'] = 'Assignment';
echo preg_replace($regex, 'Assignment', $str2) . "\n";
//prints: $lang['No Project'] = 'Not Found';
echo preg_replace($regex, 'Assignment', $str3) . "\n";
//prints: $lang['Many Project'] = 'Select Assignment owner or Assignment name';
echo preg_replace($regex, 'Assignment', $str4) . "\n";
This should work:
/(?<=\=.*)Project/
That will match only the word "Project" if it appears after an equals sign. This means you could use it in a substitution too, if you want to replace "Project" on the right-hand-side with something else.
Thx for help. Not sure what is unclear? I just want to find all instances of the word 'Project' but only instances to the right of the equals sign (i.e. not included in square brackets). Hope that helps.
This actually looks like a tricky problem. Consider
[blah blah [yakkity] Project blah] Project [blah blah] [ Project
This is a parsing problem, and I don't know of any way to do it with one regex (but would be glad to learn one!). I'd probably do it procedurally, eliminating the pairs of brackets that did not contain other pairs until there were none left, then matching "Project".
While it's not clear what instances you want to find exactly, this will do:
^.+? = (.+?);
But you might consider using simple string manipulation of your language of choice.
edit
^.+?=.+?(Project).+?;$
will only match lines that have string Project after the equal sign.
[^\[]'[^'\[\]]+'[^\]] seems to accomplish what you want!
This one: [^\[]'[^'\[\]]*Project[^'\[\]]*' will find all strings, not inside of the file that are contained in quotes, and contain the word project.
Another edit: [^\[]'(?<ProjectString>[^'\[\]]*Project[^'\[\]]*)'[^\]]
This one matches the string, and returns it as the group "ProjectString". Any regex library should be able to pull that out sufficiently.

How to get domain name from URL

How can I fetch a domain name from a URL String?
Examples:
+----------------------+------------+
| input | output |
+----------------------+------------+
| www.google.com | google |
| www.mail.yahoo.com | mail.yahoo |
| www.mail.yahoo.co.in | mail.yahoo |
| www.abc.au.uk | abc |
+----------------------+------------+
Related:
Matching a web address through regex
I once had to write such a regex for a company I worked for. The solution was this:
Get a list of every ccTLD and gTLD available. Your first stop should be IANA. The list from Mozilla looks great at first sight, but lacks ac.uk for example so for this it is not really usable.
Join the list like the example below. A warning: Ordering is important! If org.uk would appear after uk then example.org.uk would match org instead of example.
Example regex:
.*([^\.]+)(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$
This worked really well and also matched weird, unofficial top-levels like de.com and friends.
The upside:
Very fast if regex is optimally ordered
The downside of this solution is of course:
Handwritten regex which has to be updated manually if ccTLDs change or get added. Tedious job!
Very large regex so not very readable.
A little late to the party, but:
const urls = [
'www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'http://www.google.co.uk',
'www.yandex.com',
'yandex.ru',
'yandex'
]
urls.forEach(url => console.log(url.replace(/.+\/\/|www.|\..+/g, '')))
Extracting the Domain name accurately can be quite tricky mainly because the domain extension can contain 2 parts (like .com.au or .co.uk) and the subdomain (the prefix) may or may not be there. Listing all domain extensions is not an option because there are hundreds of these. EuroDNS.com for example lists over 800 domain name extensions.
I therefore wrote a short php function that uses 'parse_url()' and some observations about domain extensions to accurately extract the url components AND the domain name. The function is as follows:
function parse_url_all($url){
$url = substr($url,0,4)=='http'? $url: 'http://'.$url;
$d = parse_url($url);
$tmp = explode('.',$d['host']);
$n = count($tmp);
if ($n>=2){
if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
$d['domain'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-3)];
} else {
$d['domain'] = $tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-2)];
}
}
return $d;
}
This simple function will work in almost every case. There are a few exceptions, but these are very rare.
To demonstrate / test this function you can use the following:
$urls = array('www.test.com', 'test.com', 'cp.test.com' .....);
echo "<div style='overflow-x:auto;'>";
echo "<table>";
echo "<tr><th>URL</th><th>Host</th><th>Domain</th><th>Domain X</th></tr>";
foreach ($urls as $url) {
$info = parse_url_all($url);
echo "<tr><td>".$url."</td><td>".$info['host'].
"</td><td>".$info['domain']."</td><td>".$info['domainX']."</td></tr>";
}
echo "</table></div>";
The output will be as follows for the URL's listed:
As you can see, the domain name and the domain name without the extension are consistently extracted whatever the URL that is presented to the function.
I hope that this helps.
/^(?:www\.)?(.*?)\.(?:com|au\.uk|co\.in)$/
There are two ways
Using split
Then just parse that string
var domain;
//find & remove protocol (http, ftp, etc.) and get domain
if (url.indexOf('://') > -1) {
domain = url.split('/')[2];
} if (url.indexOf('//') === 0) {
domain = url.split('/')[2];
} else {
domain = url.split('/')[0];
}
//find & remove port number
domain = domain.split(':')[0];
Using Regex
var r = /:\/\/(.[^/]+)/;
"http://stackoverflow.com/questions/5343288/get-url".match(r)[1]
=> stackoverflow.com
Hope this helps
I don't know of any libraries, but the string manipulation of domain names is easy enough.
The hard part is knowing if the name is at the second or third level. For this you will need a data file you maintain (e.g. for .uk is is not always the third level, some organisations (e.g. bl.uk, jet.uk) exist at the second level).
The source of Firefox from Mozilla has such a data file, check the Mozilla licensing to see if you could reuse that.
import urlparse
GENERIC_TLDS = [
'aero', 'asia', 'biz', 'com', 'coop', 'edu', 'gov', 'info', 'int', 'jobs',
'mil', 'mobi', 'museum', 'name', 'net', 'org', 'pro', 'tel', 'travel', 'cat'
]
def get_domain(url):
hostname = urlparse.urlparse(url.lower()).netloc
if hostname == '':
# Force the recognition as a full URL
hostname = urlparse.urlparse('http://' + uri).netloc
# Remove the 'user:passw', 'www.' and ':port' parts
hostname = hostname.split('#')[-1].split(':')[0].lstrip('www.').split('.')
num_parts = len(hostname)
if (num_parts < 3) or (len(hostname[-1]) > 2):
return '.'.join(hostname[:-1])
if len(hostname[-2]) > 2 and hostname[-2] not in GENERIC_TLDS:
return '.'.join(hostname[:-1])
if num_parts >= 3:
return '.'.join(hostname[:-2])
This code isn't guaranteed to work with all URLs and doesn't filter those that are grammatically correct but invalid like 'example.uk'.
However it'll do the job in most cases.
It is not possible without using a TLD list to compare with as their exist many cases like http://www.db.de/ or http://bbc.co.uk/ that will be interpreted by a regex as the domains db.de (correct) and co.uk (wrong).
But even with that you won't have success if your list does not contain SLDs, too. URLs like http://big.uk.com/ and http://www.uk.com/ would be both interpreted as uk.com (the first domain is big.uk.com).
Because of that all browsers use Mozilla's Public Suffix List:
https://en.wikipedia.org/wiki/Public_Suffix_List
You can use it in your code by importing it through this URL:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
Feel free to extend my function to extract the domain name, only. It won't use regex and it is fast:
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm#3471878
Basically, what you want is:
google.com -> google.com -> google
www.google.com -> google.com -> google
google.co.uk -> google.co.uk -> google
www.google.co.uk -> google.co.uk -> google
www.google.org -> google.org -> google
www.google.org.uk -> google.org.uk -> google
Optional:
www.google.com -> google.com -> www.google
images.google.com -> google.com -> images.google
mail.yahoo.co.uk -> yahoo.co.uk -> mail.yahoo
mail.yahoo.com -> yahoo.com -> mail.yahoo
www.mail.yahoo.com -> yahoo.com -> mail.yahoo
You don't need to construct an ever-changing regex as 99% of domains will be matched properly if you simply look at the 2nd last part of the name:
(co|com|gov|net|org)
If it is one of these, then you need to match 3 dots, else 2. Simple. Now, my regex wizardry is no match for that of some other SO'ers, so the best way I've found to achieve this is with some code, assuming you've already stripped off the path:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
$dest=$d[$c-2].'.'.$d[$c-1]; # use the last 2 parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3].'.'.$dest; # if so, add a third part
};
print $dest; # show it
To just get the name, as per your question:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3]; # if so, give the third last
$dest=$d[$c-4].'.'.$dest if ($c>3); # optional bit
} else {
$dest=$d[$c-2]; # else the second last
$dest=$d[$c-3].'.'.$dest if ($c>2); # optional bit
};
print $dest; # show it
I like this approach because it's maintenance-free. Unless you want to validate that it's actually a legitimate domain, but that's kind of pointless because you're most likely only using this to process log files and an invalid domain wouldn't find its way in there in the first place.
If you'd like to match "unofficial" subdomains such as bozo.za.net, or bozo.au.uk, bozo.msf.ru just add (za|au|msf) to the regex.
I'd love to see someone do all of this using just a regex, I'm sure it's possible.
/[^w{3}\.]([a-zA-Z0-9]([a-zA-Z0-9\-]{0,65}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}/gim
usage of this javascript regex ignores www and following dot, while retaining the domain intact. also properly matches no www and cc tld
Could you just look for the word before .com (or other) (the order of the other list would be the opposite of the frequency see here
and take the first matching group
i.e.
window.location.host.match(/(\w|-)+(?=(\.(com|net|org|info|coop|int|co|ac|ie|co|ai|eu|ca|icu|top|xyz|tk|cn|ga|cf|nl|us|eu|de|hk|am|tv|bingo|blackfriday|gov|edu|mil|arpa|au|ru)(\.|\/|$)))/g)[0]
You can test it could by copying this line into the developers' console on any tab
This example works in the following cases:
So if you just have a string and not a window.location you could use...
String.prototype.toUrl = function(){
if(!this && 0 < this.length)
{
return undefined;
}
var original = this.toString();
var s = original;
if(!original.toLowerCase().startsWith('http'))
{
s = 'http://' + original;
}
s = this.split('/');
var protocol = s[0];
var host = s[2];
var relativePath = '';
if(s.length > 3){
for(var i=3;i< s.length;i++)
{
relativePath += '/' + s[i];
}
}
s = host.split('.');
var domain = s[s.length-2] + '.' + s[s.length-1];
return {
original: original,
protocol: protocol,
domain: domain,
host: host,
relativePath: relativePath,
getParameter: function(param)
{
return this.getParameters()[param];
},
getParameters: function(){
var vars = [], hash;
var hashes = this.original.slice(this.original.indexOf('?') + 1).split('&');
for (var i = 0; i < hashes.length; i++) {
hash = hashes[i].split('=');
vars.push(hash[0]);
vars[hash[0]] = hash[1];
}
return vars;
}
};};
How to use.
var str = "http://en.wikipedia.org/wiki/Knopf?q=1&t=2";
var url = str.toUrl;
var host = url.host;
var domain = url.domain;
var original = url.original;
var relativePath = url.relativePath;
var paramQ = url.getParameter('q');
var paramT = url.getParamter('t');
For a certain purpose I did this quick Python function yesterday. It returns domain from URL. It's quick and doesn't need any input file listing stuff. However, I don't pretend it works in all cases, but it really does the job I needed for a simple text mining script.
Output looks like this :
http://www.google.co.uk => google.co.uk
http://24.media.tumblr.com/tumblr_m04s34rqh567ij78k_250.gif => tumblr.com
def getDomain(url):
parts = re.split("\/", url)
match = re.match("([\w\-]+\.)*([\w\-]+\.\w{2,6}$)", parts[2])
if match != None:
if re.search("\.uk", parts[2]):
match = re.match("([\w\-]+\.)*([\w\-]+\.[\w\-]+\.\w{2,6}$)", parts[2])
return match.group(2)
else: return ''
Seems to work pretty well.
However, it has to be modified to remove domain extensions on output as you wished.
how is this
=((?:(?:(?:http)s?:)?\/\/)?(?:(?:[a-zA-Z0-9]+)\.?)*(?:(?:[a-zA-Z0-9]+))\.[a-zA-Z0-9]{2,3})
(you may want to add "\/" to end of pattern
if your goal is to rid url's passed in as a param you may add the equal sign as the first char, like:
=((?:(?:(?:http)s?:)?//)?(?:(?:[a-zA-Z0-9]+).?)*(?:(?:[a-zA-Z0-9]+)).[a-zA-Z0-9]{2,3}/)
and replace with "/"
The goal of this example to get rid of any domain name regardless of the form it appears in.
(i.e. to ensure url parameters don't incldue domain names to avoid xss attack)
All answers here are very nice, but all will fails sometime.
So i know it is not common to link something else, already answered elsewhere, but you'll find that you have to not waste your time into impossible thing.
This because domains like mydomain.co.uk there is no way to know if an extracted domain is correct.
If you speak about to extract by URLs, something that ever have http or https or nothing in front (but if it is possible nothing in front, you have to remove
filter_var($url, filter_var($url, FILTER_VALIDATE_URL))
here below, because FILTER_VALIDATE_URL do not recognize as url a string that do not begin with http, so may remove it, and you can also achieve with something stupid like this, that never will fail:
$url = strtolower('hTTps://www.example.com/w3/forum/index.php');
if( filter_var($url, FILTER_VALIDATE_URL) && substr($url, 0, 4) == 'http' )
{
// array order is !important
$domain = str_replace(array("http://www.","https://www.","http://","https://"), array("","","",""), $url);
$spos = strpos($domain,'/');
if($spos !== false)
{
$domain = substr($domain, 0, $spos);
} } else { $domain = "can't extract a domain"; }
echo $domain;
Check FILTER_VALIDATE_URL default behavior here
But, if you want to check a domain for his validity, and ALWAYS be sure that the extracted value is correct, then you have to check against an array of valid top domains, as explained here:
https://stackoverflow.com/a/70566657/6399448
or you'll NEVER be sure that the extracted string is the correct domain. Unfortunately, all the answers here sometime will fails.
P.s the unique answer that make sense here seem to me this (i did not read it before sorry. It provide the same solution, even if do not provide an example as mine above mentioned or linked):
https://stackoverflow.com/a/569219/6399448
I know you actually asked for Regex and were not specific to a language. But In Javascript you can do this like this. Maybe other languages can parse URL in a similar way.
Easy Javascript solution
const domain = (new URL(str)).hostname.replace("www.", "");
Leave this solution in js for completeness.
In Javascript, the best way to do this is using the tld-extract npm package. Check out an example at the following link.
Below is the code for the same:
var tldExtract = require("tld-extract")
const urls = [
'http://www.mail.yahoo.co.in/',
'https://mail.yahoo.com/',
'https://www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'https://google.co.uk',
'https://www.yandex.com',
'https://yandex.ru',
]
const tldList = [];
urls.forEach(url => tldList.push(tldExtract(url)))
console.log({tldList})
which results in the following output:
0: Object {tld: "co.in", domain: "yahoo.co.in", sub: "www.mail"}
1: Object {tld: "com", domain: "yahoo.com", sub: "mail"}
2: Object {tld: "uk", domain: "au.uk", sub: "www.abc"}
3: Object {tld: "com", domain: "github.com", sub: ""}
4: Object {tld: "ca", domain: "github.ca", sub: ""}
5: Object {tld: "ru", domain: "google.ru", sub: "www"}
6: Object {tld: "co.uk", domain: "google.co.uk", sub: ""}
7: Object {tld: "com", domain: "yandex.com", sub: "www"}
8: Object {tld: "ru", domain: "yandex.ru", sub: ""}
Found a custom function which works in most of the cases:
function getDomainWithoutSubdomain(url) {
const urlParts = new URL(url).hostname.split('.')
return urlParts
.slice(0)
.slice(-(urlParts.length === 4 ? 3 : 2))
.join('.')
}
You need a list of what domain prefixes and suffixes can be removed. For example:
Prefixes:
www.
Suffixes:
.com
.co.in
.au.uk
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+)\.[^\/]+/g) {
print $3;
}
/^(?:https?:\/\/)?(?:www\.)?([^\/]+)/i
Just for knowledge:
'http://api.livreto.co/books'.replace(/^(https?:\/\/)([a-z]{3}[0-9]?\.)?(\w+)(\.[a-zA-Z]{2,3})(\.[a-zA-Z]{2,3})?.*$/, '$3$4$5');
# returns livreto.co
I know the question is seeking a regex solution but in every attempt it won't work to cover everything
I decided to write this method in Python which only works with urls that have a subdomain (i.e. www.mydomain.co.uk) and not multiple level subdomains like www.mail.yahoo.com
def urlextract(url):
url_split=url.split(".")
if len(url_split) <= 2:
raise Exception("Full url required with subdomain:",url)
return {'subdomain': url_split[0], 'domain': url_split[1], 'suffix': ".".join(url_split[2:])}
Let's say we have this: http://google.com
and you only want the domain name
let url = http://google.com;
let domainName = url.split("://")[1];
console.log(domainName);
Use this
(.)(.*?)(.)
then just extract the leading and end points.
Easy, right?