How to remove +91 from Indian phone numbers in Ameyo - phone-call

In an inbound contact center for customer service, the phone numbers in the callerid are prefixed with 0 or sometimes with 91. A few inhouse CRMs require very precise phone number without these prefixes. For the CRM popup to work well, it is required that Ameyo removes such prefix, if present. How can it be done

Enable Call Context Inbound Srcphone Script from configuration and paste the following script.
​
if(null !=srcphone && srcphone.length>10 && srcphone.charAt(0) =='9' && srcphone.charAt(1) =='1'){
var srcphoneWithout91 = srcphone.substring(2,srcphone.length);
srcphone=srcphoneWithout91;
}
else if(null !=srcphone && srcphone.length>10 && srcphone.charAt(0) =='0'){
var srcphoneWithout91 = srcphone.substring(1,srcphone.length);
srcphone=srcphoneWithout91;
}
else {
srcphone;
}

Related

Pulling Drive activity report through GCP, is there a way to see folder path?

I am supposed to generate drive activity report so we can track what type of file users are using and where is the file being created (My Drive/shared drive).
I used the GAM command to pull drive activity report which has various fields except for the root path.
Does anyone know a way i can manipulate that so i can get a field that shows folder path as well.
Thanks!
You can try these particular GAM commands so you can edit them later to gather information of the folders and root folders:
gam user <User Email Address> print filetree depth 0 showmimetype gfolder excludetrashed todrive
You can edit the depth, for example orphaned folders when using -1. I am not familiar with which command you use, but you might need to mix or add some fields so it shows the root folder or path.
gam user <User Email Address> print filelist todrive select 1Yvxxxxxxxxxxxxxxxxxxxxxxjif9 showmimetype gfolder fields id
You might need to add over your command something like "print filetree" or "show filepath"
Reference:
https://github.com/taers232c/GAMADV-XTD3/wiki/Users-Drive-Files-Display
I have created a custom menu that iterates through a table of data, the data must have a column with the file IDs of interest and 2 additional columns for owner and path, since the file can be owned by either a user or a shared drive. The user running the function must have Super Admin rights to access files owned by other users and the user in question must be a member of a shared drive for the file to be located. My previous implementation as a custom function failed to address a limitation of this feature where advanced services are inaccessible.
The custom menu is created as explained in this documentation article https://developers.google.com/apps-script/guides/menus. There must be a trigger that executes when the sheet opens the menu is created.
In addition to that the code requires the use of Advanced Services, Google Drive must be added following the steps of this other article https://developers.google.com/apps-script/guides/services/advanced#enable_advanced_services. The advanced service will ask for authorization but the first time the code is executed. You may expedite the process by creating an empty function and running it.
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('File ownership').addItem('Read data', 'readData').addToUi();
}
function readData() {
var sheetData = SpreadsheetApp.getActiveSheet().getDataRange().getValues();
var i = 0;
for (; i < sheetData.length; i++){
if (sheetData[0][i] == '')
break;
}
SpreadsheetApp.getUi().alert('There are ' + i + ' cells with data.');
for (i = 1; i < sheetData.length; i++){
var fileID = sheetData[i][0];
var owner = getFileOwner(fileID);
var path = getFilePath(fileID);
SpreadsheetApp.getActiveSheet().getRange(i + 1,2).setValue(owner);
SpreadsheetApp.getActiveSheet().getRange(i + 1,3).setValue(path );
}
SpreadsheetApp.getUi().alert('The owner and file path have been populated');
}
function getFilePath(fileID, filePath = ""){
try {
var file = Drive.Files.get(fileID,{
supportsAllDrives: true
});
if (!file.parents[0])
return "/" + filePath;
var parent = file.parents[0];
var parentFile = Drive.Files.get(parent.id,{ supportsAllDrives: true });
var parentPath = parentFile.title;
if (parent.isRoot || parentFile.parents.length == 0)
return "/" + filePath;
else {
return getFilePath(
parentFile.id,
parentPath + "/" + filePath);
}
}
catch (GoogleJsonResponseException){
return "File inaccesible"
}
}
function getFileOwner(fileID){
try {
var file = Drive.Files.get(
fileID,
{
supportsAllDrives: true
});
var driveId = file.driveId;
if (driveId){
var driveName = Drive.Drives.get(driveId).name;
return driveName + "(" + driveId + ")";
}
var ownerEmailAddress = file.owners[0].emailAddress;
return ownerEmailAddress;
}
catch (GoogleJsonResponseException){
return "File inaccesible"
}
}
After executing the function, it will take significantly longer the more files IDs it has, the cells will be updated with their respective owner and path.
Note: with a Super Admin account you can programmatically create a view permission for shared drives you don't have access to using APIs or Apps Script, you may submit a separate question for more details or read the documentation in the developer page at https://developers.google.com/drive/api/v2/reference/permissions.

How To Change The Text of The Email Verification Sent Via User.verify(...) in Loopback.js

I appreciate all the need-to-haves that come right out of the box with Loopback.js but one area that could use some flexibility is the email verification that is sent upon user creation. This GitHub project helps illustrate this feature--but no where on SO or the Google Groups or in the documentation (yes, I checked there first) does it show how the actual text of the email verification can be changed.
I implemented the exact same code (as found at the previously referred-to GitHub project) in "verify.ejs"...namely:
<%= text %>
Right now the text that is inserted says:
Please verify your email by opening this link in a web browser:
I would like to refer to this user-interaction as "account activation"--not "email verification". The project has its own requirements that compels me to implement the change in semantics. Thank you in advance.
(You know open source rocks for so many reasons...the top reason for me right now is self-documented code.)
I looked at the source of User.verify(...) and discovered that the options which can be passed in are much more expansive than documented.
In the following code snippets (from Loopback's User model) you'll see what I mean:
options.host = options.host || (app && app.get('host')) || 'localhost';
options.port = options.port || (app && app.get('port')) || 3000;
// ### (later) ### //
options.text = options.text || 'Please verify your email by opening this link in a web browser:\n\t{href}';
options.text = options.text.replace('{href}', options.verifyHref);
So, in short, set these parameters in the options object passed in to User.verify():
var options = {
host: 'http://some.domain.com',
port: 5000,
text: 'Please activate your account by clicking on this link or copying and pasting it in a new browser window:\n\t{href}'
}
Source code for User.verify(..) found at: https://github.com/strongloop/loopback/blob/master/common/models/user.js
tx, that helped me.
to expand, the verification link options are
options.verifyHref = options.verifyHref ||
options.protocol +
'://' +
options.host +
displayPort +
options.restApiRoot +
userModel.http.path +
userModel.sharedClass.find('confirm', true).http.path +
'?uid=' +
options.user.id +
'&redirect=' +
options.redirect;
so, all the options there can be customized (particularly, options.restApiRoot was missing for me above).
alternatively, just set options.verifyHref yourself and only that one will be send as verification link
if you want just the href:
var options = {
text: '{href}'
}

Dashcode scroll area refresh with multiple writes

I'm creating a Dashcode project that allows me to mount remote drives. I have a Scroll Area named "statusArea" defined to contain status text of the mounting progress, kind of a command output area.
The first time through testing I'll force the "ERROR: Must enter " text to be written. Then I'll go to the back to provide a username and password and try again, but the box is never cleared for new text.
// get configuration fields from back panel
var usernameField = document.getElementById("username");
var username = usernameField.value;
var passwordField = document.getElementById("password");
var password = passwordField.value;
// clear status
var statusArea = document.getElementById("statusArea");
var status = document.getElementById("content");
status.innerText = "";
statusArea.object.refresh();
if ( !username || !password) {
status.innerText = "ERROR: Must enter username and password on back panel!";
alert("Must enter username and password on back panel!");
}
if (username && password) {
status.innerText = "Connecting as "+username;
Am I going about this the wrong way?
Thanks,
Bill

Finding the phone company of a cell phone number?

I have an application where people can give a phone number and it will send SMS texts to the phone number through EMail-SMS gateways. For this to work however, I need the phone company of the given number so that I send the email to the proper SMS gateway. I've seen some services that allow you to look up this information, but none of them in the form of a web service or database.
For instance, http://tnid.us provides such a service. Example output from my phone number:
Where do they get the "Current Telephone Company" information for each number. Is this freely available information? Is there a database or some sort of web service I can use to get that information for a given cell phone number?
What you need is called a HLR (Home Location Register) number lookup.
In their basic forms such APIs will expect a phone number in international format (example, +15121234567) and will return back their IMSI, which includes their MCC (gives you the country) and MNC (gives you the phone's carrier). The may even include the phone's current carrier (eg to tell if the phone is roaming). It may not work if the phone is currently out of range or turned off. In those cases, depending on the API provider, they may give you a cached result.
The site you mentioned seems to provide such functionality. A web search for "HLR lookup API" will give you plenty more results. I have personal experience with CLX's service and would recommend it.
This would be pretty code intensive, but something you could do right now, on your own, without APIs as long as the tnid.us site is around:
Why not have IE open in a hidden browser window with the URL of the phone number? It looks like the URL would take the format of http://tnid.us/search.php?q=########## where # represents a number. So you need a textbox, a label, and a button. I call the textbox "txtPhoneNumber", the label "lblCarrier", and the button would call the function I have below "OnClick".
The button function creates the IE instance using MSHtml.dll and SHDocVW.dll and does a page scrape of the HTML that is in your browser "object". You then parse it down. You have to first install the Interoperability Assemblies that came with Visual Studio 2005 (C:\Program Files\Common Files\Merge Modules\vs_piaredist.exe). Then:
1> Create a new web project in Visual Studio.NET.
2> Add a reference to SHDocVw.dll and Microsoft.mshtml.
3> In default.aspx.cs, add these lines at the top:
using mshtml;
using SHDocVw;
using System.Threading;
4> Add the following function :
protected void executeMSIE(Object sender, EventArgs e)
{
SHDocVw.InternetExplorer ie = new SHDocVw.InternetExplorerClass();
object o = System.Reflection.Missing.Value;
TextBox txtPhoneNumber = (TextBox)this.Page.FindControl("txtPhoneNumber");
object url = "http://tnid.us/search.php?q=" + txtPhoneNumber.Text);
StringBuilder sb = new StringBuilder();
if (ie != null) {
ie.Navigate2(ref url,ref o,ref o,ref o,ref o);
ie.Visible = false;
while(ie.Busy){Thread.Sleep(2);}
IHTMLDocument2 d = (IHTMLDocument2) ie.Document;
if (d != null) {
IHTMLElementCollection all = d.all;
string ourText = String.Empty;
foreach (object el in all)
{
//find the text by checking each (string)el.Text
if ((string)el.ToString().Contains("Current Phone Company"))
ourText = (string)el.ToString();
}
// or maybe do something like this instead of the loop above...
// HTMLInputElement searchText = (HTMLInputElement)d.all.item("p", 0);
int idx = 0;
// and do a foreach on searchText to find the right "<p>"...
foreach (string s in searchText) {
if (s.Contains("Current Phone Company") || s.Contains("Original Phone Company")) {
idx = s.IndexOf("<strong>") + 8;
ourText = s.Substring(idx);
idx = ourText.IndexOf('<');
ourText = ourText.Substring(0, idx);
}
}
// ... then decode "ourText"
string[] ourArray = ourText.Split(';');
foreach (string s in ourArray) {
char c = (char)s.Split('#')[1];
sb.Append(c.ToString());
}
// sb.ToString() is now your phone company carrier....
}
}
if (sb != null)
lblCarrier.Text = sb.ToString();
else
lblCarrier.Text = "No MSIE?";
}
For some reason I don't get the "Current Phone Company" when I just use the tnid.us site directly, though, only the Original. So you might want to have the code test what it's getting back, i.e.
bool currentCompanyFound = false;
if (s.Contains("Current Telephone Company")) { currentCompanyFound = true }
I have it checking for either one, above, so you get something back. What the code should do is to find the area of HTML between
<p class="lt">Current Telephone Company:<br /><strong>
and
</strong></p>
I have it looking for the index of
<strong>
and adding on the characters of that word to get to the starting position. I can't remember if you can use strings or only characters for .indexOf. But you get the point and you or someone else can probably find a way to get it working from there.
That text you get back is encoded with char codes, so you'd have to convert those. I gave you some code above that should assist in that... it's untested and completely from my head, but it should work or get you where you're going.
Did you look just slightly farther down on the tnid.us result page?
Need API access? Contact sales#tnID.us.
[Disclosure: I work for Twilio]
You can retrieve phone number information with Twilio Lookup.
If you are currently evaluating services and functionality for phone number lookup, I'd suggest giving Lookup a try via the quickstart.

How to get domain name from URL

How can I fetch a domain name from a URL String?
Examples:
+----------------------+------------+
| input | output |
+----------------------+------------+
| www.google.com | google |
| www.mail.yahoo.com | mail.yahoo |
| www.mail.yahoo.co.in | mail.yahoo |
| www.abc.au.uk | abc |
+----------------------+------------+
Related:
Matching a web address through regex
I once had to write such a regex for a company I worked for. The solution was this:
Get a list of every ccTLD and gTLD available. Your first stop should be IANA. The list from Mozilla looks great at first sight, but lacks ac.uk for example so for this it is not really usable.
Join the list like the example below. A warning: Ordering is important! If org.uk would appear after uk then example.org.uk would match org instead of example.
Example regex:
.*([^\.]+)(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$
This worked really well and also matched weird, unofficial top-levels like de.com and friends.
The upside:
Very fast if regex is optimally ordered
The downside of this solution is of course:
Handwritten regex which has to be updated manually if ccTLDs change or get added. Tedious job!
Very large regex so not very readable.
A little late to the party, but:
const urls = [
'www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'http://www.google.co.uk',
'www.yandex.com',
'yandex.ru',
'yandex'
]
urls.forEach(url => console.log(url.replace(/.+\/\/|www.|\..+/g, '')))
Extracting the Domain name accurately can be quite tricky mainly because the domain extension can contain 2 parts (like .com.au or .co.uk) and the subdomain (the prefix) may or may not be there. Listing all domain extensions is not an option because there are hundreds of these. EuroDNS.com for example lists over 800 domain name extensions.
I therefore wrote a short php function that uses 'parse_url()' and some observations about domain extensions to accurately extract the url components AND the domain name. The function is as follows:
function parse_url_all($url){
$url = substr($url,0,4)=='http'? $url: 'http://'.$url;
$d = parse_url($url);
$tmp = explode('.',$d['host']);
$n = count($tmp);
if ($n>=2){
if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
$d['domain'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-3)];
} else {
$d['domain'] = $tmp[($n-2)].".".$tmp[($n-1)];
$d['domainX'] = $tmp[($n-2)];
}
}
return $d;
}
This simple function will work in almost every case. There are a few exceptions, but these are very rare.
To demonstrate / test this function you can use the following:
$urls = array('www.test.com', 'test.com', 'cp.test.com' .....);
echo "<div style='overflow-x:auto;'>";
echo "<table>";
echo "<tr><th>URL</th><th>Host</th><th>Domain</th><th>Domain X</th></tr>";
foreach ($urls as $url) {
$info = parse_url_all($url);
echo "<tr><td>".$url."</td><td>".$info['host'].
"</td><td>".$info['domain']."</td><td>".$info['domainX']."</td></tr>";
}
echo "</table></div>";
The output will be as follows for the URL's listed:
As you can see, the domain name and the domain name without the extension are consistently extracted whatever the URL that is presented to the function.
I hope that this helps.
/^(?:www\.)?(.*?)\.(?:com|au\.uk|co\.in)$/
There are two ways
Using split
Then just parse that string
var domain;
//find & remove protocol (http, ftp, etc.) and get domain
if (url.indexOf('://') > -1) {
domain = url.split('/')[2];
} if (url.indexOf('//') === 0) {
domain = url.split('/')[2];
} else {
domain = url.split('/')[0];
}
//find & remove port number
domain = domain.split(':')[0];
Using Regex
var r = /:\/\/(.[^/]+)/;
"http://stackoverflow.com/questions/5343288/get-url".match(r)[1]
=> stackoverflow.com
Hope this helps
I don't know of any libraries, but the string manipulation of domain names is easy enough.
The hard part is knowing if the name is at the second or third level. For this you will need a data file you maintain (e.g. for .uk is is not always the third level, some organisations (e.g. bl.uk, jet.uk) exist at the second level).
The source of Firefox from Mozilla has such a data file, check the Mozilla licensing to see if you could reuse that.
import urlparse
GENERIC_TLDS = [
'aero', 'asia', 'biz', 'com', 'coop', 'edu', 'gov', 'info', 'int', 'jobs',
'mil', 'mobi', 'museum', 'name', 'net', 'org', 'pro', 'tel', 'travel', 'cat'
]
def get_domain(url):
hostname = urlparse.urlparse(url.lower()).netloc
if hostname == '':
# Force the recognition as a full URL
hostname = urlparse.urlparse('http://' + uri).netloc
# Remove the 'user:passw', 'www.' and ':port' parts
hostname = hostname.split('#')[-1].split(':')[0].lstrip('www.').split('.')
num_parts = len(hostname)
if (num_parts < 3) or (len(hostname[-1]) > 2):
return '.'.join(hostname[:-1])
if len(hostname[-2]) > 2 and hostname[-2] not in GENERIC_TLDS:
return '.'.join(hostname[:-1])
if num_parts >= 3:
return '.'.join(hostname[:-2])
This code isn't guaranteed to work with all URLs and doesn't filter those that are grammatically correct but invalid like 'example.uk'.
However it'll do the job in most cases.
It is not possible without using a TLD list to compare with as their exist many cases like http://www.db.de/ or http://bbc.co.uk/ that will be interpreted by a regex as the domains db.de (correct) and co.uk (wrong).
But even with that you won't have success if your list does not contain SLDs, too. URLs like http://big.uk.com/ and http://www.uk.com/ would be both interpreted as uk.com (the first domain is big.uk.com).
Because of that all browsers use Mozilla's Public Suffix List:
https://en.wikipedia.org/wiki/Public_Suffix_List
You can use it in your code by importing it through this URL:
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
Feel free to extend my function to extract the domain name, only. It won't use regex and it is fast:
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm#3471878
Basically, what you want is:
google.com -> google.com -> google
www.google.com -> google.com -> google
google.co.uk -> google.co.uk -> google
www.google.co.uk -> google.co.uk -> google
www.google.org -> google.org -> google
www.google.org.uk -> google.org.uk -> google
Optional:
www.google.com -> google.com -> www.google
images.google.com -> google.com -> images.google
mail.yahoo.co.uk -> yahoo.co.uk -> mail.yahoo
mail.yahoo.com -> yahoo.com -> mail.yahoo
www.mail.yahoo.com -> yahoo.com -> mail.yahoo
You don't need to construct an ever-changing regex as 99% of domains will be matched properly if you simply look at the 2nd last part of the name:
(co|com|gov|net|org)
If it is one of these, then you need to match 3 dots, else 2. Simple. Now, my regex wizardry is no match for that of some other SO'ers, so the best way I've found to achieve this is with some code, assuming you've already stripped off the path:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
$dest=$d[$c-2].'.'.$d[$c-1]; # use the last 2 parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3].'.'.$dest; # if so, add a third part
};
print $dest; # show it
To just get the name, as per your question:
my #d=split /\./,$domain; # split the domain part into an array
$c=#d; # count how many parts
if ($d[$c-2]=~m/(co|com|gov|net|org)/) { # is the second-last part one of these?
$dest=$d[$c-3]; # if so, give the third last
$dest=$d[$c-4].'.'.$dest if ($c>3); # optional bit
} else {
$dest=$d[$c-2]; # else the second last
$dest=$d[$c-3].'.'.$dest if ($c>2); # optional bit
};
print $dest; # show it
I like this approach because it's maintenance-free. Unless you want to validate that it's actually a legitimate domain, but that's kind of pointless because you're most likely only using this to process log files and an invalid domain wouldn't find its way in there in the first place.
If you'd like to match "unofficial" subdomains such as bozo.za.net, or bozo.au.uk, bozo.msf.ru just add (za|au|msf) to the regex.
I'd love to see someone do all of this using just a regex, I'm sure it's possible.
/[^w{3}\.]([a-zA-Z0-9]([a-zA-Z0-9\-]{0,65}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}/gim
usage of this javascript regex ignores www and following dot, while retaining the domain intact. also properly matches no www and cc tld
Could you just look for the word before .com (or other) (the order of the other list would be the opposite of the frequency see here
and take the first matching group
i.e.
window.location.host.match(/(\w|-)+(?=(\.(com|net|org|info|coop|int|co|ac|ie|co|ai|eu|ca|icu|top|xyz|tk|cn|ga|cf|nl|us|eu|de|hk|am|tv|bingo|blackfriday|gov|edu|mil|arpa|au|ru)(\.|\/|$)))/g)[0]
You can test it could by copying this line into the developers' console on any tab
This example works in the following cases:
So if you just have a string and not a window.location you could use...
String.prototype.toUrl = function(){
if(!this && 0 < this.length)
{
return undefined;
}
var original = this.toString();
var s = original;
if(!original.toLowerCase().startsWith('http'))
{
s = 'http://' + original;
}
s = this.split('/');
var protocol = s[0];
var host = s[2];
var relativePath = '';
if(s.length > 3){
for(var i=3;i< s.length;i++)
{
relativePath += '/' + s[i];
}
}
s = host.split('.');
var domain = s[s.length-2] + '.' + s[s.length-1];
return {
original: original,
protocol: protocol,
domain: domain,
host: host,
relativePath: relativePath,
getParameter: function(param)
{
return this.getParameters()[param];
},
getParameters: function(){
var vars = [], hash;
var hashes = this.original.slice(this.original.indexOf('?') + 1).split('&');
for (var i = 0; i < hashes.length; i++) {
hash = hashes[i].split('=');
vars.push(hash[0]);
vars[hash[0]] = hash[1];
}
return vars;
}
};};
How to use.
var str = "http://en.wikipedia.org/wiki/Knopf?q=1&t=2";
var url = str.toUrl;
var host = url.host;
var domain = url.domain;
var original = url.original;
var relativePath = url.relativePath;
var paramQ = url.getParameter('q');
var paramT = url.getParamter('t');
For a certain purpose I did this quick Python function yesterday. It returns domain from URL. It's quick and doesn't need any input file listing stuff. However, I don't pretend it works in all cases, but it really does the job I needed for a simple text mining script.
Output looks like this :
http://www.google.co.uk => google.co.uk
http://24.media.tumblr.com/tumblr_m04s34rqh567ij78k_250.gif => tumblr.com
def getDomain(url):
parts = re.split("\/", url)
match = re.match("([\w\-]+\.)*([\w\-]+\.\w{2,6}$)", parts[2])
if match != None:
if re.search("\.uk", parts[2]):
match = re.match("([\w\-]+\.)*([\w\-]+\.[\w\-]+\.\w{2,6}$)", parts[2])
return match.group(2)
else: return ''
Seems to work pretty well.
However, it has to be modified to remove domain extensions on output as you wished.
how is this
=((?:(?:(?:http)s?:)?\/\/)?(?:(?:[a-zA-Z0-9]+)\.?)*(?:(?:[a-zA-Z0-9]+))\.[a-zA-Z0-9]{2,3})
(you may want to add "\/" to end of pattern
if your goal is to rid url's passed in as a param you may add the equal sign as the first char, like:
=((?:(?:(?:http)s?:)?//)?(?:(?:[a-zA-Z0-9]+).?)*(?:(?:[a-zA-Z0-9]+)).[a-zA-Z0-9]{2,3}/)
and replace with "/"
The goal of this example to get rid of any domain name regardless of the form it appears in.
(i.e. to ensure url parameters don't incldue domain names to avoid xss attack)
All answers here are very nice, but all will fails sometime.
So i know it is not common to link something else, already answered elsewhere, but you'll find that you have to not waste your time into impossible thing.
This because domains like mydomain.co.uk there is no way to know if an extracted domain is correct.
If you speak about to extract by URLs, something that ever have http or https or nothing in front (but if it is possible nothing in front, you have to remove
filter_var($url, filter_var($url, FILTER_VALIDATE_URL))
here below, because FILTER_VALIDATE_URL do not recognize as url a string that do not begin with http, so may remove it, and you can also achieve with something stupid like this, that never will fail:
$url = strtolower('hTTps://www.example.com/w3/forum/index.php');
if( filter_var($url, FILTER_VALIDATE_URL) && substr($url, 0, 4) == 'http' )
{
// array order is !important
$domain = str_replace(array("http://www.","https://www.","http://","https://"), array("","","",""), $url);
$spos = strpos($domain,'/');
if($spos !== false)
{
$domain = substr($domain, 0, $spos);
} } else { $domain = "can't extract a domain"; }
echo $domain;
Check FILTER_VALIDATE_URL default behavior here
But, if you want to check a domain for his validity, and ALWAYS be sure that the extracted value is correct, then you have to check against an array of valid top domains, as explained here:
https://stackoverflow.com/a/70566657/6399448
or you'll NEVER be sure that the extracted string is the correct domain. Unfortunately, all the answers here sometime will fails.
P.s the unique answer that make sense here seem to me this (i did not read it before sorry. It provide the same solution, even if do not provide an example as mine above mentioned or linked):
https://stackoverflow.com/a/569219/6399448
I know you actually asked for Regex and were not specific to a language. But In Javascript you can do this like this. Maybe other languages can parse URL in a similar way.
Easy Javascript solution
const domain = (new URL(str)).hostname.replace("www.", "");
Leave this solution in js for completeness.
In Javascript, the best way to do this is using the tld-extract npm package. Check out an example at the following link.
Below is the code for the same:
var tldExtract = require("tld-extract")
const urls = [
'http://www.mail.yahoo.co.in/',
'https://mail.yahoo.com/',
'https://www.abc.au.uk',
'https://github.com',
'http://github.ca',
'https://www.google.ru',
'https://google.co.uk',
'https://www.yandex.com',
'https://yandex.ru',
]
const tldList = [];
urls.forEach(url => tldList.push(tldExtract(url)))
console.log({tldList})
which results in the following output:
0: Object {tld: "co.in", domain: "yahoo.co.in", sub: "www.mail"}
1: Object {tld: "com", domain: "yahoo.com", sub: "mail"}
2: Object {tld: "uk", domain: "au.uk", sub: "www.abc"}
3: Object {tld: "com", domain: "github.com", sub: ""}
4: Object {tld: "ca", domain: "github.ca", sub: ""}
5: Object {tld: "ru", domain: "google.ru", sub: "www"}
6: Object {tld: "co.uk", domain: "google.co.uk", sub: ""}
7: Object {tld: "com", domain: "yandex.com", sub: "www"}
8: Object {tld: "ru", domain: "yandex.ru", sub: ""}
Found a custom function which works in most of the cases:
function getDomainWithoutSubdomain(url) {
const urlParts = new URL(url).hostname.split('.')
return urlParts
.slice(0)
.slice(-(urlParts.length === 4 ? 3 : 2))
.join('.')
}
You need a list of what domain prefixes and suffixes can be removed. For example:
Prefixes:
www.
Suffixes:
.com
.co.in
.au.uk
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+)\.[^\/]+/g) {
print $3;
}
/^(?:https?:\/\/)?(?:www\.)?([^\/]+)/i
Just for knowledge:
'http://api.livreto.co/books'.replace(/^(https?:\/\/)([a-z]{3}[0-9]?\.)?(\w+)(\.[a-zA-Z]{2,3})(\.[a-zA-Z]{2,3})?.*$/, '$3$4$5');
# returns livreto.co
I know the question is seeking a regex solution but in every attempt it won't work to cover everything
I decided to write this method in Python which only works with urls that have a subdomain (i.e. www.mydomain.co.uk) and not multiple level subdomains like www.mail.yahoo.com
def urlextract(url):
url_split=url.split(".")
if len(url_split) <= 2:
raise Exception("Full url required with subdomain:",url)
return {'subdomain': url_split[0], 'domain': url_split[1], 'suffix': ".".join(url_split[2:])}
Let's say we have this: http://google.com
and you only want the domain name
let url = http://google.com;
let domainName = url.split("://")[1];
console.log(domainName);
Use this
(.)(.*?)(.)
then just extract the leading and end points.
Easy, right?