Find and Replace with correct URL - regex

There are region specific URLs for various websites like google.co.in or google.co.uk for google.com, So for the major sites like google, facebook, linkedin, I want to replace their region specific URLs with all region URL.
For example for google, it should be redirected to https://www.google.com/webhp?pws=0&gl=us&gws_rd=cr.
The solution which I was trying:
1) Take the part of URL google.co.in(using regex) and replace it with google.com (using re:replace)
2) For storing of initial and replacement URL, I'm thinking to use orddict, where {key,value}={"...//google.co.region/...","...//google.com/..."}, region can be in or uk or any other, so how to take that region into consideration if I'm using orddict as keystore ?
But I'm not sure how to actually implement this in erlang or whether my proposed solution will work properly ?
I'm doing this for my messenger app, so whenever user enters URL, it doesn't show preview of URL where my server is located, instead atleast show it in english.(as per now ,for facebook, my app shows preview in russian)

There's built-in regex module in Erlang: http://erlang.org/doc/man/re.html
As for your solution, it really feels like a crutch for functionality better achieved with smart networking. Or you can try making requests for a preview on client's side, not server, for example.

Related

Facebook Ads API WebsiteURL validation regex

I'm looking for a correct regex expression that Facebook uses to validate entered Website URL or Display link when user wants to create an ad in Ads Manager (you will see the mentioned input fields on attached screenshot below).
I need that because platform we're developing uses Facebook Marketing API to create ads on Facebook. So in order not to get API request failure from Facebook I would like to perform a validation before we fire a request to the API.
I've been using a few different regexes so far to perform validation but the issue comes back again and again. That's the reason why I come up with question to you.
So my question is, does anyone here used some url validation regex that works well with Facebook? I couldn't find anything on Facebook docs related to this validation, so any help from your side is welcome.
Here is the list I've been using to validate urls https://pastebin.com/0zU6MSme
#update I moved the list to the pastebin because previous link didn't work
The original list I took from here: Original URL List
The issue you are experiencing with certain urls such as http://➡.ws/䨹 being unexpectedly rejected is due to internationalized domain names. Domain names need to be ASCII text, so you have to convert the domain to IDN form before sending it to Facebook (unicode query strings are OK). The browser does this automatically, but apparently Facebook doesn't.
There are libaries like punycode and encoding.py that you could use to convert the domain name to IDN before sending your request.
As for a regex, I would expect the one Django uses will work - Python - How to validate a url in python ? (Malformed or not), except you should disallow ftp, and apply the regex after converting to IDN.

Cookie not kept when moving from html to perl page

One of my clients uses Sellerdeck as their shopping cart solution. I am currently implementing a service for them that relies heavily on cookies.
The cookie is set on a product page which has a URI that is something like http://www.mydomain.co.uk/retail/acatalog/A11-Insect-Net.html. When I browse around the site, I can see the cookie set on all pages, like it is supposed to.
Then when I go into the checkout process, Sellerdeck apparently starts using perl, because the URI changes to something like http://www.mydomain.co.uk/cgi-bin/retail/ca001000.pl. The weird thing is that, although we're still on the same domain, I can't see the cookie. When I go back to the product pages it is there again.
Doe anyone know why this may be?
Turns out the cookie was tied to a specific path and not to /. Fixed now.

Need to track what websites a user visits after leaving my site

I would like to track what websites my site's visitors go to after they leave.
Would it be possible to place a cookie on their browser when they visit my site, and then later if they go to Facebook.com or stackoverflow.com, my cookie will retreive the browser's URL data and send it back to my server.
I could then look at this data and know that my visitors had gone to Facebook.com and stackoverflow.com after they left my site.
Is this possible using cookies?
Thanks for the help.
No. Cookies are not executed or anything. They are just dumb bits of data.
You would need to be able to execute code on the page they are visiting afterwards.
What I presume you are trying to ask, is that you want to track your outbound links.
This is mainly done with Javascript: You need to intercept click events from outbound anchor links, and send an event notification as described here, or using the hitCallbackmethod prior to completing the redirection to the external website. For Google Analytics see documentation. Or you could do via a custom JS implementation sending the info back to your server instead.
Alternatively your could replace all outbound links on the server side in your html source, and have all links pointed to your server first, and redirected to the external sites. But using redirects for this purpose is not really a good recommendation, unless you are an ad networks or a search engine company requiring such method.
Lastly, there is an alternative method using the HTML5 'ping' attribute. But the feature has been either removed and/or not yet fully implemented across all browsers as of this writing.
But you can't track where your visitors go beyond the 1st level outbound links of your site.

What's a reasonable instance where this regex might not catch a webmail referrer in Google Analytics?

^mail\.(.*)?|(.*)?(web|\.)mail(.*)? is the exact regex I'm looking to scrutinize.
For example,
e3.mail.yahoo.com
webmail.example.com
hotmail.com
mail.aol.com
etc.
To be totally honest, its a fruitless effort, especially because even if you do manage to somehow do a re-write of all of the email domains that referred people to your site, there are 3 reasons it won't work:
You can't possibly account for all of the email domains out there.
If the email is hosted on HTTPS, and your pages are HTTP, you won't see a referrer anyways.
A very significant portion of the email using population uses non-web mail, like Outlook, Entourage, Mac Mail, iPhone Mail, Blackberry Mail, Android Gmail, to name a few, that never have a referrer.
Instead, if you're looking to segment the referrals of all of email referrals for tracking in Google Analytics, you should use utm variables in your URLs.
If you tag your URLs with utm_source and utm_medium, you'll be able to track them, regardless of the 3 restrictions listed above.
Traditionally, you'd set utm_medium to be email, and utm_source to be the mailing list name, and utm_campaign for the name of the specific campaign.
You can get assistance in building the URLs here: http://www.google.com/support/analytics/bin/answer.py?answer=55578
Even if links in email messages should be tagged with utm_xxx parameters, I like to clean and group my referral sources into clusters as much as possible. It is the way to go to understand effectively the sources of traffic that are missing proper tagging, and then prioritize and fix them.
The regex I use is the following, and honestly it works pretty well (it catches more than 95% of webmails that show up as referrals and can be split over dozens of subdomains like for yahoo or live, thus diluting their visibility as a source)
(messag|courrier|zimbra|imp|mail)(.*)\.(.*)\..{2,4}
You may update the subdomain names with values frequent in your area. The end catches any domain using a tld of 2-4 chars, and any domain.
I output the result as
Output To -> Constructor : Campaign Source : Webmail - $A3

Is it dangerous to leave your Django admin directory under the default url of admin?

Is it dangerous to have your admin interface in a Django app accessible by using just a plain old admin url? For security should it be hidden under an obfuscated url that is like a 64 bit unique uuid?
Also, if you create such an obfuscated link to your admin interface, how can you avoid having anyone find out where it is? Does the google-bot know how to find that url if there is no link to that url anywhere on your site or the internet?
You might want to watch out for dictionary attacks. The safest thing to do is IP restrict access to that URL using your web server configuration. You could also rate limit access to that URL - I posted an article about this last week.
If a URL is nowhere on the internet "the googlebot" can't know about it ... unless somebody tells it about it. Unfortunately many users have toolbars installed in their browser, which submit all URLs visited by the browser to various Servers (e.g. Alexa, Google).
So keeping an URL secret will not work in the long run.
Also an uuid is hard to remember and to type - leading to additional support ("What was the URL again?").
But I still strongly suggest to change the URL (e.g. to /myadmin/). This will foil automatic scanning and attack tools. So If one day an "great Django worm" hits the Internet, you have a much lower chance of being hit.
People using PHPmyAdmin had this experience for the last few years: changing the default URL avoids most attacks.
Whilst there is no harm in adding an extra layer of protection (an obfuscated url) enforcing good password choice (checking password strength and checking it's not in a large list of common passwords) would be a much better use of your time.
Assuming you've picked a good password, no, it's not dangerous. People may see the page, but they won't be able to get in anyway.
If you don't want Google to index a directory, you can use a robots.txt file to control that.