Parse redirection URL

Parse redirection URL - regex

I analyze the URL in a malicious e-mail. I parse the e-mail using BeautifulSoup. I get this URL
https://www.google.com/url?q=http://my.%42%41%44%2e%43%4F&sa=D&usg=AFQjCNGTKogvWUF40RsyeAXrGi6uQrlhoQ
This URL will force Google.com to redirect to http://my.BAD.CO Given a URL like the one above how can I know that the URL will trigger redirect?
I want to get an indication that this is a redirect and I want to get two separate URLs
http://my.BAD.CO and https://www.google.com/url?q=http://5sr0s.%61%6b%68%6f%72%61%62%2e%72%75&sa=D&usg=AFQjCNGTKogvWUF40RsyeAXrGi6uQrlhoQ
where http://my.BAD.CO is an encoded target URL http://my.%42%41%44%2e%43%4F
If the only solution is a custom RegEx like this
(?i)(http|https)://(www.|)google.com/url\?q=(http|https)://(\S+)\&usg=\S+
followed by a call to urllib.parse.unquote will it cover all corner cases?
Are there other ways to redirect besides https://www.google.com/url... ?
I found another way to redirect Here is another way to redirect: via https://www.google.de/url?sa=t&url=

I ended up with a regex
(?i)^(http|https)://(www.|)google.(ac|ad|aero|ae|af|ag|ai|al|am|an|ao|aq|arpa|ar|asia|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|biz|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|cat|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|coop|com|co|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|edu|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gov|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|info|int|in|io|iq|ir|is|it|je|jm|jobs|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mil|mk|ml|mm|mn|mobi|mo|mp|mq|mr|ms|mt|museum|mu|mv|mw|mx|my|mz|name|na|nc|net|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|pm|pn|pro|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tel|tf|tg|th|tj|tk|tl|tm|tn|to|tp|travel|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|xn--0zwm56d|xn--11b5bs3a9aj6g|xn--80akhbyknj4f|xn--9t4b11yi5a|xn--deba0ad|xn--g6w251d|xn--hgbk6aj7f53bba|xn--hlcj6aya9esc7a|xn--jxalpdlp|xn--kgbechtv|xn--zckzah|ye|yt|yu|za|zm|zw)/url\?.+$
or a readable form
(?i)^(http|https)://(www.|)google.(com|de)/url\?.+$
Lot of people considered that the question is not worth an effort of anyone. I got -4 for the question. Some questions appear to be trivial. I still hope that there is a better solution for the problem. I did not find a list of WEB sites allowing redirect of the URL like what google.com/url\?q does
Here is another way to redirect https://www.google.de/url?sa=t&url=

Related

HTTPS equivalent of Django's HttpResponse

For some reason I am in need of a views.py that returns only some text. Normally, i'd use HttpResponse("text") for this. However, In this case I require the text to be send over https, to counter the inevitable mixed content warning.
What is the simplest way of sending pure text via django(1.7.11) over https?

Django in the relevant docs of httprequest.build_absolute_uri reads:
Mixing HTTP and HTTPS on the same site is discouraged, therefore
build_absolute_uri() will always generate an absolute URI with the
same scheme the current request has. If you need to redirect users to
HTTPS, it’s best to let your Web server redirect all HTTP traffic to
HTTPS.
The docs make clear that
the method of communication is entirely the responsibility of the server
as Daniel Roseman commented.
My prefered choice is to force https throughout a site, however it is possible to do it only for a certain page.
The above can be achieved by either:
Upgrading to a secure and supported release of Django where the use of SECURE_SSL_REDIRECT and SecurityMiddleware will redirect all traffic to SSL
Asking your host provider an advice on how could this be implemented in their servers
Using the apache config files.
Using .htaccess to redirect a single page.
There are also other -off the road- hackish solutions like a snippet which can be used with a decorator in urls.py to force https, or a custom middleware that redirects certain urls to https.

I've run into the mixed content problems as well. From my experience, you simply can't use the HttpResponse objects without running into trouble. I was never totally sure though and eventually found a way "around" it.
My solution for it was to use the JsonResponse object instead, to return JSON strings, kind of a work-around with the views returning something like:
mytext = 'stuff blablabla'
return JsonResponse({'response_text': mytext})
Which is super easy to parse, and OK with HTTPS.
Maybe not what you're looking for, but I hope it helps you find your way.

htaccess redirect makes infinite loop - is there another way?

I need to redirect this URL (http://www.example.com/learn) to this URL (http://www.example.com/learn-it).
Problem is that it matches the rule on the redirected URL, and makes an infinite loop.
This does not work:
Redirect 301 http://www.example.com/learn http://www.example.com/learn-it

.htaccess 301 Redirect
The smoothest way to redirect your visitors is to use an .htaccess redirect. This has no delay since before a page is served to the browser the server checks first for an .htaccess file... if it sees this the old page never loads, instead visitors are sent directly to the new page.
These are a few .htaccess redirect codes that I've used that might come in handy for you. This is not a complete list by any means, but it took me ages to find how to do these so I'll save you the hassle and list them here. Oh, and please don't email me with questions about how these work, like I said, I found these with the help of others.. I have no idea in the slightest how to write this stuff and take no credit (or responsibility) for how they work.
If you're more technically minded than I am and want the information straight from the source, check the Apache Tutorial: .htaccess files for more detailed info.
Important notes about htaccess redirection
Always be sure to upload .htaccess files in ascii mode, sending it up as binary will break it (and usually make your server very, very unhappy.)
.htaccess does not work if you're on a windows server.
Make sure you triple check your changes. Clear your cache and look, test the server headers to make sure you see a 301 (that means its permanent) not a 302 (temporary) unless you are absolutely sure you really mean temporary.
Since some operating systems don't allow you to make a file without something before the "." you may need to save it as something.htaccess, some may even have to save it as htaccess.txt and change it once you've uploaded it.
Make sure your ftp program will show .htaccess files (FileZilla does and is free) It is a bit hard to edit something you can't see ;)
Double check that you're not overwriting an old one (some servers already place one there for your custom 404 pages etc.)
Make sure you replace example.com with your own sites URL ;-)
To Move a single page
Quick, easy and seamless for your visitors.
Redirect 301 /oldpage.html http://www.example.com/newpage.html
To Move an entire site
This will catch any traffic on your old site and redirect it to your index page on your new server. If you want to redirect each page to its new spot, this isn't the one for you.
Redirect 301 / http://www.example.com/
For detail explanation. How to redirect page usin .htaccess . read this

You should use RedirectMatch in order to use regex to be able to match exact URI:
RedirectMatch 301 ^/learn/?$ /learn-it
Make sure to clear your browser cache before testing this.

Django weird url call error

I have my Django app. I have a redirect URL(say a 404 page) to be redirected when no other URL matches. Now if any url is called as
mysite.com/something
I am redirected to the 404 page. But
mysite/something/
works fine.
The redirection url added to the end of all:
url(r'^.*/',theview),
When I remove the redirect url from the urls.py, the problem is cleared and the above URL works (without / at the end). Why is the error?

First of all, it would be a good idea to link to your previous post and mention you are using a hack that I gave you, because (A) it's not normal setup and (B) Someone might come up with a better idea than mine
Secondly, you're seeing this behaviour because of normal url processing. See, the urls mysite.com/something and mysite.com/something/ are not the same. To match it with django's urls, the difference would be:
url(r'^something/$')
url(r'^something$')
Since the difference is so minor, when using a normal setup, after failing to find the a url without a forward slash django's common middlewere* will automatically try to add one and test it. It's only then that it would give up and forward you to a 404 page.
However, in your setup, the catch-all url prevents the second round because it does apply to the url without the forward slash. My solution? Don't worry about it. The only reason you're using this hack is because Debug=True means a debug page instead of your custom 404 page, a problem you won't be facing when moving to a production environment
*and a big thanks to #Alasdair who pointed this out in the comments

Correct escaping of % in the URL with Apache

I have a Django project where I have a search page which takes input through a POST and redirect to /search/<search string>/ and this page renders the result. The percentage sign (%) is used as a wildcard in the search (tes%er returns testuser, tester, etc and the url looks like this then: example.com/search/tes%25er/) and everything works fine with the Django development server. If I manually write tes%er in the url it changes to tes%25er automatically.
Now I'm deploying on an Apache server with mod_wsgi and when my search page redirects to example.com/search/tes%er/ I get the server error: Bad Request. Your browser sent a request that this server could not understand.. If I manually add '25' to the url, like the encoded % sign so it looks like the development server it works fine.
Is there a way for Apache to automatically escape the %-sign and create a url that works, understand % unescaped or do I need to do ugly hacks in my search page that builds the url? (I'd rather not do ugly hacks like this cause then the users can't manually add % to the url and get it to work).
Edit: The code that sends the query from the search page to the search url.
if form.is_valid():
if 'search_user' in request.POST:
q = request.POST['search_user']
return redirect('/search/'+q)

As Ignacio already suggested, you should not redirect to an invalid url. So to answer your question:
you can (or perhaps its better to say 'should') not ask your Apache server to escape your url. The reason you escape your URL is because some characters have another meaning. For example, take a querystring:
somedomain.com/?key=value
If we would want to use a ? or a = in your value you would have a problem because your server would think that you are using operators of your querystring.
The same for the %-symbol. When your apache server sees a %-symbol he thinks he will find an enconded and will try to decode it. If your querystring is %20, apache will translate this to a space, while you meant "wildcard20".
In summary: apache decodes your string, so you dont want him to encode it.
But this does not solve your problem. You can solve your problem by changing your code into the following:
from urllib import urlencode
if form.is_valid():
if 'search_user' in request.POST:
q = request.POST['search_user']
return redirect('/search/?q='+urlencode(q))
In case you wonder: what if my user would type /search/?q=%; in that case he'ld have a problem for he has typed an invalid address.
Hope this helps :-).
Wout

How do short URLs services work?

How do services like TinyURL or Metamark work?
Do they simply associate the tiny URL key with a [virtual?] web page which merely provide an "HTTP redirect" to the original URL? or is there more "magic" to it ?
[original wording]
I often use URL shortening services like TinyURL, Metamark, and others, but every time I do, I wonder how these services work. Do they create a new file that will redirect to another page or do they use subdomains?

No, they don't use files. When you click on a link like that, an HTTP request is send to their server with the full URL, like http://bit.ly/duSk8wK (links to this question). They read the path part (here duSk8wK), which maps to their database. In the database, they find a description (sometimes), your name (sometimes) and the real URL. Then they issue a redirect, which is a HTTP 302 response and the target URL in the header.
This direct redirect is important. If you were to use files or first load HTML and then redirect, the browser would add TinyUrl to the history, which is not what you want. Also, the site that is redirected to will see the referrer (the site that you originally come from) as being the site the TinyUrl link is on (i.e., twitter.com, your own site, wherever the link is). This is just as important, so that site owners can see where people are coming from. This too, would not work if a page gets loaded that redirects.
PS: there are more types of redirect. HTTP 301 means: redirect permanent. If that would happen, the browser will not request the bit.ly or TinyUrl site anymore and those sites want to count the hits. That's why HTTP 302 is used, which is a temporary redirect. The browser will ask TinyUrl.com or bit.ly each time again, which makes it possible to count the hits for you (some tiny url services offer this).

Others have answered how the redirects work but you should also know how they generate their tiny urls. You'll mistakenly hear that they create a hash of the URL in order to generate that unique code for the shortened URL. This is incorrect in most cases, they aren't using a hashing algorithm (where you could potentially have collisions).
Most of the popular URL shortening services simply take the ID in the database of the URL and then convert it to either Base 36 [a-z0-9] (case insensitive) or Base 62 (case sensitive).
A simplified example of a TinyURL Database Table:
ID URL VisitCount
1 www.google.com 26
2 www.stackoverflow.com 2048
3 www.reddit.com 64
...
20103 www.digg.com 201
20104 www.4chan.com 20
Web Frameworks that allow flexible routing make handling the incoming URL's really easy (Ruby, ASP.NET MVC, etc).
So, on your webserver you might have a route action that looks like (pseudo code):
Route: www.mytinyurl.com/{UrlID}
Route Action: RouteURL(UrlID);
Which routes any incoming request to your server that has any text after your domain www.mytinyurl.com to your associated method, RouteURL. It supplies the text that is passed in after the forward slash in your URL to that method.
So, lets say you requested: www.mytinyurl.com/fif
"fif" would then be passed to your method, RouteURL(String UrlID). RouteURL would then convert "fif" to its base10 equivalent, 20103, and a database request will be made to redirect to whatever URL is stored under the ID 20103 (in this case, www.digg.com). You would also increase the visit count for Digg by one before redirecting to the correct URL.
This is a really simplified example but you should be able to get the general idea.

As an extension to #A Salcedo answer:
Some url shortening services (Tinyarro.ws) go to extreme by using Unicode (UTF-8) to encode characters in shortened url - which allows higher amount of websites before having to add additional symbol. Since most of UTF-8 is accepted for use ((IRI) RFC 3987 handled by most browsers) that bumps from 62 sites per symbol to ~1,112,064.
To put in perspective one can encode 1.2366863e+12 sites with 2 symbols (1,112,064*1,112,064) - in November 2009, shortened links on bit.ly were accessed 2.1 billion times (Around that time, bit.ly and TinyURL were the most widely used URL-shortening services.) which is ~600 times less than you can fit in just 2 symbols, so for full duration of existence of all url shortening services it should last another 20 years minimum till adding third symbol.

In simple words, URL shortener maps an arbitrary long sequence of character ( original, long crappy url ) into a short and slick sequence of characters. This is nothing but Hashing, which is most commonly used to create lookup tables, HashMap, md5 Hash for cryptographic purposes etc.
To understand the URL-Shortening process I have created a demo project on GitHub and also a blog post. Do refer to this and let me know if it was helpful.
Blog Post : URL Shortening

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js