Retrieve static file from amazom s3 bucket - amazon-web-services

I am trying to configure my nginx in such a way that whenever there is some bad gateway response, I try to fetch static html contents from the s3 bucket.
The url structure of the request is some_bucket/folder1/folder2/text
And the data is stored in s3 bucket with directory structure as s3.amazonaws.com/some_bucket/folder1/folder2/folder1_folder2.html
I am not able to determine the values for folder1 and folder2 so that I can make
the html file dynamically and use proxy_pass.
Also, I tried try_files but I think that does not work for urls.
Any idea how to tackle this problem.
Thanks.

Nginx S3 proxy can handle dynamically built URL, you can also hide a directory and even part of private URL such AWS Key:
For instance the basis URL is the following:
https://your_bucket.s3.amazonaws.com/readme.txt?AWSAccessKeyId=YOUR_ONLY_ACCESS_KEY&Signature=sagw4gsafdhsd&Expires=3453445231
Resulted URL:
https://your_server/proxy_private_file/readme.txt?st=sagw4gsafdhsd&e=3453445231
The configuration is not difficult:
location ~* ^/proxy_private_file/(.*) {
set $s3_bucket 'your_bucket.s3.amazonaws.com';
set $aws_access_key 'AWSAccessKeyId=YOUR_ONLY_ACCESS_KEY';
set $url_expires 'Expires=$arg_e';
set $url_signature 'Signature=$arg_st';
set $url_full '$1?$aws_access_key&$url_expires&$url_signature';
proxy_http_version 1.1;
proxy_set_header Host $s3_bucket;
proxy_set_header Authorization '';
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-request-id;
proxy_hide_header Set-Cookie;
proxy_ignore_headers "Set-Cookie";
proxy_buffering off;
proxy_intercept_errors on;
proxy_pass http://$s3_bucket/$url_full;
}
See the full configuration for more details.

This is what I did for someone(probably newbie) who may encounter this problem.
location ~* ^/some_bucket/(.*)/(.*)/.* {
proxy_pass http://s3.amazonaws.com/some_bucket/$1/$2/$1_$2.html;
}
~* means case insensitive regex match
^ means anything before
() for catching parameters.
For example,
User enters www.example.com/some_bucket/folder1/folder2/text
Then, it is processed as,
~* ensures case insensitive search(for case sensitive skip *(means just put ~))
^ matches www.example.com.
/some_bucket/ is matched then,
.* means any number of any character(for any numeric, replace with [0-9]*)
() ensures that matched values gets catched
So, $1 catches folder1
$2 catches folder2
Then
.* without parenthesis matches any charater but does not catch the matched value
Now the catched values can be used to find the file in amazon bucket using
proxy_pass http://s3.amazonaws.com/some_bucket/$1/$2/$1_$2.html
https://www.digitalocean.com/community/tutorials/understanding-nginx-server-and-location-block-selection-algorithms can be helpful

Related

Get url parameter and save variable or cookie

I have a new installation of wordpress that replaces an old site.
In the old site there was a dynamic referl for users
mysite.com/123456 or mysite.com/somename
Now I have to be able to intercept everything that exists after / then 123456 or somename to save it in a session variable or in a cookie, (I have full access to the server).
I did some test with this code:
location / {
add_header Set-Cookie "secret_code=$args;Domain=$site_name;Path=/;Max-Age=31536000;Secure;HTTPOnly" always;
try_files $ uri $ uri / /index.php?$args;
}
But I find myself as a value other parameters, very likely for calls that makes wordpress itself
Also with this:
location ~ ^/(.+)$ {
add_header Set-Cookie "secret_code=$1;Domain=$site_name;Path=/;Max-Age=31536000;Secure;HTTPOnly" always;
try_files $ uri $ uri / /index.php?$args;
}
But this does not work nginx because it does not run PHP, and it makes them download
Which is the best way to solve this problem
Thank you guys
I'm not sure if this is all you wanted. But, here is the regex to match alphanumeric characters after '/'
Tested and works for both mysite.com/123456 and mysite.com/somename
/(?<=mysite.com\/)\w+/

Nginx Access Log Name Based on Non-WWW Host

So,
I've got a server with around 30 virtual host configurations, each in their own separate file. My main aim at this point is to name the access log based on the $host variable.
At the moment, I'm using the following, inside of my HTTP block to be applied to all conf files:
http {
access_log /var/log/nginx/$host.access.log
}
I'd like to be able to rewrite the above $host without the www., and just keep the domain itself. I've found the following solution for that:
if ($domain ~* www\.(.*)) {
set $domain $1;
rewrite ^(.*)$ http://$domain$1 permanent;
}
Only problem is.. 'IF' Directives are not allowed inside of the 'http' block.. Is there anyway I can achieve this, whilst still being within the 'http' block? Maybe using 'map'?
Thanks in advance,
Tom
You should use a map
http {
map $host $hostw {
default $host;
~*^www\.(.*) $1;
}
access_log /var/log/nginx/$hostw.access.log
}

Weird redirect with proxy_pass in if statement

I've a SPA (Single Page Application) site, let's say under https://example.com and an API for it under https://api.example.com
I want to serve server rendered content for specific useragents like googlebot, facebookexternalhit, etc.
So, if user goes to https://example.com/brandon/things it will get served SPA, but if bot goes to the same URL it will get served server rendered page with all proper meta and open graph tags.
My server rendered pages with proper matching are under https://api.example.com/ssr/
So for example if bot hits https://example.com/brandon/things it should get content from https://api.example.com/ssr/brandon/things
I almost got it working with nginx proxy_pass if statement to the Django application (which returns server rendered output) but unfortunately there's one edge case that makes it behave weirdly.
My implementation:
server {
listen 80;
server_name example.com; # url of SPA
index index.html;
root /srv/example_spa/public/dist; # directory of SPA index.html
# $ssr variable that tells if we should use server side rendered page
set $ssr 0;
if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|facebot|twitterbot|developers\.google\.com|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|redditbot") {
set $ssr 1;
}
# location block that serves proxy_pass when the $ssr matches
# or if the $ssr doesn't match it serves SPA application index.html
location / {
if ($ssr = 1) {
proxy_pass http://127.0.0.1:9505/ssr$uri$is_args$args;
}
try_files $uri /index.html;
}
}
But there's the problem:
Everything works dandy and sweet, except one case.
User hits https://example.com/brandon/things/ and he gets SPA index.html - perfect.
User hits https://example.com/brandon/things and he gets SPA index.html - perfect.
Bot hits https://example.com/brandon/things/ and he gets server rendered page - perfect.
Bot hits https://example.com/brandon/things (without appended slash) and he gets redirected (301) to https://example.com/ssr/brandon/things - BAD BAD BAD
I've tried to make it work for couple of hours now without luck.
What would you suggest? I know if in nginx is evil, but I don't know how to make it work without it...
Any help is appreciated
You need to alter the redirects for proxy_pass
location / {
proxy_redirect http://127.0.0.1/ssr/ http://$host/ssr/;
proxy_redirect /ssr/ /;
if ($ssr = 1) {
proxy_pass http://127.0.0.1:9505/ssr$uri$is_args$args;
}
try_files $uri /index.html;
}
It turns out this was issue with my Django application redirect. I thought I had "APPEND_SLASH" option disabled, but it was enabled and made redirect when there was no slash. And it redirected without changing the host to https://api.example.com, but only URI part. Hence my confusion.
And I actually found two ways to fix that.
First, just use rewrite to append slash when there isn't one.
location / {
if ($ssr = 1) {
rewrite ^([^.]*[^/])$ $1/ permanent;
proxy_pass http://127.0.0.1:9505/ssr$uri$is_args$args;
}
try_files $uri /index.html;
}
Second, modify proxy_pass to always add / slash after $uri part and server side render application url config to accept two slashes at the end //'. It's a little hacky but has no side effects and works as it should.
Nginx config:
location / {
if ($ssr = 1) {
proxy_pass http://127.0.0.1:9505/ssr$uri/$is_args$args;
}
try_files $uri /index.html;
}
Django URL regex:
r'^ssr/(?P<username>[\w-]+)/(?P<slug>[\w-]+)(/|//)$'

Can NGINX change the response code after a proxy_pass?

So I have an internal API server namespaced under /api/, and I want to pass all other requests to an Amazon S3 static site using proxy_pass. This all works fine, it's just since Amazon is serving a single page app, I want to always return the same HTML file. They way I did this with the S3 server, was to set the index and error page as the same file. It all looks fine on the surface, but for all other requests besides /, the S3 instance returns a 404. Can I use NGINX to change this to a 200 before returning it to the client?
server {
listen 80;
server_name example.com;
location /api/ {
# serve internal app
}
location / {
proxy_pass http://example.amazonaws.com/;
# ALWAYS RETURN A 200
}
}
You should be able to use the error_page and proxy_intercept_errors directives to achieve this. Something like this should do the trick.
location / {
proxy_pass http://example.amazonaws.com/;
proxy_intercept_errors on;
error_page 404 =302 /your_html_file
}
error_page
proxy_intercept_errors
You can internally rewrite all URLs to the document you want served. This avoids the error handling cycle and problematic redirects.
It would be something like (untested):
location / {
proxy_pass http://example.amazonaws.com/;
rewrite ^.* /index.html
}
Note that you will want to only use full or root-relative URLs in your doc, because you don't know if the docs is served from a subdirectory.
You'd also be wise to have JS code validate the URL and optionally redirect to one you consider valid. Otherwise 3rd party sites could link to offensive URLs and get them in search indexes!

Get arguments Nginx and path for proxy_pass

I have this URL:
http://localhost:8888/images/upload/root/folderA/folderB?arg1=A&arg2=B
so, I want redirect all to:
http://localhost:8080/v1/files_upload/
and it must be something like:
http://localhost:8080/v1/files_upload/root/folderA/folderB?arg1=A&arg2=B
I have the following:
location ~ ^/images/upload/([^/]+)(/.*)\?(.*)$ {
upload_pass #after_upload;
...
...
}
location #after_upload {
proxy_pass http://localhost:8080/v1/files_put/$1/$2?$3;
}
I checked it, and only works $1 and $2, but the arguments $3 are not sent to proxy_pass
Thanks in advance!
The location directive doesn't match request arguments, it only checks request path. You should use the $args variable (or more specific $arg_arg1 and $arg_arg2):
location #after_upload {
proxy_pass http://localhost:8080/v1/files_put/$1/$2?$args;
}