Simple file generator service in Azure? - web-services

My client software needs to call a service online (preferably in Azure) that generates a document that is then retrievable from an URL. I believe it would go something like this:
Client calls service with parameters.
Service generates document and stores it in CDN.
Url pointing to CDN is delivered to client.
Client downloads the document via the URL. (Preferably a restricted number of times.)
What is the easiest/best way to accomplish this?

I'd use BLOBs in Azure storage. You can add the CDN option to Azure storage. Each blob has a unique URL that can be public or private. You could track reads in your system and then make the URL private.

Related

How to use the TranscribeStreamingClient in the browser with credentials

I want to be able to offer a RT transcription in my browser app using AWS transcribe. I see there is the TranscribeStreamingClient which can have credentials for AWS which are optional - I assume these credentials are required for access to the s3 bucket?
What I want to know is how to setup auth in my app so that I can make sure my users dont go overboard with the amount of minutes they transcribed?
For example:
I would be expecting to generate a pre-signed url that expires in X seconds/minutes on my Backend that I can pass to the web client which it then uses to handle the rest of the communication (similar like S3).
However I don't see such an option and the only solution that I keep circling back to is that I would need to be feeding the audio packets from to my backend which then handles all the auth and just forwards it to the service via the streaming client there. This would be okay but the documentation says that the TranscribeStreamingClient should be compatible for browser integrations. What am I missing?

Security concern in direct browser uploads to S3

The main security concern in direct js browser uploads to S3 is that users will store their S3 credentials on the client side.
To mitigate this risk, the S3 documentation recommends using a short lived keys generated by an intermediate server:
A file is selected for upload by the user in their web browser.
The user’s browser makes a request to your server, which produces a temporary signature with which to sign the upload request.
The temporary signed request is returned to the browser in JSON format.
The browser then uploads the file directly to Amazon S3 using the signed request supplied by your server.
The problem with this flow is that I don't see how it helps in the case of public uploads.
Suppose my upload page is publicly available. That means the server API endpoint that generates the short lived key needs to be public as well. A malicious user could then just find the address of the api endpoint and hit it everytime they want to upload something. The server has no way of knowing if the request came from a real user on the upload page or from any other place.
Yeah, I could check the domain on the request coming in to the api, and validate it, but domain can be easily spoofed (when the request is not coming from a browser client).
Is this whole thing even a concern ? The main risk is someone abusing my S3 account and uploading stuff to it. Are there other concerns that I need to know about ? Can this be mitigated somehow?
Suppose my upload page is publicly available. That means the server
API endpoint that generates the short lived key needs to be public as
well. A malicious user could then just find the address of the api
endpoint and hit it everytime they want to upload something. The
server has no way of knowing if the request came from a real user on
the upload page or from any other place.
If that concerns you, you would require your users to login to your website somehow, and serve the API endpoint behind the same server-side authentication service that handles your login process. Then only authenticated users would be able to upload files.
You might also want to look into S3 pre-signed URLs.

Securing publicly accessible REST endpoints

We have a REST endpoint that provides some back end services to our publicly available Web site. The web site does not require any user authentication to access its content. Anyone can access it anonymously.
Given this scenario, we would still like to protect the back-end REST api to be somewhat secured in the sense that only users using our Web site can call it.
We dont want a malicious user to run a script outside the browser bombarding it for example.
We dont even want him to run a script automating the UI to access the endpoint.
I understand that a fully public endpoint without user authentication is somewhat impossible to secure. But can we restrict usage to valid scenarios?
Some ideas:
Use TLS/SSL for the communication - this protects the channel only.
Use some Api key (that periodically expires) that the client/browser needs to pass to the server. (a malicious user can still use the key)
Use the key to throttle the number of requests.
Use it with conjunction of a CSRF token??
Use CAPTCHA on the web site to ensure human user ( adds an element of annoyance to the final user).
Use IP whitelisting.
Use load balancing and scaling of server to handle loads.
I suppose this should be a scenario occurring in the wild.
What security steps are prevalent?
Is it possible to restrict usage via only the website and not via a script?
If its not possible to secure, what kind of mitigations are used with such public rest endpoints?

Improvements on cookie based session management

"Instead of using cookies for authorization, server operators might
wish to consider entangling designation and authorization by treating
URLs as capabilities. Instead of storing secrets in cookies, this
approach stores secrets in URLs, requiring the remote entity to
supply the secret itself. Although this approach is not a panacea,
judicious application of these principles can lead to more robust
security." A. Barth
https://www.rfc-editor.org/rfc/rfc6265
What is meant by storing secrets in URLs? How would this be done in practice?
One technique that I believe fits this description is requiring clients to request URLs that are signed with HMAC. Amazon Web Services offers this technique for some operations, and I have seen it implemented in internal APIs of web companies as well. It would be possible to sign URLs server side with this or a similar technique and deliver them securely to the client (over HTTPS) embedded in HTML or in responses to XMLHttpRequests against an API.
As an alternative to session cookies, I'm not sure what advantage such a technique would offer. However, in some situations, it is convenient or often the best way to solve a problem. For example, I've used similar techniques when:
Cross Domain
You need to give the browser access to a URL that is on another domain, so cookies are not useful, and you have the capability to sign a URL server side to give access, either on a redirect or with a long enough expiration that the browser has time to load the URL.
Examples: Downloading files from S3. Progressive playback of video from CloudFront.
Closed Source Limitations
You can't control what the browser or other client is sending, aside from the URL, because you are working with a closed source plugin of some kind and can't change its behavior. Again you sign the URL server side so that all the client has to do is GET the URL.
Examples: Loading video captioning and/or sprite files via WEBVTT, into a closed-source Flash video player. Sending a payload along with a federated single sign-on callback URL, when you need to ensure that the payload can't be changed in transit.
Credential-less Task Worker
You are sending a URL to something other than a browser, and that something needs to access the resource at that URL, and on top of that you don't want to give it actual credentials.
Example: You are running a queue consumer or task-based worker daemon or maybe an AWS Lambda function, which needs to download a file, process it, and send an email. Simply pre-sign all the URLs it will use, with a reasonable expiration, so that it can perform all the requests it needs to without any additional credentials.

privacy on Amazon S3

I have an app that lets users post and share files, and currently it's my server that serves these files, but as data grows, so I'm investigating using Amazon S3. However, I use dynamic rules for what is public and what is private between certain users etc, so the server is the only possible arbiter, i.e. permissions cannot be decided on the app/client end.
Simplistically, I guess I can let my server GET data from S3, then send them back to the app. But obviously then I'm paying for bandwidth twice not to mention making my server do unnecessary work.
This seems like a fairly common problem, so I wonder how do people typically solve this problem? (Like I read that Dropbox stores its data on S3.)
We have an application with pretty much the same requirements, and there's a really good solution available. S3 supports signed, expiring S3 URLs to access objects. If you have a private S3 object that you, but not others, can access, you can create such a URL. If you give that URL to someone else, he or she can use it to fetch the object that they normally have no access to.
So the solution for your use case is:
User does a GET to the URL on your web site
Your code verifies that the user should be able to see the object (via your application's custom, dynamic rules)
The web site returns a redirect response to a signed S3 URL that expires soon, say in 5 minutes
The user's web browser does a GET to that signed S3 URL. Since it's properly signed and hasn't yet expired, S3 returns the contents of the object directly to the user's browser.
The data goes from S3 to the user without ever traveling back out through your web site. Only users your application has authorized can get the data. And if a user bookmarks or shares the URL it won't work once the expiration time has passed.