clj-http: tracking progress of a multi-part file upload

clj-http: tracking progress of a multi-part file upload - clojure

I am doing a multi-part file upload using clj-http. I am wondering if there is a way I can track the progress of a file upload. Perhaps, some function that gets called periodically with how much of the file has been uploaded so far?

clj-http uses Apache HTTP Client underneath so you can reuse a solution presented in the answer to another question. However, it cannot be easily.
The code presented in the linked answer provides an enhanced implementation of HttpEntity. clj-http currently doesn't support providing your own instance of HttpEntity as your request body.
You have two options:
Provide a pull request to clj-http so it supports providing an instance of HttpEntity as the :body value (e.g. by adding another cond branch checking for HttpEntity value or by making wrap-input-coercion a multimethod so you can extend it for HttpEntity).
Provide similar logic to that from the mentioned FileEntity and OutputStreamProgress but in your implementation of org.apache.http.entity.mime.content.ContentBody. clj-http supports providing them as values for multipart attachments. The drawback here is that it would track progress per attachment, not per the whole request.

Related

C++: MD5 Value of uploaded file on Azure

I've seen the example from Azure github and you can see that when uploading a data it uses stream and also provides md5 hash of that stream.
My main purpose is to upload file into azure and provide md5, if local md5 and one calculated by Azure doesn't match it will return an error.
I also know that I can use uploadFrom function which takes a filename and opens it and takes care of the "chunking" etc. The main problem is that it doesn't allow me to specify md5 hash as it does in upload case. uploadFrom accepts different type of options structure which doesn't have TransactionalContentHash member.
Is there any functionality in Azure which would allow me to send a file and provide md5? I know I can open it and read chunks and calculate md5 and send them one-by-one but I want to avoid that headache if it's possible.

It depends what you want to do. If you want to do an online checksum calculator, it's not worth it.
If you want that, the simplest way is probably adding 2 uploads: one of the file (data) and one of a checksum file (the "control" file), with input validation, of course, then proceed from there, where you calculate the md5 and check the control file.
If you want to create a repository for another business process, then it's a different story. In fairness, what you described is not how designing a validation flow works at all. You don't ask a client for data, their version of a checksum, then you check the data and validate if the two checksums are identical. The reason behind this is simple: a server should never ever trust a client.
Imagine this scenario: There is data. It can be a file or transient data, it's irrelevant. I edited it (again, the purpose is irrelevant, I can do it to sabotage, for fun or because I am infecting the data with malware), I know the new checksum is incorrect and I know you will catch that. So what I do is I send you the altered file and the new checksum, not the one that the file had before I edited it. Now you check the data and see the checksums are identical. You tell me "thank you sir, you are honest and did not modify anything". But I did, and I lied, didn't I ? :)
The real flow of a validation approach is this:
The client agreed what data should be sent upon beforehand, therefore you have a list of the valid checksums
The client sends data that to the server
The server calculates the checksum and sees if it's identical
If it isn't, it rejects the data, if it is, it moves to the next step
Now I know there are sites where they give you files and their checksum in a separate file, but that is the reversed flow: server to client. In theory, a client could trust a server.
There is an approach, though, that works in a way that is similar to the one you describe: the client sends data to the server (or a centralized services that would distribute the data to other clients) and also sends a sort of a checksum and the server uses that to validate that nobody changed it along the way (such as MITM attacks).
In that case you should consider a secure transmission (look up HTTPS and TLS) and digital signatures (also look up certificates public+private keys). While this paradigm is usually used for web services, not file uploads, it can be done if the file is packaged after being encrypted by an app client-side (for example in a p7s file). For that you need a key exchange between your server and the client.
Small caveat: MD5 is not secure. Don't rely on it for sensitive data.

Google cloud - file pipe

I have a system that is able to produce a custom video (based on input text) faster than real-time.
I would like to create an HTTP endpoint: /create_video?description=dog riding a horse that, as part of the response, returns the URL to the produced video.
Video can be quite long, and its generation can take some time. Rather than waiting for it to complete, I would like to return the response as soon as first frames are available, such that the user can watch instantly using the provided URL (we generate faster than real-time, so there will be no buffering). The URL must point to generate video indefinitely (even months after generation).
I am using Google Cloud. What would be the recommended way to do that?
I could create a custom endpoint that serves the videos, and has the described properties, but maybe something as simple as Cloud Storage could work (I was not able to get it to read while writing was not finalized though)?

As per #Piotr Dabkowski, after doing some extra research it seems to not be that easy. My best idea is to implement a custom endpoint that streams the result, while the file is being generated using temporary array entry in DB. Once the file is fully generated (db entry will be empty and point to cloud storage location), redirects to cloud storage.

Is there a way to unlock all http methods without changing the war file?

I have a .war file deployed in Jetty (I didn't build it, and it's not possible to create a new instance).
An OPTIONS request to http://example.com/rest/object/{uuid}responds with HEAD, DELETE, GET, OPTIONS. The people that built the war claim that it's not an issue with their file.
Is there a Jetty config file I can change to allow all the http methods?
If this is something I have to do in a Jetty Java file, I am a Jetty noob, so please be verbose, or point me to some docs that I can read.
Note: I can POST via CURL, but not via http...
Edit: (I was posting to a different Endpoint with CURL)

This is not an answer to the question I posted, but it allowed me to understand what was happening, which lead to the question being irrelevant for my particular issue. With that being said, I won't mark this as the answer in the event it is answerable (which I don't think it is, you'll see why in a minute).
The short answer is I was trying to POST to an endpoint that expects the object to already exist:
http://example.com/rest/object/{uuid}
What I should have done was use an endpoint, where if you POST, it understands you want to create a new object:
http://example.com/rest/object
The longer answer
Let's say you have a REST Endpoint that allows you to get a specific object, like this:
http://example.com/rest/object/{uuid}
Since you're dealing with a specific object, you typically wouldn't want to POST a new object there. POST means 'create a new object'. If you were able to POST there, you'd essentially be overwriting that Object... and that's what PUT is for, but it's debatable. Also, PUT wasn't an option either...
So, because of its subjective nature, some Web Services limit request methods so you must do it the way the architects intended. The REST Server I am running doesn't have much in the way of documentation, so I was unaware of these restrictions, and am still puzzled by it.
What I thought I was doing was saying "OK, I want to create an object with this id. So if I POST my data to that specific UUID, the server will know I want to create a new object." What I should have been saying is "OK, I want to create a new object. It's UUID is already defined in the data, so all I need to do is send it to the endpoint that handles those objects, and expects a POST." Like this:
http://example.com/rest/object

Returning images through AWS API Gateway

I'm trying to use AWS API Gateway as a proxy in front of an image service.
I'm able to get the image to come through but it gets displayed as a big chunk of ASCII because Content-Type is getting set to "application/json".
Is there a way to tell the gateway NOT to change the source Content-Type at all?
I just want "image/jpeg", "image/png", etc. to come through.

I was trying to format a string to be returned w/o quotes and discovered the Integration Response functionality. I haven't tried this fix myself, but something along these lines should work:
Go to the Method Execution page of your Resource,
click on Integration Response,
expand Method Response Status 200,
expand Mapping Templates,
click "application/json",
click the pencil next to Output Passthrough,
change "application/json" to "image/png"
Hope it works!

I apologize, in advance, for giving an answer that does not directly answer the question, and instead suggests you adopt a different approach... but based in the question and comments, and my own experience with what I believe to be a similar application, it seems like you may be using the the wrong tool for the problem, or at least a tool that is not the optimal choice within the AWS ecosystem.
If your image service was running inside Amazon Lambda, the need for API Gateway would be more apparent. Absent that, I don't see it.
Amazon CloudFront provides fetching of content from a back-end server, caching of content (at over 50 "edge" locations globally), no charge for the storage of cached content, and you can configure up to 100 distinct hostnames pointing to a single Cloudfront distribution, in addition to the default xxxxxxxx.cloudfront.net hostname. It also supports SSL. This seems like what you are trying to do, and then some.
I use it, quite successfully for exactly the scenario you describe: "a proxy in front of an image service." Exactly what my image service and your image service do may be different (mine is a resizer that can look up the source URL of missing/never before requested images, fetch, and resize) but fundamentally it seems like we're accomplishing a similar purpose.
Curiously, the pricing structure of CloudFront in some regions (such as us-east-1 and us-west-2) is such that it's not only cost-effective, but in fact using CloudFront can be almost $0.005 cheaper than not using it per gigabyte downloaded.
In my case, in addition to the back-end image service, I also have an S3 bucket with a single file in it, attached to a single path in the CloudFront distribution (as a second "custom origin"), for the sole purpose of serving up /robots.txt, to control direct access to my images by well-behaved crawlers. This allows the robots.txt file to be managed separately from the image service itself.
If this doesn't seem to address your need, feel free to comment and I will clarify or withdraw this answer.

#kjsc: we finally figured out how to get this working on an alternate question with base64 encoded image data which you may find helpful in your solution:
AWS Gateway API base64Decode produces garbled binary?
To answer your question, to get the Content-Type to come through as a hard-coded value, you would first go into the method response screen and add a Content-Type header and whatever Content type you want.
Then you'd go into the Integration Response screen and set the Content type to your desired value (image/png in this example). Wrap 'image/png' in single quotes.

Fetch data from website real time

Ok basically i'm fetching data from website using curl and parsing the contents using CkHtmlToText.
My issue is how to fetch new data website is writing down.
For example website contents are as follow:
-test1
-test2
After 1 second contents are :
-test1
-test2
-test3
How to fetch only the next line website wrote down that i didnt get yet which is " test3".
Any ideas ? Thank you.
Language im using is : Visual c++

HTTP requests are stateless. You make a request, you get a result, then you make another completely independent request, you get another result, and so on. If the resource you are trying to access is changing over time, you need to make multiple requests, where each time you will get the full updated resource.
I imagine you may be describing a web page that automatically updates while you are looking at it (like a Twitter feed, for example). In that case, the response contains a script that allows the browser to fetch new data and inject it into the DOM. Unless you also plan to build the DOM and use a JavaScript engine (basically implementing a web browser) to run the script, this is probably not useful to you. Instead, you are better off finding an API that gives you data in a format that is easy to parse and get updates for. If this API is a REST API (built on HTTP), then you will still need to make independent requests to get updates.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js