Transferring large files with web services - web-services

What is the best way to transfer large files with web services ? Presently we are using the straight forward option to transfer the binary data by converting the binary data into base 64 format and embeding the base 64 encoding into soap envelop itself.But it slows down the application performance considerably.Please suggest something for performance improvement.

In my opinion the best way to do this is to not do this!
The Idea of Webservices is not designed to transfer large files. You should really transfer an url to the file and let the receiver of the message pull the file itsself.
IMHO that would be a better way to do this then encoding and sending it.

Check out MTOM, a W3C standard designed to transfer binary files through SOAP.
From Wikipedia:
MTOM provides a way to send the binary
data in its original binary form,
avoiding any increase in size due to
encoding it in text.
Related resources:
SOAP Message Transmission Optimization Mechanism
Message Transmission Optimization Mechanism (Wikipedia)

Related

When to use raw binary and when base64?

I want to develop a service which receives files from users. At first, I was willing to implement uploads using raw binary in order to save time (base64 increases file size by about 33%), but reading about base64 it seems to be very useful if you don't want problems uploading files.
The question is what are the downsides of implementing raw binary uploads? And in which cases it makes sense? In this case I will develop client and server so I will have control over these two, but what about routers or network, can they corrupt data if not in base64?
I'm trying to investigate what dropbox or google drive do and why, but I can't find an article.
You won't have any problems using raw binary for file uploads. All Internet Protocol networking hardware is required to be 8 bit clean - that is to transmit all 8 bits of every byte/octet.
If you choose to use the TCP protocol, it guarantees reliable transmission of octets (bytes). Encoding using base64 would be a waste of time and bandwidth.

Should output be encoded at the API or client level?

We are moving our Web app architecture to being microservice based. We have an internal debate as to whether an REST API that provides content (in JSON, let's say) should be looking to encode content to make it safe, or whether the consumers that take that content and display it (in HTML, for example, or otherwise use it) should be responsible for that encoding. The use case is to prevent XSS attacks and similar.
The provider stance is "Well, we can't know how to encode it for everyone, or how you're going to use the content, so of course the consumers should encode the content."
The consumer stance is "There is one provider and multiple consumers, so it's more secure to do it once in the providing API than to hope that every consumer does it."
Are there any generally accepted best practices on this and why?
As a rule, data when passing through "internal" processes (whatever that might mean to use) should be stored or encoded in whatever "internal" format makes sense. The format chosen is typically designed to minimize encoding/decoding steps and to prevent data loss.
Then, upon output, data is encoded using whatever output format makes sense. Preventing data loss is important, but also proper escaping and formatting is key here.
So for example, with internal APIs, data in binary format may be sufficent. But when you output JSON or HTML or XML or PDF, you have to encode and escape your data appropriately to fit the output format.
The important point here is that different output formats have different concepts of "safe". What's "safe" for HTML may not be safe for JSON, and what's safe for JSON may not be safe for SQL. Data is encoded upon output specifically so that you can use the proper encoding for the task. You cannot assume that this step is done for you ahead of time, nor should you put your output function in the position to determine whether or not encoding must be done. If you stick with the rule: "output function ALWAYS encodes for safety", then you will never have to worry about data injection attacks.
I would say that the two important points are the following:
The encoding used by the provider MUST be specified with extreme clarity and precision in a reference document, so that all consumer implementors can know what to expect.
Whatever default encoding is used by the provider MUST keep all needed information, i.e. still be amenable to transcoding by any consumer who would wish to do it.
If you follow these two rules then you will have done 95% of the job for reliability and security.
As for your specific question, a good practice is a middle-ground: the provider follows by default a "generic" encoding, but consumers can ask (optionally) for a specific encoding which the provider may then apply -- this allows the provider to support a number of dumb, lightweight clients of possibly different kinds and can be extended later on with extra encodings without breaking the API.
I firmly believe it is both the consumer and the provider that need to do their part in being good citizens in the security space.
As the provider I want to make sure I deliver a secure product. I don't need to know the context in which my client is going to use my product, all I need to know is how I am going to deliver it. If my delivery is in JSON, then I can use that context to escape my data before sending it off, similarly for XML, plain text, etc. Further more there are transport methods that aid in security already. JSONP is one such delivery method. This ensures the payload is consumed appropriately.
As the consumer, which by the way in our environment no one is the final consumer, we are all providers to the final end client (the end users via a web browser mostly.). Because of this we have to also secure the data at this end. I would never trust a black box API to do this job for me, I would always make a point to ensure a secure payload. There are many tools out there, the ESAPI project from OWASP comes to mind, that will aid in the sanitization by context of data. Remember that you are eventually sending this data on to the end-user (browser) and if there is something awry you won't be able to pass the buck. Your service will be viewed as the vulnerable one regardless of where the flaw lies. Additionally, as the consumer, you may not always be able to rely on the black box provider to fix their flaws in a timely fashion. What if their support is lacking or they have higher priorities. Does that mean you continue to provide a known flaw to your end-users?
Security is about layers, and having safeguards at the source and end-points is always preferable.

Is there any compression available in libcurl

I need to transfer a huge file from local machine to remote machine using libcurl with C++. Is there any compression option available in-built with libcurl. As the data to be transferred is large (100 MB to 1 GB in size), it would be better if we have any such options available in libcurl itself. I know we can compress the data and send it via libcurl. But just want to know is there any better way of doing so.
Note: In my case, many client machines transfer such huge data to remote server at regular interval of time.
thanks,
Prabu
According to curl_setopt() and options CURLOPT_ENCODING, you may specify:
The contents of the "Accept-Encoding: " header. This enables decoding
of the response. Supported encodings are "identity", "deflate", and
"gzip". If an empty string, "", is set, a header containing all
supported encoding types is sent.
Here are some examples (just hit search in your browser and type in compression), but I don't know hot exactly does it work and whether it expect already gzipped data.
You still may use gzcompress() and send compressed chunks on your own (and I would do the task this way... you'll have better control on what's actually going on and you'll be able to change used algorithms).
You need to send your file with zlib compression by yourself. And perhaps there are some modification needed on the server-side.

Transmit raw vertex information to XTK?

We're using XTK to display data processed and created on a server. In our particular case, it's a parallel isocontouring application. As it currently stands we're converting to the (textual) VTK format and passing the entire (imaginary) VTK file over the wire to the client, where XTK renders it. This provides some substantial overhead, as the text format outweighs in the in-memory format by a considerably amount.
Is there a recommended mechanism available for transmitting binary data directly, either through an alternate format that is well-described or by constructing XTK primitives inside the JavaScript code itself?
It should be supported to parse an X.object from JSON. So you could generate the JSON on the serverside and use the X.object(jsonobject) copy constructor to safe down cast it. This should also give the advantage that the objects can be 'webgl-ready' and do not require any clientside parsing which should result in instant loading.
I was planning to play with that myself soon but if you get anything to work, please let us know.
Just have in mind that you need to match the X.object structure even in JSON. The best way to see what is expected by xtk is to JSON.stringify a webgl-ready X.object.
XMLHTTPRequest, in its second specification (the last one), allows trans-domain http requests (but you must have the control of the php header on the server side).
In addition it allows to sent ArrayBuffer, or Blobs or Documents (look here). And then on the client side you can write your own parser for that blob or (I think it fits more in you case) that BinaryBuffer using binary buffer views (see doc here). However XMLHTTPRequest is from client to server, but look HTML5 WebSocket, it seems it can transfert binaryArrays too (they say it here : ).
In every case you will need a parser to transform binary to string or to X.object at the client side.
I wish it helped you.

WSDL possible to transfer a FILE type?

A "checkResult" service deployed on a node machine is defined to return the result on the node to a cluster controller that sends the request.The result on node ,which is in the form of file, may vary drastically in length,as is often the case with daily log files.
At first,i thought it might be ok just using a single string to pack the whole content of the file,so i defined
checkResult(inType *in,OutType *out)
where the OutType* is char*. Then i realized that the string could be in KB length or even more. So i wonder whether it is proper to use string here.
I googled a lot and could not find the max length permitted in wsdl(maybe conflict with the local maxbuffer length as well) and did not find any information about transferring a file type parameter either.
Using struct type may be suggested ,but it could be so nested for the file and difficult to parse when some of the elements inside could be nil and absent.
What'd you do when you need to return a file type result or large amount of data in a webservice?
p.s the server and client both in C.
When transferring a large amount of data in a (SOAP) web service request or response, it is generally better practice to use an attachment mechanism versus including the data as part of the body. Probably the order for considering attachment mechanism (broadest to narrowest adoption):
Message Transmission Optimization Mechanism (MTOM) - The newest of these specifications (http://www.w3.org/TR/soap12-mtom/) which is supported in many of the mainstream languages.
SOAP with Attachments - This specification (http://www.w3.org/TR/SOAP-attachments) has been around for many years and is supported in several languages but notably not by Microsoft.
Direct Internet Message Encapsulation (DIME) - This specification (http://bgp.potaroo.net/ietf/all-ids/draft-nielsen-dime-02.txt) was pushed by Microsoft and support has been provided in multiple languages/frameworks including java and .NET.
Ideally, you would be able to work with a framework to give you code stub generation directly from a WSDL indicating MTOM-based web service.
The critical parts of such a WSDL document include:
MTOM policy declaration
Policy application in the binding
Placeholder for the reference to the attachment in the types (schema) section
If you are working contract-first and have a WSDL in hand, the example in section 1.2 of this site (http://www.w3.org/Submission/WS-MTOMPolicy/) shows the simple additions to be made to declare and apply the MTOM policy. Appendix I of the same site shows an example of a schema element which allows a web service client or server to identify a reference to the MTOM attachment.
I have not implemented a web service or client in C, but a brief scan of recently-updated packages revealed gSoap (http://www.cs.fsu.edu/~engelen/soap.html) as a possibility for helping in your endeavors.
Give those documents a look and see if they help to advance your project.