GCP Cloud CDN will not compress content when set to "Automatic" - google-cloud-platform

GCP Cloud CDN does not compress any responses when the strategy for compression is set to AUTOMATIC (as it should per the docs) (UI of the CDN in question).
No compression takes place, and no content-encoding header is sent, even tho an accept-encoding header (gzip, deflate, br) is sent.
A prime example of an ~300kb file not being compressed appropriately can be found behind a CDN here: https://himmer.software/main.6a971e1e28a0da9a.js (as of 30.11.2022). The object in question in the connected backend bucket: Object
I feel I must be overseeing something, the mime type and request headers are correct, so having the compression mode set to automatic should return a compressed version of the object.
I set compression mode to AUTOMATIC, set back to DISABLED and back to AUTOMATIC, ran v1.compute.urlMaps.invalidateCache with /* on the load balancer (invalidated all records AFAICT), but still nothing.

Are you using the new Google Global Load Balancer or the Classic Load Balancer? Dynamic compression is not currently supported on the new GCLB option, only the Classic GCLB at this time.

Related

Does anyone know if cloud run supports http/2 streaming while it does NOT support http1.1 streaming?

We have a streaming endpoint where data streams through our api.domain.com service to our backend.domain.com service and then as chunks are received in backend.domain.com, we write those chunks to the database. In this way, we can ndjson a request into our servers and IT IS FAST, VERY FAST.
We were very very disappointed to find out the cloud-run firewalls for http1.1 at least (via curl) do NOT support streaming!!!! curl is doing http2 to google cloud run firewall and google is by default hitting our servers with http1.1(for some reason though I saw an option to start in http2 mode that we have not tried).
What I mean, by they don't support streaming is that google does not send our servers a request UNTIL the whole request is received by them!!!(ie. not just headers, it needs to receive the entire body....this makes things very slow as opposed to streaming straight through firewall 1, cloud run service 1, firewall 2, cloud run service 2, database.
I am wondering if google's cloud run firewall by chance supports http/2 streaming and actually sends the request headers instead of waiting for the entire body.
I realize google has body size limits.......AND I realize we respond to clients with 200OK before the entire body is received (ie. we stream back while a request is being streamed in) sooooo, I am totally ok with google killing the connection if size limits are exceeded.
So my second question in this post is if they do support streaming, what will they do when size is exceeded since I will have already responded with 2000k at that point.
In this post, my definition of streaming is 'true streaming'. You can stream a request into a system and that system can forward it to the next system and keep reading/forwarding and reading/forwarding rather than waiting for the whole request. The google cloud run firewall is NOT MY definition of streaming since it does not pass through chunks it receives! Our servers sends data as it receives it so if there are many hops, there is no impact thanks to webpieces webserver.
Unfortunately, Cloud Run doesn't support HTTP/2 end-to-end to the serving instance.
Server-side streaming is in ALPHA. Not sure if it helps solving your problem. If it does, please fill out the following form to opt in, thanks!
https://docs.google.com/forms/d/e/1FAIpQLSfjwvwFYFFd2yqnV3m0zCe7ua_d6eWiB3WSvIVk50W0O9_mvQ/viewform

Streaming vs chunking in WCF REST service

Can anyone explain me the difference between chunking and streaming and which method should be preferred when uploading big files from iPad to WCF REST service? Right now we have timeout error when uploading big files from iPad and we'd like to fix this. Our key requirement is that WCF Service should know whether the whole file is uploaded or not. So when client by some reason won't be able to upload the whole file WCF should not perform any operation on uploaded content (as far as I understand streaming upload won't allow to implement this).
Some more questions that confuse me:
1) How both of these modes are working in terms of HTTP?
2) I found that in chunked mode there is header "trasnfer-encoding: chunked" in the first request. Then client sends chunks within separate requests to server and a final zero-length request. Do I need to set trasnfer-encoding header in every request? What other headers should be used?
3) Do I need to send only one HTTP request in streaming mode?
4) Do I need to tell WCF Service somehow that I'm sending streamed content?
5) Let's say default connection timeout for WCF service is 30 seconds. How this timeout affect
streaming and chunking modes?
6) Can anyone explain in short how both of these modes should be implemented on server and client? (No code required, just high level description). The more I read on this topic the more I'm getting confused.
Many thanks!

Best Practice: AWS ftp with file processing

I'm looking for some direction on an AWS architictural decision. My goal is to allow users to ftp a file to an EC2 instance and then run some analysis on the file. My focus is to build this in as much a service-oriented way as possible.. and in the future scale it out for multiple clients where each would have there own ftp server and processing queue with no co-mingling of data.
Currently I have a dev EC2 instance with vsftpd installed and a node.js process running Chokidar that is continuously watching for new files to be dropped. When that file drops I'd like for another server or group of servers to be notified to get the file and process it.
Should the ftp server move the file to S3 and then use SQS to let the pool of processing servers know that it's ready for processing? Should I use SQS and then have the pool of servers ssh into the ftp instance (or other approach) to get the file rather than use S3 as a intermediary? Are there better approaches?
Any guidance is very much appreciated. Feel free to school me on any alternate ideas that might save money at high file volume.
I'd segregate it right down into small components.
Load balancer
FTP Servers in scaling group
Daemon on FTP Servers to move to S3 and then queue a job
Processing servers in scaling group
This way you can scale the ftp servers if necessary, or scale the processing servers (on SQS queue length or processor utilisation). You may end up with one ftp server and 5 processing servers, or vice versa - but at least this way you only scale at the bottleneck.
The other thing you may want to look at is DataPipeline - which (whilst not knowing the details of your job) sounds like it's tailor made for your use case.
S3 and queues are cheap, and it gives you more granular control around the different components to scale as appropriate. There are potentially some smarts around wildcard policies and IAM you could use to tighten the data segregation too.
Ideally I would try to process the file on the server that it is currently placed.
This will save a lot of network traffic and CPU load.
However if you want one of the servers to be like reverse proxy and to load balance between farm of servers, then I will notify the server with http call that the file has arrived. I would made the file available via ftp since you already has working vsftp that will not be a problem and will include the file ftp url in the http call, so the server that will do the processing can get the file and start working on it immediately.
This way you will save money by not using any extra S3 or SQS or any other additional services.
If the farm of servers are made of equal type of servers, then the algorithm for distributing the load should be RoundRobin if the servers are with different capacity then the load distribution should be made according to the server performance.
For example if server ONE has 3 times more performnce then server THREE and server TWO has 2 times better performance than server THREE, then you can do:
1: Server ONE - forward 3 request
2: Server TWO - forward 2 request
3: Server THREE - forward 1 request
4: GOTO 1
Ideally there should be feedback from the serves that report the current load so the load-balancer knows who is the best candidate for the next request instead of using hard-coded algorithms, since probably the requests do not need exactly equal amount of resources to be processed, but this start looking like Map-Reduce paradigm and is out of the scope ... at least for the begining. :)
If you want to stick with S3, you could use RioFS to mount S3 bucket as a local filesystem on your FTP and processing servers. Then you could do the usual file operations (e.g.: get the notification when a file was created / modified).
As well as RioFS s3fs-fuse utilizes FUSE to provide a filesystem that is (virtual locally) mountable; s3fs-fuse is currently well-maintained.
In contrast the Filesystem Abstraction for S3, HDFS and normal filesystem
swineherd-fs allows to have a different (locally virtual) approach:
All filesystem abstractions implement the following core methods, based on standard UNIX functions, and Ruby's File class [...].
As the 'local abstraction layer' has been Only thoroughly tested on Ubuntu Linux i'd personally go for a more mainstream/solid/less experimental stack, i.e.:
a (sandboxed) vsftpd for
FTP transfers
(optionally) listen for filesystem changes and finally
trigger
middleman-s3_sync to do the cloud lift (or synchronize all by itself).
Alternatively, and more experimental, there are some github projects that might fit it:
s3-ftp: FTP server front-end
that forwards all uploads to an S3 bucket (Clojure)
ftp-to-s3: An FTP server that
uploads every file it receives to S3 (Python)
ftp-s3: FTP frontend to S3 in Python.
Last but not least i do recommend using donationware Cyberduck if on OSX - a comfortable (and very FTP-like) client interfacing S3 directly. For Windows there is a (PRO optional) freeware named S3 Browser.

Coldfusion Amazon S3 support for file upload, does it connect to a specific IP?

I'm trying to use S3 as an off site file location for a database backup. On my home dev machine this works just fine, I just do a dump out from mySQL and then
<cffile action = "copy"
source = "#backupPath##filename#"
destination = "s3://myID:myKey#myBucket/#filename#">
and all is good. However, the production server at work is behind a router/firewall controlled/managed by a 3rd party. I read somewhere that S3 needs port 843 open to work (and then lost that reference) but does the CF built in function connect to a particular IP at amazon so I could ask for that port open for just that IP?
I see that you found some answers via comments on Ray Camden's blog post about the S3 functionality, with information contributed by Steven Erat, but for the sake of completeness here on Stack Overflow and for others who may find this question, here is that information:
By default, all communication from your CF server and S3 is done over HTTPS on port 443. There is a Java system property (s3service.https-only), which defaults to true, and will do the communication over http and not over https if you set it to false. Sorry, I don't know how you might change it, unless maybe as a JVM argument.
The IP of any given bucket could be different (and possibly change over time), so you can't necessarily get by on opening a port for a single IP -- but luckily you shouldn't have to since it's all done over SSL/443.
What does use port 843 is the Amazon S3 console, an optional flash-based web interface for managing your bucket(s).

Compression HTTP

Is it possible to POST request data from browser to server in compressed format?
If yes, How can we do that?
Compressing data sent from the browser to the server is not natively supported in the browsers.
You'll have to find a workaround, using a clientside language (maybe a JavaScript GZip implementation, or a Java Applet, or ...). Be sure to visually display to the user what the browser is doing and why it is taking some time.
I don't know the scope of your application, but on company websites you could just restrict input to compressed files. Ask your users to upload .zip/.7z/.rar/... files.
The server->client responses can be gzip compressed automagically by the server.
Compressing the client->server messages is not standard, so will require some work by you. Take your very large POST data and compress it client-side, using JavaScript. Then decompress it manually on the server side.
This will usually not be a beneficial thing to do unless your bandwidth usage is a major bottleneck. Compression requires both time and CPU usage to perform.