Cannot unzip inside AWS Fargate container

Cannot unzip inside AWS Fargate container - amazon-web-services

I have Java application that extracts content from zip archive.
When launching it as a Fargate task, it produces the following error:
java.util.zip.ZipException: invalid block type
I could get similar zlib error while running an app locally and setting non-writable directory to extract zip archive content to. It works otherwise.
Using various directories inside Docker layer did not help (i tried /tmp and WORKDIR, I also tried User: root in ContainerDefinition), neither did an approach with mounting writable volume in ContainerDefinition.
According to documentation, Fargate provides 10GB for writable upper Docker layer and 4GB for mounted volume. Why I cannot extract zip archive?
I cannot trace it further, as Fargate does not provide an option for this, and I could not get more informative Java exception.

It turned out that zip file I've tried to upload has zero buffer. I needed to add Binary media type support to Api Gateway for zip file to upload correctly to destination Task.

Related

GCP bucket reachable in UI but not by gcsfuse in the cloud shell

Hi I want reach some files in a GCP bucket from the cloud shell terminal (for sftp reasons), gcsfuse successfully mounts the father dir and it has all the directories except the one I need, any ideas what am I doing wrong?

In Google Cloud Storage object names ending in a slash(/) represent a directory, and all other object names represent a file. By default directories are not implicitly defined, they exist only if a matching object ending in a slash(/) exists.
Since the usual file system operations like mkdir will do the right thing, if someone set up a bucket's structure using only gcsfuse then they will not notice anything odd about this. However, if someone uses some other tool to set up objects in Google Cloud Storage (such as the storage browser in the Google Cloud Console), they may notice that not all objects are visible until they create leading directories for them.
For example, let's say someone uploaded an object demo/start.txt by choosing the folder upload option in the storage browser section in Google Cloud Console, then mounted it with gcsfuse. The file system will initially appear empty, since there is no demo/ object. However if they subsequently run mkdir demo, they will now see a directory named demo containing a file named start.txt.
To mitigate this issue gcsfuse supports a flag called --implicit-dirs. When this flag is enabled, name lookup requests use the Google Cloud Storage API's Objects.list operation to search for objects that would implicitly define the existence of a directory with the name in question. So, in the example above, a directory named demo containing a file start.txt would appear.
So in your case I suspect the file you are not able to see is a folder which you have uploaded in Google Cloud Storage bucket. As you have already mounted gcsfuse with a directory, if you mount it again using the flag --implicit-dirs, it will throw an error. So I would suggest you to unmount the directory by running the following command -
fusermount -u /path/to/mount/directory
Then mount the directory again by running the following command -
gcsfuse --implicit-dirs BUCKET_NAME /path/to/mount/directory
You can also create a new directory and mount that directory with gcsfuse without unmounting the existing mounted directory.
Please note that the flag --implicit-dirs has some drawbacks. I would recommend you to go through this github issue to get detailed information about it.

Copy files to Container-Optimised OS from a GCP Storage bucket

How can one download files from a GCP Storage bucket to a Container-Optimised OS (COS) on instance startup?
I know of the following solutions:
gcloud compute copy-files
SSH through console
SCP
Yet all of these have to be done manually and externally after an instance is started.
There is also cloud init, yet I can't find any info on how to copy files from a Storage bucket. Examples seem to be suggesting that it's better to include content of files in the cloud init file directly, which is not something I want to do because security. Is it possible to download files from Storge bucket using cloud init?
I considered using a startup script, yet COS lacks CLI tools such as gcloud or gsutil to be able to run any such commands in a startup script.
I know I could copy the files manually and then save the image as a boot disk, but I'm hoping there are solutions that avoid having to do so.
Most of all, I'm assuming I'm not asking for something impossible, given that COS instance setup allows me to specify Docker volumes that I could mount onto the starting container. This seems to suggest I should be able to have some private files on the instance the moment COS will attempt to run my image on startup. But how?
Trying to execute a startup-script with a cloud-sdk image and copying files there as suggested by Guillaume didn't work for me for a while, showing this log. Eventually I realised that the cloud-sdk image is 2.41GB when uncompressed and takes over 2 minutes to complete pulling. I tried again with an empty COS instance and the startup script completed successfully, downloading the data from a Storage bucket.
However, a 2.41GB image and over 2 minutes of boot time sound like a bit of an overkill to download a 2KB file. Don't they?
I'm glad to see a working solution to my question (thanks Guillaume!) although I'm still wondering: isn't there a nicer way to do this? I feel that this method is even less tidy than manually putting the files on the COS instance and then creating a machine image to use in the future.

Based on Guillaume's answer I created and published a gsutil wrapper image, available as voyz/gsutil_wrap. This way I am able to run a startup-script with the following command:
docker run -v /host/path:/container/path \
--entrypoint gsutil voyz/gsutil_wrap \
cp gs://bucket/path /container/path
It's essentially a copy of what Guillaume suggested, except it is using an image containing only a minimum setup required to run gsutil. As a result it weighs 0.22GB and pulls within 10-20 seconds on average - as opposed to 2.41GB and over 2 minutes respectively for the google/cloud-sdk image suggested by Guillaume.
Also, credit to this incredibly useful StackOverflow answer that allows gsutil to use the default service account for authentication.

The startup-script is the correct location to do this. And YES, COS lacks some useful library.
BUT you can run container! And, for example, the Google Cloud SDK container!
So, add this startup-script in the VM metadata:
key -> startup-script
value ->
docker run -v /local/path/to/copy/files:/dummy/container/path \
--entrypoint gsutil google/cloud-sdk \
cp gs://your_bucket/path/to/file /dummy/container/path
Note: the startup script is ran in root mode. Perform a chmod/chown in your startup script if you need to change the file access mode.
Let me know if you need more explanation on this command line
Of course, with a fresh COS image, the startup time is quite long (pull the container image and extract it).
To reduce the startup time, you can "bake" your image. I mean, start with a COS, download/install what you want on it (or only perform a docker pull of the googkle/cloud-sdk container) and create a custom image from this.
Like this, all the required dependencies will be present on the image and the boot start will be quicker.

Read/Write data to/from Cloud Storage Bucket using gcsfuse

To mount Google Cloud Storage Bucket onto a directory on a local machine for processing.
Using a manjaro environment and installed gcsfuse manually.
in the gs://bucket01, there are directories containing jpg and json files
go get -u github.com/googlecloudplatform/gcsfuse
$GOPATH/src/github.com/googlecloudplatform/gcsfuse
GOOGLE_APPLICATION_CREDENTIALS=/run/media/manjaro/gcp/key.json gcsfuse bucket01 /run/media/manjaro/gcp/bucket01
Using mount point: /run/media/manjaro/gcp/bucket01
Opening GCS connection...
Mounting file system...
File system has been successfully mounted.
cd /run/media/manjaro/gcp/bucket01
ls
# empty
# The expected outcome is data from gs://bucket01 populates /run/media/manjaro/gcp/bucket01
# Updates in /run/media/manjaro/gcp/bucket01 will also be seen in gs://bucket01
Am i using gcsfuse correctly?

Please try using Implicit directories
As mentioned above, by default there is no allowance for the implicit
existence of directories. Since the usual file system operations like
mkdir will do the right thing, if you set up a bucket's structure
using only gcsfuse then you will not notice anything odd about this.
If, however, you use some other tool to set up objects in GCS (such as
the storage browser in the Google Developers Console), you may notice
that not all objects are visible until you create leading directories
for them.
gcsfuse supports a flag called --implicit-dirs that changes the
behavio

CodeDeploy pipeline not finding AppSpec.yml - but is clearly available

I've had this running months ago, so I know it works, but have created a new EC2 instance to deploy my code and stuck at the first hurdle.
My Deployment Details runs as follows:
Application Stop - succeeded
Download Bundle - succeeded
BeforeInstall - Failed
Upon looking at the failed event, I get:
The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path "appspec.yml". The revision was unpacked to directory "C:\ProgramData/Amazon/CodeDeploy/57f7ec1b-0452-444e-840c-4deb4566e82d/d-WH9HTZAW0/deployment-archive", and the AppSpec file was expected but not found at path "C:\ProgramData/Amazon/CodeDeploy/57f7ec1b-0452-444e-840c-4deb4566e82d/d-WH9HTZAW0/deployment-archive/appspec.yml". Consult the AWS CodeDeploy Appspec documentation for more information at http://docs.aws.amazon.com/codedeploy/latest/userguide/reference-appspec-file.html
Thing is, if I jump onto my EC2 and copy and paste the full path, sure enough I see the YML file, along with the files that were in a ZIP file within my S3 bucket, so they've been successfully sent to the EC2 and unzipped.
So I'm sure it's not a permissions things, the connection is being clearly made, and the S3 Bucket, CodeDeploy and my EC2 are all happy.
I read various posts on StackOverflow about changing the AppSpec.yml file to "appspec.yml", "AppSpec.yaml", "appspec.yaml", and still nothing works.
Anything obvious to try out?

OK, after a few days back and forth, the solution was incredibly annoying (and embarrassing)...
On my EC2 instance, the "File Name Extensions" was unticked, so my AppSpec.yml was actually AppSpec.yml.txt
IF anyone else has a similar issue, do check this first!!

How are you zipping the file. A lot of times users end up "double-zipping". To check if you unzip the .zip file does it gives you the files or the folder?
When we zip a folder on Windows, it basically creates a folder inside the zip folder and thus, CodeDeploy agent cannot read it. So to zip the artifact, please select all the files and then right click to zip it on the same location. This would avoid creating a new folder inside the zip.

Mounting Docker volumes - is the writeable layer used?

I'm new to docker and I'm trying to process a lot of data on AWS. Right now, the input for the scripts I want to run using my parent image is about 20G.
First, I tried just copying the data into my image on the writeable layer (using COPY), but then I got the error
Sending build context to Docker daemon 20.53 GB
Error response from daemon: Error processing tar file(exit status 1): write /Input/dbsnp_138.b37.vcf: no space left on device
So I thought that 20G would be too much to just store on my writeable layer.
Then I looked at mounting a volume on the docker host (using VOLUME), but wouldn't that also need to be written on the writeable layer first? Wouldn't that also give me the same error?

So the way docker build works is this, when you use something like below
docker build .
It will send the content of the current directory to the Docker daemon. So if you are sending a context of 20GB then an additional 20GB+ of free space would be needed.
When you mount volumes on a running image, it would actually mount your folder, so no extra space would be used. But if delete the directory from inside the container, the file on the host will also be deleted.
So mounting the directory is a possible solution. Also if there are files in your current directory which you would not be interested in sending to docker daemon as a context, then you should use a .dockerignore to specify which files to ignore

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js