copy files during GCP instance creation from python - google-cloud-platform

I am using the googleapiclient in python to launch VM instances. As part of that I am using the facility to run start up scripts to install docker and other python packages.
Now, one thing I would like to do is copy files to this instance ideally during the instance creation stage through python code.
What might be the way to achieve this? Ideally what would work is to be able to detect that the instance has booted and then be able to copy these files.

If I am hearing you correctly, you want files to be present inside the container that is being executed by Docker in your Compute Engine VM. Your Startup Script for the Compute Engine is installing docker.
My recommendation is not to try and copy those files into the container but instead, have them available on the local file system available to the Compute Engine. Configure your docker startup to then mount the directory from the Compute Engine into the docker container. Inside the docker container, you would now have accessibility to the desired files.
As for bringing the files into the Compute Engine environment in the first place, we have a number of options. The core story however will be describing where the files start from in the first place.
One common approach is to keep the files that you want copied into the VM in a Google Cloud Storage (GCS) bucket/folder. From there, your startup script can use GCS API or the gsutil command to copy the files from the GCS bucket to the local file system.
Another thought, and again, this depends on the nature of the files ... is that you can create a GCP disk that simply "contains" the files. When you now create a new Compute Engine instance, that instance could be defined to mount the disk which is shared read-only across all the VM instances.

First of all, I would suggest to use tool like Terraform or Google Deployment Manager to create cloud infrastructure instead of writing custom python code and handling all edge-cases by yourself.
For some reason, you can't use above tool and only Python program is an option for you the you can do following:
1. Create a GCS bucket using python api and put appropriate bucket policy to protect data.
2. Create a service account which has read permission to above GCS bucket.
3. Launch VM instance using python API and have your start-up script to install packages and run docker container. Attach above service account which has permission to read files from above GCS bucket.
3. Have a startup script in your docker container which can run ``gsutil` command to fetch files from GCS bucket and put at the right place.
Hope this helps.
Again, if you can use tools like Terraform, that will make things easy.

Related

How to Deploy a docker container with volume in Cloud Run

I am trying to publish an application I wrote in .NET Core with docker and a mounted volume. I can't really figure out or see any clear solution to my issue that will be cheap (Its for a university project.)
I tried running a docker-compose via a cloudbuild.yml linked in this post with no luck, also tried to put my dbfile in a firebase project and tried to access it via the program but it didn't work. I also read in the GCP documentation that i can probably use Filestore but the pricing is way out of budget for me. I need to publish an SQLite so my server can work correctly, that's it.
Any help would be really apreciated!
Basically, you can't mount volume in Cloud Run. It's a stateless environment and you can't persist data on it. You have to use external storage to persist your data. See the runtime contract
WIth the 2nd execution runtime environment, you can now mount Cloud Storage bucket with GCSFuse, and Filestore path with NFS

How to open a file that is stored in a bucket connected to the google cloud VM instance

I am new to google cloud, and I need to run a single python script in a compute engine.
I opened a new VM compute engine instance, opened a new bucket, uploaded the script to the bucket and I can see that the VM is connected to the bucket since when I run the command to list the buckets in the VM it finds the bucket and states the script is indeed there.
What I'm missing out on is how do I run the script? Or more generally how do I access these files?
Was looking for a suitable command but could not find any, but I have a feeling there should be such a command (since the VM can find the bucket and the files contained in it, I guess it can also access them somehow). How should I proceed to run the script from here?
The bucket's content is not attached to a volume in the VM. They are totally independent. With that being said, you first have to copy the python file from the bucket to your compute instance by using the gsutil cp command as below:
gsutil cp gs://my-bucket/main.py .
Once you have the file locally in your compute instance, you can simply run the python file.

Run java -jar inside AWS Glue job

I have relatively simple task to do but struggle with best AWS service mix to accomplish that:
I have simple java program (provided by 3rd party- I can't modify that, just use) that I can run anywhere with java -jar --target-location "path on local disc". The program, once executed, is creating csv file on local disc in path defied in --target-location
Once file is created I need to upload it to S3
The way I'm doing it currently is by having dedicated EC2 instance with java installed and first point is covered by java -jar ... and second with aws s3 cp ... command
I'm looking for better way of doing that (preferably serverless). I'm wandering if above points can be accomplished with AWS Glue Job type Python Shell? Second point (copy local file to S3), most likely I can cover with boto3 but first (java -jar execution)- I'm not sure.
Am I force to use EC2 instance or you see smarter way with AWS Glue?
Or most effective would be to build docker image (that contains this two instructions), register in ECR and run wit AWS Batch?
I'm looking for better way of doing that (preferably serverless).
I cannot tell if a serverless option is better, however, an EC2 instance will do the job just fine. Assume that you have CentOS on your instance, you may do it through
aaPanel GUI
Some useful web panels offer cron scheduled tasks, such as backing up some files from one directory to another S3 directory. I will use aaPanel as an example.
Install aaPanel
Install AWS S3 plugin
Configure the credentials in the plugin.
Cron
Add a scheduled task to back up files from "path on local disc" to AWS S3.
Rclone
A web panel goes beyond the scope of this question. Rclone is another useful tool I use to back up files from local disk to OneDrive, S3, etc.
Installation
curl https://rclone.org/install.sh | sudo bash
Sync
Sync a directory to the remote bucket, deleting any excess files in the bucket.
rclone sync -i /home/local/directory remote:bucket

Is it possible to create multiple google cloud instances and upload files to and download files from all of them at once?

I just started using google cloud, I want to create 10 virtual machines and upload files to them to run various scripts.
I have been doing it manually one by one. Is it possible to automate creating the servers all at the same time?
I have already tried using managed instance groups, but they are always on and they scale automatically, I need to control them individually.
Also, can I use a tool to upload files to all of them at once and download all the files from them at the same time?
You can automate creating VM's in a various ways;
Use Google Cloud SDK (either installed locally or through Cloud Shell). You can choose between public images or create custom image to suit your needs. Or if you just need some minor changes use startup scripts while creating a VM to install/configure your apps.
Use Deployment Manager - create come templates of the VM's to deploy and with a single command have a number of them.
Both of solutions provide you with complete control over the parameters of VM's created. You can even upload (or download directly to the new VM's) files you need.
Uploading the files to the VM's is also possible via gcloud utility - you can create a script to upload any files to the VM's (basic shell scripting experience required).
Lastly - here you can read more about storage colution available in GCP - but I'm guessing you will be using persisten disks & buckets. You can easily connect a bucket to your VM, mount it as a filesystem or just copy files to/from your VM's.
Ultimately you can read here about various ways of transferring files to the GCP instances.

How do I copy varying filenames into Vagrant box (in a platform independent way)?

I'm trying to use Vagrant to deploy to AWS using the vagrant-aws plugin.
This means I need a box, then I need to add a versioned jar (je.g. myApp-1.2.3-SNAPSHOT.jar) and some statically named files. This also need to be able to work on Windows or Linux machines.
I can use config.vm.synced_folder locally with a setup.sh to move the files I need using wildcards (e.g. cp myApp-*.jar) but the plugin only supports rsync, so only Linux.
TLDR; Is there a way to copy files using wildcards in Vagrant
This means I need a box
Yes and No. vagrant heavily relies on the box concept but in the context of AWS provider, the box is a dummy box. the system will look at the aws.* variable to connect to your account.
vagrant will spin an ec2 instance and will connect to it, you need to make sure the instance will be linked with a security group that allows the connection and open the port to your IP (at minimum)
if you are running a provisioner, please note the script is run from the ec2 instance not from your local.
what I suggest is the following:
- copy the jar files that are necessary on s3 or somewhere the ec2 instance can easily access them
- run the provisioner to fetch the files from this source (s3)
- let it go.
If you have quick turnaround of files in development mode, you can push to a git repo that the ec2 instance can pull the files and deploy the jar directly