Schedule gsutil command - google-cloud-platform

I am using gsutil to load data from local machine file to GCS.But now I want to schedule it like 20 minutes interval the data will be exported from local machine to google cloud storage.How can we create this type of cron job and where should I create?

You should create a script within your local machine to achieve this task.
It could look like this (making sure you give the file executable permissions):
#! /bin/bash
PATH="$PATH":[REPLACE with /path/to/gsutil/]
# The ".boto" file contains the settings that helps you do operations with your bucket: https://cloud.google.com/storage/docs/boto-gsutil
export BOTO_CONFIG=[REPLACE with "/home/username/.boto"]
# upload the file with the relevant gsutil command
gsutil cp [Replace with OBJECT_LOCATION] gs://[REPLACE with DESTINATION_BUCKET_NAME]/
The following cron schedule expression: */20 * * * * should trigger the job every 20 minutes, so edit the crontab file with crontab –e and add the following line:
*/20 * * * * [PATH-TO-SCRIPT]/[NAME-OF-SCRIPT].sh >/dev/null 2>&1
The following site is an excellent resource for you to calculate the cron schedule expression and this is a very nice tutorial on how to set up a cronjob with linux.

Related

AWS CLI command doesn't work from crontab

I have a CentOS 7.9 machine with AWS CLI installed on it. I can successfully run this command on its own: aws s3 sync s3://bucketname /data/bucketname/ but when I try to run it via crontab, it doesn't work. All I see in the /var/log/cron is that the command ran, but there is no data in that /data/bucketname directory.
This is what my crontab looks like:
*/5 * * * * sh /root/script.sh
And this is what my script looks like:
#!/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/opt/aws/bin
SHELL=/bin/bash
/bin/aws s3 sync s3://bucketname /data/bucketname/
which aws returns /bin/aws so I tried to add that to my script.sh file but that did not make a difference. Any help would be much appreciated.

How do I ensure crontab commands run once on AWS when there may be multiple instances?

I have a Django project running on AWS using Elastic Beanstalk. It can have between 1 and 6 instances running.
I load a crontab file to run some management/commands overnight. I have this config file:
container_commands:
01_cron_job:
command: "cp .ebextensions/crontab.txt /etc/cron.d/my_cron_jobs && chmod 644 /etc/cron.d/my_cron_jobs"
#leader_only: true
The file copied across looks like:
# Set the cron to run with utf8 encoding
PYTHONIOENCODING=utf8
# Specify where to send email
MAILTO="me#gmail.com"
1 0 * * * root source /opt/python/current/env && nice /opt/python/current/app/src/manage.py clearsessions
15 0 * * * root source /opt/python/current/env && nice /opt/python/current/app/src/manage.py update_summary_stats >> /opt/python/log/update_summary_stats.log 2>&1
# this file needs a blank space as the last line otherwise it will fail
Within the config file, if I set leader_only to false then the command doesn't run if the leader instance gets deleted at some point (for example because another instance was added during peak times and the leader deleted when it quietened). If I set leader_only to true then the crontab commands run on every instance.
What is the best way to set up crontab on AWS Elastic Beanstalk to only run once irrespective of the number of instances? Thank you
You could create a lock file (perhaps locally on a shared EFS mount, or externally using a service such as DynamoDB with Transactional Consistency or S3).
When your application creates this lock file it could then continue as normal, however should the file exist you would skip the script.
By doing this it reduces the chance of a collision, however I would also recommend adding some jitter to the start of the script (add a sleep for a random amount of seconds) to reduce the chance further that the scripts will attempt to create this lockfile at the same time.

How can i write crontab script for .sh file of copying the file from AWS s3 folder to my local ec2 folder?

I want to write the crontab script for copy the file from AWS s3 folder to my local aws ec2 . I can just run the command at my linux command by
aws s3 cp s3://sftp-test/test.txt /u02/app/oracle/test/
So the test.txt file will be copy from s3 to my local u02 folder. But when i write at my command line at .sh file and then it does not work. First i create test.sh file by
#!/bin/sh
*/5 * * * * aws s3 cp s3://sftp-customs/test.txt /u02/app/oracle/test/
Second i call that crontab by
crontab test.sh
Then i can see the my script when i used
crontab -l
But overall it does not work. I don't received any file from server. Thanks for your time all. Please help with some advise.
The shell file must be like:
#!/bin/sh
source ~/.bash_profile #or .bashrc
aws s3 cp s3://sftp-customs/test.txt /u02/app/oracle/test/
and in cron (using `crontab -e) you should enter
*/5 * * * * /path/to/test.sh
alternatively create new file (test.txt for example) with content:
*/5 * * * * /path/to/test.sh
and execute command:
crontab test.txt
NB! The last command will replace your entire cron

Getting Data From A Specific Website Using Google Cloud

I have a machine learning project and I have to get data from a website every 15 minutes. And I cannot use my own computer so I will use Google cloud. I am trying to use Google Compute Engine and I have a script for getting data (here is the link: https://github.com/BurkayKirnik/Automatic-Crypto-Currency-Data-Getter/blob/master/code.py). This script gets data every 15 mins and writes it down to csv files. I can run this code by opening an SSH terminal and executing it from there but it stops working when I close the terminal. I tried to run it by executing it in startup script but it doesn't work this way too. How can I run this and save the csv files? BTW I have to install an API to run the code and I am doing it in startup script. There is no problem in this part.
Instances running in Google Cloud Platform can be configured with the same tools available in the operating system that they are running. If your instance is a Linux instance, the best method would be to use a cronjob to execute your script repeatedly at your chosen interval.
Once you have accessed the instance via SSH, you can open the crontab configuration file by running the following command:
$ crontab -e
The above command will provide access to your personal crontab configuration (for the user you are logged in as). If you want to run the script as root you can use this instead:
$ sudo crontab -e
You can now edit the crontab configuration and add an entry that tells cron to execute your script at your required interval (in your case every 15 minutes).
Therefore, your crontab entry should look something like this:
*/15 * * * * /path/to/you/script.sh
Notice the first entry is for minutes, so by using the */15, you are telling the cron daemon to execute the script once every 15 minutes.
Once you have edited the crontab configuration file, it is a good idea to restart the cron daemon to ensure the change you made will take place. To do this you can run:
$ sudo service cron restart
If you would like to check the status to ensure the cron service is running you can run:
$ sudo service cron status
You script will now execute every 15 minutes.
In terms of storing the CSV files, you could either program your script to store them on the instance, or an alternative would be to use Google Cloud Storage bucket. File can be copied to buckets easily by making use of the gsutil (part of Cloud SDK) command as described here. It's also possible to mount buckets as a file system as described here.

Amazon Web Services - cronjob not running every minute

I have a cronjob defined on an AWS ec2 (so it uses crontab). I did crontab -e and added this line into it:
*/1 * * * * /usr/bin/php /path/to/file/file.php
I saved it with vim and it says crontab: installing new crontab. I also made the permissions on the file 755, using chmod, and I put the proper shebang (#!/usr/bin/php -q) at the top of file.php. The script should update a database every minute, but I am not seeing the database get updated at all. What did I do wrong?
logging example:
*/1 * * * * >> /var/log//your_cron.log 2>&1