Getting Data From A Specific Website Using Google Cloud - google-cloud-platform

I have a machine learning project and I have to get data from a website every 15 minutes. And I cannot use my own computer so I will use Google cloud. I am trying to use Google Compute Engine and I have a script for getting data (here is the link: https://github.com/BurkayKirnik/Automatic-Crypto-Currency-Data-Getter/blob/master/code.py). This script gets data every 15 mins and writes it down to csv files. I can run this code by opening an SSH terminal and executing it from there but it stops working when I close the terminal. I tried to run it by executing it in startup script but it doesn't work this way too. How can I run this and save the csv files? BTW I have to install an API to run the code and I am doing it in startup script. There is no problem in this part.

Instances running in Google Cloud Platform can be configured with the same tools available in the operating system that they are running. If your instance is a Linux instance, the best method would be to use a cronjob to execute your script repeatedly at your chosen interval.
Once you have accessed the instance via SSH, you can open the crontab configuration file by running the following command:
$ crontab -e
The above command will provide access to your personal crontab configuration (for the user you are logged in as). If you want to run the script as root you can use this instead:
$ sudo crontab -e
You can now edit the crontab configuration and add an entry that tells cron to execute your script at your required interval (in your case every 15 minutes).
Therefore, your crontab entry should look something like this:
*/15 * * * * /path/to/you/script.sh
Notice the first entry is for minutes, so by using the */15, you are telling the cron daemon to execute the script once every 15 minutes.
Once you have edited the crontab configuration file, it is a good idea to restart the cron daemon to ensure the change you made will take place. To do this you can run:
$ sudo service cron restart
If you would like to check the status to ensure the cron service is running you can run:
$ sudo service cron status
You script will now execute every 15 minutes.
In terms of storing the CSV files, you could either program your script to store them on the instance, or an alternative would be to use Google Cloud Storage bucket. File can be copied to buckets easily by making use of the gsutil (part of Cloud SDK) command as described here. It's also possible to mount buckets as a file system as described here.

Related

Startup script NOT running in instance

I have a instance where I have some Flask web app. In order the app to start when the VM is booted I have included a startup script:
#!/bin/sh
cd documentai_webapp
cd docai_webapp_instance_gcp
sudo python3 server.py
However, this is not at all executed, anyone can help me?thanks!
PD: When I execute this script manually within the VM it works perfectly fine
As context it is necessary contemplate:
For Linux startup scripts, you can use bash or non-bash file. To use a non-bash file, designate the interpreter by adding a #! to the top of the file. For example, to use a Python 3 startup script, add #! /usr/bin/python3 to the top of the file.
If you specify a startup script by using one of the procedures in this document, Compute Engine does the following:
Copies the startup script to the VM
Sets run permissions on the startup script
Runs the startup script as the root user when the VM boots (missing step from #Andoni)
For information about the various tasks related to startup scripts and when to perform each one, see the Overview.

How to run shell script in EC2 at a specific time?

I want to run shell srcipt in EC2 Instance when i want to do.
so I thought 3 ways how can i do this problem.
To run shell script in EC2 from Lambda at a specific time using EventBridge.
https://aws.amazon.com/ko/blogs/compute/scheduling-ssh-jobs-using-aws-lambda/
To run SSM Run Command at a specific time using EventBridge
https://medium.com/the-cloud-architect/creating-your-own-chaos-monkey-with-aws-systems-manager-automation-6ad2b06acf20
To run shell script with cron by installing cron tab package on EC2
https://docs.aws.amazon.com/opsworks/latest/userguide/workingcookbook-extend-cron.html
Which method is the best in terms of performance or maintenance?
In my opinion, depending on the complexity of what you want to run, crontab is easy and lightweight. I am not entirely positive, but I'm pretty sure crontab is installed on EC2 by default.
To view the current scheduled cron entries, you can run the following: crontab -l
To edit the cron jobs, run the following: crontab -e
Note: It will use the default EDITOR which is typically either vi or vim.
You can find out more about the syntax for crontab here.

Run command from terminal window in AWS Instance at specified time or on start up

I have a AWS Cloud9 Instance that starts running at 11:52 PM MST and stops running at 11:59 PM MST. I have a dockerfile within the Instance that when ran with the correct mount will run a set of c++ .cpp files that collect live web data. The ultimate goal of this instance is to be fully automatic so that every night it collects the live web data for that date, hence why the Instance is open at the very end of the day each night. Is it possible to have my AWS Instance run a given command in a terminal window at a certain time, say 11:55 PM or even upon startup. So at the time, or at startup, the command "docker run -it...." is ran within the instance.
Is automating this process possible? I have looked into CloudWatch events and think that might be the best way to go about automating this process but I am not quite sure how I would create a rule to fulfill the job. If it is not possible to automate a certain command within a terminal window, could I automate the dockerfile to run at a certain time?
ofcourse you can automate running of commands not just docker but for the fact any commands using cron daemon. all you need to do is place your command in shell script file say doc.sh in your desired directory.
ssh into your instance
open terminal and type crontab -e
enter the following details in this manner a b c d e /directory/command
where a -Minute, b-hour c-day d-month e-day of the week
the /directory/command specifies the location and script you want to run.
for more reference cron examples,https://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/
If you have a dockerfile that you want to run for a few minutes a day, you should look into Fargate. You can schedule an event with Cloudwatch, run the container and then shut it down when it's done.
It will probably cost around $0.01/day to run this.

Google cloud compute startup script ignored with no logging

I have a standard Debian 8.9 instance on google cloud compute (GCE) where my startup script is ignored.
In the custom metadata field, for startup-script, I am trying to run an Rscript (which is used for batch execution of R files), followed by a system shutdown, with the following:
#! /bin/bash
sudo /usr/bin/Rscript /home/myuser/launch_script.R
sudo shutdown -h now
Starting the instance is immediately followed by a shutdown and the Rscript is ignored. Removing the last line to shutdown causes the GCE instance to start, but the Rscript to be ignored. Running just "sudo /usr/bin/Rscript /home/myuser/launch_script.R" from the terminal results in the script being run. It has a chmod of 755, so I don't think this is a permissions issue.
In addition to this problem, I have read elsewhere that logging should happen in /var/log/, but there is nothing there. Instead, I have a bunch of log files (that only contain the start-up script and nothing else) in the root of my instance:
I got in touch with Google cloud support, who gave the following response:
script definition is kept under /var/run/google.startup.script
If the script does not run initially, you can force it manually with : $ sudo google_metadata_script_runner --script-type startup # for Debian, or # sudo /usr/share/google/run-startup-scripts # on Ubuntu and older images
I'm posting this information here, because it is not in their documentation (as of August 2017). I'm not sure how helpful it is, since the google.startup.script didn't exist in my case (using the latest Debian image on GCE), but I did run the other commands.
However, I think my main issues were:
I was using autossh to connect to a remote database. The startup-script was running before autossh. Building a 40 second delay into the script and running the script as a user (not sudo-type root) seems to have solved this problem for now. Autossh was being run as the main user, which I think gets loaded before lower-privilege user-defined scripts get loaded.
I was using some gcloud commands from the user account which had its own authentication issues. Running gcloud auth login as the user and ensuring correct permissions on my private key solved this.
Always remember to check the messages and syslog files in /var/log for troubleshooting. This allowed me to see the order of things being loaded at system-boot.

Building project from cron task

When I build project from terminal by using 'xcodebuild' command I succeed, but when I try to do run same script from cron task I receive error
"Code Sign error: The identity '****' doesn't match any valid certificate/private key pair in the default keychain"
I think problem is in settings and permissions of crontab utility, it seems crontab does not see my keychain
Can anyone provide me terminal command how to make my keychain visible for crontab
I encountered a similar issue with trying to build nightly via cron. The only resolution I found was to create a plist in /Library/LaunchDaemons/ and load it via launchctl. The key necessary is "SessionCreate" otherwise you will quickly run in to problems similar to what was encountered with trying to use cron -- namely that your user login.keychain is not available to the process. "SessionCreate" is similar to "su -l" in that (as far as I understand) it simulates a login and thus default keychains you expect will be available; otherwise, you are stuck with only the System keychain despite the task running as your user.
I found the answers (though not the top answer currently) here useful in troublw shooting this issue: Missing certificates and keys in the keychain while using Jenkins/Hudson as Continuous Integration for iOS and Mac development
You execute your cron job with which account ?
most probably the problem !!
You can add
echo `whoami`
at the beginning of your script to see with which user the script is launched.
Also when a Bash script is launched from cron, it don't use the same environment variable (non login shell) as when you launch it as a user.
When the script launches from cron, it doesn't load your $HOME/.profile (or .bash_profile). Anything you run from cron has to be 100% self-sufficient in terms of it's environment. I'd suggest you make yourself a file called something like "set_build_env.sh" It should contain everything from your .profile that you need to build, such as $PATH, $HOME, $CLASSPATH etc. Then in your build script, load set_build_env.sh using the dot notation or source cmd as ericc said. You should also remove the build-specific lines from your.profile and then source set_build_env from there too so only one place to maintain. Example:
source /home/dmitry/set_build_env.sh #absolute path
. /home/dmitry/set_build_env.sh #dot-space notation same as "source"