I had a question about Amazon RDS. I only need the database online for about 2 hours a day but I am dealing with quite a large database at around 1gb.
I have two main questions:
Can I automate bringing my RDS database online and offline via scripts to save money?
When I put a RDS offline to stop the "work hours" counter running and billing me, when I bring it back online will it still have the same content (i.e will all my data stay there, or will it have to be a blank DB?). If so, is there any way around this rather than backing up to S3 and reimporting it every time?

If you wish to do this programatically,
Snapshot the RDS instance using rds-create-db-snapshot
Delete the running instance using rds-delete-db-instance
Restore the database from the snapshot using rds-restore-db-instance-from-db-snapshot
You may also do all of this from the AWS Web Console as well, if you wish to do this manually.

You can start EC2* instances using shell scripts, so I guess that you can as well for RDS.
But unlike EC2*, you cannot "stop" an RDS instance without "destroying" it. You need to create a DB snapshot when terminating your database. You will use this DB snapshot when re-starting the database.
*EC2 : Elastic Computing, renting a virtual server or a server.

Here's a script that will stop/start/reboot an RDS instance
# usage ./ lhdevices start
# export vars to run RDS CLI
export JAVA_HOME=/usr;
export AWS_RDS_HOME=/home/mysql/RDSCli-1.15.001;
export PATH=$PATH:/home/mysql/RDSCli-1.15.001/bin;
export EC2_REGION=us-east-1;
export AWS_CREDENTIAL_FILE=/home/mysql/RDSCli-1.15.001/keysLightaria.txt;
if [ $# -ne 2 ]
echo "Usage: $0 {MySQL-Instance Name} {Action either start, stop or reboot}"
echo ""
exit 1
shopt -s nocasematch
if [[ $ACTION == 'start' ]]
echo "This will $ACTION a MySQL Instance"
rds-restore-db-instance-from-db-snapshot lhdevices
--db-snapshot-identifier dbStart --availability-zone us-east-1a
--db-instance-class db.m1.small
echo "Sleeping while instance is created"
sleep 10m
echo "waking..."
rds-modify-db-instance lhdevices --db-security-groups kfarrell
echo "Sleeping while instance is modified for security group name"
sleep 5m
echo "waking..."
elif [[ $ACTION == 'stop' ]]
echo "This will $ACTION a MySQL Instance"
yes | rds-delete-db-snapshot dbStart
echo "Sleeping while deleting old snapshot "
sleep 10m
#rds-create-db-snapshot lhdevices --db-snapshot-identifier dbStart
# echo "Sleeping while creating new snapshot "
# sleep 10m
# echo "waking...."
#rds-delete-db-instance lhdevices --force --skip-final-snapshot
rds-delete-db-instance lhdevices --force --final-db-snapshot-identifier dbStart
echo "Sleeping while instance is deleted"
sleep 10m
echo "waking...."
elif [[ $ACTION == 'reboot' ]]
echo "This will $ACTION a MySQL Instance"
rds-reboot-db-instance lhdevices ;
echo "Sleeping while Instance is rebooted"
sleep 5m
echo "waking...."
echo "Did not recognize command: $ACTION"
echo "Usage: $0 {MySQL-Instance Name} {Action either start, stop or reboot}"
shopt -u nocasematch

Amazon recently updated their CLI to include a way to start and stop RDS instances. stop-db-instance and start-db-instance detail the steps needed to perform these operations.


Best method to renew periodically your AWS access keys

I realized I never renewed muy AWS access keys, and they are credentials that should be renewed periodically in order to avoid attacks.
So... which is the best way to renew them automatically without any impact, if they are used just form my laptop?
Finally I created this bash script:
set -e # exit on non-zero command
set -u # force vars to be declared
set -o pipefail # avoids errors in pipelines to be masked
echo "retrieving current account id..."
current_access_key_list=$(aws iam list-access-keys | jq -r '.AccessKeyMetadata')
number_of_current_access_keys=$(echo $current_access_key_list| jq length)
current_access_key=$(echo $current_access_key_list | jq -r '.[]|.AccessKeyId')
if [[ ! "$number_of_current_access_keys" == "1" ]]; then
echo "ERROR: There already are more than 1 access key"
exit 1
echo "Current access key is ${current_access_key}"
echo "creating a new access key..."
new_access_key=$(aws iam create-access-key)
access_key=$(echo $new_access_key| jq -r '.AccessKey.AccessKeyId')
access_key_secret=$(echo $new_access_key| jq -r '.AccessKey.SecretAccessKey')
echo "New access key is: ${access_key}"
echo "performing credentials backup..."
cp ~/.aws/credentials ~/.aws/credentials.bak
echo "changing local credentials..."
aws configure set aws_access_key_id "${access_key}"
aws configure set aws_secret_access_key "${access_key_secret}"
echo "wait 10 seconds to ensure new access_key is set..."
sleep 10
echo "check new credentials work fine"
aws iam get-user | jq -r '.User'
echo "removing old access key $current_access_key"
aws iam delete-access-key --access-key-id $current_access_key
echo "Congrats. You are using the new credentials."
echo "Feel free to remove the backup file:"
echo " rm ~/.aws/credentials.bak"
I placed that script into ~/.local/bin to ensure it is in the path, and then I added these lines at the end of my .bashrc and/or .zshrc files:
# rotate AWS keys if they are too old
if [[ -n "$(find ~/.aws -mtime +30 -name credentials)" ]]; then
AWS_PROFILE=profile-1 rotate_aws_access_key
AWS_PROFILE=profile-2 rotate_aws_access_key
So any time I open a terminal (what is really frequently) it will check if the credentials file was not modified in more than one month and will try to renew my credentials automatically.
The worst thing that might happen is that it could create the new access key and not update my script, what should force me to remove it by hand.

Environment Variables in newest AWS EC2 instance

I am trying to get ENVIRONMENT Variables into the EC2 instance (trying to run a django app on Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type ami-0ff8a91507f77f867 ). How do you get them in the newest version of amazon's linux, or get the logging so it can be traced.
user-data text (modified from here):
#trying to get a file made
touch /tmp/testfile.txt
cat 'This and that' > /tmp/testfile.txt
#trying to log
echo 'Woot!' > /home/ec2-user/user-script-output.txt
#Trying to get the output logged to see what is going wrong
exec > >(tee /var/log/user-data.log|logger -t user-data ) 2>&1
#trying to log
#trying to store the ENVIRONMENT VARIABLES
# Functions
get_parameter_store_tags() {
echo $($AWS ssm get-parameters-by-path --with-decryption --path ${PARAMETER_PATH} --region ${REGION})
params_to_env () {
# If .Ta1gs does not exist we assume ssm Parameteres object.
for key in $(echo $params | /usr/bin/jq -r ".[][].${SELECTOR}"); do
value=$(echo $params | /usr/bin/jq -r ".[][] | select(.${SELECTOR}==\"$key\") | .Value")
key=$(echo "${key##*/}" | /usr/bin/tr ':' '_' | /usr/bin/tr '-' '_' | /usr/bin/tr '[:lower:]' '[:upper:]')
export $key="$value"
echo "$key=$value"
# Get TAGS
if [ -z "$PARAMETER_PATH" ]
echo "Please provide a parameter store path. -p option"
exit 1
TAGS=$(get_parameter_store_tags ${PARAMETER_PATH} ${REGION})
echo "Tags fetched via ssm from ${PARAMETER_PATH} ${REGION}"
echo "Adding new variables..."
params_to_env "$TAGS"
Notes -
What i think i know but am unsure
the user-data script is only loaded when it is created, not when I stop and then start mentioned here (although it also says [i think outdated] that the output is logged to /var/log/cloud-init-output.log )
I may not be starting the instance correctly
I don't know where to store the bash script so that it can be executed
What I have verified
the user-data text is on the instance by ssh-ing in and curl shows the current text (#!/bin/bash …)
What Ive tried
editing rc.local directly to export AWS_ACCESS_KEY_ID='JEFEJEFEJEFEJEFE' … and the like
putting them in the AWS Parameter Store (and can see them via the correct call, I just can't trace getting them into the EC2 instance without logs or confirming if the user-data is getting run)
putting ENV variables in Tags and importing them as mentioned here:
tried outputting the logs to other files as suggested here (Not seeing any log files in the ssh instance or on the system log)
viewing the System Log on the aws webpage to see any errors/logs via selecting the instance -> 'Actions' -> 'Instance Settings' -> 'Get System Log' (not seeing any commands run or log statements [only 1 unrelated word of user])

GCP Dataproc - configure YARN fair scheduler

I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.
I have found this solution, How to configure monopolistic FIFO application queue in YARN? , but as I'm always creating a new cluster, I needed to automatize this. I have added this to cluster creation:
"softwareConfig": {
"properties": {
with another line in init action script:
sudo echo "<allocations><queueMaxAppsDefault>1</queueMaxAppsDefault></allocations>" > /etc/hadoop/conf/fair-scheduler.xml
and the cluster tells me this when I fetch its config:
'softwareConfig': {
'imageVersion': '1.2.27',
'properties': {
'capacity-scheduler:yarn.scheduler.capacity.root.default.ordering-policy': 'fair',
'': '134217728',
'': 'false',
'': '-Xmx4096m',
'': '5120',
'': '-Xmx4096m',
'distcp:mapreduce.reduce.memory.mb': '5120',
'hdfs:dfs.datanode.address': '',
'hdfs:dfs.datanode.http.address': '',
'hdfs:dfs.datanode.https.address': '',
'hdfs:dfs.datanode.ipc.address': '',
'hdfs:dfs.namenode.http-address': '',
'hdfs:dfs.namenode.https-address': '',
'hdfs:dfs.namenode.secondary.http-address': '',
'hdfs:dfs.namenode.secondary.https-address': '',
'mapred:mapreduce.job.maps': '189',
'mapred:mapreduce.job.reduce.slowstart.completedmaps': '0.95',
'mapred:mapreduce.job.reduces': '63',
'': '1',
'': '-Xmx4096m',
'': '5120',
'mapred:mapreduce.reduce.cpu.vcores': '1',
'': '-Xmx4096m',
'mapred:mapreduce.reduce.memory.mb': '5120',
'': '256',
'': '-Xmx4096m',
'': '1',
'': '5120',
'spark-env:SPARK_DAEMON_MEMORY': '3840m',
'spark:spark.driver.maxResultSize': '1920m',
'spark:spark.driver.memory': '3840m',
'spark:spark.executor.cores': '8',
'spark:spark.executor.memory': '37237m',
'': '640m',
'yarn:yarn.nodemanager.resource.memory-mb': '81920',
'yarn:yarn.resourcemanager.scheduler.class': 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler',
'yarn:yarn.scheduler.fair.allocation.file': '$HADOOP_CONF_DIR/fair-scheduler.xml',
'yarn:yarn.scheduler.fair.user-as-default-queue': 'false',
'yarn:yarn.scheduler.maximum-allocation-mb': '81920',
'yarn:yarn.scheduler.minimum-allocation-mb': '1024'
The file fair-scheduler.xml also contains the specified code (everything is in one line, but I don't think this could be the problem)
After all this, the cluster still acts like if the capacity scheduler was in charge. No idea why. Any recommendation would help.
as init actions script is running after the cluster is created, the yarn service is already running in the time when the script modify the yarn-site.xml.
So after modifying the xml config file and creating the other xml file, the yarn service needs to be restarted.
It can be done using this command:
sudo systemctl restart hadoop-yarn-resourcemanager.service
Also, since the $HADOOP_CONF_DIR was not set (I thought it should be), its needed to input the whole path to the file. But, after that, the initial YARN service won't start, because it can't find the file that is created later in init actions script. So, what I did is to add the last few lines to yarn-site.xml in the init actions script as well.
The code for init actions script is the following:
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
echo " <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml
sed -i '$ d' /etc/hadoop/conf/yarn-site.xml
echo " <property>" >> /etc/hadoop/conf/yarn-site.xml
echo " <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
echo " <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
echo " </property>" >> /etc/hadoop/conf/yarn-site.xml
echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
systemctl restart hadoop-yarn-resourcemanager.service

Best way to automatically move backups of web server to an AWS sever

I have a web server that produces .tar.zg backup files that I want to automatically transfer to an AWS server.
To accomplish this I have tried to write a bash script on the AWS server that will automatically check for a new backup at the web server and make a copy of the backup if it is more recent (preserving timestamps).
Is there an easier or more robust way to go about this?
Am I correct in my FTP script syntax?
# Credentials to access other machine
# path to the remoteBackups
# Loops indefinitly
#while [[ true ]]
# FTP to remote host and get the name most recent backup
ftp -inv $HOST<<-EOT
#Store name of most recent backup to FILE
# does this work or will it just save it to a variable FILE on the
remote machine
FILE=`ls -t ~/Desktop/backups/*.tar.gz | head -1`
# For testing
echo $FILE
# Copy (preserving modification dates) file to the local remote
backups folder on aws server
#scp -p -i <.pem> $FILE $remoteBackups
# Get the most recent back up from both directories
latestLocal=`ls -t ~/intranetBackups/*.tar.gz | head -1`
latestRemote=`ls -t $remoteBackups/*.tar.gz | head -1`
# For testing
echo $latestLocal
echo $latestRemote
# If the backup from the remote is newer then save to backups and
sleep for 15 days
if [[ $latestLocal -ot $latestRemote ]]
echo Transferring backup from $latestRemote to $latestLocal
sleep 15d
echo No new backup file found
sleep 1d
# If there are more than 20 backups delete the oldest
if [[ `ls -1 ~/intranetBackups | wc -l` -ge 20 ]]
rm `ls -t ~/intranetBackuos | tail -1`
echo removed the oldest backup
echo no file to be removed

Delete older than month AWS EC2 snapshots

Is this below given command will work or not to delete older than month AWS EC2 Snapshot.
aws describe-snapshots | grep -v (date +%Y-%m-) | grep snap- | awk '{print $2}' | xargs -n 1 -t aws delete-snapshot
Your command won't work mostly because of a typo: aws describe-snapshots should be aws ec2 describe-snapshots.
Anyway, you can do this without any other tools than aws:
snapshots_to_delete=$(aws ec2 describe-snapshots --owner-ids xxxxxxxxxxxx --query 'Snapshots[?StartTime<=`2017-02-15`].SnapshotId' --output text)
echo "List of snapshots to delete: $snapshots_to_delete"
# actual deletion
for snap in $snapshots_to_delete; do
aws ec2 delete-snapshot --snapshot-id $snap
Make sure you always know what are you deleting. By echo $snap, for example.
Also, adding --dry-run to aws ec2 delete-snapshot can show you that there are no errors in request.
There are two things to pay attention at in the first command:
--owner-ids - you account unique id. Could easily be found manually in top right corner of AWS Console: Support->Support Center->Account Number xxxxxxxxxxxx
--query - JMESPath query which gets only snapshots created later than specified date (e.g.: 2017-02-15): Snapshots[?StartTime>=`2017-02-15`].SnapshotId
+1 to #roman-zhuzha for getting me close. i did have trouble when $snapshots_to_delete didn't parse into a long string of snapshots separated by whitespaces.
this script, below, does parse them into a long string of snapshot ids separated by whitespaces on my Ubuntu (trusty) 14.04 in bash with awscli 1.16:
#!/usr/bin/env bash
d=$(date +'%Y-%m-%d' -d '1 month ago')
if [ $echo_progress -eq 1 ]
echo "Date of snapshots to delete (if older than): $d"
snapshots_to_delete=$(aws ec2 describe-snapshots \
--owner-ids xxxxxxxxxxxxx \
--output text \
--query "Snapshots[?StartTime<'$d'].SnapshotId" \
if [ $echo_progress -eq 1 ]
echo "List of snapshots to delete: $snapshots_to_delete"
for oldsnap in $snapshots_to_delete; do
# some $oldsnaps will be in use, so you can't delete them
# for "snap-a1234xyz" currently in use by "ami-zyx4321ab"
# (and others it can't delete) add conditionals like this
if [ "$oldsnap" = "snap-a1234xyz" ] ||
[ "$oldsnap" = "snap-c1234abc" ]
if [ $echo_progress -eq 1 ]
echo "skipping $oldsnap known to be in use by an ami"
if [ $echo_progress -eq 1 ]
echo "deleting $oldsnap"
if [ $dry_run -eq 1 ]
# dryrun will not actually delete the snapshots
aws ec2 delete-snapshot --snapshot-id $oldsnap --dry-run
aws ec2 delete-snapshot --snapshot-id $oldsnap
Switch these variables as necesssary:
dry_run=1 # set this to 0 to actually delete
echo_progress=1 # set this to 0 to not echo stmnts
Change the date -d string to a human readable version of the number of days, months, or years back you want to delete "older than":
d=$(date +'%Y-%m-%d' -d '15 days ago') # half a month
Find your account id and update these XXXX's to that number:
--owner-ids xxxxxxxxxxxxx \
Here is an example of where you can find that number:
If running this in a cron, you only want to see errors and warnings. A frequent warning will be that there are snapshots in use. The two example snapshot id's (snap-a1234xyz, snap-c1234abc) are ignored since they would otherwise print something like:
An error occurred (InvalidSnapshot.InUse) when calling the DeleteSnapshot operation: The snapshot snap-a1234xyz is currently in use by ami-zyx4321ab
See the comments near "snap-a1234xyx" example snapshot id for how to handle this output.
And don't forget to check on the handy examples and references in the 1.16 aws cli describe-snapshots manual.
you can use 'self' in '--owner-ids' and delete the snapshots created before a specific date (e.g. 2018-01-01) with this one-liner command:
for i in $(aws ec2 describe-snapshots --owner-ids self --query 'Snapshots[?StartTime<=`2018-01-01`].SnapshotId' --output text); do echo Deleting $i; aws ec2 delete-snapshot --snapshot-id $i; sleep 1; done;
Date condition must be within Parenthesis ()
aws ec2 describe-snapshots \
--owner-ids 012345678910 \
--query "Snapshots[?(StartTime<='2020-03-31')].[SnapshotId]"