check if google storage bucket is successfully created/is not empty using bash with large bucket size - google-cloud-platform

Currently I am using:
is_created=$(gsutil du -s ${bucket_name} 2> /dev/null || true)
if [ -z "${is_created}" ]; then
gsutil mb -p ${GCLOUD_PROJECT} -c regional -l ${gcloud_region} ${bucket_name}
fi
Yet since my bucket size is large and it takes a long time to load the result. Is there another way around?
PS I've tried:
gsutil -m du -s
and did not see a noticeable difference.

The gsutil du command is intended to obtain info about the size of the bucket objects. Sure, as a side effect you can use it to determine if the bucket is accessible and not empty (or not).
Note that I'm saying accessible, not created, as you won't be able to tell the difference between bucket not created, created but not accessible (due to permissions or some failure of some sort) or accessible but empty. Except for the last/empty case attempting to create the bucket will fail.
From performance/duration perspective gsutil du isn't that great, it can be quite slow in buckets with a lot of objects in them as it spends time with size calculations, which are irrelevant for your goal.
One thing to try is the gsutil ls command instead, intended to obtain just the list of objects in the bucket, which typically uses less CPU than gsutil du (no size info collection/calculations). Use it without options to prevent collecting additional object info unnecessarily, just the object name should be enough for the empty check.
Something along these lines maybe:
missing=$(gsutil ls ${bucket_name} |& grep BucketNotFound | wc -l)
if [ ${missing} == 1 ]; then
gsutil mb -p ${GCLOUD_PROJECT} -c regional -l ${gcloud_region} ${bucket_name}
fi
Or, even faster on buckets with many objects:
created=$(gsutil ls -p ${GCLOUD_PROJECT} | grep ${bucket_name} | wc -l)
if [ ${created} == 0 ]; then
gsutil mb -p ${GCLOUD_PROJECT} -c regional -l ${gcloud_region} ${bucket_name}
fi

Related

Find how big a work directory can get during execution- linux

I have a cron job and this cron job is doing something with lots of data and then delete all the temp files it creates. during the execution, I get 'ERROR: Insufficient space in file WORK.AIB_CUSTOMER_DATA.DATA.' the current work directory has 50G free, when I run the code in another directory with 170G free space, I don't get the error, I want to track the size of working directory during the execution.
I'm afraid I might not fully understand your problem.
In order to get an understanding on how fast is it growing in terms of size you could run a simple script like:
#!/bin/bash
while true
do
#uncomment this to check all partitions of the system.
#df -h >> log.log
#uncomment this to check the files in the current folder.
#du -sh * >> log.log
sleep 1
done
Then analyze the logs and see the increase in size.
I wrote this script and let it run during the job execution to monitor the directory size and get the maximum amount of size for this work directory.
#!/bin/bash
Max=0
while true
do
SIZE=`du -sh -B1 /data/work/EXAMPLE_work* | awk '{print $1}' `
echo size: $SIZE
echo max: $Max
if [ "$SIZE" -ge $Max ]
then
echo "big size: $SIZE" > /home/mmm/Biggestsize.txt
Max=$SIZE
else
echo "small size: $SIZE" > /home/mmm/sizeSmall.txt
fi
done

Create cron in chef

I want to create Cron in chef witch they verify size of the log if it's > 30mb it will delete it, here is my code:
cron_d 'ganglia_tomcat_thread_max' do
hour '0'
minute '1'
command "rm - f /srv/node/current/app/log/simplesamlphp.log"
only_if { ::File.size('/srv/node/current/app/log/simplesamlphp.log').to_f / 1024000 > 30 }
end
Can you help me in it please
Welcome to Stackoverflow!
I suggest you to go with existing tools like "logrotate". There is a chef cookbook available to manage logrotate.
Please note, that "cron" in chef manages the system cron service which runs independently of chef. You'll have to do the file size check within the "command". It's also better to use the cron_d resource as documented here.
In the way you create cron_d resource it will add cron task only when your log file has size greater than 30mb. In all other cases cron_d will be not created.
You can check that ruby code
File.size('file').to_f / 2**20
to get the file size in Megabytes - there is a slight difference in the result I believe that is more correct.
so you can go wirh 2 solutions for your specific case
create new cron_d resource when log file is less than 30 mb to remove existing cron and provision your node periodically
move the check of the file size in the command with bash and glue with && - in that case file will be dated only if size is greater than 30mb. something like that
du -k file.txt | cut -f1
will return size of the file in bytes
To me also correct way to to that is to use logrotate service and chef recipe for that.

How to implement blue/green deployments in AWS with Terraform without losing capacity

I have seen multiple articles discussing blue/green deployments and they consistently involve forcing recreation of the Launch Configuration and the Autoscaling Group. For example:
https://groups.google.com/forum/#!msg/terraform-tool/7Gdhv1OAc80/iNQ93riiLwAJ
This works great in general except that the desired capacity of the ASG gets reset to the default. So if my cluster is under load then there will be a sudden drop in capacity.
My question is this: is there a way to execute a Terraform blue/green deployment without a loss of capacity?
I don't have a full terraform-only solution to this.
The approach I have is to run a small script to get the current desired capacity, set a variable, and then use that variable in the asg.
handle-desired-capacity:
#echo "Handling current desired capacity"
#echo "---------------------------------"
#if [ "$(env)" == "" ]; then \
echo "Cannot continue without an environment"; \
exit -1; \
fi
$(eval DESIRED_CAPACITY := $(shell aws autoscaling describe-auto-scaling-groups --profile $(env) | jq -SMc '.AutoScalingGroups[] | select((.Tags[]|select(.Key=="Name")|.Value) | match("prod-asg-app")).DesiredCapacity'))
#if [ "$(DESIRED_CAPACITY)" == '' ]; then \
echo Could not determine desired capacity.; \
exit -1; \
fi
#if [ "$(DESIRED_CAPACITY)" -lt 2 -o "$(DESIRED_CAPACITY)" -gt 10 ]; then \
echo Can only deploy between 2 and 10 instances.; \
exit -1; \
fi
#echo "Desired Capacity is $(DESIRED_CAPACITY)"
#sed -i.bak 's!desired_capacity = [0-9]*!desired_capacity = $(DESIRED_CAPACITY)!g' $(env)/terraform.tfvars
#rm -f $(env)/terraform.tfvars.bak
#echo ""
Clearly, this is as ugly as it gets, but it does the job.
I am looking to see if we can get the name of the ASG as an output from the remote state that I can then use on the next run to get the desired capacity, but I'm struggling to understand this enough to make it useful.
As a second answer, I wrapped the AWSCLI + jq into a Terraform module.
https://registry.terraform.io/modules/digitickets/cli/aws/latest
module "current_desired_capacity" {
source = "digitickets/cli/aws"
assume_role_arn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/OrganizationAccountAccessRole"
role_session_name = "GettingDesiredCapacityFor${var.environment}"
aws_cli_commands = ["autoscaling", "describe-auto-scaling-groups"]
aws_cli_query = "AutoScalingGroups[?Tags[?Key==`Name`]|[?Value==`digitickets-${var.environment}-asg-app`]]|[0].DesiredCapacity"
}
and
module.current_desired_capacity.result gives you the current desired capacity of the ASG you have nominated in the aws_cli_query.
Again, this is quite ugly, but the formalisation of this means you can now access a LOT of properties from AWS that are not yet available within Terraform.
This is a gentle hack. No resources are passed around and it was written purely with read-only for single scalar values in mind, so please use it with care.
As the author, I'd be happy to explain anything about this via the GitHub Issues page at https://github.com/digitickets/terraform-aws-cli/issues

C++ program significantly slower when run in bash

I have a query regarding bash. I have been running some of my own C++ programs in conjunction with commercial programs and controlling their interaction (via input and output files) through Bash scripting. I am finding that if I run my c++ program alone in terminal it completes in around 10–15 seconds, but when I run the same through the bash script it can take up to 5 minutes to complete in each case.
I find using System Monitor that consistently 100% of one CPU is used when I run the program directly in terminal whereas when I run it in bash (in a loop) a maximum of 60% of CPU usage is recorded and seems to be linked to the longer completion time (although the average CPU usage is higher over the 4 processors).
This is quite frustrating as until recently this was not a problem.
An example of the code:
#!/usr/bin/bash
DIR="$1"
TRCKDIR=$DIR/TRCKRSLTS
STRUCTDIR=$DIR
SHRTTRCKDIR=$TRCKDIR/SHRT_TCK_FILES
VTAL=VTAL.png
VTAR=VTAR.png
NAL=$(find $STRUCTDIR | grep NAL)
NAR=$(find $STRUCTDIR | grep NAR)
AMYL=$(find $STRUCTDIR | grep AMYL)
AMYR=$(find $STRUCTDIR | grep AMYR)
TCKFLS=($(find $TRCKDIR -maxdepth 1 | grep .fls))
numTCKFLS=${#TCKFLS[#]}
for i in $(seq 0 $[numTCKFLS-1]); do
filenme=${TCKFLS[i]}
filenme=${filenme%.t*}
filenme=${filenme##*/}
if [[ "$filenme" == *VTAL* || "$filenme" == *VTA_L* ]]; then
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAL -ROI2 $NAL -op "$SHRTTRCKDIR"/"$filenme"_VTAL_NAL.fls
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAL -ROI2 $AMYL -op "$SHRTTRCKDIR"/"$filenme"_VTAL_AMYL.fls
fi
if [[ "$filenme" == *VTAR* || "$filenme" == *VTA_R* ]];then
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAR -ROI2 $NAR -op "$SHRTTRCKDIR"/"$filenme"_VTAR_NAR.fls
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAR -ROI2 $AMYR -op "$SHRTTRCKDIR"/"$filenme"_VTAR_AMYR.fls
fi
done

calculate mp3 average volume

I need to know the average volume of an mp3 file so that when I convert it to mp3 (at a different bitrate) I can scale the volume too, to normalize it...
Therefore I need a command line tool / ruby library that gives me the average volume in dB.
You can use sox (an open source command line audio tool http://sox.sourceforge.net/sox.html) to normalize and transcode your files at the same time.
EDIT
Looks like it doesn't have options for bit-rate. Anyway, sox is probably overkill if LAME does normalization.
You can use LAME to encode to mp3. It has options for normalization, scaling, and bitrate. LAME also compiles to virtually any platform.
I wrote a little wrapper script, based on the above input:
#!/bin/sh
# Get the current volume (will reset to this later).
current=`amixer -c 0 get Master 2>&1 |\
awk '/%/ {
p=substr($4,2,length($4)-2);
if( substr(p,length(p)) == "%" )
{
p = substr(p,1,length(p)-1)
}
print p
}'`
# Figure out how loud the track is. The normal amplitude for a track is 0.1.
# Ludicrously low values are 0.05, high is 0.37 (!!?)
rm -f /tmp/$$.out
/usr/bin/mplayer -vo null -ao pcm:file=/tmp/$$.out $1 >/dev/null 2>&1
if [ $? = 0 ] ; then
amplitude=`/usr/bin/sox /tmp/$$.out -n stat 2>&1 | awk '/RMS.+amplitude/ {print $NF}'`
fi
rm -f /tmp/$$.out
# Set an appropriate volume for the track.
to=`echo $current $amplitude | awk '{printf( "%.0f%%", $1 * 0.1/$2 );}'`
echo $current $amplitude | awk '{print "Amplitude:", $2, " Setting volume to:", 10/$2 "%, mixer volume:", $1 * 0.1/$2}'
amixer -c 0 set Master $to >/dev/null 2>&1
mplayer -quiet -cache 2500 $1
# Reset the volume for next time.
amixer -c 0 set Master "$current%" >/dev/null 2>&1
It takes an extra second to start up playing the file, and relies on alsamixer to adjust the volume, but it does a really nice job of keeping you from having to constantly tweak the master volume. And it doesn't really care what the input format is, since if mplayer can play it at all, it can extract the audio, so it should work fine with MP3, Ogg, AVI, whatever.
http://mp3gain.sourceforge.net/ is a well thought out solution for this.