Find how big a work directory can get during execution- linux - diskspace

I have a cron job and this cron job is doing something with lots of data and then delete all the temp files it creates. during the execution, I get 'ERROR: Insufficient space in file WORK.AIB_CUSTOMER_DATA.DATA.' the current work directory has 50G free, when I run the code in another directory with 170G free space, I don't get the error, I want to track the size of working directory during the execution.

I'm afraid I might not fully understand your problem.
In order to get an understanding on how fast is it growing in terms of size you could run a simple script like:
#!/bin/bash
while true
do
#uncomment this to check all partitions of the system.
#df -h >> log.log
#uncomment this to check the files in the current folder.
#du -sh * >> log.log
sleep 1
done
Then analyze the logs and see the increase in size.

I wrote this script and let it run during the job execution to monitor the directory size and get the maximum amount of size for this work directory.
#!/bin/bash
Max=0
while true
do
SIZE=`du -sh -B1 /data/work/EXAMPLE_work* | awk '{print $1}' `
echo size: $SIZE
echo max: $Max
if [ "$SIZE" -ge $Max ]
then
echo "big size: $SIZE" > /home/mmm/Biggestsize.txt
Max=$SIZE
else
echo "small size: $SIZE" > /home/mmm/sizeSmall.txt
fi
done

Related

Redis mass insert problem "ERR Protocol error: too big mbulk count string"

UPDATE
I split the file into multiple files each roughly with 1.5 million lines and no issues.
Attempting to pipe into Redis 6.0.6 roughly 15 million lines of SADD and HSET commands properly formatted to Redis Mass Insertion but it fails with the following message:
ERR Protocol error: too big mbulk count string
I use the following command:
echo -e "$(cat load.txt)" | redis-cli --pipe
I run dbsize command in redis-cli and it shows no increase during the entire time.
I can use the formatting app I wrote (a c++ app with client library redis-plus-plus), which correctly formats the lines, write to std::cout then using the following command as well:
./app | redis-cli --pipe
but it exits right away and only sometimes produces the error message.
If I take roughly 400,000 lines from the load.txt file and load it in a smaller file then use echo -e etc.... it loads fine. The problem seems to be the large number of lines.
Any suggestions? It's not a formatting issue afaik. I can code my app to write all the commands into Redis but mass insertion should be faster and I'd prefer that route.

check if google storage bucket is successfully created/is not empty using bash with large bucket size

Currently I am using:
is_created=$(gsutil du -s ${bucket_name} 2> /dev/null || true)
if [ -z "${is_created}" ]; then
gsutil mb -p ${GCLOUD_PROJECT} -c regional -l ${gcloud_region} ${bucket_name}
fi
Yet since my bucket size is large and it takes a long time to load the result. Is there another way around?
PS I've tried:
gsutil -m du -s
and did not see a noticeable difference.
The gsutil du command is intended to obtain info about the size of the bucket objects. Sure, as a side effect you can use it to determine if the bucket is accessible and not empty (or not).
Note that I'm saying accessible, not created, as you won't be able to tell the difference between bucket not created, created but not accessible (due to permissions or some failure of some sort) or accessible but empty. Except for the last/empty case attempting to create the bucket will fail.
From performance/duration perspective gsutil du isn't that great, it can be quite slow in buckets with a lot of objects in them as it spends time with size calculations, which are irrelevant for your goal.
One thing to try is the gsutil ls command instead, intended to obtain just the list of objects in the bucket, which typically uses less CPU than gsutil du (no size info collection/calculations). Use it without options to prevent collecting additional object info unnecessarily, just the object name should be enough for the empty check.
Something along these lines maybe:
missing=$(gsutil ls ${bucket_name} |& grep BucketNotFound | wc -l)
if [ ${missing} == 1 ]; then
gsutil mb -p ${GCLOUD_PROJECT} -c regional -l ${gcloud_region} ${bucket_name}
fi
Or, even faster on buckets with many objects:
created=$(gsutil ls -p ${GCLOUD_PROJECT} | grep ${bucket_name} | wc -l)
if [ ${created} == 0 ]; then
gsutil mb -p ${GCLOUD_PROJECT} -c regional -l ${gcloud_region} ${bucket_name}
fi

Create cron in chef

I want to create Cron in chef witch they verify size of the log if it's > 30mb it will delete it, here is my code:
cron_d 'ganglia_tomcat_thread_max' do
hour '0'
minute '1'
command "rm - f /srv/node/current/app/log/simplesamlphp.log"
only_if { ::File.size('/srv/node/current/app/log/simplesamlphp.log').to_f / 1024000 > 30 }
end
Can you help me in it please
Welcome to Stackoverflow!
I suggest you to go with existing tools like "logrotate". There is a chef cookbook available to manage logrotate.
Please note, that "cron" in chef manages the system cron service which runs independently of chef. You'll have to do the file size check within the "command". It's also better to use the cron_d resource as documented here.
In the way you create cron_d resource it will add cron task only when your log file has size greater than 30mb. In all other cases cron_d will be not created.
You can check that ruby code
File.size('file').to_f / 2**20
to get the file size in Megabytes - there is a slight difference in the result I believe that is more correct.
so you can go wirh 2 solutions for your specific case
create new cron_d resource when log file is less than 30 mb to remove existing cron and provision your node periodically
move the check of the file size in the command with bash and glue with && - in that case file will be dated only if size is greater than 30mb. something like that
du -k file.txt | cut -f1
will return size of the file in bytes
To me also correct way to to that is to use logrotate service and chef recipe for that.

C++ program significantly slower when run in bash

I have a query regarding bash. I have been running some of my own C++ programs in conjunction with commercial programs and controlling their interaction (via input and output files) through Bash scripting. I am finding that if I run my c++ program alone in terminal it completes in around 10–15 seconds, but when I run the same through the bash script it can take up to 5 minutes to complete in each case.
I find using System Monitor that consistently 100% of one CPU is used when I run the program directly in terminal whereas when I run it in bash (in a loop) a maximum of 60% of CPU usage is recorded and seems to be linked to the longer completion time (although the average CPU usage is higher over the 4 processors).
This is quite frustrating as until recently this was not a problem.
An example of the code:
#!/usr/bin/bash
DIR="$1"
TRCKDIR=$DIR/TRCKRSLTS
STRUCTDIR=$DIR
SHRTTRCKDIR=$TRCKDIR/SHRT_TCK_FILES
VTAL=VTAL.png
VTAR=VTAR.png
NAL=$(find $STRUCTDIR | grep NAL)
NAR=$(find $STRUCTDIR | grep NAR)
AMYL=$(find $STRUCTDIR | grep AMYL)
AMYR=$(find $STRUCTDIR | grep AMYR)
TCKFLS=($(find $TRCKDIR -maxdepth 1 | grep .fls))
numTCKFLS=${#TCKFLS[#]}
for i in $(seq 0 $[numTCKFLS-1]); do
filenme=${TCKFLS[i]}
filenme=${filenme%.t*}
filenme=${filenme##*/}
if [[ "$filenme" == *VTAL* || "$filenme" == *VTA_L* ]]; then
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAL -ROI2 $NAL -op "$SHRTTRCKDIR"/"$filenme"_VTAL_NAL.fls
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAL -ROI2 $AMYL -op "$SHRTTRCKDIR"/"$filenme"_VTAL_AMYL.fls
fi
if [[ "$filenme" == *VTAR* || "$filenme" == *VTA_R* ]];then
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAR -ROI2 $NAR -op "$SHRTTRCKDIR"/"$filenme"_VTAR_NAR.fls
STREAMLINE_CUTTER -MRT ${TCKFLS[i]} -ROI1 $VTAR -ROI2 $AMYR -op "$SHRTTRCKDIR"/"$filenme"_VTAR_AMYR.fls
fi
done

calculate mp3 average volume

I need to know the average volume of an mp3 file so that when I convert it to mp3 (at a different bitrate) I can scale the volume too, to normalize it...
Therefore I need a command line tool / ruby library that gives me the average volume in dB.
You can use sox (an open source command line audio tool http://sox.sourceforge.net/sox.html) to normalize and transcode your files at the same time.
EDIT
Looks like it doesn't have options for bit-rate. Anyway, sox is probably overkill if LAME does normalization.
You can use LAME to encode to mp3. It has options for normalization, scaling, and bitrate. LAME also compiles to virtually any platform.
I wrote a little wrapper script, based on the above input:
#!/bin/sh
# Get the current volume (will reset to this later).
current=`amixer -c 0 get Master 2>&1 |\
awk '/%/ {
p=substr($4,2,length($4)-2);
if( substr(p,length(p)) == "%" )
{
p = substr(p,1,length(p)-1)
}
print p
}'`
# Figure out how loud the track is. The normal amplitude for a track is 0.1.
# Ludicrously low values are 0.05, high is 0.37 (!!?)
rm -f /tmp/$$.out
/usr/bin/mplayer -vo null -ao pcm:file=/tmp/$$.out $1 >/dev/null 2>&1
if [ $? = 0 ] ; then
amplitude=`/usr/bin/sox /tmp/$$.out -n stat 2>&1 | awk '/RMS.+amplitude/ {print $NF}'`
fi
rm -f /tmp/$$.out
# Set an appropriate volume for the track.
to=`echo $current $amplitude | awk '{printf( "%.0f%%", $1 * 0.1/$2 );}'`
echo $current $amplitude | awk '{print "Amplitude:", $2, " Setting volume to:", 10/$2 "%, mixer volume:", $1 * 0.1/$2}'
amixer -c 0 set Master $to >/dev/null 2>&1
mplayer -quiet -cache 2500 $1
# Reset the volume for next time.
amixer -c 0 set Master "$current%" >/dev/null 2>&1
It takes an extra second to start up playing the file, and relies on alsamixer to adjust the volume, but it does a really nice job of keeping you from having to constantly tweak the master volume. And it doesn't really care what the input format is, since if mplayer can play it at all, it can extract the audio, so it should work fine with MP3, Ogg, AVI, whatever.
http://mp3gain.sourceforge.net/ is a well thought out solution for this.