Create cron in chef - amazon-web-services

I want to create Cron in chef witch they verify size of the log if it's > 30mb it will delete it, here is my code:
cron_d 'ganglia_tomcat_thread_max' do
hour '0'
minute '1'
command "rm - f /srv/node/current/app/log/simplesamlphp.log"
only_if { ::File.size('/srv/node/current/app/log/simplesamlphp.log').to_f / 1024000 > 30 }
end
Can you help me in it please

Welcome to Stackoverflow!
I suggest you to go with existing tools like "logrotate". There is a chef cookbook available to manage logrotate.
Please note, that "cron" in chef manages the system cron service which runs independently of chef. You'll have to do the file size check within the "command". It's also better to use the cron_d resource as documented here.

In the way you create cron_d resource it will add cron task only when your log file has size greater than 30mb. In all other cases cron_d will be not created.
You can check that ruby code
File.size('file').to_f / 2**20
to get the file size in Megabytes - there is a slight difference in the result I believe that is more correct.
so you can go wirh 2 solutions for your specific case
create new cron_d resource when log file is less than 30 mb to remove existing cron and provision your node periodically
move the check of the file size in the command with bash and glue with && - in that case file will be dated only if size is greater than 30mb. something like that
du -k file.txt | cut -f1
will return size of the file in bytes
To me also correct way to to that is to use logrotate service and chef recipe for that.

Related

HTCondor - Partitionable slot not working

I am following the tutorial on
Center for High Throughput Computing and Introduction to Configuration in the HTCondor website to set up a Partitionable slot. Before any configuration I run
condor_status
and get the following output.
I update the file 00-minicondor in /etc/condor/config.d by adding the following lines at the end of the file.
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=4
SLOT_TYPE_1_PARTITIONABLE = TRUE
and reconfigure
sudo condor_reconfig
Now with
condor_status
I get this output as expected. Now, I run the following command to check everything is fine
condor_status -af Name Slotype Cpus
and find slot1#ip-172-31-54-214.ec2.internal undefined 1 instead of slot1#ip-172-31-54-214.ec2.internal Partitionable 4 61295 that is what I would expect. Moreover, when I try to summit a job that asks for more than 1 cpu it does not allocate space for it (It stays waiting forever) as it should.
I don't know if I made some mistake during the installation process or what could be happening. I would really appreciate any help!
EXTRA INFO: If it can be of any help have have installed HTCondor with the command
curl -fsSL https://get.htcondor.org | sudo /bin/bash -s – –no-dry-run
on Ubuntu 18.04 running on an old p2.xlarge instance (it has 4 cores).
UPDATE: After rebooting the whole thing it seems to be working. I can now send jobs with different CPUs requests and it will start them properly.
The only issue I would say persists is that Memory allocation is not showing properly, for example:
But in reality it is allocating enough memory for the job (in this case around 12 GB).
If I run again
condor_status -af Name Slotype Cpus
I still get something I am not supposed to
But at least it is showing the correct number of CPUs (even if it just says undefined).
What is the output of condor_q -better when the job is idle?

No space left on device in Sagemaker model training

I'm using custom algorithm running shipped with Docker image on p2 instance with AWS Sagemaker (a bit similar to https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb)
At the end of training process, I try to write down my model to output directory, that is mounted via Sagemaker (like in tutorial), like this:
model_path = "/opt/ml/model"
model.save(os.path.join(model_path, 'model.h5'))
Unluckily, apparently the model gets too big with time and I get the
following error:
RuntimeError: Problems closing file (file write failed: time = Thu Jul
26 00:24:48 2018
00:24:49 , filename = 'model.h5', file descriptor = 22, errno = 28,
error message = 'No space left on device', buf = 0x1a41d7d0, total
write[...]
So all my hours of GPU time are wasted. How can I prevent this from happening again? Does anyone know what is the size limit for model that I store on Sagemaker/mounted directories?
When you train a model with Estimators, it defaults to 30 GB of storage, which may not be enough. You can use the train_volume_size param on the constructor to increase this value. Try with a large-ish number (like 100GB) and see how big your model is. In subsequent jobs, you can tune down the value to something closer to what you actually need.
Storage costs $0.14 per GB-month of provisioned storage. Partial usage is prorated, so giving yourself some extra room is a cheap insurance policy against running out of storage.
In the SageMaker Jupyter notebook, you can check free space on the filesystem(s) by running !df -h. For a specific path, try something like !df -h /opt.

Fluentd S3 output plugin not recognizing index

I am facing problems while using S3 output plugin with fluent-d.
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
using %index at end never resolves to _0,_1 . But i always end up with log file names as
sflow_.log
while i need sflow_0.log
Regards,
Can you paste your fluent.conf?. It's hard to find the problem without full info. File creations are mainly controlled by time slice flag and buffer configuration..
<buffer>
#type file or memory
path /fluentd/buffer/s3
timekey_wait 1m
timekey 1m
chunk_limit_size 64m
</buffer>
time_slice_format %Y%m%d%H%M
With above, you create a file every minute and within 1min if your buffer limit is reached or due to any other factor another file is created with index 1 under same minute.

Flume HDFS Sink generates lots of tiny files on HDFS

I have a toy setup sending log4j messages to hdfs using flume. I'm not able to configure the hdfs sink to avoid many small files. I thought I could configure the hdfs sink to create a new file every-time the file size reaches 10mb, but it is still creating files around 1.5KB.
Here is my current flume config:
a1.sources=o1
a1.sinks=i1
a1.channels=c1
#source configuration
a1.sources.o1.type=avro
a1.sources.o1.bind=0.0.0.0
a1.sources.o1.port=41414
#sink config
a1.sinks.i1.type=hdfs
a1.sinks.i1.hdfs.path=hdfs://localhost:8020/user/myName/flume/events
#never roll-based on time
a1.sinks.i1.hdfs.rollInterval=0
#10MB=10485760
a1.sinks.il.hdfs.rollSize=10485760
#never roll base on number of events
a1.sinks.il.hdfs.rollCount=0
#channle config
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sources.o1.channels=c1
a1.sinks.i1.channel=c1
It is your typo in conf.
#sink config
a1.sinks.i1.type=hdfs
a1.sinks.i1.hdfs.path=hdfs://localhost:8020/user/myName/flume/events
#never roll-based on time
a1.sinks.i1.hdfs.rollInterval=0
#10MB=10485760
a1.sinks.il.hdfs.rollSize=10485760
#never roll base on number of events
a1.sinks.il.hdfs.rollCount=0
where in the line 'rollSize' and 'rollCount', you put il as i1.
Please try to use DEBUG, then you will find like:
[SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.shouldRotate:465) - rolling: rollSize: 1024, bytes: 1024
Due to il, default value of rollSize 1024 is being used .
HDFS Sink has a property hdfs.batchSize (default 100) which describes "number of events written to file before it is flushed to HDFS". I think that's your problem here.
Consider also checking all other properties: HDFS Sink .
This can possibly happen because of the memory channel and its capacity. I guess its dumping data to HDFS as soon as its capacity becomes full. Did you try using file channel instead of memory ?

RRDTool: RRD file not updating

My RRD file is not updating, what is the reason?
The graph shows the legend with: -nanv
I created the RRD file using this syntax:
rrdtool create ups.rrd --step 300
DS:input:GAUGE:600:0:360
DS:output:GAUGE:600:0:360
DS:temp:GAUGE:600:0:100
DS:load:GAUGE:600:0:100
DS:bcharge:GAUGE:600:0:100
DS:battv:GAUGE:600:0:100
RRA:AVERAGE:0.5:12:24
RRA:AVERAGE:0.5:288:31
Then I updated the file with this syntax:
rrdtool update ups.rrd N:$inputv:$outputv:$temp:$load:$bcharge:$battv
And graphed it with this:
rrdtool graph ups-day.png
-t "ups "
-s -1day
-h 120 -w 616
-a PNG
-cBACK#F9F9F9
-cSHADEA#DDDDDD
-cSHADEB#DDDDDD
-cGRID#D0D0D0
-cMGRID#D0D0D0
-cARROW#0033CC
DEF:input=ups.rrd:input:AVERAGE
DEF:output=ups.rrd:output:AVERAGE
DEF:temp=ups.rrd:temp:AVERAGE
DEF:load=ups.rrd:load:AVERAGE
DEF:bcharge=ups.rrd:bcharge:AVERAGE
DEF:battv=ups.rrd:battv:AVERAGE
LINE:input#336600
AREA:input#32CD3260:"Input Voltage"
GPRINT:input:MAX:" Max %lgv"
GPRINT:input:AVERAGE:" Avg %lgv"
GPRINT:input:LAST:"Current %lgv\n"
LINE:output#4169E1:"Output Voltage"
GPRINT:output:MAX:"Max %lgv"
GPRINT:output:AVERAGE:" Avg %lgv"
GPRINT:output:LAST:"Current %lgv\n"
LINE:load#FD570E:"Load"
GPRINT:load:MAX:" Max %lg%%"
GPRINT:load:AVERAGE:" Avg %lg%%"
GPRINT:load:LAST:" Current %lg%%\n"
LINE:temp#000ACE:"Temperature"
GPRINT:temp:MAX:" Max %lgc"
GPRINT:temp:AVERAGE:" Avg %lgc"
GPRINT:temp:LAST:" Current %lgc"
You will need at least 13 updates, each 5min apart (IE, 12 PDP (primary data points)) before you can get a single CDP (consolidated data point) written to your RRAs, enabling you to get a data point on the graph. This is because your smallest resolution RRA is a Count 12, meaning you need 12 PDP to make one CDP.
Until you have enough data to write a CDP, you have nothing to graph, and your graph will always have unknown data.
Alternatively, add a smaller resolution RRA (maybe Count 1) so that you do not need to collect data for so long before you have a full CDP.
The update script needs to be run at exactly the same interval as defined in your database.
I see it has a step value of 300 so the database should be updated every 5 minutes.
Just place you update script in a cron job (you can also do it for your graph script)
For example,
sudo crontab -e
If run for the first time choose your favorite editor (I usually go with Vim) and add the full path location of your script and run it every 5 minutes. So add this (don't forget to rename the path):
*/5 * * * * /usr/local/update_script > /dev/null && /usr/local/graph_script > /dev/null
Save it, and wait a couple of minutes. I usually redirect the output to /dev/null in case of the output that can be generated by a script. So if a script that will be executed gives an output crontab will fail and send a notification.