Chef checking log file size - amazon-web-services

I want to create a code with chef which it test the size of log file if it exceeds 30mb for example it will delete it in cron job.
Can you help me in it please?
this is my solution but it's not what i want:
execute "echo '/srv/#{app['shortname']}/current/app/log/*.log {
rotate 1
missingok
copytruncate
sharedscripts
postrotate
/bin/rm -rf /srv/#{app['shortname']}/current/app/log/*.log*
endscript
}' >> /etc/logrotate.d/#{app['name']}-app-log-rotation" do
ignore_failure true
end

You can always create a file with the shell script to delete the file and call that file via chef cron resource
file '/tmp/foo.sh' do
content "if (wc -c < /tmp/y.txt) > 3072000; then rm -f /tmp/y.txt; fi"
mode 755
end
cron 'name_of_cron_entry' do
minute '3'
command 'sh -x /tmp/foo.sh > /tmp/backup 2>&1'
end
This will create a cron entry for you with that script.

You can try something like
file '/tmp/y.txt' do
only_if { ::File.size('/tmp/y.txt').to_f / 1024000 > 30 }
action :delete
end

Related

Log Each Request And Output In EC2 Ubuntu or Linux

I have an AWS EC2 instance and in this instance, I have some cron jobs.
This cron jobs looks like:
0 5 * * mon curl -Ssi -X POST http://example.com
And I have some manual outputs like:
echo "output: hello..."
I want to store these actions on a log file in EC2 ubuntu or linux instance. Is it possible? Any suggestion?
Expected output:
[2021-10-10 ...] - POST http://example.com
[2021-10-11 ...] - output: hello...
Write the following script ~/bin/site-detector
#!/bin/bash
source ~/.bash_profile
log_file=/tmp/site-detector.log
curl -Ssi -X POST http://example.com >> $log_file
echo "Detected # $(date)" >> $log_file
echo " " >> $log_file
Make your script executable:
chmod a+x ~/bin/site-detector
Update your crontab script:
0 5 * * mon ~/bin/site-detector

Is there a way to confirm user_data ran successfully with Terraform for EC2?

I'm wondering if it's possible to know when the script in user data executes completely?
data "template_file" "script" {
template = file("${path.module}/installing.sh")
}
data "template_cloudinit_config" "config" {
gzip = false
base64_encode = false
# Main cloud-config configuration file.
part {
filename = "install.sh"
content = "${data.template_file.script.rendered}"
}
}
resource "aws_instance" "web" {
ami = "ami-04e7b4117bb0488e4"
instance_type = "t2.micro"
key_name = "KEY"
vpc_security_group_ids = [aws_default_security_group.default.id]
subnet_id = aws_default_subnet.default_az1.id
associate_public_ip_address = true
iam_instance_profile = "Role_S3"
user_data = data.template_cloudinit_config.config.rendered
tags = {
Name = "Terraform-Ansible"
}
}
And in the content of the script I have this.
It tells me Terraform successfully apply the changes, but the script is still running, is there a way I can monitor that?
#!/usr/bin/env bash
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
echo BEGIN
sudo apt update
sudo apt upgrade -y
sudo apt install -y unzip
echo END
No, You can not confirm the user data status from the terraform, as it posts launching script that executes once EC2 instance launched. But you will need some extra effort on init script that one way to check.
How to check User Data status while launching the instance in aws
If you do something that is mentioned above to make some marker file once user data completed, then you can try this to check.
resource "null_resource" "user_data_status_check" {
provisioner "local-exec" {
on_failure = "fail"
interpreter = ["/bin/bash", "-c"]
command = <<EOT
echo -e "\x1B[31m wait for few minute for instance warm up, adjust accordingly \x1B[0m"
# wait 30 sec
sleep 30
ssh -i yourkey.pem instance_ip ConnectTimeout=30 -o 'ConnectionAttempts 5' test -f "/home/user/markerfile.txt" && echo found || echo not found
if [ $? -eq 0 ]; then
echo "user data sucessfully executed"
else
echo "Failed to execute user data"
fi
EOT
}
triggers = {
#remove this once you test it out as it should run only once
always_run ="${timestamp()}"
}
depends_on = ["aws_instance.my_instance"]
}
so this script will check marker file on the newly launch server by doing ssh with timeout 30 seconds with max attempts 5.
Here are some pointers to remember:
User data shell scripts must start with the Shebang #! characters and the path to the interpreter you want to read the script (commonly /bin/bash).
Scripts entered as user data are run as the root user, so no need to use the sudo command in the init script.
When a user data script is processed, it is copied to and run from /var/lib/cloud/instances/instance-id/. The script is not deleted after it is run and can be found in this directory with the name user-data.txt So to check if your shell script made to the server refer this directory and the file.
The cloud-init output log file (/var/log/cloud-init-output.log) captures console output of your user_data shell script. to know how your user_data shell script was executed and its output check this file.
Source: https://www.middlewareinventory.com/blog/terraform-aws-ec2-user_data-example/
Well I use these two ways to confirm.
At the end of cloudinit config file this line sends me a notification through whatsapp (using callmebot). Thus no matter how much does it take to setup, I always get notified when it's ready to use. I watch some series or read something in that time. no time wasted.
curl -X POST "https://api.callmebot.com/whatsapp.php?phone=12345678910&text=Ec2+transcoder+setup+complete&apikey=12345"
At the end of cloudinit config this line runs -
echo "for faster/visual confirmation of above execution.."
wget https://www.sample-videos.com/video123/mp4/720/big_buck_bunny_720p_1mb.mp4 -O /home/ubuntu/dpnd_comp.mp4
When I sign in to the instance I can see directly the file.
And I'm loving it. Hope this helps someone. Also, don't forget to tell me your method too.

GCP Dataproc - configure YARN fair scheduler

I was trying to set up a dataproc cluster that would compute only one job (or specified max jobs) at a time and the rest would be in queue.
I have found this solution, How to configure monopolistic FIFO application queue in YARN? , but as I'm always creating a new cluster, I needed to automatize this. I have added this to cluster creation:
"softwareConfig": {
"properties": {
"yarn:yarn.resourcemanager.scheduler.class":"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler",
"yarn:yarn.scheduler.fair.user-as-default-queue":"false",
"yarn:yarn.scheduler.fair.allocation.file":"$HADOOP_CONF_DIR/fair-scheduler.xml",
}
}
with another line in init action script:
sudo echo "<allocations><queueMaxAppsDefault>1</queueMaxAppsDefault></allocations>" > /etc/hadoop/conf/fair-scheduler.xml
and the cluster tells me this when I fetch its config:
'softwareConfig': {
'imageVersion': '1.2.27',
'properties': {
'capacity-scheduler:yarn.scheduler.capacity.root.default.ordering-policy': 'fair',
'core:fs.gs.block.size': '134217728',
'core:fs.gs.metadata.cache.enable': 'false',
'distcp:mapreduce.map.java.opts': '-Xmx4096m',
'distcp:mapreduce.map.memory.mb': '5120',
'distcp:mapreduce.reduce.java.opts': '-Xmx4096m',
'distcp:mapreduce.reduce.memory.mb': '5120',
'hdfs:dfs.datanode.address': '0.0.0.0:9866',
'hdfs:dfs.datanode.http.address': '0.0.0.0:9864',
'hdfs:dfs.datanode.https.address': '0.0.0.0:9865',
'hdfs:dfs.datanode.ipc.address': '0.0.0.0:9867',
'hdfs:dfs.namenode.http-address': '0.0.0.0:9870',
'hdfs:dfs.namenode.https-address': '0.0.0.0:9871',
'hdfs:dfs.namenode.secondary.http-address': '0.0.0.0:9868',
'hdfs:dfs.namenode.secondary.https-address': '0.0.0.0:9869',
'mapred-env:HADOOP_JOB_HISTORYSERVER_HEAPSIZE': '3840',
'mapred:mapreduce.job.maps': '189',
'mapred:mapreduce.job.reduce.slowstart.completedmaps': '0.95',
'mapred:mapreduce.job.reduces': '63',
'mapred:mapreduce.map.cpu.vcores': '1',
'mapred:mapreduce.map.java.opts': '-Xmx4096m',
'mapred:mapreduce.map.memory.mb': '5120',
'mapred:mapreduce.reduce.cpu.vcores': '1',
'mapred:mapreduce.reduce.java.opts': '-Xmx4096m',
'mapred:mapreduce.reduce.memory.mb': '5120',
'mapred:mapreduce.task.io.sort.mb': '256',
'mapred:yarn.app.mapreduce.am.command-opts': '-Xmx4096m',
'mapred:yarn.app.mapreduce.am.resource.cpu-vcores': '1',
'mapred:yarn.app.mapreduce.am.resource.mb': '5120',
'spark-env:SPARK_DAEMON_MEMORY': '3840m',
'spark:spark.driver.maxResultSize': '1920m',
'spark:spark.driver.memory': '3840m',
'spark:spark.executor.cores': '8',
'spark:spark.executor.memory': '37237m',
'spark:spark.yarn.am.memory': '640m',
'yarn:yarn.nodemanager.resource.memory-mb': '81920',
'yarn:yarn.resourcemanager.scheduler.class': 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler',
'yarn:yarn.scheduler.fair.allocation.file': '$HADOOP_CONF_DIR/fair-scheduler.xml',
'yarn:yarn.scheduler.fair.user-as-default-queue': 'false',
'yarn:yarn.scheduler.maximum-allocation-mb': '81920',
'yarn:yarn.scheduler.minimum-allocation-mb': '1024'
}
},
The file fair-scheduler.xml also contains the specified code (everything is in one line, but I don't think this could be the problem)
After all this, the cluster still acts like if the capacity scheduler was in charge. No idea why. Any recommendation would help.
Thanks.
as init actions script is running after the cluster is created, the yarn service is already running in the time when the script modify the yarn-site.xml.
So after modifying the xml config file and creating the other xml file, the yarn service needs to be restarted.
It can be done using this command:
sudo systemctl restart hadoop-yarn-resourcemanager.service
Also, since the $HADOOP_CONF_DIR was not set (I thought it should be), its needed to input the whole path to the file. But, after that, the initial YARN service won't start, because it can't find the file that is created later in init actions script. So, what I did is to add the last few lines to yarn-site.xml in the init actions script as well.
The code for init actions script is the following:
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
echo "<allocations>" > /etc/hadoop/conf/fair-scheduler.xml
echo " <queueMaxAppsDefault>1</queueMaxAppsDefault>" >> /etc/hadoop/conf/fair-scheduler.xml
echo "</allocations>" >> /etc/hadoop/conf/fair-scheduler.xml
sed -i '$ d' /etc/hadoop/conf/yarn-site.xml
echo " <property>" >> /etc/hadoop/conf/yarn-site.xml
echo " <name>yarn.scheduler.fair.allocation.file</name>" >> /etc/hadoop/conf/yarn-site.xml
echo " <value>/etc/hadoop/conf/fair-scheduler.xml</value>" >> /etc/hadoop/conf/yarn-site.xml
echo " </property>" >> /etc/hadoop/conf/yarn-site.xml
echo "</configuration>" >> /etc/hadoop/conf/yarn-site.xml
systemctl restart hadoop-yarn-resourcemanager.service
fi

Remove old crons when redeplying to AWS beanstack

Not an expert on AWS and trying to fool around with Cron jobs. For testing, I had a sample script send me emails every minute. Now, I want to change it to once every 10 minutes (*/10 * * * *) These are the container commands I tried and none of them seems to work.
I am using a config file and a txt file to define the crons.
Config file contents (with various ideas I read from online sources)
container_commands:
00_remove_old_cron_jobs0:
command: "rm -fr /etc/cron.d/cron_job"
01_remove_old_cron_jobs1:
command: "sudo sed -i 's/empty stuff//g' /etc/cron.d/cron_job"
02_remove_old_cron_jobs2:
command: "crontab -r || exit 0"
03_cron_job:
command: "cat .ebextensions/cron_job.txt > /etc/cron.d/cron_job && chmod 644 /etc/cron.d/cron_job"
leader_only: true
cron_job.txt file contents.
# The newline at the end of this file is extremely important. Cron won't run without it.
0 * * * * ec2-user /usr/bin/php -q /var/www/html/cron1.php > /dev/null
0 * * * * ec2-user /usr/bin/php -q /var/www/html/html/cron2.php > /dev/null
*/10 * * * * ec2-user /usr/bin/php -q /var/www/html/cronTestEmailer.php > /dev/null
The test emailer script keeps firing every minute instead of every 10 mins and I dont know how I can make sure cron updates are reflected correctly.
You can achieve the same with the follow ebextensions config file.
files:
"/etc/cron.d/mycron":
mode: "000644"
owner: root
group: root
content: |
* * * * * root /usr/local/bin/myscript.sh
"/usr/local/bin/myscript.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/bash
date > /tmp/date
# Your actual script content
exit 0
commands:
remove_old_cron:
command: "rm -f /etc/cron.d/*.bak"
More Details about the config file below:
files: Creates a Cron job and a file with the name myscript.sh. If a file with the same name exists already, first it moves the old file .bak and creates the file with new contents.
commands: deletes the all .bak files

No Output for Python Script Executed via Cron Job

Relatively new to running cron jobs in Centos6, I can't seem to get this Python script to execute properly. I would like this script to execute and then email me the output. I have been receiving emails, but they're empty.
So far, in Crontab I've tried entering:
*/10 * * * * cd /home/local/MYCOMPANY/purrone/MyPythonScripts_Dev1 && /usr/bin/python ParserScript_CampusInsiders.py > /var/log/cron`date +\%Y-\%m-\%d-\%H:\%M:\%S`-cron.log 2>&1 ; mailx -s "Feedparser Output" my#email.com
and
*/10 * * * * /home/local/MYCOMPANY/purrone/MyPythonScripts_Dev1/ParserScript_CampusInsiders.py > /var/log/cron`date +\%Y-\%m-\%d-\%H:\%M:\%S`-cron.log 2>&1 ; mailx -s "Feedparser Output" my#email.com
I have run chmod +x on the python script to make the script executable and the Python script has #!/usr/bin/env python at the header. What am I doing wrong here?
The other problem might be that I shouldn't be using the log file? All I see at /var/log/cron when I open with cat cron is entires like this, for example (no actual output from the script):
Jul 23 13:20:01 ent-mocdvsmg01 CROND[24681]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jul 23 13:20:01 ent-mocdvsmg01 CROND[24684]: (MYJOB\purrone) CMD (/home/local/MYCOMPANY/purrone/MyPythonScripts_Dev1/ParserScript_CampusInsiders.py > /var/log/cron`date +\%Y-\%m-\%d-\%H:\%M:\%S`-cron.log 2>&1 ; mailx -s "Feedparser Output" my#email.com)
There is nothing going into your mailx input; it expects the message on stdin. Try running it outside of crontab as a test until it sends a valid email. You could test with:
% echo hello |mailx -s test my#email.com
Note that cron can email you the output of its run. You just need to add a line to the top of crontab like:
MAILTO = you#email.com
Solution was to omit the redirect > and instead edit the Crontab thusly:
*/15 * * * * /home/local/COMPANY/malvin/SilverChalice_CampusInsiders/SilverChalice_Parser.py | tee /home/local/COMPANY/malvin/SilverChalice_CampusInsiders`date +\%Y-\%m-\%d-\%H:\%M:\%S`-cron.log | mailx -s "SilverChalice CampusInsiders" my#email.com