Google Cloud Ops Agent - google-cloud-platform

I am having an issue where Google Cloud Ops Agent logging gathers a lot of data and fills up my entire debian server hard drive in about 3 weeks due to the ever increasing size of the log file.
I do not want to increase the size of my server hard drive.
Does anyone know how to configure Google Cloud Ops Agent so that it only retains log data for the previous 7 days ?
EDIT: Google Cloud Ops Agent log file is stored in directory below
/var/log/google-cloud-ops-agent/subagents/logging-module.log

I faced the same issue recently while using agent 2.11.0. And it's not just an enormous log file, it's also a ridiculous CPU usage! Check it out in htop.
If you open the log file you'll see it spamming errors about buffer chunks. Apparently, they got broken smh, so the agent can't read them and send away. Thus, high IO and CPU usage.
The solution is to stop the service:
sudo service google-cloud-ops-agent stop
Then clear all buffer chunks:
sudo rm -rf /var/lib/google-cloud-ops-agent/fluent-bit/buffers/
And delete log file if you want:
sudo rm -f /var/log/google-cloud-ops-agent/subagents/logging-module.log
Then start the agent:
sudo service google-cloud-ops-agent start
This helped me out.
Btw this issue is described here and it seems that Google "fixed" it since 2.7.0-1. Whatever they mean by it since we still faced it...

Related

Copy files to Container-Optimised OS from a GCP Storage bucket

How can one download files from a GCP Storage bucket to a Container-Optimised OS (COS) on instance startup?
I know of the following solutions:
gcloud compute copy-files
SSH through console
SCP
Yet all of these have to be done manually and externally after an instance is started.
There is also cloud init, yet I can't find any info on how to copy files from a Storage bucket. Examples seem to be suggesting that it's better to include content of files in the cloud init file directly, which is not something I want to do because security. Is it possible to download files from Storge bucket using cloud init?
I considered using a startup script, yet COS lacks CLI tools such as gcloud or gsutil to be able to run any such commands in a startup script.
I know I could copy the files manually and then save the image as a boot disk, but I'm hoping there are solutions that avoid having to do so.
Most of all, I'm assuming I'm not asking for something impossible, given that COS instance setup allows me to specify Docker volumes that I could mount onto the starting container. This seems to suggest I should be able to have some private files on the instance the moment COS will attempt to run my image on startup. But how?
Trying to execute a startup-script with a cloud-sdk image and copying files there as suggested by Guillaume didn't work for me for a while, showing this log. Eventually I realised that the cloud-sdk image is 2.41GB when uncompressed and takes over 2 minutes to complete pulling. I tried again with an empty COS instance and the startup script completed successfully, downloading the data from a Storage bucket.
However, a 2.41GB image and over 2 minutes of boot time sound like a bit of an overkill to download a 2KB file. Don't they?
I'm glad to see a working solution to my question (thanks Guillaume!) although I'm still wondering: isn't there a nicer way to do this? I feel that this method is even less tidy than manually putting the files on the COS instance and then creating a machine image to use in the future.
Based on Guillaume's answer I created and published a gsutil wrapper image, available as voyz/gsutil_wrap. This way I am able to run a startup-script with the following command:
docker run -v /host/path:/container/path \
--entrypoint gsutil voyz/gsutil_wrap \
cp gs://bucket/path /container/path
It's essentially a copy of what Guillaume suggested, except it is using an image containing only a minimum setup required to run gsutil. As a result it weighs 0.22GB and pulls within 10-20 seconds on average - as opposed to 2.41GB and over 2 minutes respectively for the google/cloud-sdk image suggested by Guillaume.
Also, credit to this incredibly useful StackOverflow answer that allows gsutil to use the default service account for authentication.
The startup-script is the correct location to do this. And YES, COS lacks some useful library.
BUT you can run container! And, for example, the Google Cloud SDK container!
So, add this startup-script in the VM metadata:
key -> startup-script
value ->
docker run -v /local/path/to/copy/files:/dummy/container/path \
--entrypoint gsutil google/cloud-sdk \
cp gs://your_bucket/path/to/file /dummy/container/path
Note: the startup script is ran in root mode. Perform a chmod/chown in your startup script if you need to change the file access mode.
Let me know if you need more explanation on this command line
Of course, with a fresh COS image, the startup time is quite long (pull the container image and extract it).
To reduce the startup time, you can "bake" your image. I mean, start with a COS, download/install what you want on it (or only perform a docker pull of the googkle/cloud-sdk container) and create a custom image from this.
Like this, all the required dependencies will be present on the image and the boot start will be quicker.

Limit a Docker container's disk IO - AWS EBS/EC2 Instance

When I run a new install of WordPress or a simple build command for some of my web apps in Jenkins the server grinds to a halt. In Netdata it appears the culprit is high "iowait".
I know that I can increase the IOPS on the EBS volume but I'd rather just wait a longer time for the process to finish. Is there a way to limit IOPS on a docker container (in this case; my Jenkins container)?
Try --device-read-iops and --device-write-iops option of docker run command.
The command should be something like this
docker run -itd --device-read-iops /dev/sda:100 --device-write-iops /dev/sda:100 image-name
NOTE: /dev/sda is the device name and 100 is number of iops per second
You can also limit io in terms of bytes using
--device-read-bps and --device-write-bps option.
Check this documentation for more info.
https://docs.docker.com/engine/reference/run/

disk full issue in Hadoop yarn

I am using EMR & running a spark streaming job with yarn as resource manager and Hadoop 2.7.3-amzn-0, facing disk full issue so, I have implemented two properties yarn.nodemanager.localizer.cache.cleanup.interval-ms as 600000 and yarn.nodemanager.localizer.cache.target-size-mb as 1024 in yarn-site.xml then also my disk space is become full after some time and cross the limit which I configured. It seems my configured properties are not working and manually I have to clean the disk using this command: rm -rf filecache/ usercache/
every spark job is creating directory in filecache like:- /mnt/yarn/usercache/hadoop/filecache/5631/__spark_libs__8149715201399895593.zip having all .jar files.
What to do for automatically clean the filecache & usercache? On which file and location do i have to change?
Can anyone please help.

Google cloud compute startup script ignored with no logging

I have a standard Debian 8.9 instance on google cloud compute (GCE) where my startup script is ignored.
In the custom metadata field, for startup-script, I am trying to run an Rscript (which is used for batch execution of R files), followed by a system shutdown, with the following:
#! /bin/bash
sudo /usr/bin/Rscript /home/myuser/launch_script.R
sudo shutdown -h now
Starting the instance is immediately followed by a shutdown and the Rscript is ignored. Removing the last line to shutdown causes the GCE instance to start, but the Rscript to be ignored. Running just "sudo /usr/bin/Rscript /home/myuser/launch_script.R" from the terminal results in the script being run. It has a chmod of 755, so I don't think this is a permissions issue.
In addition to this problem, I have read elsewhere that logging should happen in /var/log/, but there is nothing there. Instead, I have a bunch of log files (that only contain the start-up script and nothing else) in the root of my instance:
I got in touch with Google cloud support, who gave the following response:
script definition is kept under /var/run/google.startup.script
If the script does not run initially, you can force it manually with : $ sudo google_metadata_script_runner --script-type startup # for Debian, or # sudo /usr/share/google/run-startup-scripts # on Ubuntu and older images
I'm posting this information here, because it is not in their documentation (as of August 2017). I'm not sure how helpful it is, since the google.startup.script didn't exist in my case (using the latest Debian image on GCE), but I did run the other commands.
However, I think my main issues were:
I was using autossh to connect to a remote database. The startup-script was running before autossh. Building a 40 second delay into the script and running the script as a user (not sudo-type root) seems to have solved this problem for now. Autossh was being run as the main user, which I think gets loaded before lower-privilege user-defined scripts get loaded.
I was using some gcloud commands from the user account which had its own authentication issues. Running gcloud auth login as the user and ensuring correct permissions on my private key solved this.
Always remember to check the messages and syslog files in /var/log for troubleshooting. This allowed me to see the order of things being loaded at system-boot.

AWS S3 upload fails: RequestTimeTooSkewed

I'm using
aws s3 sync ~/folder/ s3:// --delete
to upload (and sync) a large number of files to an S3 bucket. Some - but not all - of the files fail, throwing this error message:
upload failed: to s3://bucketname/folder/
A client error (RequestTimeTooSkewed) occurred when calling the UploadPart operation: The difference between the request time and the current time is too large
I know that the cause of this error is usually a local time that's out of sync with Internet time, but I'm running NTP (on my Ubuntu PC) and the date/time seem absolutely accurate - and this error has only been reported for about 15 out of the forty or so files I've uploaded so far.
Some of the files are relatively large - up to about 70MB each - and my upload speeds aren't fantastic: could S3 possibly be comparing the initial and completion times and reporting their difference as an error?
Thanks,
The time verification happens at the start of your upload to S3, so it won't be to do with files taking too long to upload.
Try comparing your system time with what S3 is reporting and see if there is any unnecessary time drift, just to make sure:
# Time from Amazon
$ curl http://s3.amazonaws.com -v
# Time on your local machine
$ date -u
(Time is returned in UTC)
I was running aws s3 cp inside docker container on a MacBook Pro and got this error. Restart the Docker for Mac fixed this issue.
Amazon S3 uses NTP for its system clocks, to sync with your clock.
Run
sudo apt-get install ntp
then open /etc/ntp.conf and add at the bottom
server 0.amazon.pool.ntp.org iburst
server 1.amazon.pool.ntp.org iburst
server 2.amazon.pool.ntp.org iburst
server 3.amazon.pool.ntp.org iburst
Then run
sudo service ntp restart
It now seems that multipart uploads were failing on aws s3. Using s3cmd instead works perfectly.
You have to sync you local time on your machine. The time is out of world time.
I'm having the issue on MacOS.
I fixed it by
Preference -> Date & Time -> check the box "Set date and time automatically"
Restarting the machine fixed this issue for me.
In cmd
Give aws configure
Set your default region name = us-east-1
Whatever it may be but there should be any ,
But not none
Default region name [none] -->> ×××××
Default region name [us-east-1] -->> √√
Create a bucket via GUI in the aws website and check its the time of creation
In the creation date
Note down that date and time from aws
and set the date and time of your pc with as same (which you have noted down)from settings in your pc
And now try to give the command in cmd
-> aws s3 ls