Stackdriver Monitoring floods collectd uc_update: Value too old in syslog - google-cloud-platform

Let me preface this, by stating that I am not a DevOp, so my experience with Linux administration is limited.
I basically followed this How-To (https://cloud.google.com/monitoring/agent/install-agent) and installed the agent on my Google Compute Instance.
Everything works, I get the new metrics in my stackdriver account, however I get this flooded in my syslog
instance-name collectd[26092]: uc_update: Value too old: name = <RandomNumber>/processes-all/ps_vm; value time = 1517218302.393; last cache update = 1517218302.393;
So I found this in my /opt/stackdriver/collectd/etc/collectd.conf file
Hostname "RandomNumber"
Interval 60
This makes sense, we dont use collectd for anything else, beside stackdriver. So finding that the proccessid that causes the problem is the same as stackdriver hostname is in order.
Next I checked https://collectd.org/faq.shtml
I run this command for both /etc/collectd.conf and /opt/stackdriver/collectd/etc/collectd.conf
grep -i LoadPlugin /etc/collectd.conf | egrep -v '^[[:space:]]*#' | sort | uniq -c
1 LoadPlugin cpu
1 LoadPlugin interface
1 LoadPlugin load
1 LoadPlugin memory
1 LoadPlugin network
1 LoadPlugin syslog
grep -i LoadPlugin /opt/stackdriver/collectd/etc/collectd.conf | egrep -v '^[[:space:]]*#' | sort | uniq -c
1 LoadPlugin "match_regex"
1 LoadPlugin aggregation
1 LoadPlugin cpu
1 LoadPlugin df
1 LoadPlugin disk
1 LoadPlugin exec
1 LoadPlugin interface
1 LoadPlugin load
1 LoadPlugin match_regex
1 LoadPlugin match_throttle_metadata_keys
1 LoadPlugin memory
1 LoadPlugin processes
1 LoadPlugin stackdriver_agent
1 LoadPlugin swap
1 LoadPlugin syslog
1 LoadPlugin tcpconns
1 LoadPlugin write_gcm
As you can see there is no repeating values.
I have run out of ideas, can someone help?
Thank you.
P.S.
We are using Debian Stretch and running lighttpd with php.
P.S. More information
This is a more detailed log with the error in it you can see the timestamps
Jan 30 10:47:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_cputime; value time = 1517309269.877; last cache update = 1517309269.877;
Jan 30 10:48:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_cputime; value time = 1517309329.884; last cache update = 1517309329.884;
Jan 30 10:50:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_rss; value time = 1517309449.881; last cache update = 1517309449.881;
Jan 30 10:50:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/io_octets; value time = 1517309449.881; last cache update = 1517309449.884;
Jan 30 10:52:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_vm; value time = 1517309569.889; last cache update = 1517309569.889;
Jan 30 10:52:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/disk_octets; value time = 1517309569.890; last cache update = 1517309569.890;
Jan 30 10:52:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/disk_octets; value time = 1517309569.890; last cache update = 1517309569.894;
This is the output of the PS command
ps -e
PID TTY TIME CMD
1 ? 00:01:28 systemd
2 ? 00:00:00 kthreadd
3 ? 00:00:24 ksoftirqd/0
5 ? 00:00:00 kworker/0:0H
7 ? 00:41:17 rcu_sched
8 ? 00:00:00 rcu_bh
9 ? 00:00:02 migration/0
10 ? 00:00:00 lru-add-drain
11 ? 00:00:03 watchdog/0
12 ? 00:00:00 cpuhp/0
13 ? 00:00:00 cpuhp/1
14 ? 00:00:03 watchdog/1
15 ? 00:00:01 migration/1
16 ? 00:11:58 ksoftirqd/1
18 ? 00:00:00 kworker/1:0H
19 ? 00:00:00 cpuhp/2
20 ? 00:00:03 watchdog/2
21 ? 00:00:01 migration/2
22 ? 00:03:16 ksoftirqd/2
24 ? 00:00:00 kworker/2:0H
25 ? 00:00:00 cpuhp/3
26 ? 00:00:03 watchdog/3
27 ? 00:00:02 migration/3
28 ? 00:03:11 ksoftirqd/3
30 ? 00:00:00 kworker/3:0H
31 ? 00:00:00 kdevtmpfs
32 ? 00:00:00 netns
33 ? 00:00:00 khungtaskd
34 ? 00:00:00 oom_reaper
35 ? 00:00:00 writeback
36 ? 00:00:00 kcompactd0
38 ? 00:00:00 ksmd
39 ? 00:01:02 khugepaged
40 ? 00:00:00 crypto
41 ? 00:00:00 kintegrityd
42 ? 00:00:00 bioset
43 ? 00:00:00 kblockd
44 ? 00:00:00 devfreq_wq
45 ? 00:00:00 watchdogd
49 ? 00:01:16 kswapd0
50 ? 00:00:00 vmstat
62 ? 00:00:00 kthrotld
63 ? 00:00:00 ipv6_addrconf
130 ? 00:00:00 scsi_eh_0
131 ? 00:00:00 scsi_tmf_0
133 ? 00:00:00 bioset
416 ? 07:01:34 jbd2/sda1-8
417 ? 00:00:00 ext4-rsv-conver
443 ? 00:02:37 systemd-journal
447 ? 00:00:00 kauditd
452 ? 00:00:01 kworker/0:1H
470 ? 00:00:01 systemd-udevd
483 ? 00:00:26 cron
485 ? 00:00:37 rsyslogd
491 ? 00:00:00 acpid
496 ? 00:00:49 irqbalance
497 ? 00:00:21 systemd-logind
498 ? 00:00:36 dbus-daemon
524 ? 00:00:00 edac-poller
612 ? 00:00:02 kworker/2:1H
613 ? 00:00:00 dhclient
674 ? 00:00:00 vsftpd
676 ttyS0 00:00:00 agetty
678 tty1 00:00:00 agetty
687 ? 00:01:18 ntpd
795 ? 4-19:58:17 mysqld
850 ? 00:00:15 sshd
858 ? 00:04:06 google_accounts
859 ? 00:00:33 google_clock_sk
861 ? 00:01:05 google_ip_forwa
892 ? 01:31:57 kworker/1:1H
1154 ? 00:00:00 exim4
1160 ? 00:00:01 kworker/3:1H
4259 ? 00:00:00 kworker/2:1
6090 ? 00:00:00 kworker/0:1
6956 ? 00:00:00 sshd
6962 ? 00:00:00 sshd
6963 pts/0 00:00:00 bash
6968 pts/0 00:00:00 su
6969 pts/0 00:00:00 bash
6972 ? 00:00:00 kworker/u8:2
7127 ? 00:00:00 kworker/3:2
7208 ? 00:00:00 php-fpm7.0
7212 ? 00:00:00 kworker/0:0
10516 ? 00:00:00 systemd
10517 ? 00:00:00 (sd-pam)
10633 ? 00:00:00 kworker/2:2
11569 ? 00:00:00 kworker/3:1
12539 ? 00:00:00 kworker/1:2
13625 ? 00:00:00 kworker/1:0
13910 ? 00:00:00 sshd
13912 ? 00:00:00 systemd
13913 ? 00:00:00 (sd-pam)
13920 ? 00:00:00 sshd
13921 ? 00:00:00 sftp-server
13924 ? 00:00:00 sftp-server
14016 pts/0 00:00:00 tail
14053 ? 00:00:03 php-fpm7.0
14084 ? 00:00:00 sshd
14090 ? 00:00:00 sshd
14091 pts/1 00:00:00 bash
14098 ? 00:00:01 php-fpm7.0
14099 pts/1 00:00:00 su
14100 pts/1 00:00:00 bash
14105 ? 00:00:00 sshd
14106 ? 00:00:00 sshd
14107 ? 00:00:00 php-fpm7.0
14108 pts/1 00:00:00 ps
17456 ? 00:00:03 kworker/u8:1
17704 ? 01:38:36 lighttpd
21624 ? 00:00:30 perl
25593 ? 00:00:00 sshd
25595 ? 00:00:00 systemd
25596 ? 00:00:00 (sd-pam)
25602 ? 00:00:00 sshd
25603 ? 00:00:00 sftp-server
25641 ? 00:00:00 sftp-server
27001 ? 00:00:00 gpg-agent
28953 ? 00:01:20 stackdriver-col
PS grep comamnd with less, output
root#instance-7:/home/# ps aux | grep collectd
root 6981 0.0 0.0 12756 976 pts/0 S+ 13:40 0:00 grep collectd
root 28953 0.1 1.1 1105712 41960 ? Ssl Jan29 3:16 /opt/stackdriver/collectd/sbin/stackdriver-collectd -C /opt/stackdriver/collectd/etc/collectd.conf -P /var/run/stackdriver-agent.pid

These should be normal messages from the Stackdriver agent. (If the rate is as you said 2-3 messages per minute.)
I suggest you to install ntp/ntpd service and sync it to any time server, so you can have the right time on your system.
example ntp server: pool.ntp.org

You are just getting a duplicate as your message has identical timestamp values
for both, the new value to be added to the internal cache and the last value with the same name that was added to the cache.
value time = 1517218302.393
last cache update = 1517218302.393
You can refer to the collectd faq page (https://collectd.org/faq.shtml). It explains this kind of messages including an example which matches the one you got.
You should check:
- If there are more than one collectd daemon running in your instance (ps command). To see the collectd processes you can run:
ps aux | grep collectd
Are the timestamps increasing with each message? If this is the case, it could be that there is another host report data using the same host name.

Since those logs seem to not been affecting the instance, if they're flooding your Stackdriver, you can exclude those logs from the default sink.
Using gcloud, this can be accomplished with the following command:
gcloud logging sinks update _Default --log-filter "$(echo $(gcloud logging sinks describe _Default --format "value(filter)") "AND NOT textPayload:\"uc_update: Value too old:\"")"

Related

Dataproc cluster fails to initialize

With the standard dataproc image 1.5 (Debian 10, Hadoop 2.10, Spark 2.4), a dataproc cluster cannot be created. Region is set to europe-west-2.
The stack-driver log says:
"Failed to initialize node <name of cluster>-m: Component hdfs failed to activate See output in: gs://.../dataproc-startup-script_output"
Scanning through the output (gs://.../dataproc-startup-script_output), I can see the hdfs activation has failed:
Aug 18 13:21:59 activate-component-hdfs[2799]: + exit_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -ne 0 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + log_and_fail hdfs 'Component hdfs failed to activate' 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local component=hdfs
Aug 18 13:21:59 activate-component-hdfs[2799]: + local 'message=Component hdfs failed to activate'
Aug 18 13:21:59 activate-component-hdfs[2799]: + local error_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local client_error_indicator=
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -eq 2 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 'StructuredError{hdfs, Component hdfs failed to activate}'
Aug 18 13:21:59 activate-component-hdfs[2799]: StructuredError{hdfs, Component hdfs failed to activate}
Aug 18 13:21:59 activate-component-hdfs[2799]: + exit 1
What am I missing?
EDIT
As #Dagang suggested, I ssh-ed into the master node and ran grep "activate-component-hdfs" /var/log/dataproc-startup-script.log. The output is here.
So the problem is there is an user name called "pete{" on which the hadoop fs -mkdir -p command failed. These kind of user names with special chars especially open parenthesis e,g,"()[]{}" will potentially fail the HDFS activation step during cluster creation.
So the easy solution is just to remove those accidentally created user.

Select specific columns from a record using only 'sed' without using 'awk'

Here are some sample input I obtain from doing ls -l :
-rwxr-xr-x 1 root root 1779 Jan 10 2014 zcmp
-rwxr-xr-x 1 root root 5766 Jan 10 2014 zdiff
-rwxr-xr-x 1 root root 142 Jan 10 2014 zegrep
-rwxr-xr-x 1 root root 142 Jan 10 2014 zfgrep
-rwxr-xr-x 1 root root 2133 Jan 10 2014 zforce
-rwxr-xr-x 1 root root 5940 Jan 10 2014 zgrep
lrwxrwxrwx 1 root root 8 Dec 5 2015 ypdomainname -> hostname
I would like to print out the last column and 5th column using ONLY sed like this:
zcmp 1779
zdiff 5766
zegrep 142
zfgrep 142
zforce 2133
zgrep 5940
ypdomainname -> hostname 8
I'm trying to find a regex to match but have not succeeded. And I'm not allowed to use awk or cut either.
Thank you in advance.
Try this;
ls -l | sed -r 's/^(\S+\s+){5}(\S+\s+){3}/\1/' | sed 's/^\(.*\) \(.*\)$/\2\ \1/g'

Print the first line occurrence of each matching pattern with sed

I would like to filter the output of the utility last based on a variable set of usernames.
This is sample output from last unfiltered,
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)
reboot system boot system Wed Apr 6 13:08 - 13:15 (00:06)
user1 pts/0 server Wed Apr 6 13:06 - down (00:01)
reboot system boot system Wed Apr 6 13:06 - 13:07 (00:01)
user1 pts/0 server Wed Apr 6 12:59 - down (00:06)
What I would like to do is pipe the output of last to sed. Then, using sed I would print the first occurrence of each specified user name i.e. their last log entry in wtmp. The output should appear as so,
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)
The sed expression that I particularly like is,
last|sed '/user1/{p;q;}'
Unfortunately this only gives me the ability to match the first occurrence of one username. Using this syntax is there a way I could specify a multiple of usernames? Thanks in advance!
awk is better fit here than sed due to awk's ability to use associative arrays:
last | awk '!seen[$1]++'
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)

regexp to wrap a line with ${color} and $color

Is there a way to have this regex put ${color orange} at the beginning, and $color at the end of the line where the date is found?
DJS=`date +%_d`;
cat thisweek.txt | sed s/"\(^\|[^0-9]\)$DJS"'\b'/'\1${color orange}'"$DJS"'$color'/
With this expression I get this:
Saturday Aug 13 12pm - 9pm 4pm - 5pm
Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
Thursday Aug ${color orange}18$color Not Currently Scheduled for This Day
Friday Aug 19 7am - 3:30pm 10:30am - 11:30am
What I want to have is this:
Saturday Aug 13 12pm - 9pm 4pm - 5pm Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
${color orange}Thursday Aug 18 Not Currently Scheduled for This Day$color
Friday Aug 19 7am - 3:30pm 10:30am - 11:30am
Acually, it works for me. Depending on your version of sed, you might need to pass -r. Also, as tripleee says, don't use cat here
DJS=`date +%_d`
sed -r s/"\(^\|[^0-9]\)$DJS"'\b'/'\1${color orange}'"$DJS"'$color'/ thisweek.txt
EDIT: Ok, so with the new information I arrived at this:
sed -r "s/([^0-9]+19.+)/\${color orange}\1\$color/" thisweek.txt
This gives me the output
Saturday Aug 13 12pm - 9pm 4pm - 5pm
Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
Thursday Aug 18 Not Currently Scheduled for This Day
${color orange}Friday Aug 19 7am - 3:30pm 10:30am - 11:30am $color
(Note that it differs from your's since it's friday at least in my time zone)

How to get the SPID in linux 2.6 from C++

I have a question: Is there some way to the SPID in linux 2.6 from a C++ application? When I do a "ps -amT" I can see the threads in the process:
root#10.67.100.2:~# ps -amT
PID SPID TTY TIME CMD
1120 - pts/1 00:00:20 sncmdd
- 1120 - 00:00:00 -
- 1125 - 00:00:00 -
- 1126 - 00:00:00 -
- 1128 - 00:00:00 -
- 1129 - 00:00:09 -
- 1130 - 00:00:00 -
- 1131 - 00:00:09 -
1122 - pts/1 00:00:00 snstatusdemuxd
- 1122 - 00:00:00 -
- 1127 - 00:00:00 -
- 1132 - 00:00:00 -
- 1133 - 00:00:00 -
And then in the filesystem I can see the threads:
root#10.67.100.2:~# ls /proc/1120/task/
1120 1125 1126 1128 1129 1130 1131
So is there some way I can get the SPID from my application so I can somehow identify what my SPID is in each running thread?
Thanks!
/Mike
Edit: I should add that the PID returned from getpid() is the same in each thread.
When I add this code to my threads:
// Log thread information to syslog
syslog(LOG_NOTICE, "ibnhwsuperv: gettid()= %ld, pthread_self()=%ld", (long int)syscall(224), pthread_self());
I get this result:
Jan 1 01:24:13 10 ibnhwsupervd[1303]: ibnhwsuperv: gettid()= -1, pthread_self()=839027488
Neither of which look like the SPID given by ps or in the proc filesystem.
Also, note that gettid does not return the SPID.
How about gettid()?
Edit: If your libc doesn't have the gettid() function, you should run it like this:
#include <sys/syscall.h>
syscall(SYS_gettid);
... or see example on this manual page.