Cloudwatch Custom Metric - Memory Utilization --from-cron not working - amazon-web-services

I managed to follow all the steps listed here to setup the aws scripts to pick up the memory usage in the system and report it to cloudwatch. The problem i'm having is that it is not getting picked up in the Cloudwatch console.
When I do
$ ~/aws-scripts-mon/mon-put-instance-data.pl --mem-util --verbose
The metric gets successfully sent to Cloudwatch. I pick it up in the console
But when i try to do the same through a cron job, it doesnt get picked up in the Cloudwatch console.
To setup the cron , i did
$ sudo crontab -e
and added this line
*/5 * * * * ~/aws-scripts-mon/mon-put-instance-data.pl --mem-util --from-cron
saved and exited. When i check the /var/log/syslog, it says that the metric was successfully sent, but for some reason, i dont catch it in the cloudwatch console. What am i missing here ?
The syslog is below for reference (with ip masked)
Jan 18 22:55:01 ip-xxx-xx-xx-xx CRON[22536]: (root) CMD (~/aws-scripts-mon/mon-put-instance-data.pl --mem-util --from-cron)
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/pickup[22530]: 7FF494449A: uid=0 from=<root>
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/cleanup[22540]: 7FF494449A: message-id=<20170118225501.7FF494449A#ip-xxx-xx-xx-xx.localdomain>
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/qmgr[21671]: 7FF494449A: from=<root#ip-xxx-xx-xx-xx.localdomain>, size=673, nrcpt=1 (queue active)
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/local[22542]: warning: dict_nis_init: NIS domain name not set - NIS lookups disabled
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/local[22542]: 7FF494449A: to=<root#ip-xxx-xx-xx-xx.localdomain>, orig_to=<root>, relay=local, delay=0.03, delays=0.02/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/qmgr[21671]: 7FF494449A: removed
Note: The absolute path in the cron job did the trick. Documented the various hiccups here.

Cron doesn't use the login shell's environment variables, so ~ might not resolve to your current user's HOME directory as it would in your manual tests. Try replacing this with the absolute path (e.g., /home/sarul/aws-script-mon/mon-put-instance-data.pl and see if it runs the script correctly.
If you're using local AWS credentials in the user's environment or ~/.aws/config rather than an instance profile, you might need to add these credentials somewhere accessible by cron as well.
Also note that the postfix syslog entries indicate that a mail message of some sort is being queued - perhaps related to an error reported by the script invoked by cron.

Related

Apache TLS Handshakes Timeout after DHCP Lease Renewal

I'm trying to figure out why my HTTPS sites go down everytime my server's DHCP lease gets renewed.
It happens consistently, but HTTP sites continue to work just fine.
Restarting systemd-networkd brings the sites back, but until that happens the HTTPS sites are basically unreachable.
Any tips on where to look first?
The weird thing is these sites come back after the next DHCP lease renewal, then I lose connectivity on the next one, then it comes back, then I lose it, on and on.
This is what I see in syslog when it happens.
Apr 13 18:06:25 www-1 systemd-networkd[13973]: ens4: DHCP lease lost
Apr 13 18:06:25 www-1 systemd-networkd[13973]: ens4: DHCPv4 address 10.138.0.29/32 via 10.138.0.1
Apr 13 18:06:25 www-1 systemd-networkd[13973]: ens4: IPv6 successfully enabled
Apr 13 18:06:25 www-1 dbus-daemon[579]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.231' (uid=101 pid=13973 comm="/lib/systemd/systemd-networkd " label="unconfined")
Apr 13 18:06:25 www-1 systemd-networkd[13973]: ens4: Configured
Apr 13 18:06:25 www-1 systemd[1]: Starting Hostname Service...
Apr 13 18:06:25 www-1 dbus-daemon[579]: [system] Successfully activated service 'org.freedesktop.hostname1'
Apr 13 18:06:25 www-1 systemd[1]: Started Hostname Service.
Apr 13 18:06:25 www-1 systemd-hostnamed[17589]: Changed host name to 'www-1.us-west1-b.c.camp-fire-259800.internal'
This issue seems to be related to the following:
https://moss.sh/name-resolution-issue-systemd-resolved/
and
https://github.com/systemd/systemd/issues/9243
I've disabled systemd-resolved and am using a static /etc/resolv.conf copied from /run/systemd/resolve/resolv.conf
For internal DNS I'm using a private Google DNS Zone.
Thanks.

Trouble with email setup for Wikimedia site

I'm using Google Cloud Engine, Bitnami, and Mailgun to set up a Mediawiki site (v1.33.1-1 on Debian 9). I'm very new to every one of things.
My Mailgun is properly set up and verified, and I'm following the documentation provided here: https://cloud.google.com/compute/docs/tutorials/sending-mail/using-mailgun
When I run:
echo 'Test passed.' | mail -s 'Test-Email' EMAIL#EXAMPLE.COM
And then:
tail -n 5 /var/log/syslog
These are my results:
root#bitnami-mediawiki-860c:~# tail -n 5 /var/log/syslog
Nov 15 03:58:39 bitnami-mediawiki-860c postfix/qmgr[13119]: 8E84FA13DA: from=<>, size=2918, nrcpt=1 (queue active)
Nov 15 03:58:39 bitnami-mediawiki-860c postfix/bounce[13144]: 7A557A13D9: sender non-delivery notification: 8E84FA13DA
Nov 15 03:58:39 bitnami-mediawiki-860c postfix/qmgr[13119]: 7A557A13D9: removed
Nov 15 03:58:39 bitnami-mediawiki-860c postfix/smtp[13142]: 8E84FA13DA: to=<root#bitnami-mediawiki-860c>, relay=none, delay=0.01, delays=0.01/0/0/0, dsn=
5.4.4, status=bounced (Host or domain name not found. Name service error for name=bitnami-mediawiki-860c type=AAAA: Host not found)
Nov 15 03:58:39 bitnami-mediawiki-860c postfix/qmgr[13119]: 8E84FA13DA: removed
Can anyone tell me how to fix this? Be specific if you can, as I'm beginning from nearly zero prior knowledge.
Google Cloud by default has always blocked the port 25, however, you can use different ports, i.e. 587 and 465.
Those ports should work to send mails from an VM instance, which could be the root cause for this to not being working as expected. It should work as mentioned on the comments with the port 2525.

Error: NoCredentialProviders: no valid providers in chain. Deprecated. error with dehydrated tool

I am trying to update certs on my servers with dehydrated and dehydrated-route53-hook-script.
Here is the complete command and error:
./xsys renewcerts
Running: cd certificates && ./dehydrated --cron
# INFO: Using main config file ..config/certificates/config
Processing mydomain.org with alternative names: dev-mydomain.org
+ Checking domain name(s) of existing cert... unchanged.
+ Checking expire date of existing cert...
+ Valid till Apr 21 11:47:17 2019 GMT (Less than 30 days). Renewing!
+ Signing domains...
+ Generating private key...
+ Generating signing request...
+ Requesting new certificate order from CA...
+ Received 2 authorizations URLs from the CA
+ Handling authorization for dev-mydomain.org
+ Handling authorization for mydomain.org
+ 2 pending challenge(s)
+ Deploying challenge tokens...
Error: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Could not find zone for dev-mydomain.org
Running: cd certificates && ./dehydrated --cleanup
Looks like the aws credentials are failing, but from everything I can tell those are OK. I last ran this ~60 days ago and it ran fine then and (as far as I know) nothing has changed.
Any ideas on where to look for a fix is appreciated.
Update
I found that this command is failing:
$cli53 list
Error: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
So the root issue seems to be cli53. I have credentials in ~/.aws/credentials per docs.
This ended up being an issue with cli53. I had a symlink as follows...
ls -la .aws/
total 0
drwxr-xr-x 3 myuser staff 96 Apr 5 15:33 .
drwxr-xr-x+ 143 myuser staff 4576 Apr 8 12:30 ..
lrwxr-xr-x 1 myuser staff 69 Apr 5 15:33 credentials -> /Users/myuser/ansible/myapp/_secrets/aws_credentials
...but I had recently changed this path to:
/Users/myuser/apps/myapp/_secrets/aws_credentials so it was simply a failure of cli53 being able to find the appropriate credentials.

BotoServerError: 400 Bad Request - While sending email from EC2 Ubuntu instance

I am using Django (Pyhton) framework deployed on AWS EC2 Ubuntu instance and sending email using BOTO and AWS SES service.
Earlier my script used to work.
But since few days I have encountered an error:
BotoServerError at /contact_us/
BotoServerError: 400 Bad Request
<ErrorResponse xmlns="http://ses.amazonaws.com/doc/2010-12-01/">
<Error>
<Type>Sender</Type>
<Code>RequestExpired</Code>
<Message>Request timestamp: Wed, 16 Mar 2016 16:57:21 GMT expired. It must be within 300 secs/ of server time.</Message>
</Error>
<RequestId>368a4b97-eb97-11e5-bf2d-8ff0675b134d</RequestId>
</ErrorResponse>
Exception Location: /usr/local/lib/python2.7/dist-packages/boto/ses/connection.py
in _handle_error, line 177
Server time: Wed, 16 Mar 2016 16:57:21 +0000
The SES is working on UTC and I have changed the time of EC2 to UTC as well.
Help me how to solve this issue.
Request timestamp: Wed, 16 Mar 2016 16:57:21 GMT expired. It
must be within 300 secs/ of server time.
Since you said it is not working for few days, it is most likely due to recent daylight savings time. And it is likely you are not running ntp to sync your clock.
Try this: sudo ntpdate pool.ntp.org
which will sync your system clock. If you want to make sure the time sync happens periodically, then start the NTP daemon:
sudo service ntp stop
sudo ntpdate -s pool.ntp.org
sudo service ntp start

cron resource not working in aws opswork?

I've this script in my recipe
cron "logs_processPageView" do
minute "*"
hour "*"
day "*"
month "*"
weekday "*"
command %Q{
echo "hi" >> /home/ubuntu/test.txt
}
action :create
end
When I run the recipe with opswork, here is the corresponding log
[Fri, 12 Jul 2013 02:42:48 +0000] DEBUG: Processing cron[logs_processPageView] on test1.localdomain
[Fri, 12 Jul 2013 02:42:48 +0000] DEBUG: Cron 'logs_processPageView' not found
[Fri, 12 Jul 2013 02:42:48 +0000] INFO: Added cron 'logs_processPageView'
{code}
I assumed the cron have been added to the cron job.
But when I ssh'd to the instance, there is no test.txt, even if I wait an hour. Also there is no new cronjob when I run {code}sudo crontab -l{code} or {code}crontab -l{code}.
Why the resource not adding the cronjob?
I tried to use cron cookbook. There is new file in /etc/cron.d/cronfile, but the cron still not working.
What did I've done wrong? And how to fix it?
This is a bug because opswork was using Chef 9 (very outdated Chef).
Currently they have already upgraded to Chef 11.4, so you can try again, because my script in the question now is working.