cron resource not working in aws opswork? - amazon-web-services

I've this script in my recipe
cron "logs_processPageView" do
minute "*"
hour "*"
day "*"
month "*"
weekday "*"
command %Q{
echo "hi" >> /home/ubuntu/test.txt
}
action :create
end
When I run the recipe with opswork, here is the corresponding log
[Fri, 12 Jul 2013 02:42:48 +0000] DEBUG: Processing cron[logs_processPageView] on test1.localdomain
[Fri, 12 Jul 2013 02:42:48 +0000] DEBUG: Cron 'logs_processPageView' not found
[Fri, 12 Jul 2013 02:42:48 +0000] INFO: Added cron 'logs_processPageView'
{code}
I assumed the cron have been added to the cron job.
But when I ssh'd to the instance, there is no test.txt, even if I wait an hour. Also there is no new cronjob when I run {code}sudo crontab -l{code} or {code}crontab -l{code}.
Why the resource not adding the cronjob?
I tried to use cron cookbook. There is new file in /etc/cron.d/cronfile, but the cron still not working.
What did I've done wrong? And how to fix it?

This is a bug because opswork was using Chef 9 (very outdated Chef).
Currently they have already upgraded to Chef 11.4, so you can try again, because my script in the question now is working.

Related

Jenkins suddenly started failing to provision agents in Amazon EKS

We are using the Kubernetes plugin to provision agents in EKS and around 8:45 pm EST yesterday, with no apparent changes on our end (I'm the only admin, and I certainly wasn't doing anything then) we started getting issues with provisioning agents. I have rebooted the EKS node and the Jenkins master. I can confirm that kubectl works fine and lists 1 node running.
I'm suspecting something must have changed on the AWS side of things.
What's odd is that those ALPN errors don't show up anywhere else in our logs until just before this started happening. Google around, I see people saying to ignore these "info" messages because the Java version doesn't support ALPN, but the fact that it's complaining about "HTTP/2" makes me wonder if Amazon changed something on their end to be HTTP/2 only?
I know this might seem too specific for a SO question, but if something did change with AWS that broke compatibility, I think this would be the right place.
From the Jenkins log at around 8:45:
INFO: Docker Container Watchdog check has been completed
Aug 29, 2019 8:42:05 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished DockerContainerWatchdog Asynchronous Periodic Work. 0 ms
Aug 29, 2019 8:45:04 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Excess workload after pending Kubernetes agents: 1
Aug 29, 2019 8:45:04 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Template for label eks: Kubernetes Pod Template
Aug 29, 2019 8:45:04 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 29, 2019 8:45:04 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning Kubernetes Pod Template from eks with 1 executors. Remaining excess workload: 0
Aug 29, 2019 8:45:14 PM hudson.slaves.NodeProvisioner$2 run
INFO: Kubernetes Pod Template provisioning successfully completed. We have now 3 computer(s)
Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Created Pod: jenkins-eks-39hfp in namespace jenkins
Aug 29, 2019 8:45:14 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 29, 2019 8:45:14 PM io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1 onFailure
WARNING: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
WARNING: Error in provisioning; agent=KubernetesSlave name: jenkins-eks-39hfp, template=PodTemplate{inheritFrom='', name='jenkins-eks', namespace='jenkins', slaveConnectTimeout=300, label='eks', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], volumes=[HostPathVolume [mountPath=/var/run/docker.sock, hostPath=/var/run/docker.sock], EmptyDirVolume [mountPath=/tmp/build, memory=false]], containers=[ContainerTemplate{name='jnlp', image='infra-docker.artifactory.mycompany.io/jnlp-docker:latest', alwaysPullImage=true, workingDir='/home/jenkins/work', command='', args='-url http://jenkins.mycompany.io:8080 ${computer.jnlpmac} ${computer.name}', ttyEnabled=true, resourceRequestCpu='', resourceRequestMemory='', resourceLimitCpu='', resourceLimitMemory='', envVars=[KeyValueEnvVar [getValue()=/home/jenkins, getKey()=HOME]], livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe#2043f440}], envVars=[KeyValueEnvVar [getValue()=/tmp/build, getKey()=BUILDDIR]], imagePullSecrets=[org.csanchez.jenkins.plugins.kubernetes.PodImagePullSecret#40ba07e2]}
io.fabric8.kubernetes.client.KubernetesClientException:
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:198)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for agent jenkins-eks-39hfp
Aug 29, 2019 8:45:14 PM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
SEVERE: A thread (OkHttp Dispatcher/255634) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask#2c315338 rejected from java.util.concurrent.ScheduledThreadPoolExecutor#2bddc643[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632)
at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:300)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48)
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:213)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Ran into this today as AWS just pushed the update for the net/http golang CVE for K8s versions 1.12.x. That patch apparently broke the version of the Kubernetes plugin we were on. Updating to the latest version of the plugin 1.18.3 resolved the issue.
https://issues.jenkins-ci.org/browse/JENKINS-59000?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel
Just realized my Kubernetes plugin had an update available. Applied that and it seems to be working fine now.
It's a bit difficult to see without direct troubleshooting, but the log points to Jenkins (through the Java API library) not being able to talk to the kube-apiserver because it has been denied.
It would check if you can still talk to the cluster with a previously working KUBECONFIG using the standard kubectl.
I speculate that the reason for the changed behavior could be to an automatic EKS upgrade of your minor version. For example, EKS recently released a patch (~08/30/19) to address CVE-2019-9512 and CVE-2019-9514.
PS. I don't think the issue is related to dropping the HTTP/2 connection.
Just update the latest version of Kubernetes plugin (1.18.3) and restart the Jenkins. This worked for me.

Hyperledger Fabric: Peer nodes fail to restart with byfn script when machine is shut down while network is running

I have a hyperledger fabric network running on a single AWS instance using the default byfn script.
ERROR: Orderer, cli, CA docker containers show "Up" status. Peers show "Exited" status.
Error occurs when:
Byfn network is running, machine is rebooted (not in my control but because of some external reason).
Network is left running overnight without shutting the machine. Shows same status next morning.
Error shown:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b0523a7b1730 hyperledger/fabric-tools:latest "/bin/bash" 23 seconds ago Up 21 seconds cli
bfab227eb4df hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 23 seconds ago peer1.org1.example.com
6fd7e818fab3 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 19 seconds ago peer1.org2.example.com
1287b6d93a23 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 22 seconds ago peer0.org2.example.com
2684fc905258 hyperledger/fabric-orderer:latest "orderer" 28 seconds ago Up 26 seconds 0.0.0.0:7050->7050/tcp orderer.example.com
93d33b51d352 hyperledger/fabric-peer:latest "peer node start" 28 seconds ago Exited (2) 25 seconds ago peer0.org1.example.com
Attaching docker log: https://hastebin.com/ahuyihubup.cs
Only the peers fail to start up.
Steps I have tried to solve the issue:
docker start $(docker ps -aq) or manually, starting individual peers.
byfn down, generate and then up again. Shows the same result as above.
Rolled back to previous versions of fabric binaries. Same result on 1.1, 1.2 and 1.4. In older binaries, error is not repeated if network is left running overnight but repeats when machine is restarted.
Used older docker images such as 1.1 and 1.2.
Tried starting up only one peer, orderer and cli.
Changed network name and domain name.
Uninstalled docker, docker-compose and reinstalled.
Changed port numbers of all nodes.
Tried restarting without mounting any volumes.
The only thing that works is reformatting the AWS instance and reinstalling everything from scratch. Also, I am NOT using AWS blockchain template.
Any help would be appreciated. I have been stuck on this issue for a month now.
Error resolved by adding following lines to peer-base.yaml:
GODEBUG=netdns=go
dns_search: .
Thanks to #gari-singh for the answer:
https://stackoverflow.com/a/49649678/5248781

Issue with awslogs service and CloudWatch Logs Agent on Ubuntu 16.04

On one of my AWS ec2 instances running Ubuntu 16.04, I'm getting the following errors filled up in my /var/syslog.
Jul 17 18:11:21 Mysql-Slave systemd[1]: Stopped The CloudWatch Logs agent.
Jul 17 18:11:21 Mysql-Slave systemd[1]: Started The CloudWatch Logs agent.
Jul 17 18:11:26 Mysql-Slave systemd[1]: awslogs.service: Main process exited, code=exited, status=255/n/a
Jul 17 18:11:26 Mysql-Slave systemd[1]: awslogs.service: Unit entered failed state.
Jul 17 18:11:26 Mysql-Slave systemd[1]: awslogs.service: Failed with result 'exit-code'.
Jul 17 18:11:26 Mysql-Slave systemd[1]: awslogs.service: Service hold-off time over, scheduling restart.
Jul 17 18:11:26 Mysql-Slave systemd[1]: Stopped The CloudWatch Logs agent.
Jul 17 18:11:26 Mysql-Slave systemd[1]: Started The CloudWatch Logs agent.
Jul 17 18:11:32 Mysql-Slave systemd[1]: awslogs.service: Main process exited, code=exited, status=255/n/a
Jul 17 18:11:32 Mysql-Slave systemd[1]: awslogs.service: Unit entered failed state.
Jul 17 18:11:32 Mysql-Slave systemd[1]: awslogs.service: Failed with result 'exit-code'.
Jul 17 18:11:32 Mysql-Slave systemd[1]: awslogs.service: Service hold-off time over, scheduling restart.
Jul 17 18:11:32 Mysql-Slave systemd[1]: Stopped The CloudWatch Logs agent.
Jul 17 18:11:32 Mysql-Slave systemd[1]: Started The CloudWatch Logs agent.
The /var/log/awslogs.log contains these messages:
database is locked
2018-07-17 20:59:01,055 - cwlogs.push - INFO - 27074 - MainThread - Missing or invalid value for use_gzip_http_content_encoding config. Defaulting to using gzip encoding.
2018-07-17 20:59:01,055 - cwlogs.push - INFO - 27074 - MainThread - Using default logging configuration.
database is locked
2018-07-17 20:59:06,549 - cwlogs.push - INFO - 27104 - MainThread - Missing or invalid value for use_gzip_http_content_encoding config. Defaulting to using gzip encoding.
2018-07-17 20:59:06,549 - cwlogs.push - INFO - 27104 - MainThread - Using default logging configuration.
database is locked
2018-07-17 20:59:12,054 - cwlogs.push - INFO - 27110 - MainThread - Missing or invalid value for use_gzip_http_content_encoding config. Defaulting to using gzip encoding.
2018-07-17 20:59:12,054 - cwlogs.push - INFO - 27110 - MainThread - Using default logging configuration.
Any pointers in troubleshooting this will be of great help.
A similar issue was posted in the following link - https://forums.aws.amazon.com/thread.jspa?threadID=165134
I did the following:
a) Stopped the awslogs service
$ service awslogs stop ## Amazon Linux
OR
$ service awslogsd stop ## Amazon Linux 2
b) Deleted the agent-state file in /var/awslogs/state/ (I renamed it in my case)
$ mv agent-state agent-state.old ## Amazon Linux
OR
$ cd /var/lib/awslogs; mv agent-stat agent-stat.old ## Amazon Linux 2
c) Restarted the awslogs service
$ service awslogs start ## Amazon Linux
OR
$ sudo systemctl start awslogsd ## Amazon Linux 2
A new agent-state file was created as a result and the errors mentioned my post disappeared after this.
Please try the following commands based on your Linux version
sudo service awslogs start
If you are running Amazon Linux 2, try the below command
sudo systemctl start awslogsd
took me 2 hours to figure this out
In my case, I found duplicate entries for some properties in /etc/awslogs/awslogs.conf file.
(Not all were duplicates, as some of the properties were commented, and I uncommented them to set values.)
It didn't work. Then I scrolled till the bottom of the file.
I found following entries. Set the values to these properties and it worked.
[/var/log/messages]
datetime_format = %b %d %H:%M:%S
file = /home/ec2-user/application.log
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = MyProject

Cloudwatch Custom Metric - Memory Utilization --from-cron not working

I managed to follow all the steps listed here to setup the aws scripts to pick up the memory usage in the system and report it to cloudwatch. The problem i'm having is that it is not getting picked up in the Cloudwatch console.
When I do
$ ~/aws-scripts-mon/mon-put-instance-data.pl --mem-util --verbose
The metric gets successfully sent to Cloudwatch. I pick it up in the console
But when i try to do the same through a cron job, it doesnt get picked up in the Cloudwatch console.
To setup the cron , i did
$ sudo crontab -e
and added this line
*/5 * * * * ~/aws-scripts-mon/mon-put-instance-data.pl --mem-util --from-cron
saved and exited. When i check the /var/log/syslog, it says that the metric was successfully sent, but for some reason, i dont catch it in the cloudwatch console. What am i missing here ?
The syslog is below for reference (with ip masked)
Jan 18 22:55:01 ip-xxx-xx-xx-xx CRON[22536]: (root) CMD (~/aws-scripts-mon/mon-put-instance-data.pl --mem-util --from-cron)
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/pickup[22530]: 7FF494449A: uid=0 from=<root>
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/cleanup[22540]: 7FF494449A: message-id=<20170118225501.7FF494449A#ip-xxx-xx-xx-xx.localdomain>
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/qmgr[21671]: 7FF494449A: from=<root#ip-xxx-xx-xx-xx.localdomain>, size=673, nrcpt=1 (queue active)
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/local[22542]: warning: dict_nis_init: NIS domain name not set - NIS lookups disabled
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/local[22542]: 7FF494449A: to=<root#ip-xxx-xx-xx-xx.localdomain>, orig_to=<root>, relay=local, delay=0.03, delays=0.02/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Jan 18 22:55:01 ip-xxx-xx-xx-xx postfix/qmgr[21671]: 7FF494449A: removed
Note: The absolute path in the cron job did the trick. Documented the various hiccups here.
Cron doesn't use the login shell's environment variables, so ~ might not resolve to your current user's HOME directory as it would in your manual tests. Try replacing this with the absolute path (e.g., /home/sarul/aws-script-mon/mon-put-instance-data.pl and see if it runs the script correctly.
If you're using local AWS credentials in the user's environment or ~/.aws/config rather than an instance profile, you might need to add these credentials somewhere accessible by cron as well.
Also note that the postfix syslog entries indicate that a mail message of some sort is being queued - perhaps related to an error reported by the script invoked by cron.

Cntlmd not starting under systemd on Centos 7.1

Had a weird error trying to start cntlmd on Centos 7.1.
systemctl start cntlmd` results in the following in the logs (and yes, becomming is exactly how it's spelt in the logs :)):
systemd: Started SYSV: Cntlm is meant to be given your proxy address and becoming
Weird thing is:
that it did run initially after installation.
The exact same config works perfectly on another machine (provisioned with Chef so 100% same config).
If I run it in the foreground it works but through systemd, not.
To "fix" it, I had to manually remove and reinstall, whereupon it worked again.
Anybody seen this error (Google reveals nothing) and know what's going on?
I realised that the /var/run/cntlm directory seemed to be "removed" after every boot. Turns out the /var/run/cntlm directory is never created by systemd-tmpfiles on boot (thanks to this SO answer), which then resulted in:
Feb 29 06:13:04 node01 cntlm: Using following NTLM hashes: NTLMv2(1) NT(0) LM(0)
Feb 29 06:13:04 node01 cntlm[10540]: Daemon ready
Feb 29 06:13:04 node01 cntlm[10540]: Changing uid:gid to 996:995 - Success
Feb 29 06:13:04 node01 cntlm[10540]: Error creating a new PID file
because cntlm couldn't write it's pid file because /var/run/cntlm didn't exist.
So to get systemd-tmpfiles to create the /var/run/cntlm directory on boot you need to add the following file in /usr/lib/tmpfiles.d/cntlm.conf:
d /run/cntlm 700 cntlm cntlm
Reboot and Bob's your uncle.