I want to create a cloudwatch alarm for the diskspace-utilization.
I've folowed the AWS doc below
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html
It is creating the cron on my instance and I've checked my system log as well.
Sep 22 12:20:01 ip-#### CRON[13921]: (ubuntu) CMD
(~/volumeAlarm/aws-scripts-mon/mon-put-instance-data.pl
--disk-space-util --disk-space-avail --disk-space-used --disk-path=/ --from-cron)
Sep 22 12:20:13 ip-#### CRON[13920]: (ubuntu) MAIL (mailed 1 byte of output; but got status 0x004b, #012)
also manually running the command,
./mon-put-instance-data.pl --disk-space-util --disk-space-avail
--disk-space-used --disk-path=/
shows the result,
print() on closed filehandle MDATA at CloudWatchClient.pm line 167.
Successfully reported metrics to CloudWatch. Reference Id:####
But there is no metrics in the aws console, So that I can set the alarm,
Please help, If someone solved the problem.
CloudWatch scripts will get the instance's meta data and write it to a
local file /var/tmp/aws-mon/instance-id, if the file or folder has
incorrect permission that the script cannot write to file
/var/tmp/aws-mon/instance-id, then it may throw error like "print() on
closed filehandle MDATA at CloudWatchClient.pm line 167". Sorry for
making assumption. A possible scenario is: the root user executed the
mon-get-instance-stats.pl or mon-put-instance-data.pl scripts
initially, and the scripts has generated the file/folder on place,
then the root user switched back to different user and execute the
CloudWatch scripts again, this error shows up. To fix this, you need
to remove the folder /var/tmp/aws-mon/, and re-execute the CloudWatch
scripts to re-generate the folder and files again.
This is the support answer that I get from the aws support on having the same issue may be it will help u too. Also do check your AWSAccessKey for the EC2 instance as well.
Related
As said in the title, I am attempting to put a CloudWatch Agent (CW agent) on my On-Premise-Server (OPS).
After running this line of code that I got from the AWS User Guide to start the CW agent:
& $Env:ProgramFiles\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1 -m ec2 -a start
I got this error:
****** processing cwagent-otel-collector ******
cwagent-otel-collector will not be started as it has not been configured yet.
****** processing amazon-cloudwatch-agent ******
AmazonCloudWatchAgent has been started
I did/do not know what this was so I searched and found that when someone else had this issue, they did not create a config file.
I did create a config file (named config.json by default) using the configuration wizard and I am still having the issue.
I have tried looking into a number of pages on that user guide, but nothing has resolved the issue.
Thank you in advance for any assistance you can provide.
This message is info and not an error.
CloudWatch agent is bundled with the AWS OpenTelemetry collector agent. They're actually two agents. CloudWatch agent and Otel collector have separate configuration files. If you provide a config for one and not the other, it will only start the one that is configured. This is expected behavior.
Thank you for taking the time to answer. I have since resolved the issue (recently).
Everything from the command I was using to the path where the file resided was incorrect.
Starting over and going through all the steps again with background information helped.
The first installation combined with learning everything for the first time produced the issue.
Anyone having this issue I recommend that when you hit a wall like this you start over. I know it is not what anyone wants to do, but in the end it saved time.
I followed the instructions here and configured cloudwatch monitoring scripts to run via a cron job on my instance. The scripts run, but I keep getting the following error mailed to me (on the MAILTO address in the crontab file):
mon-put-instance-data.pl --mem-util --mem-used-incl-cache-buff --mem-used --mem-avail --aggregated --auto-scaling --from-cron
Use of uninitialized value $data_value in scalar chomp at /home/ec2-user/aws-scripts-mon/CloudWatchClient.pm line 137
I am using an IAM role instead of the credential file and the role has all the permissions mentioned in the link above.
While troubleshooting, I found:
The job is submitting data to cloudwatch and I confirm that I can see all the metrics in cloudwatch console.
There are no errors in /var/log/messages
If I run the script manually with the --verbose flag, I get a success message as well:
print() on closed filehandle MDATA at CloudWatchClient.pm line 167.
print() on closed filehandle MDATA at CloudWatchClient.pm line 167.
print() on closed filehandle MDATA at CloudWatchClient.pm line 167.
MemoryUtilization: 15.9675544623621 (Percent)
MemoryUsed: 1275.01171875 (Megabytes)
MemoryAvailable: 6710.00390625 (Megabytes)
print() on closed filehandle MDATA at CloudWatchClient.pm line 167.
No credential methods are specified. Trying default IAM role. Using IAM role <xxxx-prod-WebServerRole-1891EV5KJYJ49>
Endpoint: https://monitoring.eu-west-1.amazonaws.com
Payload: { /*Removed for brevity*/ }
Received HTTP status 200 on attempt 1
Successfully reported metrics to CloudWatch.
Reference Id: c44c28ff-63e7-11e7-903d-350b8f4c0dae
The error is intermittent but regular - I received the emails at 12:30 PM, 13:21 PM, 14:02 PM, 14:48 PM, 15:16 PM
Not sure what is going on?
You probably are running two mon-put-instance-data.pl commands simultaneously. CloudWatchClient.pm reads and writes to a temporary file in a manner that can break if multiple copies are run at once. The breakage would only be sporadic, because it depends on the exact ordering of the race.
One fix is to use flock in your cron command to ensure mutual exclusion when running mon-put-instance-data.pl:
flock -w 30 /tmp/mon-put-instance-data.lockfile mon-put-instance-data.pl --mem-util --mem-used-incl-cache-buff --mem-used --mem-avail --aggregated --auto-scaling --from-cron
I'm trying to deploy apps using with cf cli commands with jenkins, and have some weird issue now.
It works fine with 1 or 2 concurrent deployments, but if there are more than 3-4 jobs running, any cf cli command returns strange errors randomly like;
No space targeted, use 'cf target -s' to target a space.
or
Server error, status code: 404, error code: 100004, message: The app
could not be found: 0da4xxxx-9476-473a-b77d-f02xxxxxx
However, there is no issue cf cli command itself if I run each cf command one by one.
(I'm only assigned to 1 org, 1 space, so no issue to choose space/target, and app is there if I do 'cf a' later.)
I fixed the config.json issue by this comment, but still blocked by strange behavior of cf cli. Any idea?
https://stackoverflow.com/a/35247160/5862540
The cf CLI stores your configured API endpoint and access & refresh tokens in a local file, $CF_HOME/config.json.
Most cf CLI commands read this file when you invoke them, and many commands write to the file when they finish. Writing is performed for two reasons: when your access token expires, the cf CLI automatically requests a new token from UAA and updates the one in config.json. Also, we simply don't have any logic to check if any updates were made that need persisting, so the file gets written out again just in case.
So it's important to configure a different CF_HOME for any parallel executions of cf CLI commands to avoid random errors. And when your config.json is corrupted, just delete the file and configure your API endpoint & login again.
I have encountered an issue when creating custom AMIs (images) on EC2 instances. If I start up a Windows default 2012 server instance with a custom bootstrap/user-data script such as;
<powershell>
PowerShell "(New-Object System.Net.WebClient).DownloadFile('http://download.microsoft.com/download/3/2/2/3224B87F-CFA0-4E70-BDA3-3DE650EFEBA5/vcredist_x64.exe','C:\vcredist_x64.exe')"
</powershell>
It will work as intended and go to the URL and download the file, and store it on the C: Drive.
But if I setup a Windows Server Instance, then create a image from it, and store it as a Custom AMI, then deploy it with the exact same custom user-data script it will not work. But if I go to the instance url (http://169.254.169.254/latest/user-data) it will show the script has imported successfully but has not been executed.
After checking the error logs I have noticed this on a regular occasion:
Failed to fetch instance metadata http://169.254.169.254/latest/user-data with exception The remote server returned an error: (404) Not Found.
Update 4/15/2017: For EC2Launch and Windows Server 2016 AMIs
Per AWS documentation for EC2Launch, Windows Server 2016 users can continue using the persist tags introduced in EC2Config 2.1.10:
For EC2Config version 2.1.10 and later, or for EC2Launch, you can use
true in the user data to enable the plug-in after
user data execution.
User data example:
<powershell>
insert script here
</powershell>
<persist>true</persist>
For subsequent boots:
Windows Server 2016 users must additionally enable configure and enable EC2Launch instead of EC2Config. EC2Config was deprecated on Windows Server 2016 AMIs in favor of EC2Launch.
Run the following powershell to schedule a Windows Task that will run the user data on next boot:
C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 –Schedule
By design, this task is disabled after it is run for the first time. However, using the persist tag causes Invoke-UserData to schedule a separate task via Register-FunctionScheduler, to persist your user data on subsequent boots. You can see this for yourself at C:\ProgramData\Amazon\EC2-Windows\Launch\Module\Scripts\Invoke-Userdata.ps1.
Further troubleshooting:
If you're having additional issues with your user data scripts, you can find the user data execution logs at C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log for instances sourced from the WS 2016 base AMI.
Original Answer: For EC2Config and older versions of Windows Server
User data execution is automatically disabled after the initial boot. When you created your image, it is probable that execution had already been disabled. This is configurable manually within C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml.
The documentation for "Configuring a Windows Instance Using the EC2Config Service" suggests several options:
Programmatically create a scheduled task to run at system start using schtasks.exe /Create, and point the scheduled task to the user data script (or another script) at C:\Program Files\Amazon\Ec2ConfigServer\Scripts\UserScript.ps1.
Programmatically enable the user data plug-in in Config.xml.
Example, from the documentation:
<powershell>
$EC2SettingsFile="C:\Program Files\Amazon\Ec2ConfigService\Settings\Config.xml"
$xml = [xml](get-content $EC2SettingsFile)
$xmlElement = $xml.get_DocumentElement()
$xmlElementToModify = $xmlElement.Plugins
foreach ($element in $xmlElementToModify.Plugin)
{
if ($element.name -eq "Ec2SetPassword")
{
$element.State="Enabled"
}
elseif ($element.name -eq "Ec2HandleUserData")
{
$element.State="Enabled"
}
}
$xml.Save($EC2SettingsFile)
</powershell>
Starting with EC2Config version 2.1.10, you can use <persist>true</persist> to enable the plug-in after user data execution.
Example, from the documentation:
<powershell>
insert script here
</powershell>
<persist>true</persist>
Another solution that worked for me is to run Sysprep with EC2Launch.
The issue is that AWS doesn't reestablish the route to the profile service (169.254.169.254) in your custom AMI. See response by SanjitPatel in this post. So when I tried to use my custom AMI to create spot requests, my new instances were failing to find user data.
Shutting down with Sysprep, essentially forces AWS re-do all setup work on the instance, as if it were run for the first time. So when you create your instance, shut it down with Sysprep and then create your custom AMI, AWS will setup the profile service route correctly for the new instances and execute your user data. This also avoids manually changing Windows Tasks and executing user data on subsequent boots, as persist tag does.
Here is a quick step-by-step:
Create an instance using one of the AWS Windows AMIs (Windows Server 2016 Nano Server doesn't support Sysprep) and passing your desired user data (this may be optional, but good to make sure AWS wires setup scripts correctly to handle user data).
Customize your instance as needed.
Shut down your instance with Sysprep. Just open EC2LaunchSettings application and click "Shutdown with Sysprep". Full instructions here.
Create your custom AMI from the instance you just shut down.
Use your custom AMI to create other instances, passing user data on instance creation. User data will be executed on instance launch. In my case, I used Spot Request screen, which had a User Data text box.
Hope this helps!
At the end of initial bootstrap (UserData) script, just append persist tag as shown below.
Works perfectly.
<powershell>
insert script here
</powershell>
<persist>true</persist>
For those people that got here from Google and are running a Server 2016 instance, it seems that this is no longer possible.
Server2016 doesn't have ec2config service and so you can't use the persist flag.
<persist>true</persist>
Described in Anthony Neace's post.
Server 2016 uses EC2Launch and I haven't yet seen how it's possible to run a script at every boot. You can run a script on the first boot, but subsequent boots will not run it.
I added below powershell script to run during the AMI bake process which helped me fix this issue. This was Windows server 2019.
$EC2LaunchInitInstance = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1"
$EC2LaunchSysprep = "C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\SysprepInstance.ps1"
Invoke-Expression -Command "$EC2LaunchInitInstance -Schedule"
Invoke-Expression -Command "$EC2LaunchSysprep -NoShutdown"
I'm trying to create a simple dataFlow pipeline with a single Activity of ShellCommandActivity type. I've attached the configuration of the activity and ec2 resource.
When I execute this the Ec2Resource sits in the WAITING_ON_DEPENDENCIES state then after sometime changes to TIMEDOUT. The ShellCommandActivity is always in the CANCELED state. I see the instance launch and very quicky changes to the terminated stated.
I've specified a s3 log file url, but that never gets updated.
Can anyone give me any pointers? Also is there any guidance out there on debugging this?
Thanks!!
You are currently forcing your instance to shut down after 1 minute which gives the TIMEOUT status if it can't execute in that time. Try increasing it to 50 minutes.
Also make sure you are using an AMI that runs Amazon Linux and that you are using full absolute paths in your scripts.
S3 log files are written as:
s3://bucket/folder/