AWS Powershell - Get-CWMetricStatistics for EBS Read IOPS - amazon-web-services

I'm having some issues getting IOPs stats for EBS volumes, using this code:
Get-CWMetricList -Namespace AWS/EC2 |Select-Object * -Unique
Get-CWMetricList -Namespace AWS/EBS |Select-Object * -Unique
$StartDate = (Get-Date).AddDays(-3)
$EndDate = Get-Date
$ReadIOPS = Get-CWMetricStatistics -Namespace "AWS/EC2" -MetricName "DiskReadOps" -UtcStartTime $StartDate -UtcEndTime $EndDate -Period 300 -Statistics #("Average")
$ReadIOPS.Datapoints.Count
$ReadIOPS = Get-CWMetricStatistics -Namespace "AWS/EBS" -MetricName "VolumeReadOps" -UTCStartTime $StartDate -UTCEndTime $EndDate -Period 300 -Statistics #("Average")
$ReadIOPS.Datapoints.Count
Top 2 lines show that the Namespace/Metrics Names are correct. Rest should show that the first query in the AWS/EC2 name space gets data, however the 2nd in the AWS/EBS namespace doesn't.
The ultimate goal is to add a -dimension tag and grab all read/write iops for a particular volumed. This is why the AWS/EC2 namespace doens't work as I need to specify a volume id and not an instance ID.
Any ideas why I'm not picking up any datapoints on the latter query?

Turns out that EBS stats require a Vol ID to be specified though this is not called out or errors as such.
I had stripped out the dimension to cast as wide a net as possible/back to basics when troubleshooting. Adding that back in fixed the issue:
ie. this works
$Volume = 'vol-blah'
$dimension1 = New-Object Amazon.CloudWatch.Model.Dimension
$dimension1.set_Name("VolumeId")
$dimension1.set_Value($Volume)
$ReadIOPS = Get-CWMetricStatistics -Namespace "AWS/EBS" -MetricName "VolumeReadOps" -UTCStartTime $StartDate -UTCEndTime $EndDate -Period 300 -Statistics #("Average") -Dimension $dimension1
$ReadIOPS.Datapoints.Count

Related

Parse lines from messages in a Splunk query to be displayed as a chart on a dashboard

I generate events on multiple computers that list service names that aren't running. I want to make a chart that displays the top offending service names.
I can use the following to get a table for the dashboard:
ComputerName="*.ourDomain.com" sourcetype="WinEventLog:Application" EventCode=7223 SourceName="internalSystem"
| eval Date_Time=strftime(_time, "%Y-%m-%d %H:%M")
| table host, Date_Time, Message, EventCode
Typical Message(s) will contain:
The following services were not running after 5603 seconds and a start command has been sent:
Service1
Service2
The following services were not running after 985 seconds and a start command has been sent:
Service2
Service3
Using regex I can make a named group of everything but the first line with (?<Services>((?<=\n)).*)
However, I don't think this is the right approach as I don't know how to do a valuation for the chart with this information.
So in essence, how do I grab and tally service names from messages in Splunk?
Edit 1:
Coming back to this after a few days.
I created a field extraction called "Services" with regex that grabs the contents of each message after the first line.
If I use | stats count BY Services it counts each message as a whole instead of the lines inside. The results look like this:
Service1 Service2 | Count: 1
Service2 Service3 | Count: 1
My intention is to have it treat each line as its own value so the results would look like:
Service1 | Count: 1
Service2 | Count: 2
Service3 | Count: 1
I tried | mvexpand Services but it didn't change the output so I assume I'm either using it improperly or it's not applicable here.
I think you can do it with the stats command.
| stats count by service
will give a number of appearances for each service. You then can choose the bar chart visualization to create a graph.
I ended up using split() and mvexpand to solve this problem.
This is what worked in the end:
My search
| eval events=split(Service, "
")
| mvexpand events
| eval events=replace(events, "[\n\r]", "")
| stats count BY events
I had to add the replace() method because any event with just one service listed was being treated differently from an event with multiple, after the split on an event with multiple services each service had a carriage return, hence the replace.
My end result dashboard chart:
For Chart dropping down that is clean:
index="yourIndex" "<searchCriteria>" | stats count(eval(searchmatch("
<searchCriteria>"))) as TotalCount
count(eval(searchmatch("search1"))) as Name1
count(eval(searchmatch("search2" ))) as Name2
count(eval(searchmatch("search3"))) as Name3
| transpose 5
| rename column as "Name", "row 1" as "Count"
Horizontal table example with percentages:
index=something "Barcode_Fail" OR "Barcode_Success" | stats
count(eval(searchmatch("Barcode_Success"))) as SuccessCount
count(eval(searchmatch("Barcode_Fail"))) as FailureCount
count(eval(searchmatch("Barcode_*"))) as Totals | eval
Failure_Rate=FailureCount/Totals |eval Success_Rate=SuccessCount/Totals

AWS EC2 spot instance availability

I am using the API call request_spot_instances to create spot instance without specifying any availability zone. Normally a random AZ is picked by the API. The spot request sometimes would return a no capacity status whereas I could request for a spot instance successfully through the AWS console in another AZ. What is the proper way to check the availability of the spot instance of a specific instance type before calling the request_spot_instance?
There is no public API to check Spot Instance availability. Having said that, you can still achieve what you want by following the below steps:
Use request_spot_fleet instead, and configure it to launch a single instance.
Be flexible with the instance types you use, pick as many as you can and include them in the request. To help you pick the instances, check Spot Instance advisor for instance interruption and saving rates.
At the Spot Fleet request, configure AllocationStrategy to capacityOptimized this will allow the fleet to allocate capacity form the most available Spot instance from your instances list and reduce the likelihood of Spot interruptions.
Don't set a max price SpotPrice, the default Spot instance price will be used. The pricing model for Spot has changed and it's no longer based on bidding, therefore Spot prices are more stable and don't fluctuate.
This may be a bit overkill for what you are looking for but with parts of the code you can find the spot price history for the last hour (this can be changed). It'll give you the instance type, AZ, and additional information. From there you can loop through the instance type to by AZ. If a spot instance doesn't come up in say 30 seconds try the next AZ.
And to Ahmed's point in his answer, this information can be used in the spot_fleet_request instead of looping through the AZs. If you pass the wrong AZ or subnet in the spot fleet request, it may pass the dryrun api call, but can still fail the real call. Just a heads up on that if you are using the dryrun parameter.
Here's the output of the code that follows:
In [740]: df_spot_instance_options
Out[740]:
AvailabilityZone InstanceType SpotPrice MemSize vCPUs CurrentGeneration Processor
0 us-east-1d t3.nano 0.002 512 2 True [x86_64]
1 us-east-1b t3.nano 0.002 512 2 True [x86_64]
2 us-east-1a t3.nano 0.002 512 2 True [x86_64]
3 us-east-1c t3.nano 0.002 512 2 True [x86_64]
4 us-east-1d t3a.nano 0.002 512 2 True [x86_64]
.. ... ... ... ... ... ... ...
995 us-east-1a p2.16xlarge 4.320 749568 64 True [x86_64]
996 us-east-1b p2.16xlarge 4.320 749568 64 True [x86_64]
997 us-east-1c p2.16xlarge 4.320 749568 64 True [x86_64]
998 us-east-1d p2.16xlarge 14.400 749568 64 True [x86_64]
999 us-east-1c p3dn.24xlarge 9.540 786432 96 True [x86_64]
[1000 rows x 7 columns]
And here's the code:
ec2c = boto3.client('ec2')
ec2r = boto3.resource('ec2')
#### The rest of this code maps the instance details to spot price in case you are looking for certain memory or cpu
paginator = ec2c.get_paginator('describe_instance_types')
response_iterator = paginator.paginate( )
df_hold_list = []
for page in response_iterator:
df_hold_list.append(pd.DataFrame(page['InstanceTypes']))
df_instance_specs = pd.concat(df_hold_list, axis=0).reset_index(drop=True)
df_instance_specs['Spot'] = df_instance_specs['SupportedUsageClasses'].apply(lambda x: 1 if 'spot' in x else 0)
df_instance_spot_specs = df_instance_specs.loc[df_instance_specs['Spot']==1].reset_index(drop=True)
#unapck memory and cpu dictionaries
df_instance_spot_specs['MemSize'] = df_instance_spot_specs['MemoryInfo'].apply(lambda x: x.get('SizeInMiB'))
df_instance_spot_specs['vCPUs'] = df_instance_spot_specs['VCpuInfo'].apply(lambda x: x.get('DefaultVCpus'))
df_instance_spot_specs['Processor'] = df_instance_spot_specs['ProcessorInfo'].apply(lambda x: x.get('SupportedArchitectures'))
#look at instances only between 30MB and 70MB
instance_list = df_instance_spot_specs['InstanceType'].unique().tolist()
#---------------------------------------------------------------------------------------------------------------------
# You can use this section by itself to get the instancce type and availability zone and loop through the instance you want
# just modify instance_list with one instance you want informatin for
#look only in us-east-1
client = boto3.client('ec2', region_name='us-east-1')
prices = client.describe_spot_price_history(
InstanceTypes=instance_list,
ProductDescriptions=['Linux/UNIX', 'Linux/UNIX (Amazon VPC)'],
StartTime=(datetime.now() -
timedelta(hours=1)).isoformat(),
# AvailabilityZone='us-east-1a'
MaxResults=1000)
df_spot_prices = pd.DataFrame(prices['SpotPriceHistory'])
df_spot_prices['SpotPrice'] = df_spot_prices['SpotPrice'].astype('float')
df_spot_prices.sort_values('SpotPrice', inplace=True)
#---------------------------------------------------------------------------------------------------------------------
# merge memory size and cpu information into this dataframe
df_spot_instance_options = df_spot_prices[['AvailabilityZone', 'InstanceType', 'SpotPrice']].merge(df_instance_spot_specs[['InstanceType', 'MemSize', 'vCPUs',
'CurrentGeneration', 'Processor']], left_on='InstanceType', right_on='InstanceType')

AWS glue delete all partitions

I defined several tables in AWS glue.
Over the past few weeks, I've had different issues with the table definition which I had to fix manually - I want to change column names, or types, or change the serialization lib. However, If i already have partitions created, the repairing of table doesn't change them, and so I have to delete all partitions manually and then repairing.
Is there a simple way to do this? Delete all partitions from an AWS Glue table?
I'm using aws batch-delete-partition CLI command, but it's syntax is tricky, and there are some limitations on the amount of partitions you can delete in one go, the whole thing is cumbersome...
For now, I found this command line solution, runinng aws glue batch-delete-partition iteratively for batches of 25 partitions using xargs
(here I am assuming there are max 1000 partitions):
aws glue get-partitions --database-name=<my-database> --table-name=<my-table> | jq -cr '[ { Values: .Partitions[].Values } ]' > partitions.json
seq 0 25 1000 | xargs -I _ bash -c "cat partitions.json | jq -c '.[_:_+25]'" | while read X; do aws glue batch-delete-partition --database-name=<my-database> --table-name=<my-table > --partitions-to-delete=$X; done
Hope it helps someone, but I'd prefer a more elegant solution
Using python3 with boto3 looks a little bit nicer. Albeit not by much :)
Unfortunately AWS doesn't provide a way to delete all partitions without batching 25 requests at a time. Note that this will only work for deleting the first page of partitions retrieved.
import boto3
glue_client = boto3.client("glue", "us-west-2")
def get_and_delete_partitions(database, table, batch=25):
partitions = glue_client.get_partitions(
DatabaseName=database,
TableName=table)["Partitions"]
for i in range(0, len(partitions), batch):
to_delete = [{k:v[k]} for k,v in zip(["Values"]*batch, partitions[i:i+batch])]
glue_client.batch_delete_partition(
DatabaseName=database,
TableName=table,
PartitionsToDelete=to_delete)
EDIT: To delete all partitions (beyond just the first page) using paginators makes it look cleaner.
import boto3
glue_client = boto3.client("glue", "us-west-2")
def delete_partitions(database, table, partitions, batch=25):
for i in range(0, len(partitions), batch):
to_delete = [{k:v[k]} for k,v in zip(["Values"]*batch, partitions[i:i+batch])]
glue_client.batch_delete_partition(
DatabaseName=database,
TableName=table,
PartitionsToDelete=to_delete)
def get_and_delete_partitions(database, table):
paginator = glue_client.get_paginator('get_partitions')
itr = paginator.paginate(DatabaseName=database, TableName=table)
for page in itr:
delete_partitions(database, table, page["Partitions"])
Here is a PowerShell version FWIW:
$database = 'your db name'
$table = 'your table name'
# Set the variables above
$batch_size = 25
Set-DefaultAWSRegion -Region eu-west-2
$partition_list = Get-GLUEPartitionList -DatabaseName $database -TableName $table
$selected_partitions = $partition_list
# Uncomment and edit predicate to select only certain partitions
# $selected_partitions = $partition_list | Where-Object {$_.Values[0] -gt '2020-07-20'}
$selected_values = $selected_partitions | Select-Object -Property Values
for ($i = 0; $i -lt $selected_values.Count; $i += $batch_size) {
$chunk = $selected_values[$i..($i + $batch_size - 1)]
Remove-GLUEPartitionBatch -DatabaseName $database -TableName $table -PartitionsToDelete $chunk -Force
}
# Now run `MSCK REPAIR TABLE db_name.table_name` to add the partitions again

ESXi VM snapshot creation | PowerCLI

I'm trying to automate snapshot checking DS free space. Its getting tricky for VMs with multiple DS attached. Script takes multiple snapshots for such VMs if condition satisfies. Please help me to understand where its going wrong.
Consolidating free space:
$free = (Get-Datastore -VM $vm | Select #{N="FreeSpace";E={[math]::Round(($_.FreeSpaceMB)*100/($_.CapacityMB),0)}})
Now checking if free space is available in each DS where VM connected:
foreach ($ds in $free.FreeSpace)
{
if (($ds -gt 25)
{
get-vm $vm | new-snapshot -name "$cmr.$date" -Description $description
}
}
If I understand the question properly, regarding dealing with multiple datastores... I would take a look at introducing a Sort-Object after the initial Get-Datastore which is based on the FreeSpaceMB property, then only selecting the first datastore (which should have the least amount of free space available) and performing your calculation based on that.
Untested example:
$free = (Get-Datastore -VM $vm | Sort-Object -Property FreeSpaceMB | Select-Object -Property #{N="FreeSpace";E={[math]::Round(($_.FreeSpaceMB)*100/($_.CapacityMB),0)}} -First 1)

Powershell disk monitoring

I have created the following script that returns the percentage of free space , the total space etc. of every disk for the remote servers the problem is that i want an extra column "warning" that prints No or Yes if the free space is bellow 10% i tried if statement but with no success. Please for your help.
Get-WmiObject Win32_LogicalDisk -filter "DriveType=3" -computer (Get-Content .\servers.txt) | Select SystemName,DeviceID,VolumeName,#{Name="Size(GB)";
Expression={"{0:N1}" -f($_.size/1gb)}},#{Name="FreeSpace(GB)";
Expression={"{0:N1}" -f($_.freespace/1gb)}},#{Name=" % Free(GB)";
Expression={"{0:N1}" -f(($_.freespace/$_.size)*100 )}},#{Name=" Warning";
Expression={????????}} |Format-Table -AutoSize |Out-File disk_monitor.txt
You can try something like
#{Name="Warning";Expression={ if((100 / $_.Size * $_.FreeSpace) -lt 10) { "Yes" } else { "No" }} };
This will calculate what percentage of disk space is available (100 / Size * FreeSpace) and if it's less than 10 (as in, percent), will return "Yes" or otherwise "No".