How to expand multiple Properties in json output (ConvertFrom-Json/ConvertTo-Csv) - amazon-web-services

Running a AWS CLI command for AWS to export all the subnets. Then running the below in PowerShell. When doing this, there are a couple fields (Tags, CIDR Blocks) that output as just "System.Object[]" in the CSV. If i could get them to expand or 'be selected' would be great. Specifically, I only really want the tag: "Name", but i'll take w/e is easier.
Get-Content -Path C:\Working\subnets.json |
ConvertFrom-Json |
Select-Object -expand * |
ConvertTo-Csv -NoTypeInformation |
Set-Content C:\Working\subnets12.csv
json was created from: aws ec2 describe-subnets --output=json
then, copied and pasted everything inside the { } into a text file.
I assume there is a better way to create the json file from bash prompt. And, might even be able to convert to csv from bash instead of PowerShell.
I was expecting to be able to get a CSV that had all the details from the original json file.

Related

AWS CLI - Put output into a readable format

So I have ran the following command in my CLI and it returned values, however, they are unreadable how would I format this into a table with a command?
do
echo "Check if SSE is enabled for bucket -> ${i}"
aws s3api get-bucket-encryption --bucket ${i} | jq -r .ServerSideEncryptionConfiguration.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm
done
Would I need to change the command above?
You can specify an --output parameter when using the AWS CLI, or configure a default format using the aws configure command.
From Setting the AWS CLI output format - AWS Command Line Interface:
The AWS CLI supports the following output formats:
json – The output is formatted as a JSON string.
yaml – The output is formatted as a YAML string.
yaml-stream – The output is streamed and formatted as a YAML string. Streaming allows for faster handling of large data types.
text – The output is formatted as multiple lines of tab-separated string values. This can be useful to pass the output to a text processor, like grep, sed, or awk.
table – The output is formatted as a table using the characters +|- to form the cell borders. It typically presents the information in a "human-friendly" format that is much easier to read than the others, but not as programmatically useful.

AWS CLI filtering with JQ

I'm still trying to understand how to use JQ to get what I want. I want to get the size of all snapshots in my account older than a specific date and then add them up so that I can calculate cost. I am able to do this without the date filtering with this.
aws ec2 describe-snapshots --profile my_profile_name | jq "[.Snapshots[].VolumeSize] | add"
This returns a numerical value. Without JQ, I'm also able to get a list of snapshots using "query" but I don't think that will be applied when using JQ but I could be wrong.
aws ec2 describe-snapshots --profile my_profile_name --owner-ids self --query "Snapshots[?(StartTime<='2022-09-08')].[SnapshotId]"
I tried various arrangements using "select" along with my first example. However, I haven't been able to get anything returned yet. I appreciate any pointers.
This is the "select" that doesn't quite work.
aws ec2 describe-snapshots --profile my_profile_name | jq "[.Snapshots[]select(.StartTime < "2022-09-08")] | [.Snapshots[].VolumeSize] | add"
Edit 11/15/22
I was able to make progress and I found a site that allows you to test JQ. The example is able to select strings and numbers, but I'm having trouble with the date part. I don't understand how to interrupt the date in the format that AWS provides. I can do the add part, I removed it to simplify the example.
This the the working "select" for a string. I can only do greater/less than when I use numbers and remove the quotes from the JSON section.
.Snapshots[] | select(.StartTime == "2022-11-14T23:28:39+00:00") | .VolumeSize
jq play example
It would appear that the jq expression you're looking for is:
[.Snapshots[] | select(.StartTime < "2022-09-08") | .VolumeSize] | add
Without an illustrative JSON, however, it's difficult to test this out; that's one of the reasons for the mcve guidelines.
A solution without jq is this:
aws ec2 describe-snapshots \
--owner-ids self \
--query "sum(Snapshots[?StartTime < '2022-11-12'].VolumeSize)"
But note that the comparison is not done "by date" but by literally comparing strings like 2016-11-02T01:25:28.000Z with 2022-11-12.

automating file archival from ec2 to s3 based on last modified date

I want to write an automated job in which the job will go through my files stored on the ec2 storage and check for the last modified date.If the date is more than (x) days the file should automatically get archived to my s3.
Also I don't want to convert the file to a zip file for now.
What I don't understand is how to give the path of the ec2 instance storage and the how do i put the condition for the last modified date.
aws s3 sync your-new-dir-name s3://your-s3-bucket-name/folder-name
Please correct me if I understand this wrong
Your requirement is to archive the older files
So you need a script that checks the modified time and if its not being modified since X days you simply need to make space by archiving it to S3 storage . You don't wish to store the file locally
is it correct ?
Here is some advice
1. Please provide OS information ..this would help us to suggest shell script or power shell script
Here is power shell script
$fileList = Get-Content "c:\pathtofolder"
foreach($file in $fileList) {
Get-Item $file | select -Property fullName, LastWriteTime | Export-Csv 'C:\fileAndDate.csv' -NoTypeInformation
}
then AWS s3 cp to s3 bucket.
You will do the same with Shell script.
Using aws s3 sync is a great way to backup files to S3. You could use a command like:
aws s3 sync /home/ec2-user/ s3://my-bucket/ec2-backup/
The first parameter (/home/ec2-user/) is where you can specify the source of the files. I recommend only backing-up user-created files, not the whole operating system.
There is no capability for specifying a number of days. I suggest you just copy all files.
You might choose to activate Versioning to keep copies of all versions of files in S3. This way, if a file gets overwritten you can still go back to a prior version. (Storage charges will apply for all versions kept in S3.)

Unable to save/modify imported build definition in VSTS

I am attempting to export a build from one project to another. Projects are in different collections. I have collection admin so perms should be good, but just to be sure I granted myself build and project admin.
I exported the build as json using the VSTS UI in the source project then imported in the target project. All the tasks are present, but the parameters are grayed out. I also cannot enable/disable tasks. There are some parameters that need to be filled in such as build agent. I was able to select the appropriate agent. I have no outstanding items at this point that the UI is indicating I would need to address prior to saving. The save, discard, queue options are all grayed out.
I can add a new phase, but I can't add any tasks to that phase. I also tried bringing up the yaml and compared it to the yaml in the source project, no differences.
Why can't I save my imported build definition?
Import succeeded after replacing all instances of the project and collection ids in the json I was attempting to import. At the time of this post those changes must be completed manually.
UPDATE:
I tried just removing the offending properties rather than replacing them and that worked. I created a simple script to clear out those properties:
Param(
[parameter(Mandatory=$true)]
[alias("p")]
$path
)
$removeProperties = #("triggers","metrics","_links","authoredBy","queue","project")
$json = Get-Content -Path $path -Raw | ConvertFrom-Json
foreach ($property in $removeProperties) {
$json.PSObject.Properties.Remove($property)
}
$json | ConvertTo-Json -Depth 100 | Set-Content -Path $path -Force

Does an EMR master node know its cluster ID?

I want to be able to create EMR clusters, and for those clusters to send messages back to some central queue. In order for this to work, I need to have some sort of agent running on each master node. Each one of those agents will have to identify itself in this message so that the recipient knows which cluster the message is about.
Does the master node know its ID (j-*************)? If not, then is there some other piece of identifying information that could allow the message recipient to infer this ID?
I've taken a look through the config files in /home/hadoop/conf, and I haven't found anything useful. I found the ID in /mnt/var/log/instance-controller/instance-controller.log, but it looks like it'll be difficult to grep for. I'm wondering where instance-controller might get that ID from in the first place.
You may look at /mnt/var/lib/info/ on Master node to find lot of info about your EMR cluster setup. More specifically /mnt/var/lib/info/job-flow.json contains the jobFlowId or ClusterID.
You can use the pre-installed json parser (jq) to get the jobflow id.
cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId"
(updated as per #Marboni)
You can use Amazon EC2 API to figure out. The example below uses shell commands for simplicity. In real life you should use appropriate API to do this steps.
First you should find out your instance ID:
INSTANCE=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
Then you can use your instance ID to find out the cluster id :
ec2-describe-instances $INSTANCE | grep TAG | grep aws:elasticmapreduce:job-flow-id
Hope this helps.
As been specifed above, the information is in the job-flow.json file. This file has several other attributes. So, knowing where it's located, you can do it in a very easy way:
cat /mnt/var/lib/info/job-flow.json | grep jobFlowId | cut -f2 -d: | cut -f2 -d'"'
Edit: This command works in core nodes also.
Another option - query the metadata server:
curl -s http://169.254.169.254/2016-09-02/user-data/ | sed -r 's/.*clusterId":"(j-[A-Z0-9]+)",.*/\1/g'
Apparently the Hadoop MapReduce job has no way to know which cluster it is running on - I was surprised to find this out myself.
BUT: you can use other identifiers for each map to uniquely identify the mapper which is running, and the job that is running.
These are specified in the environment variables passed on to each mapper. If you are writing a job in Hadoop streaming, using Python, the code would be:
import os
if 'map_input_file' in os.environ:
fileName = os.environ['map_input_file']
if 'mapred_tip_id' in os.environ:
mapper_id = os.environ['mapred_tip_id'].split("_")[-1]
if 'mapred_job_id' in os.environ:
jobID = os.environ['mapred_job_id']
That gives you: input file name, the task ID, and the job ID. Using one or a combination of those three values, you should be able to uniquely identify which mapper is running.
If you are looking for a specific job: "mapred_job_id" might be what you want.