is possible to use aws-cli start-query function without start-time/end-time? - amazon-web-services

I'm trying to use aws logs start-query function, but I need something more dynamic than start-time/end-time (with unix timestamp). Like last 5 minutes or something like that. Is this possible?

AWS CLI doesn't offer such possibilities like "last X minutes" for logs regardless of function you use to find logs. But start-time and end-time is fully flexible way to get logs - you just need to pass proper values.
It means that you can create own script doing exactly what you need, i.e. it could calculate start and end time and just pass them to to start-query function.
Example of simple calculation of start_time and end_time in bash:
#!/bin/bash
declare -i start_time
declare -i end_time
declare -i last_minutes
declare -i last_millis
end_time=$(date +%s)
last_minutes=$1
last_millis=$last_minutes*60*1000
start_time=$end_time-$last_millis
echo "$start_time"
echo "$end_time"
so you can invoke this script passing number of last minutes and it will calculate start_time and end_time. Then you just need to invoke proper command you need, e.g. aws logs start-query --start-time $start_time --end-time $end_time instead of printing start_time and end_time. You can introduce other options in the script depending on your needs as well.

Related

Trying to run a gcloud command that would list VM instances and show how many days old they are since their creation date

I'm trying to run a gcloud command that would list VM instances and show how many days old they are since creation date.
So far I only have the below command but I don't know if there is a way to put them in some order of how many days old they are from creation date. I've been trying to add some sort of filter but haven't managed to work it out.
If there is any advice it would be much appreciated.
gcloud projects list --format="table[box,title='My Project List'(createTime:sort=1,name,projectNumber,projectId:label=ProjectID,parent.id:label=Parent)"
Cloud SDK (gcloud) includes projections for date() and duration() but these would benefit from clearer documentation. I want, for example, to think that unit=1 returns the result as seconds (which would be easier) for what follows, but it does not (and I can't work out what it actually does, anyone?)
unit=1 -- I wonder whether this just unifies Timestamps to second precision? Since a Timestamp could have nanosecond precision, perhaps `date(.. unit=1, ...) just always rounds to seconds?
Even though gcloud (and many other CLIs) attempt to provide this smorgasbord of functionality, it's often better to follow the UNIX crede and assemble a solution from parts.
createTime are Timestamps; one of Google's Well-Known Protobuf Types
Purely bash:
# createtime Timestamps for all my projects
TIMESTAMPS=$(\
gcloud projects list \
--format="value(createTime.date())")
# In seconds
NOW=$(date +%s)
for TIMESTAMP in ${TIMESTAMPS}
do
# Parse the Google Timestamp and return seconds
CREATED=$(date +%s --date=${TIMESTAMP})
# Difference in seconds since ${NOW}
DIFF=$((${NOW}-${CREATED}))
# Seconds-->Days
DAYS=$((${DIFF}/3600/24))
echo ${DAYS}
done
duration() may be useful but I'm unfamiliar with these ISO 8601 Durations
gcloud projects list \
--format="value(createTime.duration())"
A more complete example:
# Multiple values are easier to parse as CSV
VALUES=$(\
gcloud projects list \
--format="csv[no-heading](projectId,createTime.date())")
# In seconds
NOW=$(date +%s)
for VALUE in ${VALUES}
do
# Extract PROJECT_ID and TIMESTAMP from VALUE
IFS=, read PROJECT_ID TIMESTAMP <<< ${VALUE}
# Parse the Google Timestamp and return seconds
CREATED=$(date +%s --date=${TIMESTAMP})
# Difference in seconds since ${NOW}
DIFF=$((${NOW}-${CREATED}))
# Seconds-->Days
DAYS=$((${DIFF}/3600/24))
printf "%s\t%s\n" "${DAYS}" "${PROJECT_ID}"
done

Is there a way to find out the number of functions present in AWS Lambda?

I'm using Boto3 python module to communicate with AWS Lambda. I want to find out how many functions are present in the account from code. There are functions that list the functions, create paginators, and get a particular function. But is there a function that returns the total count of functions that are present in Lambda?
I'm writing a code that parses through every Lambda function. I want to show a progress bar in the terminal that shows how many functions have been covered so far, so that the user gets a rough estimate about how much longer it will take to finish execution.
No, there is no function that simply returns a count of AWS Lambda functions in an account. (In fact, I don't recall seeing any AWS API calls that simply return counts in any of the AWS services.)
You would need to use list_functions(), but it only returns a maximum of 50 functions. If the list_functions() call returns a NextMarker value, then call the function again with that value in the Marker parameter.
The ListFunctions paginator can do this for you, but it will still involve an API call for every 50 results.
Using aws-cli and jq you can do something like:
aws lambda list-functions --max-items=10000 | jq -r '.Functions' | jq length
which will get all the functions in the account and return them in the response (as a json array), then piping the result through jq can extract the functions-array and count the number of items in it.
Note that the order of magnitude of the number of lambdas in the account is probably pretty static and if you want to provide a feedback of the % of function you finished processing, it's good enough to make this call only once in a while and cache the result, since it's going to take a few seconds to read all these resources!
Found that we can get the number of Lambda functions in an account using Boto3. Although this answer by Mr John Rotenstein is one way to get the list of all the accounts, when you have tens of thousands of such functions, it gets very slow to do that. If you're looking for just the total number of Lambda functions, you can use get_account_settings() function that returns:
{
'AccountLimit': {
'TotalCodeSize': 123,
'CodeSizeUnzipped': 123,
'CodeSizeZipped': 123,
'ConcurrentExecutions': 123,
'UnreservedConcurrentExecutions': 123
},
'AccountUsage': {
'TotalCodeSize': 123,
'FunctionCount': 123
}
}
You can find the FunctionCount here.

What gcloud command can be used to obtain a list of default compute engine service accounts across your organisation?

I have tried this command;
gcloud alpha scc assets list <ORGANISATION-ID> --filter "security_center_properties.resource.type="google.iam.ServiceAccount" AND resource_properties.name:\"Compute Engine default service account\""
but I am recieving the following error;
(gcloud.alpha.scc.assets.list) INVALID_ARGUMENT: Invalid filter.
When I remove the filter after AND, I don't get an error message but I just see an >
Any ideas where I am going wrong?
I have reviewed this documentation to support me building the command but not sure which is the right filter to use.
I wonder if i should be filtering on the email of a compute engine default service account that ends "-compute#developer.gserviceaccount.com" but I can't identify what the right filter for this is.
The problem is the use of " on the filter.
You need to type --filter and put the filter like this: "FILTER_EXPRESION".
One filter expression could be: security_center_properties.resource_type="google.compute.Instance"
But you can not put a double quote inside a double quote block. So you need to use the back slash (\),if not, the command interpret the first double quote of the filter as the end of the filter.
On the other hand if you delete part of the command the prompt shows you '>' because there is a double quote block that is not end and it is waiting that you ends the command.
So the filter that you want has to be like this, for example:
gcloud alpha scc assets list <ORGANIZATION ID> \
--filter "security_center_properties.resource_type=\"google.compute.Instance\" AND security_center_properties.resource_type=\"google.cloud.resourcemanager.Organization\""
I hope that this explanation could help you!

How to pass arguments to streaming job on Amazon EMR

I want to produce the output of my map function, filtering the data by dates.
In local tests, I simply call the application passing the dates as parameters as:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 | ./reducer.py
Then the parameters are taken in the map function
#!/usr/bin/python
date1 = sys.argv[1];
date2 = sys.argv[2];
The question is:
How do I pass the date parameters to the map calling on Amazon EMR?
I am a beginner in Map reduce. Will appreciate any help.
First of all,
When you run a local test, and you should as often as possible.
the correct format (in order to reproduce how map-reduce works) is:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 | sort | ./reducer.py | sort
That the way the hadoop framework works.
If you are looking on a big file, you should do it in steps to verify results of each line.
meaning:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 > map_result.txt
cat map_result.txt | sort > map_result_sorted.txt
cat map_result_sorted.txt | ./reducer.py > reduce_result.txt
cat reduce_result.txt | sort > map_reduce_result.txt
In regard to your main question:
Its the same thing.
If you are going to use the amazon web console to create your cluster, in the add step window you just write as fallowing:
name: learning amazon emr
Mapper: (here they say: please give us s3 path to your mapper, we will ignore that, and just write our script name and parameters, no backslash...) mapper.py 20/12/2014 31/12/2014
Reducer: (the same as in the mapper) reducer.py (you can add here params too)
Input location: ...
Output location: ... (just remember to use a new output every time, or your task will fail)
Arguments: -files s3://cod/mapper.py,s3://cod/reducer.py (use your file path here, even if you add only one file use the -files argument)
That's it
If you are going into the all argument thing, i suggest you see this guy blog on how to use the passing of arguments in order to use only a single map,reduce file.
Hope it helped

rrdtool xport - limit on DEFs

I have a script that generates command line invocations of rrdtool xport based on input provided in a domain specific language. This works well, until the number of DEFs in the command line exceed a certain number - it seems to be around 50. At that point the command simply returns without any output or error information.
Is there a limit on the number of DEFs in rrdtool export? If so, then can it be raised or circumvented?
The issue turned out to be the character limit on the command line sent to the shell via Python's os.system method call. The issue can be worked around by creating a temporary executable script, writing the command line to the script and executing it.