Amazon Cloudwatch Logs, parse datetime result of a function - amazon-web-services

In our logs, we get uptime as a millisecond and I am trying to format it using the AWS CloudWatch query. But I found out that it is not possible to parse the result of some function, only field values itself can be used. Example:
fields
req.stats.uptime,
fromMillis(req.stats.uptime) as tstamp,
"1970-01-03T22:53:01.000+01:00" as tz
| sort #timestamp desc
| limit 1
| fields #timestamp
| parse tstamp "T*:*:*." as H, M, S
| filter req.url like /\/healthcheck/ and ispresent(req.stats.uptime)
if i parse "tstamp" - i get nothing, empty HMS, but if i parse "tz", i get proper values back..
does someone know how to avoid this problem? unfortunately, I don't see formating available in AWS CloudWatch queries possible and at the moment I think, the best way could be to format on the service side and put already formatted into logs. But maybe someone knows a better solution?
Regards in advise

Related

AWS Firehose dynamic partitioning and date parsing

I'm trying to do dynamic data partitioning by date with a kinesis delivery/firehose stream. The payload I'm expecting is JSON, with this general format
{
"clientId": "ASGr496mndGs80oCC97mf",
"createdAt": "2022-09-21T14:44:53.708Z",
...
}
I don't control the format of this date I'm working with.
I have my delivery firehose set to have "Dynamic Partitioning" and "Inline JSON Parsing" enabled (because both are apparently required per the AWS console UI).
I've got these set as "Dynamic Partitioning Keys"
year
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%Y")
month
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%m")
day
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%d")
hour
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%h")
But that gives me errors like date \"2022-09-21T18:30:04.431Z\" does not match format \"%Y-%m-%dT%H:%M:%S.%fZ.
It looks like strptime expects decimal seconds to be padded out to 6 places, but I have 3. I don't control the format of this date I'm working with. This seems to be JQ expressions, but I have exactly zero experience using it, and the AWS documentation for this stuff leaves an awful lot to be desired.
Is there a way to get strptime to successfully parse this format, or to just ignore the minute, second, and millisecond part of the time (I only care about hours)?
Is there another way to achieve what I'm trying to do here?
You can try following :
.createdAt | strptime("%Y-%m-%dT%H:%M:%S%Z") | strftime("%Y")
It is trimming the milliseconds whereas retaining rest of the information in the datetime.
Here is the jq snippet example

Regex in Spark SQL is resulting in wrong value but working fine on Amazon Athena

I am trying to extract text that exists inside root level brackets from a string in Spark-SQL. I have used the function regexp_extract() on both Spark-SQL and Athena on the same string with the same regex.
On Athena, it's working fine.
But on Spark-SQL, it is not returning the value as expected.
Query is:
SELECT regexp_extract('Russia (Federal Service of Healthcare)', '.+\s\((.+)\)', 1) AS cl
Output On Athena:
Federal Service of Healthcare
Output on Spark-SQL:
ia (Federal Service of Healthcare)
I am bumping my head around but can't seem to find a solution around this.
This does the trick:
SELECT regexp_extract('Russia (Federal Service of Healthcare)', '.+\\\\s\\\\((.+)\\\\)', 1) AS cl
output:
+-----------------------------+
|cl |
+-----------------------------+
|Federal Service of Healthcare|
+-----------------------------+
The s is not being escaped in your example, that's why it falls as part of the group; you can also use the regexp_extract API directly which makes a cleaner solution:
.withColumn("cl", regexp_extract(col("name"), ".+\\s\\((.+)\\)", 1))
Good luck!

S3 select - How can I query by non-standard timestamp comparison

I'm using a S3 bucket where the data is organized into files by an ID & year/month - meaning one file per ID & month.
In each (csv.gz) file each record has a timestamp in the format: YYYY-MM-dd HH:mm:ss (note the missing T).
Now, when querying the data I want to support datetime granularity down to seconds so naturally it's desired to filter the data in S3 already prior to managing the data in Python.
I can't however find any method to do this.
The function TO_TIMESTAMP doesn't support a user provided format (expects a T date/time separator) and combining SUBSTRING and CAST (CAST(SUBSTRING(my_timestamp_column, 1, 10) AS TIMESTAMP)) yields a The query cannot be evaluated error.
Is there any way around this?
The documentation states that the function TO_TIMESTAMP is "the inverse operation of TO_STRING" which is not quite true as the latter supports a time_format_pattern.
I think I had to solve the same or similar problem. As you said, the documentation (https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-date.html#s3-glacier-select-sql-reference-to-timestamp) states that the function TO_TIMESTAMP is the inverse operation of TO_STRING. But the documentation to me was misleading, because it does not make clear that the TO_TIMESTAMP function does support time_format_pattern as a second argument. The documentation shows that it only takes one argument, but it can in fact take two.
I was able to convert a non-standard timestamp 20190101T050000.000Z from type string to timestamp like so:
aws s3api select-object-content --bucket foo_bucket --key foo.json.gz --expression "SELECT * FROM s3object s WHERE TO_TIMESTAMP(s.\"timestamp\", 'yMMdd''T''Hmmss.SSS''Z''') < TO_TIMESTAMP('20190101T050000.000Z', 'yMMdd''T''Hmmss.SSS''Z''')" --expression-type 'SQL' --input-serialization '{ "CompressionType": "GZIP","JSON": {"Type": "DOCUMENT"}}' --output-serialization '{"JSON": {"RecordDelimiter": "\n"}}' /dev/shm/foo.json
Hope that helps somebody out.
Having same issue over here, I went an step over and change my csv file to grant date field with require format by timestamp date type in S3 Select.The requiere format is described here S3 data types
So first, in order to response the question, based on S3 Select documentation, I think is not possible to work with a date without T at the end. By the time you correct that, you will be able to work with CAST function. Next is what I do:
select * from s3object as s where CAST('2020-01-01T' AS TIMESTAMP) < CAST('2021-01-01T' AS TIMESTAMP)
That works just okay, however as you can see, I'm not passing s."Date" which is the field header in my csv file due to following error:
Attempt to convert from one data type to another failed at line 1, column 39: cast from STRING to TIMESTAMP.
I hope have been help a little bit, and hope someone can help with this error.

Visualize time values over days in QuickSight

I have an event dataset in QuickSight, where each record has a timestamp field as following:
last_day_record_ts |
-------------------|
2020-01-19 05:46:55|
2020-01-20 05:55:37|
2020-01-21 06:00:12|
2020-01-22 06:12:57|
2020-01-23 06:02:15|
2020-01-24 06:15:35|
2020-01-25 06:20:05|
2020-01-26 05:55:48|
I want to build a visualization of time values over days as a line chart as following:
However, I find it difficult to get this in AWS QuickSight. Any ideas?
Instead of desired result QuickSight persistently gives just aggregated record values (i.e 1 for each day) but not the time values itself...
UPDATE. The workaround I found for now - to add calculated fields to the Data Set in order to get numeric values instead of timestamp ones.
Calculated fields:
day_midnight | truncDate('DD',{last_day_record_ts})
time_diff_in_hours_dec | abs(dateDiff({last_day_record_ts},{day_midnight},"MI")) / 60
time_diff_in_hours_int | decimalToInt({time_diff_in_hours_dec})
time_diff_in_min | ({time_diff_in_hours_dec} - {time_diff_in_hours_int}) * 60
The only problem I still cannot solve - to get Y axis labels in HH:MM format as in green rectangle. For now, it's numeric decimals...
Unfortunately, (after many attempts of my own) this type of visual does not appear to be possible in Quicksight at the time of writing.
Quicksight has many nice features, but it's still missing some (very basic imo) things that make it limiting for anyone working with data that is outside the expected use-cases.

Redshift UNLOAD ignores DATESTYLE

Issue is quite simple. In a single session, I run:
set DATESTYLE to 'SQL,DMY';
Then I run an UNLOAD command, using a basic SELECT * FROM [table name]. The table has a DATE column.
The file output to S3 does not use the format I specified. How can I change the date format output by UNLOAD?
We created an AWS support ticket, and this was the response:
Upon reading your case I gather that you are looking to see if in UNLOAD command you could add a Data Conversion Parameter to specify the date/timestamp format instead of using the date conversions in select statement. Please let me know if I am missing out on any information.
Unfortunately at the moment we do not currently have the functionality to add the date parameter and the only way to change that would be to do it in the SQL query. I have created a feature request for the particular use case that you have requested but I wouldn't be able to give you an ETA for when the feature would be released.
You can use Datetime Format Strings - Amazon Redshift to output the date in a specific style, eg:
select sysdate,
to_char(sysdate, 'HH24:MI:SS') as seconds,
to_char(sysdate, 'HH24:MI:SS.MS') as milliseconds,
to_char(sysdate, 'HH24:MI:SS:US') as microseconds;
timestamp | seconds | milliseconds | microseconds
--------------------+----------+--------------+----------------
2015-04-10 18:45:09 | 18:45:09 | 18:45:09.325 | 18:45:09:325143