Redshift UNLOAD ignores DATESTYLE - amazon-web-services

Issue is quite simple. In a single session, I run:
set DATESTYLE to 'SQL,DMY';
Then I run an UNLOAD command, using a basic SELECT * FROM [table name]. The table has a DATE column.
The file output to S3 does not use the format I specified. How can I change the date format output by UNLOAD?

We created an AWS support ticket, and this was the response:
Upon reading your case I gather that you are looking to see if in UNLOAD command you could add a Data Conversion Parameter to specify the date/timestamp format instead of using the date conversions in select statement. Please let me know if I am missing out on any information.
Unfortunately at the moment we do not currently have the functionality to add the date parameter and the only way to change that would be to do it in the SQL query. I have created a feature request for the particular use case that you have requested but I wouldn't be able to give you an ETA for when the feature would be released.

You can use Datetime Format Strings - Amazon Redshift to output the date in a specific style, eg:
select sysdate,
to_char(sysdate, 'HH24:MI:SS') as seconds,
to_char(sysdate, 'HH24:MI:SS.MS') as milliseconds,
to_char(sysdate, 'HH24:MI:SS:US') as microseconds;
timestamp | seconds | milliseconds | microseconds
--------------------+----------+--------------+----------------
2015-04-10 18:45:09 | 18:45:09 | 18:45:09.325 | 18:45:09:325143

Related

AWS Firehose dynamic partitioning and date parsing

I'm trying to do dynamic data partitioning by date with a kinesis delivery/firehose stream. The payload I'm expecting is JSON, with this general format
{
"clientId": "ASGr496mndGs80oCC97mf",
"createdAt": "2022-09-21T14:44:53.708Z",
...
}
I don't control the format of this date I'm working with.
I have my delivery firehose set to have "Dynamic Partitioning" and "Inline JSON Parsing" enabled (because both are apparently required per the AWS console UI).
I've got these set as "Dynamic Partitioning Keys"
year
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%Y")
month
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%m")
day
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%d")
hour
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%h")
But that gives me errors like date \"2022-09-21T18:30:04.431Z\" does not match format \"%Y-%m-%dT%H:%M:%S.%fZ.
It looks like strptime expects decimal seconds to be padded out to 6 places, but I have 3. I don't control the format of this date I'm working with. This seems to be JQ expressions, but I have exactly zero experience using it, and the AWS documentation for this stuff leaves an awful lot to be desired.
Is there a way to get strptime to successfully parse this format, or to just ignore the minute, second, and millisecond part of the time (I only care about hours)?
Is there another way to achieve what I'm trying to do here?
You can try following :
.createdAt | strptime("%Y-%m-%dT%H:%M:%S%Z") | strftime("%Y")
It is trimming the milliseconds whereas retaining rest of the information in the datetime.
Here is the jq snippet example

S3 select - How can I query by non-standard timestamp comparison

I'm using a S3 bucket where the data is organized into files by an ID & year/month - meaning one file per ID & month.
In each (csv.gz) file each record has a timestamp in the format: YYYY-MM-dd HH:mm:ss (note the missing T).
Now, when querying the data I want to support datetime granularity down to seconds so naturally it's desired to filter the data in S3 already prior to managing the data in Python.
I can't however find any method to do this.
The function TO_TIMESTAMP doesn't support a user provided format (expects a T date/time separator) and combining SUBSTRING and CAST (CAST(SUBSTRING(my_timestamp_column, 1, 10) AS TIMESTAMP)) yields a The query cannot be evaluated error.
Is there any way around this?
The documentation states that the function TO_TIMESTAMP is "the inverse operation of TO_STRING" which is not quite true as the latter supports a time_format_pattern.
I think I had to solve the same or similar problem. As you said, the documentation (https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-date.html#s3-glacier-select-sql-reference-to-timestamp) states that the function TO_TIMESTAMP is the inverse operation of TO_STRING. But the documentation to me was misleading, because it does not make clear that the TO_TIMESTAMP function does support time_format_pattern as a second argument. The documentation shows that it only takes one argument, but it can in fact take two.
I was able to convert a non-standard timestamp 20190101T050000.000Z from type string to timestamp like so:
aws s3api select-object-content --bucket foo_bucket --key foo.json.gz --expression "SELECT * FROM s3object s WHERE TO_TIMESTAMP(s.\"timestamp\", 'yMMdd''T''Hmmss.SSS''Z''') < TO_TIMESTAMP('20190101T050000.000Z', 'yMMdd''T''Hmmss.SSS''Z''')" --expression-type 'SQL' --input-serialization '{ "CompressionType": "GZIP","JSON": {"Type": "DOCUMENT"}}' --output-serialization '{"JSON": {"RecordDelimiter": "\n"}}' /dev/shm/foo.json
Hope that helps somebody out.
Having same issue over here, I went an step over and change my csv file to grant date field with require format by timestamp date type in S3 Select.The requiere format is described here S3 data types
So first, in order to response the question, based on S3 Select documentation, I think is not possible to work with a date without T at the end. By the time you correct that, you will be able to work with CAST function. Next is what I do:
select * from s3object as s where CAST('2020-01-01T' AS TIMESTAMP) < CAST('2021-01-01T' AS TIMESTAMP)
That works just okay, however as you can see, I'm not passing s."Date" which is the field header in my csv file due to following error:
Attempt to convert from one data type to another failed at line 1, column 39: cast from STRING to TIMESTAMP.
I hope have been help a little bit, and hope someone can help with this error.

Visualize time values over days in QuickSight

I have an event dataset in QuickSight, where each record has a timestamp field as following:
last_day_record_ts |
-------------------|
2020-01-19 05:46:55|
2020-01-20 05:55:37|
2020-01-21 06:00:12|
2020-01-22 06:12:57|
2020-01-23 06:02:15|
2020-01-24 06:15:35|
2020-01-25 06:20:05|
2020-01-26 05:55:48|
I want to build a visualization of time values over days as a line chart as following:
However, I find it difficult to get this in AWS QuickSight. Any ideas?
Instead of desired result QuickSight persistently gives just aggregated record values (i.e 1 for each day) but not the time values itself...
UPDATE. The workaround I found for now - to add calculated fields to the Data Set in order to get numeric values instead of timestamp ones.
Calculated fields:
day_midnight | truncDate('DD',{last_day_record_ts})
time_diff_in_hours_dec | abs(dateDiff({last_day_record_ts},{day_midnight},"MI")) / 60
time_diff_in_hours_int | decimalToInt({time_diff_in_hours_dec})
time_diff_in_min | ({time_diff_in_hours_dec} - {time_diff_in_hours_int}) * 60
The only problem I still cannot solve - to get Y axis labels in HH:MM format as in green rectangle. For now, it's numeric decimals...
Unfortunately, (after many attempts of my own) this type of visual does not appear to be possible in Quicksight at the time of writing.
Quicksight has many nice features, but it's still missing some (very basic imo) things that make it limiting for anyone working with data that is outside the expected use-cases.

Amazon Cloudwatch Logs, parse datetime result of a function

In our logs, we get uptime as a millisecond and I am trying to format it using the AWS CloudWatch query. But I found out that it is not possible to parse the result of some function, only field values itself can be used. Example:
fields
req.stats.uptime,
fromMillis(req.stats.uptime) as tstamp,
"1970-01-03T22:53:01.000+01:00" as tz
| sort #timestamp desc
| limit 1
| fields #timestamp
| parse tstamp "T*:*:*." as H, M, S
| filter req.url like /\/healthcheck/ and ispresent(req.stats.uptime)
if i parse "tstamp" - i get nothing, empty HMS, but if i parse "tz", i get proper values back..
does someone know how to avoid this problem? unfortunately, I don't see formating available in AWS CloudWatch queries possible and at the moment I think, the best way could be to format on the service side and put already formatted into logs. But maybe someone knows a better solution?
Regards in advise

AWS IoT Analytics Delta Window

I am having real problems getting the AWS IoT Analytics Delta Window (docs) to work.
I am trying to set it up so that every day a query is run to get the last 1 hour of data only. According to the docs the schedule feature can be used to run the query using a cron expression (in my case every hour) and the delta window should restrict my query to only include records that are in the specified time window (in my case the last hour).
The SQL query I am running is simply SELECT * FROM dev_iot_analytics_datastore and if I don't include any delta window I get the records as expected. Unfortunately when I include a delta expression I get nothing (ever). I left the data accumulating for about 10 days now so there are a couple of million records in the database. Given that I was unsure what the optimal format would be I have included the following temporal fields in the entries:
datetime : 2019-05-15T01:29:26.509
(A string formatted using ISO Local Date Time)
timestamp_sec : 1557883766
(A unix epoch expressed in seconds)
timestamp_milli : 1557883766509
(A unix epoch expressed in milliseconds)
There is also a value automatically added by AWS called __dt which is a uses the same format as my datetime except it seems to be accurate to within 1 day. i.e. All values entered within a given day have the same value (e.g. 2019-05-15 00:00:00.00)
I have tried a range of expressions (including the suggested AWS expression) from both standard SQL and Presto as I'm not sure which one is being used for this query. I know they use a subset of Presto for the analytics so it makes sense that they would use it for the delta but the docs simply say '... any valid SQL expression'.
Expressions I have tried so far with no luck:
from_unixtime(timestamp_sec)
from_unixtime(timestamp_milli)
cast(from_unixtime(unixtime_sec) as date)
cast(from_unixtime(unixtime_milli) as date)
date_format(from_unixtime(timestamp_sec), '%Y-%m-%dT%h:%i:%s')
date_format(from_unixtime(timestamp_milli), '%Y-%m-%dT%h:%i:%s')
from_iso8601_timestamp(datetime)
What are the offset and time expression parameters that you are using?
Since delta windows are effectively filters inserted into your SQL, you can troubleshoot them by manually inserting the filter expression into your data set's query.
Namely, applying a delta window filter with -3 minute (negative) offset and 'from_unixtime(my_timestamp)' time expression to a 'SELECT my_field FROM my_datastore' query translates to an equivalent query:
SELECT my_field FROM
(SELECT * FROM "my_datastore" WHERE
(__dt between date_trunc('day', iota_latest_succeeded_schedule_time() - interval '1' day)
and date_trunc('day', iota_current_schedule_time() + interval '1' day)) AND
iota_latest_succeeded_schedule_time() - interval '3' minute < from_unixtime(my_timestamp) AND
from_unixtime(my_timestamp) <= iota_current_schedule_time() - interval '3' minute)
Try using a similar query (with no delta time filter) with correct values for offset and time expression and see what you get, The (_dt between ...) is just an optimization for limiting the scanned partitions. You can remove it for the purposes of troubleshooting.
Please try the following:
Set query to SELECT * FROM dev_iot_analytics_datastore
Data selection filter:
Data selection window: Delta time
Offset: -1 Hours
Timestamp expression: from_unixtime(timestamp_sec)
Wait for dataset content to run for a bit, say 15 minutes or more.
Check contents
After several weeks of testing and trying all the suggestions in this post along with many more it appears that the extremely technical answer was to 'switch off and back on'. I deleted the whole analytics stack and rebuild everything with different names and it now seems to now be working!
Its important that even though I have flagged this as the correct answer due to the actual resolution. Both the answers provided by #Populus and #Roger are correct had my deployment being functioning as expected.
I found by chance that changing SELECT * FROM datastore to SELECT id1, id2, ... FROM datastore solved the problem.