Redshift - Adding timezone offset (Varchar) to timestamp column

Redshift - Adding timezone offset (Varchar) to timestamp column - amazon-web-services

as part of ETL to Redshift, in one of the source tables, there are 2 columns:
original_timestamp - TIMESTAMP: which is the local time when the record was inserted in whichever region
original_timezone_offset - Varchar: which is the offset to UTC
The data looks something like this:
original_timestamp original_timezone_offset
2011-06-22 11:00:00.000000 -0700
2014-11-29 17:00:00.000000 -0800
2014-12-02 22:00:00.000000 +0900
2011-06-03 09:23:00.000000 -0700
2011-07-28 03:00:00.000000 -0700
2011-05-01 01:30:00.000000 -0700
In my target table, I need to convert this to UTC (using the offset). How do I do it?
So far I have tried multiple things but dateadd() seems to be the closest solution. But the problem with dateadd() is, when I say:
SELECT original_timestamp, original_timezone_offset
,dateadd(H, original_timezone_offset, original_timestamp) as original_utc_time
it is adding/subtracting '700'/'800' hours instead of 7/8 hrs to the original timestamp because the offset is a VARCHAR and the values are like: -0700 etc.
Did anyone see this issue before? Appreciate any help/inputs. Thanks.

Just take the 'hours' part of the offset:
WITH t as (
SELECT '2011-06-22 11:00:00.000000'::timestamp as original_timestamp, '-0700' as original_timezone_offset
UNION ALL
SELECT '2014-11-29 17:00:00.000000'::timestamp,'-0800'
UNION ALL
SELECT '2014-12-02 22:00:00.000000'::timestamp,'+0900'
)
SELECT
original_timestamp,
original_timezone_offset,
DATEADD(hour, SUBSTRING(original_timezone_offset, 1, 3)::INT, original_timestamp)
FROM t
2011-06-22 11:00:00 -0700 2011-06-22 04:00:00
2014-11-29 17:00:00 -0800 2014-11-29 09:00:00
2014-12-02 22:00:00 +0900 2014-12-03 07:00:00
You'll need some additional fancy code if you have non-full-hour offsets (eg +0730).

First, recognize that if your timestamps are already in local time of the given offset, then you need to subtract that offset to convert back to UTC. In that first example you gave, 2011-06-22 11:00:00 -0700 is equivalent to 2011-06-22 18:00:00 UTC.
However, rather than try to add or subtract these values yourself, you should let the AT TIME ZONE function do the work for you. It will create a timestamptz that is in your supplied offset, then you can use it again to convert to UTC.
(Note that you could use the CONVERT_TIMEZONE function instead, but that one is only understood by Redshift, where AT TIME ZONE works on regular PostgreSQL also.)
However, you have is that the time zone offsets you have aren't in a format understood by these functions. See time zone usage notes. So, before we try to convert, let's translate your offset strings to an understood format.
We will want -0700 to become +07:00. The colon is required, and the sign must be flipped because it will be interpreted with the POSIX-style time zone format. In that format, positive values lie west of GMT instead of the usual conventions specified in ISO 8601.
concat(translate(substring(original_timezone_offset, 1, 3), '-+', '+-'),':',substring(original_timezone_offset, 4, 2))
Then we will use that with AT TIME ZONE to do the conversion:
(original_timezone AT TIME ZONE <the above mess>) AT TIME ZONE 'UTC' AS utc_timestamp
Putting it all together...
WITH t as (
SELECT '2011-06-22 11:00:00.000000'::timestamp as original_timestamp, '-0700' as original_timezone_offset
UNION ALL
SELECT '2014-11-29 17:00:00.000000'::timestamp,'-0800'
UNION ALL
SELECT '2014-12-02 22:00:00.000000'::timestamp,'+0900'
)
SELECT
original_timestamp,
original_timezone_offset,
concat(translate(substring(original_timezone_offset, 1, 3), '-+', '+-'),':',substring(original_timezone_offset, 4, 2)) as modified_timezone_offset,
(original_timestamp AT TIME ZONE concat(translate(substring(original_timezone_offset, 1, 3), '-+', '+-'),':',substring(original_timezone_offset, 4, 2))) AT TIME ZONE 'UTC' AS utc_timestamptz
FROM t
Output:
2011-06-22 11:00:00 -0700 +07:00 2011-06-22 18:00:00
2014-11-29 17:00:00 -0800 +08:00 2014-11-30 01:00:00
2014-12-02 22:00:00 +0900 -09:00 2014-12-02 13:00:00
SQL Fiddle here.

Related

Define and convert datetime in AWS Athena

I have a process where I need to to match UTC datetime and EDT datetime.
As you know, EDT can be changed between 4/5 hours from UTC.
How can I define one datetime to be in UTC and another to be in EDT and match the two?
Something like (datetime_A is my EDT timestamp, and datetime_B is my UTC):
Where CAST((datetime_A as EDT) to UTC)=datetime_B
Thanks!

How to generate_series of every hour of every day of 1 year from the current timestamp

I have a query that generates every day of the year(shown below). What if I want to get a series of every hour of every day of the year from the current timestamp. Example: today is July 23,2019 10:30:00 AM, the result I am hoping to get is below
2019-07-23 20:30:00
2019-07-23 20:00:00
2019-07-23 19:00:00
2019-07-23 18:00:00
.
.
.
2018-07-23 20:00:00
This is a Redshift (PostgreSQL 8.0.2) query for Eclipse Birt. Hoping to create a parameter for both date and time but seems difficult to achieve if 2 separate ranges.
select cast(convert_timezone('UTC','AEST',cast(now() as timestamp without time zone)) as date) - generate_series(0, 365) date,
to_char(cast(convert_timezone('UTC','AEST',cast(now() as timestamp without time zone)) as date) - generate_series(0, 365), 'dd/mm/yyyy') date_disp;
Example: today is July 23,2019 10:30:00 AM, the result I am hoping to get is below:
2019-07-23 20:30:00
2019-07-23 20:00:00
2019-07-23 19:00:00
2019-07-23 18:00:00
.
.
.
2018-07-23 20:00:00

This is to similar to your previous question.
Use:
SELECT date_trunc('hour', now()::timestamp) - generate_series(0, 24 * 365) * interval '1 hour'
This outputs:
2019-07-23 05:00:00
2019-07-23 04:00:00
etc

You can use the DATEADD Redshift function, using "h", "hr" or "hrs" as your first parameter. Documentation for this function can be found here and here.

RRD Tool - confusing start time

I'm setting up a rrd database to store sensor data for 3 days in 12hr intervalls (43200s) = 6 row in RRA.
rrdtool create test.rrd --step 43200 --start 1562429286 DS:temp:GAUGE:86400:U:U RRA:AVERAGE:0:1:6
The databases starting time is 1562429286 (06.07.2019 - 18:08:06).
When I dump the database:
rrdtool dump test.rrd
it says (output trimmed for clarity):
2019-07-04 02:00:00 CEST / 1562198400 NaN
2019-07-04 14:00:00 CEST / 1562241600 NaN
2019-07-05 02:00:00 CEST / 1562284800 NaN
2019-07-05 14:00:00 CEST / 1562328000 NaN
2019-07-06 02:00:00 CEST / 1562371200 NaN
2019-07-06 14:00:00 CEST / 1562414400 NaN
I expected rrdtool to give the next nearest timestamp ( 6.7.19 18:00 ) as the last entry ("starting point") instead. So why is it at 14:00 ?
At first this explanation (How to create a rrd file with a specific time?) made perfect sense for the small intervall of 5m to me. But in my case I cannot get behind the logic if the intervall is bigger (12h)

This is because the RRA buckets are always normalised to be aligned to the GMT (UCT) timezone. It is not visible if you are using a cdp (consolodated data point) width of an hour or less; but in your case, your cdp are 12 hours in width. Your timezone means that these are offset by 2 hours from UCT zero resulting in apparent boundaries of 02 and 14 local time (if you were in London then you'd be seeing 0 and 12 as expected).
This effect is much more noticeable when you are using 1-day rollups and are located in somewhere like New Zealand, when you'll see the CDP boundary appearing at noon rather than at midnight.
It is not currently possible to specify a different timezone to use as a base for the RRA buckets (this would make the data nonportable) though I believe it has been on the RRDTool feature request list for a number of years.

How do I get delivery slots between the times of 2pm and 5pm?

I have a model called DeliverySlot. Its attributes look like:
#<DeliverySlot:0x007f955a322cf0> {
:id => 2562,
:from => Sat, 31 Dec 2016 12:00:00 UTC +00:00
}
from is a datetime column. Delivery slots are an hour and 30 minutes apart from each other.
How can I get all delivery slots from Monday-Friday that are between the hour of 2pm (14:00) and 5pm (17:00)?
As of now I have:
Assuming, Time.now.utc.strftime('%A') is Monday.
DeliverySlot.where(from: (Time.now.utc..(Time.now.utc + 5.days).end_of_day))
I am using Postgres btw. Should I be using Postgres date functions? If so, which ones?

I know I am bit late and you probably got this down pat by now.
I created a method called slots and it takes 2 parameters they are dates.
...
# Model.slot(Date.today, Date.today+7.days)
def self.slot(start_date, end_date)
start_date.upto(end_date) do |date|
if ['Monday','Tuesday','Wednesday','Thursday','Friday'].include?(date.strftime('%A'))
(date.beginning_of_day.to_i .. date.end_of_day.to_i).step(30.minutes).each do |time|
if Time.at( time ).strftime('%R') > '13:00' && Time.at( time ).strftime('%R') < '17:00'
puts "Create Slot Date: #{date.strftime('%D')} Time: #{Time.at( time )}"
end
end
end
end
end
I hope that this helps

Pandas time series: groupby and sum from noon to noon

My pandas dataframe is structured like this (with 'date' as index):
starttime duration_seconds
date
2012-12-24 11:52:00 31800
2012-12-23 0:28:00 35940
2012-12-22 2:00:00 26820
2012-12-21 1:57:00 23520
2012-12-20 1:32:00 23100
2012-12-19 0:50:00 25080
2012-12-18 1:17:00 24780
2012-12-17 0:38:00 25440
2012-12-15 10:38:00 32760
2012-12-14 0:35:00 23160
2012-12-12 22:54:00 3960
2012-12-12 0:21:00 24060
2012-12-10 23:45:00 900
2012-12-11 11:00:00 24840
2012-12-10 0:27:00 25980
2012-12-09 19:29:00 4320
2012-12-09 3:00:00 29880
2012-12-08 2:07:00 34380
I use the following to groupby date and sum the total seconds each day:
df_sum = df.groupby(df.index.date).sum()
What I'd like to do is sum duration_seconds from noon on one day to noon on the following day. Is there an elegant (pandas) way of doing this? Thanks in advance!

pd.TimeGrouper is a custom groupby class for time-interval grouping of NDFrames with a DatetimeIndex, TimedeltaIndex or PeriodIndex. (If your dataframe index is using date-strings, you'll need to convert it to a DatetimeIndex first by using df.index = pd.DatetimeIndex(df.index).)
df.groupby(pd.TimeGrouper('24H')).sum() groups df using 24-hour intervals starting at time 00:00:00.
df.groupby(pd.TimeGrouper('24H'), base=12).sum() groups df using 24-hour intervals starting at time 12:00:00:
In [90]: df.groupby(pd.TimeGrouper('24H', base=12)).sum()
Out[90]:
duration_seconds
2012-12-07 12:00:00 34380.0
2012-12-08 12:00:00 34200.0
2012-12-09 12:00:00 26880.0
2012-12-10 12:00:00 24840.0
2012-12-11 12:00:00 28020.0
2012-12-12 12:00:00 NaN
2012-12-13 12:00:00 23160.0
2012-12-14 12:00:00 32760.0
2012-12-15 12:00:00 NaN
2012-12-16 12:00:00 25440.0
2012-12-17 12:00:00 24780.0
2012-12-18 12:00:00 25080.0
2012-12-19 12:00:00 23100.0
2012-12-20 12:00:00 23520.0
2012-12-21 12:00:00 26820.0
2012-12-22 12:00:00 35940.0
2012-12-23 12:00:00 31800.0
Documentation on pd.TimeGrouper is a little sparse. It is a subclas of pd.Grouper and thus many of its parameters have the same meaning as those documented for pd.Grouper. You can find more examples of pd.TimeGrouper usage in the Cookbook. I found the base parameter by inspecting the source code. The base parameter in pd.TimeGrouper has the same meaning as the base parameter in pd.resample and that is not surprising since pd.resample is implemented using pd.TimeGrouper.
In fact, come to think of it, another way to compute the desired result is
df.resample('24H', base=12).sum()

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Redshift - Adding timezone offset (Varchar) to timestamp column - amazon-web-services

Related

Define and convert datetime in AWS Athena

How to generate_series of every hour of every day of 1 year from the current timestamp

RRD Tool - confusing start time

How do I get delivery slots between the times of 2pm and 5pm?

Pandas time series: groupby and sum from noon to noon

Categories

Resources