Redshift uses wrong timezone when loading from S3 - amazon-web-services

Redshift allows loading time-related types using its epoch representation as stated here by setting timeformat 'epochmillisecs' parameter of COPY command. It works great for TIMESTAMPs but something is broken for TIME columns.
Epoch value 1636984022000 (ms) that corresponds to 13:47:02.572000 is being imported as 22:41:11 by Redshift. I can see that it matches 13:47:02 UTC at PST (-08:00).
I tried alter user awsuser set timezone to 'UTC' and set timezone to default but it doesn't seem to help. What am I missing ?

Sometimes timezone issues can be hard to diagnose because SQL clients perform a conversion on the field when displaying data.
A trick I use is to convert the timestamp to TEXT so that the SQL client does not alter its contents. So, try selecting the data as SELECT field::TEXT to verify how it is actually being stored.
This prevents the SQL client from performing any nicely-intentioned timezone conversion and lets you see the 'real' underlying data.

Related

ISO 8601 string instead of datetime

I have faced this problem in other projects, but right now we're working in Django / Python with Vue.js and PostgreSQL.
Nearly every 'date' field in our model is truly a date, not a datetime. The business users don't care about a time-part, in fact storing any such value is misleading. A good example is the effective date on a tax rate. From the user point of view, it takes effect at midnight on the specified date.
If we store this as a Django DateTimeField or DateField, it requires a time-part. It could be 0000h, but the value has no meaning to the business, because they will never need to see or compare anything with the time-part. The Django functionality is great if we want to store a real datetime, put it into the database as UTC, and then display it to users in their local time through the automatic time zone conversion. But we don't. If the effective date is March 1, 2019, it should display as such regardless of the user's timezone. If the date is stored as a datetime (2019, 3, 1, 0, 0, 0) and entered by someone in Vancouver, it will appear as the next calendar day for another user in Toronto. Definitely not what we want. We could kludge it by setting the time-part to 1200h, but really?
We also have potential problems, depending on the internal representation in the database, when using SQL or tools that access the schema directly (e.g. BI tools). How do we know what time zone applies to the datetime value?
So, we're thinking of using Django CharField with ISO 8601 format (YYYY-MM-DD) instead. It will sort properly, it can be compared easily (or directly in some tools like SQL), and can be displayed without reformatting if the client is willing to use the standard. If we need to do date arithmetic, we can use the Python Standard Libraries datetime and calendar to convert from/to string. We'll need to use those to catch SQL injection attacks anyway.
We will also need to deal with date entry through a Datepicker, converting to the ISO 8601 string before storing and back again when displaying for edit.
It appears to be a better way to represent what the business needs, and it gets rid of any timezone conversion issues.
There is certainly a lot of comment on datetime and time zone handling, but I haven't found anyone taking this approach to storing true dates. Am I missing an important 'gotcha'? We're early enough in the project that we can go either way, so I'm hoping to confirm that this will work before refactoring becomes a big job.
Have you considered using DateField?
This will only store the date part and not the time.

Django datetime field fractional support

I have Django 1.9.5 installed with MariaDB 10.1.13 (also tested with MySQL 5.6.30) (which was updated from MySQL 5.5) and I am trying to get fractional seconds support in a date time field.
I have created a test model to try and get this working, here is the definition
class History(models.Model):
date = models.DateTimeField(null=True)
then in the shell I have run the following
History(date=datetime.now()).save()
and then when I query it
type(History.objects.get(id=1).date)
I get
<type 'NoneType'>
even though the entry appears in the database.
I can also use the field in a query
History.objects.all().order_by('date')
I know it works because inspecting the data shows that the order has changed
But I need to be able to return the date so I can compare it with another.
I was using the MySQL 5.5 without fractional support but there are records in my database that have the same datetime field and thus the order_by didnt work, I was using order_by id and that worked whilst the records were entered chronologically, but now this isnt the case I need fractional support.
Any ideas?
For this sort of situation it might be better to store the date and time as a unix timestamp in the database. There you millisecond accuracy. You can convert to python datetime objects at anytime with fromtimestamp.
Last but not least, you can easily convert unix timestamps with javascript to date times in the user's own timezone. A process that's far more complex with django or any other server side tech.

Django 1.6 filter by hour and time zone issue

In my application I use time zones (USE_TZ=True) and all the dates I create in my code are aware UTC datetime objects (I use django.util.timezone.now for the current date and the following helper function to ensure all the dates in my instances are what I expect:)
#classmethod
def asUTCDate(cls, date):
if not timezone.is_aware(date):
return timezone.make_aware(date, timezone.utc)
return date.replace(tzinfo=timezone.utc)
I also enforced the check of naive/aware dates using this snippet (like suggested in the doc):
import warnings
warnings.filterwarnings(
'error', r"DateTimeField .* received a naive datetime",
RuntimeWarning, r'django\.db\.models\.fields')
As I understood so far, this is the right way to proceed (this is a quote from the django documentation: "The solution to this problem is to use UTC in the code and use local time only when interacting with end users."), and it seems that my app is handling dates very well… but I have just implemented a filter against a model that makes use of the Django 1.6 __hour and it force the extraction based on the user timezone, the result is something like:
django_datetime_extract('hour', "object"."date", Europe/Rome) = 15
but this breaks my query, since some results I was expecting are not included in the set, but when I use a __range to search between dates it seems to work as expected (objects with a date in the range are returned)… so it seems to me that Django takes into account timezones in queries only for the __hour filter… but I don't understand why… I was supposing that UTC is used everywhere except in templates where the displayed dates are formatted according to user tz, but maybe that's not true.
So my questions are: is the way I'm working with time zones right? Is __hour filter wrong or what?
It seems as though you're doing the right thing with dates. However, all the documentation for any date related functionality, such as filtering by hour include this note:
When USE_TZ is True, datetime fields are converted to the current time
zone before filtering.
For the range filter, this note doesn't exist because range can be used to not only filter on dates, but other types as well such as integers and characters. ie It is not necessarily datetime aware.
In essence the problem comes down to this: where do you draw the line between 'interacting with users' where times are in a local timezone, and what is internal where times are in UTC? In your case, you could imagine a user entering in a search box to search for hour==3. Does that mean for example that your form code should do the conversion between hour==3 and the UTC equivalent? This would then require a special forms.HourField. Or perhaps the value (3) should be fed directly to the query where we know that we're searching on an hour field and so a conversion is required.
We really have to follow the documentation on this one.
Values which are going to be filtered against date/time fields using any of the specialised date/time filtering functions will be treated as being in the user's local time zone.
If using the range filter for dates no time conversions occur so you are expected to convert the user's entered local time value to UTC.

Django Timezone Handling with Postgres

I recently upgraded a Django project from 1.3 to 1.5 in order to start using the timezone handling. Because I am a moron, my Django timezone was set to "America/NewYork" instead of using UTC. Turning Django's timezone support on automatically sets Postgres to UTC. Now I'm running into issues with running direct SQL queries. I can't seem to get the timezone filtering correct. Here's what I'm doing:
Accepting a timestamp field from a user form
Swapping that to UTC in my view with stamp.astimezone(timezone('UTC'))
Passing that as a parameter (start.strftime('%Y-%m-%d %H:%M:00%z')) in a raw SQL query using a django.db.connection's cursor
The query that gets executed (logging to the console) looks correct:
SELECT to_char(created, 'YYYY-MM-DD HH12:MIam'),
COALESCE(heat_flow_1, 0.0) / 1000.0, COALESCE(heat_flow_2, 0.0) / 1000.0,
COALESCE(heat_flow_3, 0.0) / 1000.0
FROM results_flattenedresponse
WHERE installation_id = '66'
AND created BETWEEN TIMESTAMPTZ '2013-04-26 13:00:00+0000' AND TIMESTAMPTZ '2013-04-26 16:00:00+0000'
if I copy and paste that into PGAdmin, I get what I'm looking for, a set of rows starting at 9am EDT. However, the dataset that comes back from the django.db.connection's cursor has all of the dates pushed forward 4 hours (the difference between EDT and UTC). The rest of the data is actually correct, but the dates are being treated as UTC and pushed to the user's active timezone (even if I call deactivate). I feel like I have a mess of bad ideas wired together trying to fix this now, so I'm not sure what parts are good ideas and what are bad.
EDIT/ UPDATE
I've been trying a number of other things here and still getting bad results, but only when I query the database directly. The stuff in steps 2 and 3 above seems immaterial. The one quick fix I can see is actually setting the timezone in the query, e.g., SET timezone='America/New_York'; and then undoing it but that seems like a very bad idea for data integrity.
The other strange bit: I've set the Django timezone setting to UTC, but when I download a local copy and look at the data it's still marked as if it were set to America/New_York. I don't know if that's due to a setting on my local server or if there's a bug where the data isn't being localized properly by Django when it gets inserted into the second (non-default) database, though if that were the case I expect my problem would have gone away.
Since the server is now in UTC time, the created comes back as UTC by default, so the first bit needs to be
SELECT to_char(created AT TIME ZONE %s, 'YYYY-MM-DD HH12:MIam')
and pass in the timezone you want to see. This isn't the best approach in the world, but it works for me here because the queries are all run inside an object that has the relevant timezone as a property.

How should I store date/time objects in SQL?

I've had this question for a long time. The problem is this: most SQL servers I've worked with (namely MySQL) don't store timezone information with dates, so I assume they simply store dates as relative to the server's local timezone. This creates an interesting problem in the case where I'll have to migrate servers to different timezones, or if I create a cluster with servers spread out over different datacenters, or if I need to properly translate date/times into local values.
For example, if I persist a date like 2011-08-12 12:00:00 GMT-7, it is persisted as 2011-08-12 12:00:00. If I have an event which happens across the world at a specific time, I have to assume that my server is storing dates in GMT-0700, (let's not even add daylight savings time into the mix) then translate them into dates depending on each user's local timezone. If I have multiple servers storing dates in their own timezones, all of this fails miserably.
For frameworks like Hibernate and Django, how do they deal with this problem, if at all? Am I missing something, or is this a significant problem?
As I see it, the best choices are:
Convert all times to UTC when storing them in the database, and localize them from UTC for display
Store the UTC offset in minutes (at least one modern time zone is a multiple of ten minutes offset from UTC) in a separate column from the date/time
Store the timestamp as a string
In my current project we encountered this issue (we use Postgres) and decided to store all times in UTC and convert as needed in the application. No separate storage of the time zone for us. We also decided that all client-server interaction using timestamps would be in UTC, and that local time zones would ONLY be considered for user interaction. This has worked out well so far.
Since you tagged this question with Django, I'll add that the pytz module is extremely useful for dealing with locale timezone conversion.
Your answer for MySQL lies on this page MySQL Server Time Zone Support
Basically MySQL offers automatic timezone support for any fields that use UTC (timestamp) field but not for fields that don't (date, time, and datetime fields). For UTC fields you can set the timezone from the client using SET time_zone = timezone;. For nonUTC fields you must calculate it yourself.
You are so right I've often run into this and tend to look the TZ up and store it in a static "VARS" table so at least I can move it later.
Side notes:
Interesting the DATETIME manual doesn't even mention it
It is affected by NOW() and CURTIME() as shown here
HTH