timezones and doing analytics on tables - ruby-on-rails-4

This strange behavior has recently came to my attention, while I was testing my Rails app on local environment in which I use around_filter to set the timezone to registered user (the default timezone is UTC).
What I did was that I registered a new user in my app. My current time was 10pm GMT-5 (March 3), and this user's created_at time was saved to database to 4am UTC (March 4). Now, I know that this time is saved in database with the timezone settings, but here comes the problem:
I use a graph for visual representation of daily registered users, and when I called the following function to tell me number of users registered in the last few days:
from ||= Date.today - 1.month
to ||= Date.today
where(created_at: from..to).group('DATE(created_at)').count
It would say that this user was registered in March 4, while it was in fact registered on March 3 from my perspective.
My question is:
How should I call where function and group by a created_at column, so that the dates with be affected correctly (according to my timezone) ?
Or is there something else that I should be doing differently?

I'm not a rubyist, so I'll let someone else give the specific code, but I can answer from a general algorithmic perspective.
If you're storing UTC in the database, then you need to query by UTC as well.
In determining the range of the query (the from and to), you'll need to know the start and stop times for "today" in your local time zone, and convert those each to UTC.
For example, I'm in the US Pacific time zone, and today is March 7th, 2015.
from: 2015-03-07T00:00:00-08:00 = 2015-03-07T08:00:00Z
to: 2015-03-08T00:00:00-08:00 = 2015-03-08T08:00:00Z
If you want to subtract a month like you showed in the example, do it before you convert to UTC. And watch out for daylight saving time. There's no guarantee the offsets will be the same.
Also, you'll want to use a half-open interval range that excludes the upper bound. I believe in Ruby that this is done with three dots (...) instead of two (at least according to this).
Grouping is usually a bit more difficult. I assume this is a query against a database, right? Well, if the db you're querying has time zone support, then you could use it convert the date to your time zone before grouping. Something like this (pseudocode):
groupby(DATE(CONVERT_TZ(created_at,'UTC','America/Los_Angeles')))
Since you didn't state what DB you're using, I can't be more specific. CONVERT_TZ is available on MySQL, and I believe Oracle and Postgres both have time zone support as well.

Date.today will default to your system's set timezone (which by the way should always be UTC, here's why) so if you want to use UTC, simply do Time.zone.now.to_date if rails is set to UTC
Otherwise you should do
Time.use_zone('UTC') do
Time.zone.now.to_date
end
After this you should display the created_at dates by doing object.created_at.in_time_zone('EST')
to show it in your current timezone

Related

Storing wall-clock datetimes in Django/Postgres

I want to save a future wall-clock datetime for events in Django (I have timezone string stored separately).
I can't simply use the DateTimeField because it enforces timestamp with time zone and always saves time in current timezone. It doesn't handle DST or possible timezone changes between current date and the date of actual event.
I could use any of these options:
Pick any timezone to store timestamps and always throw this timezone away before applying actual timezone in Python.
Split timestamp to DateField and TimeField.
Store datetime as string.
Custom field that stores datetime as timestamp without time zone.
but it makes queries more difficult and seems quite weird.
Are there any better options I miss? This usecase seems quite common so I guess there is a better way to do that?
EDIT: my usecase:
Let's say my user want to book an appointment to 2019-12-20 10:00 and currently it's 2019-03-10. I know the timezone of this user (it's stored separately as string like 'US/Eastern').
If I assume that EST starts at November 3, 2019, the best I can do is to store timestamp to 2019-12-20 15:00:00+00:00 (or 2019-12-20 10:00-05:00. I don't want this because:
I have no idea if my tzdata has correct information for future datetime
Even if it currently does, I have no idea if there would be any unexpected change in US/Eastern timezone and it becomes worse when it's not US. Future DST changes are not guaranteed.
If user moves to different timezone, I'll have to recalculate every single appointment while taking care about DST.
If tzdata changes during this recalculation... let's not think about that.
I'd prefer to store future dates as naive datetime + timezone string like 'US/Eastern' and (almost) never construct tz-aware datetime for any date further than a week. Django + postgres currently forces me to use timestamp with time zone, which is great for logs and past events, but it has fixed offset (not even timezone name) so it doesn't fit for future wall clock datetimes.
For this usecase, let's say that I don't care about ambiguous times: not much users want to book at 02:00 AM.
I see a few possible solutions:
Set USE_TZ = False and TIME_ZONE = 'UTC' and use calendar times. No conversions will be done, so essentially you're just storing the calendar time and getting it back as a naive datetime. The main problem is that this setting is global, and is not a good one for many uses (e.g. auto_now).
As above, but set USE_TZ = True. As long as you express your calendar times in UTC, there won't be any untoward conversions. The problem here is that you'll be getting aware datetimes, so you'll have to take care to ignore or remove the time zone everywhere.
Use separate DATE_FIELD and TIME_FIELD. This may or may not be a good solution depending on what kind of queries you're trying to run.
Create your own field that uses timestamp without time zone. (Or perhaps it already exists?)
Note that this issue has nothing to do with past versus future. It's about wanting to use a fixed moment in time versus a calendar (or wall clock) time. The points you raised are certainly valid objections to using a point in time to represent a calendar time.

How to set Facebook SDK queries specific to time zones?

I want to be able to pull data depending on the time zone I'm currently located in.
For example, I have this query at the moment:
$response = $fb->get('/pageid/insights/page_impressions?period=day')
And I get this response:
But how would I go about showing the data so that it's in Eastern Time (ET) format. Otherwise known as EDT or UTC -4? I'm assuming it's an additional "parameter" added to the query, but what would it be?
Unfortunately, you can't. All data is aggregated by days according to a fixed offset of UTC-7 (even when the Pacific time zone is on UTC-8).
You could adjust the time zone of the resulting timestamp, but that would be misleading, as the value totals would now not truly be matching the days total for the time zone specified.
Really, an API like this (or any operation grouping timestamps by date) should consider a time zone - and that time zone should be specified by full IANA time zone identifier, such as America/New_York. Consider that UTC-4 is not sufficient, because US Eastern Time alternates between EST (UTC-5) and EDT (UTC-4).
You could request Facebook add this feature, but AFAIK they do not currently offer it.
See also this related answer.

Django use UTC offsets for current timezone

Django attempts to address the timezone problem by storing dates internally in UTC and converting them to the client's timezone for display. This sounds fine and good in theory, until you realize two major things:
Many timezones can and do exist inside of the same UTC offset.
Since there are no timezone HTTP headers, we need to determine the timezone of the client manually, and this requires the use of JavaScript. However, JavaScript can only reliably determine the UTC offset of the client and may not guess the correct timezone.
With these two problems in mind, I assume a simple solution would be to ignore timezones, DST, etc. altogether and rely instead on the client's current UTC offset. On each page load, JavaScript on the client would update the client's cookie with the client's current UTC offset and middleware in Django would load that value for each request.
Here is the problem: Django makes use of get_current_timezone() which retrieves it's data from the value set when timezone.activate() was last called. timezone.activate() takes a timezone object as an argument.
Is there a way to use timezone.activate() with only a UTC offset?
The solution you describe, of getting the client's current UTC offset and sending back to the server, either via a cookie, or some other mechanism, is a common approach. Unfortunately it's flawed. Just because people do this doesn't make it a good idea.
The problem is that the offset you gather from the client is for a specific moment in time. However, you may not be working with that same moment in time on the server.
For example, you might call new Date().getTimezoneOffset() on the client, which gives you a value of 480, which is 480 minutes West of UTC, or UTC-08:00 (note the sign inversion). So you pass 480 to the server, load a date from the DB in UTC, and apply the offset. Except, perhaps the date you loaded was from several months ago, and the client's offset for that date was UTC-07:00. You have therefore applied the wrong offset, and produced a resulting value that is an hour off from what it should be.
A time zone cannot be identified by an offset alone. A time zone identifier looks like "America/Los_Angeles", not just UTC-8. This is a very common mistake. Read more under "time zone != offset" in the timezone tag wiki.
There are only two correct ways to handle this scenario:
Use a library like jsTimeZoneDetect or moment-timezone to guess the time zone of the browser, then let the user pick their time zone, defaulting to the guessed value. You can then use the selected or guessed time zone in your server-side code with Django or whatever.
Send only UTC to the client, do the conversion from UTC to local time in the browser using JavaScript. (The browser understands the behavior of the local time zone where it is running, even if it has trouble identifying it.) The catch here is - older browsers might possible convert older dates incorrectly, due to this bug. But for the most part, this is still a reasonable approach.

How to communicate time data between different zone?

In my Django app I've got a Task model with some date and time fields:
class Task(models.Model):
date = models.DateField()
start_time = models.TimeField(help_text='hh:mm')
end_time = models.TimeField(help_text='hh:mm')
# more stuff
I'll send some Task instances to some Android clients that will be in a time zone (TZ1) different from my server time zone (TZ2).
The start_time and end_time fields must be set to the target time zone (TZ1), i.e. if I enter '13:00' in the start_time field in the Task admin, it should be '13:00' in TZ1.
How can I set the start_time and end_time values to be TZ1 times? If I leave the values entered in the default admin I guess the times will be set to the server time zone (TZ2), right?
Then what's the best format to send these values (through JSON) to the Android clients to get the correct TZ2 time?
Now I'm using Python Datetime's isoformat(), which gives something like
2013-02-11T13:17:23.811680
but it has no time zone data...
This is not the best way to handle timezones.
The best way is to convert times to UTC as early as possible and convert them back as late as possible.
In other words, if I enter the current time here as Feb 11, 21:03, it should never be stored like that. Instead it should be changed to UTC before anything else happens.
That's so, no matter what happens with it, it's correct. If I send it to Inner Mongolia, it should stay as UTC right up until the point someone wants to look at it. Then and only then should it be converted (and for display only).
Following that rule will save you a lot of grief in any software that has to work across multiple timezones. Trust me on that, we fixed a major Telco up after they'd implemented some hideous system that sent timezones across the wire, meaning that every point had to be able to convert to and from every timezone.
Getting them into UTC as quickly as possible, and only getting them back on demand, saved bucketloads of time and money.

Convert datetime from one timezone to another (native C++)

Customers from around the world can send certain 'requests' to my server application. All these customers are located in many different time zones.
For every request, I need to map the request to an internal C++ class instance. Every class instance has some information about its 'location', which is also indicated by a time zone.
Every customer can send requests relating to instances belonging to different time zones. To prevent my customers from converting everything themselves to the time zone of the 'target' instance, I have to convert everything myself from one time zone to another. However, I only find in C++ (unmanaged, native) functions to convert times between local time and GTM, but not from/to a time zone that is not your current time zone.
I could ask my customers to send every date time in UTC or GTM, but that does not solve my problem as I still have to convert this to the time zone of the 'instance', which can be any time zone in the world.
I also don't seem to find a Windows function that does this. What I do find is a managed .Net class that does this, but I want to keep my application strictly unmanaged.
Are there any Windows (XP, Vista, 7, 2003, 2008) functions that I can use (and which I overlooked in the documentation), or are there any other free algorithms that can convert between one time zone and the other?
Notice that it is not the GMT-difference that is posing the problem, but the actual DST-transition moment that seems to depend on the time zone. E.g:
Western Europe goes from non-DST to DST the last Sunday before April 1st.
USA goes from non-DST to DST the 2nd Sunday after March 1st.
China has no DST.
Australia goes from non-DST to DST the 1st Sunday after October 1st.
All this DST-transition information is available somewhere in the Windows registry. Problem is: which Windows function can I use to exploit this information.
I don't know of a way to extract information about other TimeZones via the API: I've seen it done by querying the registry though (we do this in a WindowsCE-based product).
The TimeZones are defined as registry keys under
HKLM\Software\Microsoft\Windows NT\Current Version\Time Zones
Each key contains several values, and the one which tells you about offsets & Daylight Savings is the TZI key. This is a binary blob, and it represents this structure:
typedef struct
{
LONG m_nBias;
LONG m_nStandardBias;
LONG m_nDaylightBias;
SYSTEMTIME m_stcStandardDate;
SYSTEMTIME m_stcDaylightDate;
} TZI;
Look up MSDN's TIME_ZONE_INFORMATION page (http://msdn.microsoft.com/en-us/library/ms725481(v=VS.85).aspx) for how to interpret the Bias fields, and especially the StandardDate and DaylightDate fields -- they are gently abused to support constructs like "the last Saturday in April".
HTH