getstream aggregate feed - group by time with strftime - strftime

My goal is to aggregate activities together, but only if they happen within x amount of minutes within each other.
It looks like I need to use the strftime syntax. Using this syntax is it possible? I am also confused about which activity property I assign this group string to.
Thanks!

You can use strftime to ensure activities that happen in the same hour or in the same day are grouped together. It needs to be a fixed window though, it can't be based on the distance between activities.
This blogpost explains the underlying system for aggregation:
http://blog.getstream.io/aggregated-feeds-demystified/
Note that on enterprise plans we can setup custom aggregation formats that can handle more complex use cases. (such as the grouping you're looking for)

Related

what is the right way to query different filters on dynamodb?

I save my order data on dyanmodb table. And the partition key is orderId, sort key is timestamp. Each order has many other attributes like category, userName, price, items, status`. I am going to build a filter service to let clients query order based on these attributes. Also I'd like to add a limit on the query for pagination. But I find some limitations on dynamodb.
In order to support querying different fields, I have two options:
Create GSI for each attribute. It is very expensive but it supports query each attribute very performance. This solution doesn't support combine multiple attributes in the filter.
Attach a filter expression on the SCAN to include attribute condition. SCAN is not very performance in the first place. Also the filter expression is applied after limits. Which means it is very likely to response less than users request limits.
so what is the good way to achieve this in dynamodb?
There is unfortunately no magic way to solve your problems. There is no DynamoDB feature which you missed. Indeed, as you said, making each of the attributes available for efficient queries requires a GSI which will cost you additional money - but that's reasonable. Indeed, as you said, there is no efficient way to search for an intersection of requirements on two different attribute. And indeed, the "limit" feature doesn't quite do what you want and you'll need to emulate your page size need in the client code (asking for more pages until your desired amount is recieved), potentially with unacceptably high latency.
It sounds that what you really need is a search engine. These have exactly the features that you asked for. You'll still be paying for these features (indexing of individual columns still takes up CPU and disk space, intersection of multiple attribute searches still requires significant work at query time) but search engines are designed for exactly these operations, and do them more efficiently and with lower latency (which is important for interactive searches, which are the bread-and-butter of search engines).
You can add the limit for pagination using the limit attribute in the query. But can you please be more specific about your access patterns, is your clients going to query all the orders or only the orders belonging to them ?

Kendo datasource and templates, specifically for Scheduler widget

I've noticed that the scheduler widget does things a bit differently from other widgets. In fact, I read in the documentation that the DS is a different one:
"http://docs.kendoui.com/api/framework/schedulerdatasource"
Anyways, on to my two questions.
when i was doing a template for the day cells, i noticed that if i used value called 'date' it would automatically use the correct date value for that day cell. But i never created this date variable, not did i include it in my datasource. So where did it come from? if its provided through framework, what other values similar to this one are available to me? where can i find some documentation on this?
For kendo widgets, when you apply a datasource and a template, it automatically maps each datasource item to one item in the widget (e.g. one row in the grid, one item in the list view etc.). its a one to one correlation. But this is not the case for the scheduler datasource since, like i stated above, it is a different type of datasource (its a schedulerdatasource). The scheduler datasource mandates that each item in the datasource have a start date and an end date so it can map it to the corresponding cell. hence, this destroys the one-to-one relationship of datasource item to day [template]. How can i revert to the behavior of the datasource with other widgets? do i have to somehow configure it to overwrite the schedulerdatasource to the original datasource? i want to preserve the correlation behavior of 1-to-1 between my datasource and my day template.
just to give a generic example of what i am trying to accomplish with this, imagine that instead of doing entrys with time slots, i want to instead have my scheduler display daily summaries of how many hours i worked out, how many calories i ate, amount of hours i slept etc. But i do not want to associate those amounts with hours in the day.
--
Sorry that was technically more than two questions.
But thanks in advance!
-B
Straight to your questions:
The options available in the eventTemplate are listed in the documentation.
The SchedulerDataSource does one thing more than the regular DataSource - it expands recurring events. This means that for one event which says repeats two days the SchedulerDataSource creates two data items - one for each day. If you don't have any recurring events then you would have the one-to-one mapping. The scheduler can only be bound to a SchedulerDataSource instance (it will throw an exception otherwise).
It looks that the scheduler may not be the widget you are looking for. If you just want to display a list of items the ListView or Grid widgets may be a better fit.

Django 1.6 filter by hour and time zone issue

In my application I use time zones (USE_TZ=True) and all the dates I create in my code are aware UTC datetime objects (I use django.util.timezone.now for the current date and the following helper function to ensure all the dates in my instances are what I expect:)
#classmethod
def asUTCDate(cls, date):
if not timezone.is_aware(date):
return timezone.make_aware(date, timezone.utc)
return date.replace(tzinfo=timezone.utc)
I also enforced the check of naive/aware dates using this snippet (like suggested in the doc):
import warnings
warnings.filterwarnings(
'error', r"DateTimeField .* received a naive datetime",
RuntimeWarning, r'django\.db\.models\.fields')
As I understood so far, this is the right way to proceed (this is a quote from the django documentation: "The solution to this problem is to use UTC in the code and use local time only when interacting with end users."), and it seems that my app is handling dates very well… but I have just implemented a filter against a model that makes use of the Django 1.6 __hour and it force the extraction based on the user timezone, the result is something like:
django_datetime_extract('hour', "object"."date", Europe/Rome) = 15
but this breaks my query, since some results I was expecting are not included in the set, but when I use a __range to search between dates it seems to work as expected (objects with a date in the range are returned)… so it seems to me that Django takes into account timezones in queries only for the __hour filter… but I don't understand why… I was supposing that UTC is used everywhere except in templates where the displayed dates are formatted according to user tz, but maybe that's not true.
So my questions are: is the way I'm working with time zones right? Is __hour filter wrong or what?
It seems as though you're doing the right thing with dates. However, all the documentation for any date related functionality, such as filtering by hour include this note:
When USE_TZ is True, datetime fields are converted to the current time
zone before filtering.
For the range filter, this note doesn't exist because range can be used to not only filter on dates, but other types as well such as integers and characters. ie It is not necessarily datetime aware.
In essence the problem comes down to this: where do you draw the line between 'interacting with users' where times are in a local timezone, and what is internal where times are in UTC? In your case, you could imagine a user entering in a search box to search for hour==3. Does that mean for example that your form code should do the conversion between hour==3 and the UTC equivalent? This would then require a special forms.HourField. Or perhaps the value (3) should be fed directly to the query where we know that we're searching on an hour field and so a conversion is required.
We really have to follow the documentation on this one.
Values which are going to be filtered against date/time fields using any of the specialised date/time filtering functions will be treated as being in the user's local time zone.
If using the range filter for dates no time conversions occur so you are expected to convert the user's entered local time value to UTC.

Custom Date Aggregate Function

I want to sort my Store models by their opening times. Store models contains is_open function which controls Store's opening time ranges and produces a boolean if it's open or not. The problem is I don't want to sort my queryset manually because of efficiency problem. I thought if I write a custom annotate function then I can filter the query more efficiently.
So I googled and found that I can extend Django's aggregate class. From what I understood, I have to use pre-defined sql functions like MAX, AVG etc. The thing is I want to check that today's date is in a given list of time intervals. So anyone can help me that which sql name should I use ?
Edit
I'd like to put the code here but it's really a spaghetti one. One pages long code only generates time intervals and checks the suitable one.
I want to avoid :
alg= lambda r: (not (s.is_open() and s.reachable))
sorted(stores,key=alg)
and replace with :
Store.objects.annotate(is_open = CheckOpen(datetime.today())).order_by('is_open')
But I'm totally lost at how to write CheckOpen...
have a look at the docs for extra

Google Analytics exclude empty custom variable in a custom report

I have a custom variable set for all visitors; for our registered users it's some value, for unregistered users, it's empty.
I can find unregistered users in an advanced segment using the settings Exclude Custom Variable (Value 02) Matching Regexp .+ -- works brilliantly.
But I need a report of unregistered visitors for a dashboard, and tried to do the same thing with a filter. I have a metric of Visits and a dimension of something all vistors will have (e.g. Browser). My filter is identical to the one in the advanced segment, but ... not brilliant. I get no visits. I have tried to Include with a regex ^$ but no love there, either.
Any ideas what I am doing wrong?
To understand your problem and the solution yourself, let me illustrate how the data recording works in any collection process (Google Anlaytics is one of the tools used for data collection and analysis):
To record and analyse data, you first decide what you want to record, and then how. Maybe this how is where Google Analytics comes in for you. So, the data that you want to see is the metric, it can have a name and a (usually numeric) value, and each dimension is how you want to separate or drill down into the various views of the data. As an example, if you want to know how many visitors visited your site everyday, and you want to be able to see through which source they came, Daily Visitor Count is your metric and Source is your dimension.
The important thing to understand here is that Dimensions and Metrics are not bound together. What I mean here is that just because you decided that Daily Visitor Counts should be viewable by Source, doesn't add a source to every updation of the Daily Visitor Count metric. In order to view the metric by the dimenision, you need to update a value for the dimension every time you record the metric.
If you don't record a dimension for a metric, then you cannot obtain the value of the metrics for which you didn't record a dimension by applying a filter on the dimension. Because, using a dimension filter only lets you access the values recorded for the dimension, and not all metrics, because, dimensions don't contain values of metrics, only metrics can optionally contain values for dimensions.
So when you query "dimension equals regex +*", it works, with both include and exclude, but you cannot query metrics with empty dimension using a dimensional filter. The best way would be to only add a standard or default value for the dimension every time you record the metric so that you can separate, something like (not set) or unknown.
Hope that helps. :)
I just hope you understand what you were trying to do is conceptually wrong, though it could still have been made technically feasible.