Complex filtering queries with multiple attributes - amazon-web-services

I have a page where I am listing some entities and providing an interface with multiple filtering options. For simplifying the question, Let's say I am listing various movies on that page. So de-normalized row should look like this for a single movie entry;
producer_id: Partition Key - (e.g: PRODUCER#213141)
movie_id: Sort Key - (e.g: MOVIE#887347)
producer_name: (e.g: "Warner Bros")
movie_name: (e.g: "Harry Potter")
status: (e.g: "ON_SHOW")
publish_date: (e.g: "2020.01.01")
type: (e.g: "fantasy")
language:(e.g: "English")
I want to enable filtering by using a composite attribute used as a GSI secondary key. My composite attribute would look like something like this;
GS1SK: "harry_potter#2020.01.01#fantasy#English#ON_SHOW"
And the partition key for this secondary index would be simply the producer_id.
So let's say any user comes to the page and wants to filter out the movies with the given filtering options. Such an access pattern example would be;
Get all of the sci-fi movies produced by Warner Bros and has
the status ON_SHOW.
However, the problem starts here. Since the combined attributes are not hierarchical, the composite attribute can't be used for this scenario. Because in the above access pattern example, the user had the option to not specify any date, simply wanted to get results for any date range. But when you look at the structure of the composite attribute, it is impossible to not specify the date range for filter other attributes in advance such as movie type or movie language.
I know the DynamoDB is not the best fit for such complex querying, however, I think providing filtering options in a listing page is a really typical scenario that even the simplest products should provide. My question is, what kind of approach should I use to satisfy this filtering needs.
Maybe I am getting the idea wrong behind composite attributes?
Should I use filtering expressions and there is no way to do such advanced filtering with composite GSIs?
Maybe for such scenarios, I should be considering Elastic Search or AWS Athena services?
I will need to provide even more filtering options in the other pages of my application, such as filtering listed users with their demographic information. Do you think I should be considering migrating RDBS rather than using a NO-SQL database?
What I really want to do is to provide these filtering features without any filtering expressions to reduce RCU usage, and increase the efficiency in my queries. I would appreciate any kind of help and advice. Thanks.

It sounds like you understand composite attributes perfectly and have a solid grasp of your options. You've stumbled across one of the weaknesses of DynamoDB. It's challenging to support this kind of ad-hoc search functionality with DynamoDB.
I've seen this problem solved using a tool like Elastic Search (your option #3). A common pattern is to enable DynamoDB streams, which can be used to update the Elastic Search index. A bit more infrastructure to set up, but the search capabilities would be much more flexible than what you get with DynamoDB alone.

Related

How to set up many to one relationship in Sitecore

Let me preface by saying I'm pretty new to both Sitecore and C# so be gentle. I'm setting up products as Sitecore items for a new website we are working on. Each product could have dozens of replacement parts associated with it. Each part may or may not be associated with associated with multiple products. I'm trying to determine the best way to set this up taking into account creating and possibly reusing parts across products and also how best to associate parts with products.
Use a list Data Type like MultiList and Treelist (with Search) on the Product Template for the parts.
Avoid lots of items in a single folder this can give performance issues in the CMS. (up to 100 is okay). So create a tree structure for te parts if you have many parts, Or use a bucket. Bucket are good for many thousands of items.

Couchdb "Get By Type" view design

So for my documents, i have a type property that defined them. And almost for all of these 'types', I have to have a 'get by type' call..
Now the question is which one of these design is more efficient;
Have a single view that has a key with the 'type' that maps all of the documents
Have a view for each 'type' that just maps those types, and I can query the view to get all the documents in the view?
It depends from how many "types" you have in your db. If few - go with "view per type" approach and you'll be fine and have nicer API URLs.
However, when you'll have around 70 types (my case) of documents within single database it would be too oblivious to understand that this approach isn't work anymore and you need one single view to filter docs by type - you'll never forget to add your special view for new doc type, you don't need to cleanup outdated views. As bonus feature, having single view allows you to retrieve docs of multiple types with single request and have only one replication task that syncs multiple types of docs between databases. Same is true for every other fields that are common for a every or most part of docs (like author, updated_at etc.).
Final decision is yours, but better to take the way that will free you from additional work and one additional query parameter is not much higher cost to have relax.
I think the latter is best. Have a view for each type that queries/filters for that particular type. This allows you, from the Futon views drop down, to very quickly display lists of docs of the particular type(s). Almost like you're looking at "tables". But not really ;-)

How to I combine Page-views for a URL when they have different query strings in Google Analytics?

I am trying to do some reporting on page views on a site and the results are being listed like the following:
www.example.com/directory/ - 100 views
www.example.com/directory/?id=123456 - 10 views
www.example.com/directory/?id=987654 - 5 views
What filter do I need to create to views the results as:
www.example.com/directory/ - 100 views
www.example.com/directory/?id=* - 15 views
Thanks in advance
Yes, getting historical grouped together is going to mean using something like Google Docs, Excel, Tableau Software, Analytics Canvas, etc.
Moving forward...
One of the simplest ways of keeping things grouped in GA is to set up an advanced profile filter. You'll want to use this with a new profile; keeping a "raw" or "empty" profile is highly advisable for when you actually want to look at those individual URLs.
That said, here's a filter pattern that should work for you:
Go to Admin > Filters (under the View Column)
+ New Filter > Create new Filter > Name it
Filter Type = Custom filter > Advanced
Here's the pattern:
Field A: www\.example\.com\/directory\/\?id=.+
Output To: www\.example\.com\/directory\/\?id=\*
Another way to aggregate the same URI with multiple query strings is to change the primary dimension to 'Page Title' under Behavior > Site Content > All Pages.
The best way to do this for your historical data is unfortunately in an excel pivot table. You can get in in the UI, but only by creating a custom report and searching for very specific directories.
Check out the documentation on excluding query strings in your GA profile. Maybe create a new profile and write an advanced rule to rewrite all "id" pages to "/directory/product-page".
A totally different approach is to use custom variables or custom dimensions and to stop looking in the normal "Behavior" reports section (used to be called "Content" in GA) – custom dims are available using Google Analytics Universal Analytics only, which means starting a new web property and possibly running both code snippets concurrently (totally safe to do).
Personally I find custom dimensions a bit easier to work with than custom variables, and I generally think that it's a good idea to start exploring the new Google Analytics.
The nice thing about either of these approaches is that you can still keep the full page path date in the same profile as your custom dimension / variables information; it'll stay in the Behavior section where it belongs with all the other page paths.
Where I'm going with this...
You can create a new dimension such as "page type" and then call it "products", "posts", "articles", or whatever these id #s represent in this /directory/; then you can look at metrics across the dimension like pageviews, time on page, etc. by page type.
You can even create other dimensions to help describe them in more detail, such as breaking down blog posts or products into their different categories; i.e. hierarchical dimensions. Once you start using this kind of thing you may wonder what you ever did without it!
I think it's fair that I stop this answer now since it's not about how to set up custom variables or custom dimensions; those links should get you started (it's really not difficult).
Note: You can use php to fill in the dimension information in the GA tracking snippet dynamically based on the page that is being viewed (again, that's another question).

Stackoverflow like tag system form for django?

What I am trying to create is a site for resources. Basically, you add resources such as books and videos via links. Now, with any resource site that caters to a variety of resources, you need to tag them in order to understand what kind of resource you are using.
For example, if you make notes on something like Chemistry or key points from a talk on lets say "Django", then these are text documents. Thus you would want them inside a TEXT TAG.
So, when you are making a form for this kind of thing, what form field would you use? For example, by knee-jerk approach is to simple make a text area field, and then separate the different tags via comma. Now, this can be prone to many problems, I'd just like to know what is the best approach to take to solving this problem? Basically, an easy way to validate the data input? Would forms.ChoiceField be the best approach to the problem or is there something else that is superior?
https://www.djangopackages.com/grids/g/tagging/ is your best bet, most specifically https://github.com/alex/django-taggit. If you want to run your own tagging system, take a look at the source code for some ideas.
EDIT: The easiest way to display this in a form would be to use a ModelMultipleChoiceField. This allows you to select multiple tags for a single resource, and handles server-side validation and conversion to the actual Tag instances. However, I think most people would agree this option looks hideous, and it is certainly not user-friendly if there is a large amount of possible tags.
If you're using jQUery, another option is to use Django_select2. This is what I have personally used in a similar situation, and it handles a large amount of possible tags very well. Django_select2 is a thin wrapper around jQuery's Select2 plugin, with a bit of added functionality (most notably the AutoView and AutoModelSelect2Field). This provides a hybrid between a text field and a select list, allowing you to search all tags and easily select multiple tags. See http://ivaynberg.github.io/select2/ for examples of what you can achieve.

django design pattern/best practice: filtering a queryset

I have a page where I'm displaying the results of a queryset to the user.
What i'd like to do is allow the user to click on a link in order to apply a filter.
Currently what I do is have the links pass "get" parameters to the page in order to apply filters. The filters can be references to other models or custom filters (e.g. an unassigned filter)
In order to provide a decent user experience the implementation needs to do a few things
in the view:
check that the filter parameter passed is valid
check what type of filter it is (based on other models or a custom filter) in order to apply the correct condition to the queryset
(optional) a way to make the filters cumulative (i.e. you can keep adding filters)
in the Template:
display the correct resultset based on the filter choosen
when displaying the filters, recognize which filter we have applied so that the current applied filter is displayed as text not a hyperlink.
I'm thinking this must be common enough that someone must have like a design pattern or best practice figured out for this other than the obvious whack of if/else statements in the view and the template.
is there?
I find the way the Django admin handles this kind of functionality a great pattern. If you're not familiar, check out the list_filter option in the admin. It's similar to what you're describing, but yours is a bit more generic. Perhaps this will help you ponder some ideas?
First, for the actual querystring chunk, you're simply passing the Django-ORM lookup key and value pair. e.g., ?sites__id__exact=1, tags__in=words, etc. Since you want to allow for cross-model lookups, you'd need to provide another parts in the string to include the model name, not too tough.
For checking if the filter is valid, you can simply ensure that the model/field lookup is valid. By splitting the parts of each QS chunk, you can identify the model, the fieldname, the lookup, and the value. Then, use Django's built-in functionality to validate that fieldname exists on model. You can do this with ForeignKey's too. Here's how Django does it
You can keep adding filters pretty easily to this. You'll be providing your view and the form that's displaying these filters with some context, so it'll persist and re-populate for the user. Also, you could just as easily persist the query string. Basically, you'd have the same read / parsing functionality here at all times, nothing really different.
I think the keys are automating and keeping it as DRY as possible. Don't succumb to a bunch of if statements. It's really easy to pass these lookups into the ORM, safely too, and it's really easy to catch bad lookups and provide the user with a meaningful error message.
I hope that helps you on your path! :)