Google Datastore ancestor query returning data too far down - google-cloud-platform

I have an "inbox/messaging" structure that I'm working on, that allows for multiple kinds of parents. As in, people can leave comments on a few different kinds of objects. For this example, let's say someone is leaving a comment on a Article object.
The way we've formatted our data, the comment is created as a Message object, and that object is a child of Article (and Article is a child of Account). So when we query for the list of messages, we simply ask for all Messages that are children of that instance of Article. That looks like this:
Message.query(ancestor=source_key)
source_key here is the Key of the article we're viewing.
Great, this works really well and is pretty fast.
Now we want to add replies to those Message objects. I figure we'll just store replies the same way we add Messages to Articles. Which is to say, a Reply is simply another instance of Message, and the parent of that reply object is the message it's replying to. So basically, instead of leaving a comment on an Article, you're leaving a comment on a Message.
This sounds good on paper but it seems that in practice, the Key it ends up getting is structured like so:
Key('Account', 5629499534213120, 'Article', 5946158883012608, 'Message', 6509108836433920)
Which turns out, when we query for the list of messages, it return the replies as well in the response, as if they aren't replies at all.
Some questions:
Is there any way we can do like a "shallow" query? To strictly get only the immediate children of that Article?
I've read more on how ancestor queries work and because ancestor queries have a 1 write per second limitation, I'm now wondering if it may be better to change how we store this to where a Message is not the child of Article, and instead maybe have a KeyProperty of Article exist on Message, if that makes sense. And maybe no parent for Message. There could be lots of people leaving a comment on an article, or also lots of people leaving replies to those comments. But even so, Article is a child of Account too, along with a lot of other kinds of objects, and generally we don't run into any issues with lots of different writes. So would we even run into this write limitation?
EDIT: I've moved on a little bit and am trying to query only replies for a given message, so I'm looking for all messages that have a parent (ancestor) of another message.
So given this key as the ancestor: Key('Account', 5629499534213120, 'Article', 5946158883012608, 'Message', 5663034638860288)
I query our message table, and I get back that exact same key (as well as other messages). How is that possible? If I'm specifying an ancestor, in what world does it make sense that I would get back the same object I'm using to query the ancestor with? The parent of that message is just:
Key('Account', 5629499534213120, 'Article', 5946158883012608)
So, obviously the ancestor doesn't strictly match there. Why would my query return it then? Hastebin of what, basically, I'm running into: https://hastebin.com/karojolisi.py

Regarding the question on write limitation, if you are using the Cloud Firestore in Datastore mode, then the limitation of 1 write per second is by entity and not entity group.
See https://cloud.google.com/datastore/docs/firestore-or-datastore
"Writes to an entity group are no longer limited to 1 per second."
and https://cloud.google.com/datastore/docs/concepts/limits
"Maximum write rate to an entity" is "1 per sec"
So, irrespective of which approach you take, with datastore mode, writes shouldn't be a concern as the messages and replies are not expected to be edited. Unless of course, if you have any kind of aggregate information like the number of replies for a given message which require updating the parent message record with each child record.
Regarding your main question of querying only the messages for an article and not their replies, one option is to have a field called article_id and populate this only for the top level messages and have this also in the index (prefix of the ancestor composite index). The reason to recommend article_id and not a boolean is, since this field is indexed, it is better to have the field not be based on a narrow range of values.
The reason to prefer this approach to storing the messages in a separate table is that all messages belonging to an article will be stored close by with the initial approach and that is better for read performance.

Related

Django Master/Detail

I am designing a master/detail solution for my app. I have searched for ever in the django docs, also here and elsewhere I could, so I guess the answer is not that obvious, despite being an answer many people look for - not only in django, but in every language, I think.
Generally, in most cases, the master already exists: for example, the Django Docs illustrate the Book example, where we already have an Author and we want to add several Books for that Author.
In my case, the parent is not yet present on the database think of a purchase order, for instance.
I have thought to divide the process in two steps: the user would start to fill in the info for the master model (regular form) and then proceed to another view to add the lines (inline formset). But I don't think this is the best process at all - there are a lot of possible flaws in it.
I also thought about creating a temporary parent object in a different table and only having a definitive master when the children are finally created. But it still doesn't look clean.
Because of that, for my app it would be ideal to create the master object at the same time as the detail objects (lines) - again, like an order.
Is there a way where I can have the same view to manage both master and detail? Like this I would receive both in the same POST request and it would make a lot more sense, not to say it would be much cleaner.
Sorry if it's too long, and thank you in advance!
So I found out that in my case the process could actually be split in two phases.
For this I simply use the traditional model form and inline formset.
But! I also found out that there could be several answers to this:
We could get crazy and build some spaceship in AJAX that would get the job done, simply by sending a JSON object (in which the lines could be an array of objects)
Django also has its ways and it's possible to send multiple forms in the same request! (thank you #mousetail for the tip).
Of course, be there as it may, there are many ways to build a house, these are just the ones I found out.

Decide which model to retrieve data on Django Rest Framework

I'm trying to build a simple API which should do the following:
A user makes a request on /getContent endpoint, with the geographical coordinates for this user. By content, it means audio files.
Every time they send a request, we should just get a random object from a Model and return the URL field from it, to be consumed by the front end. For this, it can be a random one, it doesn't matter much which one.
Also, we should keep tracking about the requests each user makes. This way, we can check how many requests the user has made, and when they were made.
Every 5 requests or so, the idea is to send the user a customized content based on their location. My idea is to store this content in another model, since it would have many more fields in comparison from the standard content.
Basically, at every request, I'd check if it's time to send a special content. If not, just send the random one. Otherwise, I'd check if the time is appropriate, and the user is within a valid location based on the special content's data in the model. If this validation passes, we send the URL of the special content, otherwise, we just send the random one.
I'm having a hard time figuring out the best way to design this. My initial idea is to have two different models:
Model 1: Standard content. It has some fields to its meta data, such as duration, title and other stuff like this.
Model 2: Custom content. Besides the meta data, it should contain the geographical data, and the datetime range. This will allow the checking to be made if the content should be played or not.
Now it's the part I'm pretty much clueless. How to make it all work together?
QUESTIONS
Maybe storing every single request data from every user, and checking this data might not be very effective. It would require some writing at every request instead of just reading.
Since I'd be using two different models, how can I make the decision to happen in the view? I mean, the final output would be the same, an URL. But I'd have to make the decision process to happen in the view on which model to use.
I appreciate the help!

How to extend the event/occurrence models in django-scheduler

I'd like to augment events/occurrences in django-scheduler with three things:
Location
Invitees
RSVPs
For Location, my initial thought was to subclass Event and add Location as a foreign key to a Location class, but my assumption is that each occurrence saved won't then include Location, so if the location changes for one occurrence, I'll have nowhere to store that information.
In this situation, is it recommended to create an EventRelation instead? Will I then be able to specify a different Location for one occurrence in a series? The EventRelation solution seems untidy to me, I'd prefer to keep models in classes for clarity and simplicity.
I think Invitees is the same problem, so presumably I should use a similar solution?
For RSVPs, I intend to make an RSVP class with Occurrence as a foreign key, and as far as I can tell that should work without any issues as long as I save the occurrence before attaching it to an RSVP?
I've read all the docs, all the GitHub issues, various StackOverflow threads, the tests, the model source, etc, but it's still unclear what the "right" way to do it is.
I found a PR which introduces abstract models: https://github.com/llazzaro/django-scheduler/pull/389 which looks like exactly what I want, but I'm reluctant to use code which was seemingly abandoned 18 months ago as I won't get the benefit of future improvements.
EDIT: I'm now thinking that another way to do this would be to have just one object linked to the event using EventRelation, so I'd have an "EventDetails" object connected to the Event via EventRelation, then include FKs to Location, Guests, etc from that object.
I should then also be able to subclass my EventDetails object with different kinds of events and attach those too. I'll give it a go ant see if it works!
Just in case anyone find this and is wondering the same thing: I ended up ditching Django-scheduler and using Django-recurrence instead. Had to do a bit more work myself, but it was easier to create the custom event types that I was looking for. Worked pretty well!

Role of django filters. Filtering or upfront formatting within the view?

I would like to hear your opinion about this.
I have a django application, where the data obtained from the model are rough. To make them nicer I have to do some potentially complex, but not much, operation.
An example, suppose you have a model where the US state is encoded as two letter codes. In the html rendering, you want to present the user the full state name. I have a correspondence two-letters -> full name in another db table. Let's assume I don't want to perform joins.
I have two choices
have the view code extract the two-letter information from the model, then perform a query against the second table, obtain the full name, and put it in the context. The template renders the full state name.
create a custom filter which accepts two-letter codes, hits the db, and returns the full-length name. Have the view pass the two-letter information into the context, and put in the template the piping into the filter. The filter renders the two-letter code as a full string.
Now, these solutions seem equivalent, but they could not be, also from the design point of view. I'm kind of skeptical where to draw the line between the filter responsibility and the view responsibility. Solution 1 is doing the task of the filter in solution 2, it's just integrated within the view itself. Of course, if I have to call the filter multiple times within the same page, solution 1 is probably faster (unless the filter output is memoized).
What's your opinion in terms of design, proper coding and performance?
It seems to me your model should have a method to do the conversion. It seems like extra work to make a filter, and I don't think most Django developers would expect that sort of thing in a filter.
A filter is meant to be more generic - formatting and displaying data rather than lookups.
My oppinion is that the first solution is much cleaner from a design point of view. I'd like to see the template layer as just the final stage of presentation, where all the information is passed (in its final form) by the view.
It's better to have all the "computation logic" in the view. That, way:
It's much easier to read and comprehend (especially for a third party).
I you need to change something, you can focus on a particular view method and be sure that everything you need to change is in there (no need to switch back and forth from view to template).
As for performance, I think your point is right. If you want to do the same lookup multiple times, the 2nd solution is worse.
Edit:
Referring to ashchristopher's comment, I was actually trying to say that it definitely does not belong to the template. It's never really clear what business logic is and where the line between "data provision" and "business logic" lies. In this case it seems that maybe ashchristopher is right. Transformation of state codes to full state names is probably a more database related encoding matter, than a business logic one.

Building a topic hierarchy for indexing content

Im looking to build a topic map to catagorize content.
For example the Topic 'Art' may have sub categories of 'Art History', 'Painting', 'Sculpture' etc etc.
I've crawled a few online resources, but I've hit a problem related to how I wish to use the hierarchy.
I've got a lot of content that I wish to index by topic. So to give the above example, if a user searches for 'Art' then they will not only get anything that mentions 'Art', but also anything that mentions 'Painting', even if it doesnt mention 'Art'. Fair enough.
But if, in another part of my heirarchy, I have 'House Maintenance', for example, then that might also have a subtopic of 'Painting'.
But then if a user searches for 'Art', my engine will say 'well, Painting is a sub category of 'Art', so I'll include this peice of content thats all about the best colour to paint your bathroom walls....
Has anyone come across this problem before? I've tried googling, but without knowing the exact terminology its hard to make headway....
EDIT: More succinctly, 'Painting' is a subtopic of 'Art', but if something is about 'Painting' then it doesnt neecssarily follow that its about 'Art', since 'Art' is not the only parent of 'Painting'.
In "topic maps", as it is understood in the related standard you can set different "scopes" to a topic. So "painting" may be part of two scopes, with different meanings.
A topic map:
http://www.ontopia.net/page.jsp?id=vizigator
Scope:
http://www.ontopia.net/topicmaps/materials/tao.html#stp-scope
If the Topic Map you are creating is built on Topic Maps technology, then subjectIdentifiers can be used to distinguish between two Topics with the same name (both named "Painting") that actually represent two different Subjects (Painting as an Art form, and Painting in the sense of home renovation).
If someone queries about Art and you drill down to Painting, then you can return only those entries related to 'Painting as an Art form' because those Painting entries are no longer thrown together on one heap.
Turning up late to this party (you've probably already built it or moved on or found an answer) but thought I'd throw in my 2 cents having worked on a high end Topic Map based CMS.
What you are missing out in your description is how topics are linked together. Topic are linked together via Associations that in themselves have Type's and Roles. So yes painting would be a child of art and of house maintenance but they would be linked differently.
Defining your type and role is up to you really, there is no hard and fast rules its really just down to your own leanings. So
Topic: Art
Association: Source=Art, Reference=Painitng, Type=Culture, Role=Practice
Topic: House Maintenance
Association: Soruce=House Maintenance, Reference=Painting, Type=DIY, Role=Activity
I suck at categorisation but hopefully you can see what I'm getting at. You'd filter your searches based on the type and role. So if someone searched for art you'd return painting and if you wanted to dig deeper and return co-related topics you are talking about returning Culture associated topics and not DIY associated topics.
Topic Maps if done right are extremely flexible, you've also got scope and language baked in too if you do it right. You should be able to link the same topics together in a 100 different ways and see the data differently depending on your starting point.
Information Architecture for the World Wide Web would give you a good start on organizing information... it's a good read, but might not be so technically detailed.
Since you want to process House/Painting and Art/Painting differently, then it seems like you'll need two distinct entries for Painting (one for each meaning). Which one you associate a given 'lump of text' with could be based on context clues from the text itself, if your text processor is powerful enough.
For example, whenever you have a conflict like this, look in the text - do you see other words there? Like 'sink', 'wall', 'hard wood', or 'windows'? Or do you see other terms like 'Monet', 'impressionism', 'canvas', and 'gallery'? That'll allow you to automate the decision, and should be fairly accurate. The only snag is that this presumes you have a fairly healthy dictionary of 'related terms' lying around somewhere.
On the user-end, when Painting is selected, you'd simply have to either merge all the results together, or present the user an option to select which parent topic they want to be viewing results from.
I don't know of a specific name for that, but I don't think it should really be a problem, either. All it calls for is that Art/Painting and House Maintenance/Painting are understood as separate entities. Someone searching for "art" gets subcategories of Art, so gets Art/Painting. Someone searching for "house maintenance" gets subcategories of House Maintenance, so gets House Maintenance/Painting. Someone searching for "painting" gets Art/Painting and House Maintenance/Painting, which is appropriate.