Finding nested duplicates in OCL - ocl

I am facing a challenge when trying to check for duplicates in OCL.
Here is a simplification of the class diagram:
+-----------+
|ChapterName|
+-----------+
^ 0..* chapterNames
|
^
V
+-------+ books 0..* +----+
|Catalog|<>------------>|Book|
+-------+ +----+
catalogs 0..* ^ ^ 0..* books
| |
+----+ customers 0..* +--------+
|Shop|<>-------------->|Customer|
+----+ +--------+
The attributes for each class are declared as follows:
ChapterName
Name
Catalog
Category
Problem:
What I want to check is if a customer has any books with duplicate chapter names, that also belong to a specific category in catalog.
I haven't managed to wrap my head around the logic. What I have so far is:
context Shop
self.customers.books->select(cubks | cubks =
self.catalogs.books->select(cabks | cabks = cubks)->first())
...Which should find the books from the catalog which a customer has.
Question: How can I add further constraints to solve the problem above?
Also. I am using Eclipse, EMF, and the OCL console from within Eclipse.

context Shop::checkForDuplicates(catalog:Catalog)
post: result =
self.customer.books->flatten()->select(book|
catalog.books->contains(book)
)->forEach(book|
chapterNames->asSet()->size()=chapterNames->size()
)
customer is a Set; books is either a Bag or a Set (depending on whether duplicate books are allowed, I'll assume it is a Bag, though it doesn't matter). Then customer.books is a bag of bags of books (one bag per each customer) and customer.books->flatten() is a bag of all the books owned by customers.
catalog.books is either a Bag or a Set (doesn't matter). The select operation returns only those books that are contained in the given catalog (and belongs to some Customer, since we are selecting from the bag constructed before).
book.chapterNames is a Sequence (I assume that the association is ordered) with the name of chapters in that book. forAll returns true iff for every element in the collection (i.e. for every book in the given catalog, which is owned by a customer), the body evaluates as true.
The trick now is relying on the operation Sequence::asSet(), which returns all the elements from the sequence with duplicates removed. Then the size of the bag is equal to the size of the set iff no element was removed (i.e. if every element was unique).

Related

Order of Django Queryset results

I'm having trouble understanding why a Queryset is being returned in the order it is. We have authors listed for articles and those get stored in a ManyToMany table called: articlepage_authors
We need to be able to pick, on an article by article basis, what order they are returned and displayed in.
For example, article with id 44918 has authors 13752 (‘Lee Bodding’) and 13751 (‘Mark Lee’).
I called these in the shell which returns :
Out[6]: <QuerySet [<User: Mark Lee (MarkLee#uss778.net)>, <User: Lee Bodding (LeeBodding#uss778.net)>]>
Calling this in postgres: SELECT * FROM articlepage_authors;
shows that user Lee Bodding id=13752 is stored first in the table.
id | articlepage_id | user_id
-----+----------------+---------
1 | 44508 | 7781
2 | 44508 | 7775
3 | 44514 | 17240
….
465 | 44916 | 17171
468 | 44918 | 13752
469 | 44918 | 13751
No matter what I try e.g. deleting the authors, adding ‘Lee Bodding’, saving the article, then adding ‘Mark Lee’, and vice versa – I can still only get a query set which returns ‘Mark Lee’ first.
I am not sure how else to debug this.
One solution would be to add another field which defines the order of authors, but I’d like to understand what’s going on here first. Something seems to be defining the order already, and it’d be better to manage that.
You can add an order_by to your queryset to make records appear in the order that you would like. Warning: for query optimization you may need to create an index on that field for performance reasons depending on the database:
By default, results returned by a QuerySet are ordered by the ordering tuple given by the ordering option in the model’s Meta. You can override this on a per-QuerySet basis by using the order_by method.
Example:
Entry.objects.filter(pub_date__year=2005).order_by('-pub_date', 'headline')
The result above will be ordered by pub_date descending, then by headline ascending. The negative sign in front of "-pub_date" indicates descending order. Ascending order is implied.
You pair that with an extra to order by the many-to-many ID:
.extra(select={
'creation_seq': 'articlepage_authors.id'
}).order_by("creation_seq")
If you're using django > 1.10, you can just use the field directly without the extra:
.order_by('articlepage_authors.id')

Cassandra, schema and process design for concurrent writes

This is a long-winded question. It is about Cassandra schema design. I'm here to get inputs from your respected experts on a use-case I'm working on. All inputs, suggestions, and critics are welcome. Here goes my question.
We would like to collect REVIEWS from our USERS about some PAPERS we are about to publish. For each paper we seek for 3 reviews. But We send out review invites to 3*2= 6 users. All 6 users can submit their reviews to our system, but only the first 3 count; and these first 3 reviewers will get reward their work.
In our Cassandra DB, there are three tables: USER, PAPER and REVIEW. The USER and PAPER tables are simple: each user corresponds to a row in the USER table with an unique USER_ID; similarly, each paper has a unique PAPER_ID in the PAPER table.
The REVIEW table looks like this
CREATE TABLE REVIEW(
PAPER_ID uuid,
USER_ID uuid,
REVIEW_CONTENT text,
PRIMARY KEY(PAPER_ID, USER_ID)
);
We use PAPER_ID as the partition key of the REVIEW table so that all reviews of a given paper is stored in a single Cassandra row. For each paper we have, we pick up 6 users, insert 6 entries into the REVIEW table and send out 6 invites to those users. So, for paper "P1", there are 6 entries in the REVIEW table that look like this
----------------------------------------------------
PAPER_ID | USER_ID | REVIEW_CONTENT |
----------------------------------------------------
P1 | U1 | null |
----------------------------------------------------
P1 | U2 | null |
----------------------------------------------------
P1 | U3 | null |
----------------------------------------------------
P1 | U4 | null |
----------------------------------------------------
P1 | U5 | null |
----------------------------------------------------
P1 | U6 | This paper ... |
---------------------------------------------------
... | ... | ... |
Users submit review via a web browser using http. At the backend, we use the following process to handle submitted reviews (use paper "P1" as an example):
Use partition key "P1" to get all 6 entries out from the REVIEW table.
Find out how many of these 6 entries have non-null values at the REVIEW_CONTENT column (non-null values indicate that the corresponding user has already submitted his review. For example, in the above table, user "U6" has submitted his review, while other 5 have not yet).
If this number >=3, we already had enough reviews, return to the current reviewer with a message like "Thanks, we already had enough reviews."
If this number < 2, save the current review to the corresponding entry in the REVIEW table, return to the reviewer with a message like "Your review has been accepted." (E.g. If the current reviewer is "U1", then fill the REVIEW_CONTENT column of "P1, U1" entry with the current review content.)
If this number =2, this is the most complicated the case as the current submission is the last one we'll accept. In this case, we first save the current review to the REVIEW table, then we find the ids of all three users that have submitted reviews (including the current user), record their ids into a transaction table to pay them rewards later.
But this process does not work. The problem is that it does not handle concurrent submissions correctly. Consider the following case: two users have already submitted their reviews, and meanwhile 3 other users are submitting their reviews via three concurrent process shown above. At step 5, each of the three will think he is the 3rd and last submitter and insert new records into the transaction table. This leads to a double counting: a single user may be rewarded more than once for the same review he submitted.
Another problem of this process is that it may never reach to step 5. Let's say there is no submission in the REVIEW table, and 4 users submit their reviews at the same time. All of them saved their reviews at step 4. After this, later submitter will always be rejected as there are 4 accepted reviews already. But since we never reach step 5, no ids will be recorded into the transaction table and users will never get any rewards.
So here comes my question: How should I handle my use case using Cassandra as the back-end DB? Will Cassandra COUNTER help? If so, how? I have not thought through how to use COUNTER yet, but this blog (http://aphyr.com/posts/294-call-me-maybe-cassandra) warned that Cassandra COUNTER is not safe (quote "Consequently, Cassandra counters will over- or under-count by a wide range during a network partition.") Will Cassandra's Compare and Set (CAS) feature help? If so, how? Again the save blog warned that "Cassandra lightweight transactions are not even close to correct."
Rather than creating empty entries in your review table, I would consider leaving it empty and only filling it as the reviews are submitted. To handle concurrency, add a timeuuid field as a sorting key:
CREATE TABLE review(
paper_id uuid,
submission_time timeuuid,
user_id uuid,
content text,
PRIMARY KEY (paper_id, submission_time)
);
When a user makes their submission, add the entry to the table. Then AFTER the write is successful, query the table (on only the paper_id) and find out if the user's id is one of the first three. Respond to the user accordingly. Since you're committed to a small set of reviewers, the extra overhead of fetching all the reviews should be minimal (especially since you wouldn't need to include the content column in the query).
If you need to track who's reviewing the papers, add a set of user ids to the paper table and write the six user ids there.

Django: duplicates when filtering on many to many field

I've got the following models in my Django app:
class Book(models.Model):
name = models.CharField(max_length=100)
keywords = models.ManyToManyField('Keyword')
class Keyword(models.Model)
name = models.CharField(max_length=100)
I've got the following keywords saved:
science-fiction
fiction
history
science
astronomy
On my site a user can filter books by keyword, by visiting /keyword-slug/. The keyword_slug variable is passed to a function in my views, which filters Books by keyword as follows:
def get_books_by_keyword(keyword_slug):
books = Book.objects.all()
keywords = keyword_slug.split('-')
for k in keywords:
books = books.filter(keywords__name__icontains=k)
This works for the most part, however whenever I filter with a keyword that contains a string that appears more than once in the keywords table (e.g. science-fiction and fiction), then I get the same book appear more than once in the resulting QuerySet.
I know I can add distinct to only return unique books, but I'm wondering why I'm getting duplicates to begin with, and really want to understand why this works the way it does. Since I'm only calling filter() on successfully filtered QuerySets, how does the duplicate book get added to the results?
The 2 models in your example are represented with 3 tables: book, keyword and book_keyword relation table to manage M2M field.
When you use keywords__name in filter call Django is using SQL JOIN to merge all 3 tables. This allows you to filter objects in 1st table by values from another table.
The SQL will be like this:
SELECT `book`.`id`,
`book`.`name`
FROM `book`
INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`)
INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`)
WHERE (`keyword`.`name` LIKE %fiction%)
After JOIN your data looks like
| Book Table | Relation table | Keyword table |
|---------------------|------------------------------------|------------------------------|
| Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name |
|---------|-----------|------------------|-----------------|------------|-----------------|
| 1 | Book 1 | 1 | 1 | 1 | Science-fiction |
| 1 | Book 1 | 1 | 2 | 2 | Fiction |
| 2 | Book 2 | 2 | 2 | 2 | Fiction |
Then when data is loaded from DB into Python you only receive data from book table. As you can see the Book 1 is duplicated there
This is how Many-to-many relation and JOIN works
Direct quote from the Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Successive filter() calls further restrict the
set of objects, but for multi-valued relations, they apply to any
object linked to the primary model, not necessarily those objects that
were selected by an earlier filter() call.
In your case, because keywords is a multi-valued relation, your chain of .filter() calls filters based only on the original model and not on the previous queryset.

Need Modeling Help For An Ordering Form

I'd like to create a Django project for my company's purchasing department. This would be my first project in Django, so sorry if this comes off as rudimentary. The workflow would look something like this:
user registers for an account > signs in > can create, edit, view, or delete a purchase order.
I'm getting tripped up on the modeling. Presumably I can create and authenticate users using django.contrib.auth. Also, since this is mainly a form saving/printing application I would use a ModelForm to generate my forms based on my models since the users will be making changes to the form data that will need to be saved. A simplified version of the purchase order form in question looks something like this:
| Vendor | Date | Lead Time | Arrival Date | Buyer_Name |
+--------+-------+-----------+--------------+------------+
| FooBar |1-1-12 | 30 | 2-1-12 | Mr. Bar |
+--------+-------+-----------+--------------+------------+
+--------+-------+-----------+--------------+------------+
| SKU | Description | Quantity | Price | Dimensions |
+--------+-------------+----------+-------+--------------+
|12345 | Soft Bar | 38 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
|12346 | Hard Bar | 12 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
|12347 | Medium Bar | 17 | 5.75 | 16 X 5 X 8 |
+--------+-------------+----------+-------+--------------+
As you can see, the main purchase order form has a header that identifies the Vendor being ordered from, the current date, lead time, arrival date, and the buyer's name who is filling the form out. Under that is a line-by-line order detail for three different SKUs. Ideally, each PurchaseOrder should be able to have many SKUs added to it.
What is the best way to model something like this? Do I create a User, PurchaseOrder, and SKU model? Then add a FK to the SKU Model that points to the PurchaseOrder Model's PK or is there some other, more correct, way to do something like this? Thanks in advance for any help.
[Edit]
Django had what I was looking for all along. Since this is essentially a nested form, I could make use of Formsets.
Here are two helpful links to get started:
https://docs.djangoproject.com/en/1.4/topics/forms/formsets/
https://docs.djangoproject.com/en/1.4/topics/forms/modelforms/#model-formsets
Use django's built in user model (you can look at the source to see the definition but it is similar to the code below for these other models). Other than that I would suggest a model for every object you mentioned.
Don't add a FK to the SKU Model since SKU can exist without being in a purchase order (if I understand the problem correctly).
models.py
from django.contrib.auth.models import User
class Vendor(models.Model):
name = models.CharField(max_length=200)
#other fields
class SKU(models.Model):
description = models.CharField(max_length=200)
#other fields
class PurchaseOrder(models.Model):
purchaser = models.ForiegnKey(User)
name = models.CharField(max_length=200)
skus = models.ManyToManyField(SKU) #this is the magic that allows 1 purchase order to be filled with several SKUs
#other fields

Django: union of different queryset on the same model

I'm programming a search on a model and I have a problem.
My model is almost like:
class Serials(models.Model):
id = models.AutoField(primary_key=True)
code = models.CharField("Code", max_length=50)
name = models.CharField("Name", max_length=2000)
and I have in the database tuples like these:
1 BOSTON The new Boston
2 NYT New York journal
3 NEWTON The old journal of Mass
4 ANEWVIEW The view of the young people
If I search for the string new, what I want to have is:
first the names that start with the string
then the codes that start with the string
then the names that contain the string
then the codes that contain the string
So the previous list should appear in the following way:
2 NYT New York journal
3 NEWTON The old journal of Mass
1 BOSTON The new Boston
4 ANEWVIEW The view of the young people
The only way I found to have this kind of result is to make different searches (if I put "OR" in a single search, I loose the order I want).
My problem is that the code of the template that shows the result is really redundant and honestly very ugly, because I have to repeat the same code for all the 4 different querysets. And the worse thing is that I cannot use the pagination!
Now, since the structure of the different querysets is the same, I'm wandering if there is a way to join the 4 querysets and give the template only one queryset.
You can make those four queries and then chain them inside your program:
result = itertools.chain(qs1, qs2, qs3, qs4)
but this doesn't seem to nice because your have to make for queries.
You can also write your own sql using raw sql, for example:
Serials.objects.raw(sql_string)
Also look at this:
How to combine 2 or more querysets in a Django view?
You should also be able to do qs1 | qs2 | qs3 | qs4. This will give you duplicates, however.
What you might want to look into is Q() objects:
from django.db.models import Q
value = "new"
Serials.objects.filter(Q(name__startswith=value) |
Q(code__startswith=value) |
Q(name__contains=value) |
Q(code__contains=value).distinct()
I'm not sure if it will handle the ordering if you do it this way, as this would rely on the db doing that.
Indeed, even using qs1 | qs2 may cause the order to be determined by the db. That might be the drawback (and reason why you might need at least two queries).