How to make a Group by in OCL?

How to make a Group by in OCL? - ocl

I have an association between Person and Department, and I want to check if Person.function is unique
Person.allInstances->isUnique(function)
but I want to check if function is unique for Persons in the same department and not between all persons. I can have the same function but not in the same department.
I don't know how to use isUnique for each department (Persons group by department).

If you have a scope for which isUnique is to apply then gather that scope as the source. Perhaps as department.persons->isUnique(function). (This is exactly the approach that you should apply anyway to avoid using the generally over-powerful and inefficient allInstances() wherever possible.)

Related

DynamoDB one-to-one

Hello stackoverflow community,
This question is about modeling one-to-one relationships with multiple entities involved.
Say we have an application about students. Each Student has:
Profile (name, birth date...)
Grades (math score, geography...)
Address (city, street...).
Requirements:
The Profile, Grades and the Address only belong to one Student each time (i.e. one-to-one).
A Student has to have all Profile, Grades and Address data present (there is no student without grades for example).
Updates can happen to all fields, but the profile data mostly remain untouched.
We access the data based on a Student and not by querying for the address or something else (a query could be "give me the grades of student John", or "give me profile and address of student John", etc).
All fields put together are bellow the 400kb threshold of DynamoDB.
The question is how would you design it? Put all data as a single row/item or split it to Profile, Grades and Address items?

My solution is to go with keeping all data in one row defined by the studentId as the PK and the rest of the data follow in a big set of columns. So one item looks like [studentId, name, birthDate, mathsGrade, geographyGrade, ..., city, street].
I find that like this I can have transnational inserts/updates (with the downside that I always have to work with the full item of course) and while querying I can ask for the subset of data needed each time.
On top of the above, this solution fits with two of the most important AWS guidelines about dynamo:
keep everything in a single table and
pre-join data whenever possible.
The reason for my question is that I could only find one topic in stackoverflow about one-to-one modeling in DynamoDB and the suggested solution (also heavily up-voted) was in favor of keeping the data in separate tables, something that reminds me a relational-DB kind of design (see the solution here).
I understand that in that context the author tried to keep a more generic use case and probably support more complex queries, but it feels like the option of putting everything together was fully devalued.
For that reason I'd like to open that discussion here and listen to other opinions.

A Basic Implementation
Considering the data and access patterns you've described, I would set up a single student-data table with a partition key that allows me to query by the student, and a sort key that allows me to narrow down my results even further based on the entity I want to access. One way of doing that would be to use some kind of identifier for a student, say studentID, and then something more generalized for the sort key like entityID, or simply SK.
At the application layer, I would classify each Item under one possible entity (profile, grades, address) and store data relevant to that entity in any number of attributes that I would need on that Item.
An example of how that data might look for a student named john smith:
{ studentId: "john", entityId: "profile", firstName: "john", lastName: "smith" }
{ studentId: "john", entityId: "grades", math2045: 96.52, eng1021:89.93 }
{ studentId: "john", entityId: "address", state: "CA", city: "fresno" }
With this schema, all your access patterns are available:
"give me the math grades of student john"
PartitionKey = "john", SortKey = "grades"
and if you store address within the students profile entity, you can accomplish "give me profile and address of student John" in one shot (multiple queries should be avoided when possible)
PartitionKey = "john", SortKey = "profile"
Consider
Keep in mind, you need to take into account how frequently you are reading/writing data when designing your table. This is a very rudimentary design, and may need tweaking to ensure that you're not setting yourself up for major cost or performance issues down the road.
The basic idea that this implementation demonstrates is that denormalizing your data (in this case, across the different entities you've established) can be a very powerful way to leverage DynamoDB's speed, and also leave yourself with plenty of ways to access your data efficiently.
Problems & Limitations
Specific to your application, there is one potential problem that stands out, which is that it seems very feasible the grades Items start to balloon to the point where they are impossible to manage and become expensive to read/write/update. As you start storing more and more students, and each student takes more and more courses, your grades entities will expand with them. Say the average student takes anywhere from 35-40 classes and gets a grade for each of them, you don't want to have to manage 35-40 attributes on an item if you don't have to. You also may not want back every single grade every time you ask for a student's grades. Maybe you start storing more data on each grade entity like:
{ math1024Grade: 100, math1024Instructor: "Dr. Jane Doe", math1024Credits: 4 }
Now for each class, you're storing at least 2 extra attributes. That Item with 35-40 attributes just jumped up to 105-120 attributes.
On top of performance and cost issues, your access patterns could start to evolve and become more demanding. You may only want grades from the student's major, or a certain type of class like humanities, sciences, etc, which is currently unavailable. You will only ever be able to get every single grade from each student. You can apply a FilterExpression to your request and remove some of the unwanted Items, but you're still paying for all the data you've read.
With the current solution, we are leaving a lot on the table in terms of optimizations in performance, flexibility, maintainability, and cost.
Optimizations
One way to address the lack of flexibility in your queries, and possible bloating of grades entities, is the concept of a composite sort key.
Using a composite sort key can help you break down your entities even further, making them more manageable to update and providing you more flexibility when you're querying. Additionally, you would wind up with much smaller and more manageable items, and although the number of items you store would increase, you'll save on cost and performance. With more optimized queries, you'll get only the data you need back so you're not paying those extra read units for data you're throwing away. The amount of data a single Query request can return is limited as well, so you may cut down on the amount of roundtrips you are making.
That composite sort key could look something like this, for grades:
{ studentId: "john", entityId: "grades#MATH", math2045: 96.52, math3082:91.34 }
{ studentId: "john", entityId: "grades#ENG", eng1021:89.93, eng2203:93.03 }
Now, you get the ability to say "give me all of John's MATH course grades" while still being able to get all the grades (by using the begins_with operation on the sort key when querying).
If you think you'll want to start storing more course information under grades entities, you can suffix your composite sort key with the course name, number, identifier, etc. Now you can get all of a students grades, all of a students grades within a subject, and all that data about a students grade within a subject, like its instructor, credits, year taken, semester, start date, etc.
These optimizations are all possible solutions, but may not fit your application, so again keep that in mind.
Resources
Here are some resources that should help you come up with your own solution, or ways to tweak the ones I've provided above to better suit you.
AWS re:Invent 2019: Data modeling with Amazon DynamoDB (CMY304)
AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401)
Best Practices for Using Sort Keys to Organize Data
NoSQL Design For DynamoDB
And keep this one in mind especially when you are considering cost/performance implications for high-traffic application:
Best Practices for Designing and Using Partition Keys Effectively

User Friendly Unique Identifier For DynamoDB

In my DynamoDB table named users, I need a unique identifier, which is easy for users to remember.
In a RDBMS I can use auto increment id to meet the requirement.
As there is no way to have auto increment id in DynamoDB, is there a way to meet this requirement?
If I keep last used id in another table (lastIdTable) retrieve it before adding new document, increment that number and save updated numbers in both tables (lastIdTable and users), that will be very inefficient.
UPDATE
Please note that there's no way of using an existing attribute or getting users input for this purpose.

Since it seems you must create a memorable userId without any information about the user, I’d recommend that you create a random phrase of 2-4 simple words from a standard dictionary.
For example, you might generate the phrase correct horse battery staple. (I know this is a userId and not a password, but the memorability consideration still applies.)
Whether you use a random number (which has similar memorability to a sequential number) or a random phrase (which I think is much more memorable), you will need to do a conditional write with the condition that the ID does not already exist, and if it does exist, you should generate a new ID and try again.

email address seems the best choice...
Either as a partition key, or use a GUID as the partition key and have a Global Secondary Index over email address.
Or as Matthew suggested in a comment, let the users pick a user name.

Docker container naming strategy might give you some idea. https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go
It will result in unique (limited) yet human friendly
Examples
awesome_einstein
nasty_weinstein
perv_epstein
A similar one: https://github.com/jjmontesl/codenamize

Google Org Charts Display Functionality

I'm using Google Org charts and I need to have a child element have 3 parents above it - is this possible? For example a situation where an employee has three bosses.

No, the parent column only accepts one id. Which is lucky for the employee with 3 bosses because maybe it will help them sort out who the employee actually reports to.
As a conceptual work-around you could establish the three bosses as a single entity like "The Triumverate", "The Tribunal", or whatever and then put the employee under that entity. Or have a node with 3 comma-seperated names like "Mike, John, Susan", and then use "Mike, John, Susan" as the parent node for the poor confused employee.
This is one case where, while I know these things happen, when you're formalizing this you should really be asking, "Why does this employee have 3 bosses?" It's really very confusing for both the employees and the bosses 99% of the time. It is often best to pick one boss for them to report to and then have all the bosses communicate sideways to each other. The only exception that I can think of is a shared receptionist somewhere like a Doctor's Office with multiple doctors. And even then it might help for the receptionist to have one formal boss who has the right to discipline, fire, give them a raise, and two other superiors that just use their services. Helps a lot if the employee encounters conflicting orders.
But of course, that's not what you asked for. But it is probably why they didn't make nodes support multiple parent nodes.

How to get reverse relationship in Django's ORM

I usually tend to avoid giving such explicit examples, but in this case it's necessary.
I have 5 entities:
Student
Group
StudentGroup
CourseGroup
Course (not relevant -- for completeness purpose only)
StudentGroup represents students who are part of a group. A CourseGroup is a course the whole group is taking part in.
I want to get all students that are part of a Group and are taking part in a specific Course. So far, I've only managed to get all students in a group:
students = Student.objects.filter(studentgroup=1)
Not sure why I can say studentgroup=1 but it's fortunate. However there's no studentgroupcourse=1 :) Any help?
Edit: My models are: http://pastebin.com/07z1iEcw

ASSUMING CourseGroup has a foreignkey to StudentGroup and a foreignKey to a Course
Student.objects.filter(studentgroup=1, studentgroup__coursegroup__course=your_course)

Designing sets of data and support class extension OO approach, in c++

i'm currently working on a project and something came up on the design.
I have a class named Key which is composed of several Fields. This Field class it's a mother class and their sons like Age, Name, etc implement Field. Inside the Key class there's an attribute which is an array of Fields, to hold different kinds of Fields.
class Key {
private:
Field * fieldList;
}
I'm working on a team and a design choice came up that i couldn't defend cause i didn't knew how to answer to the following problem... or maybe the lack of it? I trust that you'll be able to open my mind on this.
The purpose of this Key class is to hold several fields. The existence of this class is because i'm going to handle data of this kind.
(Name, Age....)
This is how i thought it would look already implemented:
Key myKey = Key();
Age newAge = Age(50);
myKey.add(newAge);
This is what the prototype of the add method of the Key class would look like:
void Key::add(Field);
As you may have assumed, since the Key class has an array of Field's this method receives a Field and since Age is also Field, cause of inheritance, then this works like a charm. Same can be said of the Name class and other classes that could come up in the future.
This is the same idea as in a database where you have rows with data and the columns belong to the attributes, so a same column has the same type of attribute.
We would also like to compare 2 Key's only by one of the Fields, for example:
Let's say i have 2 Key's with this data:
(John, 50) <- myKey1
(Paul, 60) <- myKey2
My method to do this would look like this:
myKey1.compareTo(myKey2, 2)
This would answer if the 2nd attribute of the first myKey1 is bigger, equal or less than the one on the second myKey2.
There's a problem with this. When i used the add method, i randomly added Field's of different types, say Age first, then Name second, etc to the Key object. My Key object added them to it's internal array by order of appearance.
So when i use the compareTo method, nothing is assuring me that inside both objects, the 2nd elements of their arrays will have a Field say, the Name Field, and therefore if that were not to be true, it could be comparing a Name with Age, cause inside it only holds an array of Field's, that are equal type as long the Key class knows.
This was my approach to my solution, but what i couldn't answer is, how to fix this problem.
Another member of my team proposed, that we implement a method for the key class for each of the existing fields, that is:
myKey.addAge(newAge);
myKey.addName(newName);
Inside it would still have the Field array but this time, the class can assure you that Age will go in the 1st place of the array, and that Name would go in the 2nd position of the array, cause each method would make sure of it.
The obvious problem with this, is that i would have to add a method for each type of Field that exists. That means that if in the future i wish to add say "born date" and so creating the new Date class, i'll have to add a method addDate, and so on and so on...
Another reason my team member gave me is that, "we can't trust an exterior user that he will add the Fields the way they're supposed to be ordered" when pointing why my approach was bad.
So to conclude:
On the first approach, the Key class depends on the programmer that added Fields, to make sure they have the order they should, but as a benefit no need to add a method for each type of field.
On the second approach, the Key class makes sure the order is the right one, by implementing a method for each type Field that exists, but then, by each type of new Field created, the class would grow bigger and bigger.
Any ideas with this? is there a workaround for this?
Thanks in advance, and i apologize if i wasn't clear with it, i'll add new details if needed.

Expanding on #tp1's excellent idea of an ID field in the Field class and an enum, you can actually make it very flexible. If you are comfortable limiting the number of field types to 32, you could even take a set of flags as the ID in CompareTo. Then you could compare multiple fields at the same time. Does that approach make sense?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to make a Group by in OCL? - ocl

If you have a scope for which isUnique is to apply then gather that scope as the source. Perhaps as department.persons->isUnique(function). (This is exactly the approach that you should apply anyway to avoid using the generally over-powerful and inefficient allInstances() wherever possible.)

Related

DynamoDB one-to-one

User Friendly Unique Identifier For DynamoDB

Google Org Charts Display Functionality

How to get reverse relationship in Django's ORM

Designing sets of data and support class extension OO approach, in c++

Categories

Resources