I'm using Google Org charts and I need to have a child element have 3 parents above it - is this possible? For example a situation where an employee has three bosses.
No, the parent column only accepts one id. Which is lucky for the employee with 3 bosses because maybe it will help them sort out who the employee actually reports to.
As a conceptual work-around you could establish the three bosses as a single entity like "The Triumverate", "The Tribunal", or whatever and then put the employee under that entity. Or have a node with 3 comma-seperated names like "Mike, John, Susan", and then use "Mike, John, Susan" as the parent node for the poor confused employee.
This is one case where, while I know these things happen, when you're formalizing this you should really be asking, "Why does this employee have 3 bosses?" It's really very confusing for both the employees and the bosses 99% of the time. It is often best to pick one boss for them to report to and then have all the bosses communicate sideways to each other. The only exception that I can think of is a shared receptionist somewhere like a Doctor's Office with multiple doctors. And even then it might help for the receptionist to have one formal boss who has the right to discipline, fire, give them a raise, and two other superiors that just use their services. Helps a lot if the employee encounters conflicting orders.
But of course, that's not what you asked for. But it is probably why they didn't make nodes support multiple parent nodes.
Related
Hello stackoverflow community,
This question is about modeling one-to-one relationships with multiple entities involved.
Say we have an application about students. Each Student has:
Profile (name, birth date...)
Grades (math score, geography...)
Address (city, street...).
Requirements:
The Profile, Grades and the Address only belong to one Student each time (i.e. one-to-one).
A Student has to have all Profile, Grades and Address data present (there is no student without grades for example).
Updates can happen to all fields, but the profile data mostly remain untouched.
We access the data based on a Student and not by querying for the address or something else (a query could be "give me the grades of student John", or "give me profile and address of student John", etc).
All fields put together are bellow the 400kb threshold of DynamoDB.
The question is how would you design it? Put all data as a single row/item or split it to Profile, Grades and Address items?
My solution is to go with keeping all data in one row defined by the studentId as the PK and the rest of the data follow in a big set of columns. So one item looks like [studentId, name, birthDate, mathsGrade, geographyGrade, ..., city, street].
I find that like this I can have transnational inserts/updates (with the downside that I always have to work with the full item of course) and while querying I can ask for the subset of data needed each time.
On top of the above, this solution fits with two of the most important AWS guidelines about dynamo:
keep everything in a single table and
pre-join data whenever possible.
The reason for my question is that I could only find one topic in stackoverflow about one-to-one modeling in DynamoDB and the suggested solution (also heavily up-voted) was in favor of keeping the data in separate tables, something that reminds me a relational-DB kind of design (see the solution here).
I understand that in that context the author tried to keep a more generic use case and probably support more complex queries, but it feels like the option of putting everything together was fully devalued.
For that reason I'd like to open that discussion here and listen to other opinions.
A Basic Implementation
Considering the data and access patterns you've described, I would set up a single student-data table with a partition key that allows me to query by the student, and a sort key that allows me to narrow down my results even further based on the entity I want to access. One way of doing that would be to use some kind of identifier for a student, say studentID, and then something more generalized for the sort key like entityID, or simply SK.
At the application layer, I would classify each Item under one possible entity (profile, grades, address) and store data relevant to that entity in any number of attributes that I would need on that Item.
An example of how that data might look for a student named john smith:
{ studentId: "john", entityId: "profile", firstName: "john", lastName: "smith" }
{ studentId: "john", entityId: "grades", math2045: 96.52, eng1021:89.93 }
{ studentId: "john", entityId: "address", state: "CA", city: "fresno" }
With this schema, all your access patterns are available:
"give me the math grades of student john"
PartitionKey = "john", SortKey = "grades"
and if you store address within the students profile entity, you can accomplish "give me profile and address of student John" in one shot (multiple queries should be avoided when possible)
PartitionKey = "john", SortKey = "profile"
Consider
Keep in mind, you need to take into account how frequently you are reading/writing data when designing your table. This is a very rudimentary design, and may need tweaking to ensure that you're not setting yourself up for major cost or performance issues down the road.
The basic idea that this implementation demonstrates is that denormalizing your data (in this case, across the different entities you've established) can be a very powerful way to leverage DynamoDB's speed, and also leave yourself with plenty of ways to access your data efficiently.
Problems & Limitations
Specific to your application, there is one potential problem that stands out, which is that it seems very feasible the grades Items start to balloon to the point where they are impossible to manage and become expensive to read/write/update. As you start storing more and more students, and each student takes more and more courses, your grades entities will expand with them. Say the average student takes anywhere from 35-40 classes and gets a grade for each of them, you don't want to have to manage 35-40 attributes on an item if you don't have to. You also may not want back every single grade every time you ask for a student's grades. Maybe you start storing more data on each grade entity like:
{ math1024Grade: 100, math1024Instructor: "Dr. Jane Doe", math1024Credits: 4 }
Now for each class, you're storing at least 2 extra attributes. That Item with 35-40 attributes just jumped up to 105-120 attributes.
On top of performance and cost issues, your access patterns could start to evolve and become more demanding. You may only want grades from the student's major, or a certain type of class like humanities, sciences, etc, which is currently unavailable. You will only ever be able to get every single grade from each student. You can apply a FilterExpression to your request and remove some of the unwanted Items, but you're still paying for all the data you've read.
With the current solution, we are leaving a lot on the table in terms of optimizations in performance, flexibility, maintainability, and cost.
Optimizations
One way to address the lack of flexibility in your queries, and possible bloating of grades entities, is the concept of a composite sort key.
Using a composite sort key can help you break down your entities even further, making them more manageable to update and providing you more flexibility when you're querying. Additionally, you would wind up with much smaller and more manageable items, and although the number of items you store would increase, you'll save on cost and performance. With more optimized queries, you'll get only the data you need back so you're not paying those extra read units for data you're throwing away. The amount of data a single Query request can return is limited as well, so you may cut down on the amount of roundtrips you are making.
That composite sort key could look something like this, for grades:
{ studentId: "john", entityId: "grades#MATH", math2045: 96.52, math3082:91.34 }
{ studentId: "john", entityId: "grades#ENG", eng1021:89.93, eng2203:93.03 }
Now, you get the ability to say "give me all of John's MATH course grades" while still being able to get all the grades (by using the begins_with operation on the sort key when querying).
If you think you'll want to start storing more course information under grades entities, you can suffix your composite sort key with the course name, number, identifier, etc. Now you can get all of a students grades, all of a students grades within a subject, and all that data about a students grade within a subject, like its instructor, credits, year taken, semester, start date, etc.
These optimizations are all possible solutions, but may not fit your application, so again keep that in mind.
Resources
Here are some resources that should help you come up with your own solution, or ways to tweak the ones I've provided above to better suit you.
AWS re:Invent 2019: Data modeling with Amazon DynamoDB (CMY304)
AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401)
Best Practices for Using Sort Keys to Organize Data
NoSQL Design For DynamoDB
And keep this one in mind especially when you are considering cost/performance implications for high-traffic application:
Best Practices for Designing and Using Partition Keys Effectively
Assume that I have a list of employee names from a database (thousands, potentially tens-of-thousands in the near future). To make the problem simpler assume that each firstname/lastname combination is unique (a big if, but a tangent).
I also have a RSS stream of news content that pertains to the business (again, could be in the hundreds of items per day).
What I would like to do is detect if an employees name appears in the several paragraph news item and, if so, 'tag' the item with the person its talking about.
There may be more than one employee named in a single news item so breaking the loop after the first positive match isn't a possibility.
I can certainly brute force things: for every news item, loop over each and every employee name and if a regex expression returns a match, make note of it.
Is there a simpler way in ColdFusion or should I just get on with my nested loops?
Just throwing this out there as something you could do...
It sounds like you'll almost unanimously have significantly more employee names than words per post. Here's how I might handle it:
Have an always-running CF app that will pull in the feeds and onAppStart
Grab all employees from your db
Create an app-scoped look up struct with first names as keys and a struct of last names as values ( you could also add middle names sibling to last names with a 3rd tier if desired ).
So one key in the look up might be "Vanessa" with a struct with 2 keys ( "Johnson" and "Forta" ) as its value.
Then, each article you parse, just listToArray with a space as a delimiter and loop through the array doing a simple structKeyExists with each token. For matches, check the next item in the array as a last name.
I'd guess this would be much more performant processingwise than doing however many searches and also take almost no time to code and you can feed in any future sources extremely simply ( your checker takes one argument, any text on Earth ).
Interested to see what route you go and whether your experiments expose anything new about performance in CF.
Matthew, you have a tall order there, and there are really multiple parts to the challenge/solution. But just in terms of comparing a list of values to a given set of text to see if one of them occur in there, you'll find that there's no one could CF function. BEcause of that, I created a new one, findList, available at cflib:
http://cflib.org/index.cfm?event=page.udfbyid&udfid=1908
It's not perfect, nor as optimal as it could be, but it may be a useful first step or you, or give you some ideas. That said, it suited my need (determine if a given blog comment had reference to any of the blacklisted words). I show it comparing a list of URLs, but it could be any words at all. Hope that's a little helpful.
Another option worth exploring is leveraging the Solr engine that ships with CF now. It will do the string search heavy lifting for you and you can probably focus on dynamically keeping your collections up to date and optimized as new feed items come in.
Good luck!
For fun, I've been programming a Risk clone in C++ and I need some help with the territories/Continents part of it. Setting them up so that they know what territories are adjacent to them, what Continent it is apart of, who currently controls it and of course the amount of armies currently in it.
Likewise, the Continent needs to know all the territories that are in it, so a player who controls the whole Continent gets corresponding reinforcement bonus for that Continent.
Currently, I think a using std::set may be the best choice, but I need some suggestions on how to set it up.
Create a graph where each Territory object has an array (vector/whatever) of other territories it is adjacent to. Then have a Continent object for each continent which has a list of territories that are in it.
At end of each turn check to see that all territories in a continent all belong to the same player and if so give that player the extra resources defined by the continent. The territories themselves will be updated after each fight in a turn.
an std::vector should be more than sufficient, no need to complicate things.
You might consider using the boost graph library to make the country graph. A std::map could then take countries to continents, or a std::multimap to go the other direction.
I am working on a django model and not being a database expert I could use some advice. I essentially have a model which contains a many to many relationship with another model. But I need to store unique values for each relationship each time I include something.
So for instance in chemistry you may have many elements that include hydrogen, but each element has a unique amount of hydrogen in it. So for instance a water entry would be connected to hydrogen and oxygen and the amount would be two hydrogen atoms and one oxygen.
I want hydrogen and water in this scenario to be stored in the database as elements, so I can query against them for other elements using them.
What is the best way to model this?
Thanks!
Read the documentaion here and pay close attention to the Beatles example, it's exactly what you need.
Person -> Element
Group -> Chemical_Compound
Membership -> Element_2_Chemical
Element_2_Chemical should have an int field which details how many elements you have in each chemical compound.
In your metaphor, you say "I want hydrogen and water in this scenario to be stored in the database as elements, so I can query against them for other elements using them."
Does it mean that "water" may be on any part of the relationship you are modeling? Do "water" relate to "hydrogen" in the (almost) same way as "milk" relates to "water"?
If the answer is Yes, then you should use a directed-acyclic-graph model (hopefully, you won't have cycles in your relationship: A->B->C->A). Look into the django-dag ( http://pypi.python.org/pypi/django-dag/ ) and django-treebeard-dag ( http://pypi.python.org/pypi/django-treebeard-dag/0.2 ) packages.
If the answer in No, so yo have a clear distinction between what's a "container" and what's a "containee", use a normal many-to-many rel between two different models, like the "Membership" example in django documentation ( https://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships ).
In any case you'll have to add more info to the "edge" of the relationship.
Following strictly your chemical metaphor, you are maybe not modeling enough information, because some molecules have the same composition but different structure (they are called "isomers"). For instance the pentane, the 2-methylbutane and the 2,2-dimethylpropane have all five carbons and twelve hydrogens, but they are very different one another...
With this I am saying that when you are doing an "enhanced many-to-many" it's generally a complex model, so take care of not leaving anything out of it.
Trust me, I've put a lot of thought into this problem before asking here, and I think I got a solution, but I'd like to see what you guys can come up with before settling for my own :)
SCENARIO:
In the domain we got 3 entities: A treatment, a beautyshop and an employee. A beautyshop can hire 0 to many employees. Now, a beautysalon has a list of possible treatments it can do for its costumers. Each treatment has a description, a duration and a price. Each employee has a similar list, but each employee can specialize each treatment (different price or duration), add new treatments or "remove" treatments derived from the beautyshop.
.. This seems like a rather common problem to me, so I was hoping someone could come up with something clever :)
So far, Im thinking about letting each treatment have a unique id, and then let the employee list insert treatments by itself which will have the same id as the one from the shop. These employee treatments will then override the shop ones with the same id..
Thanks in advance
Are We talking about the objective representation of the problem or about the database representation of the problem? If it's the objective representation, than a specialized treatment should just be a subclass of the generic treatment.
With the relational database representation of the problem, things get a bit harder:
beautyshop ---= employee
beautyshop ---= treatment_type
treatment_type ---= treatment
employee =--= treatment
(---= is one to many, =--= is many to many).
But how do We get a list of treatments available in a beautyshop? We don't. Instead We get a list of treatments available from all beautyshop employees. That said, if a beautyshop has 0 employees, it serves no treatments.
You might use null fields in treatment table to indicate that the particular employee serves this treatment with default properties. If the treatment_type's defaults change for particular beautyshop, then all treatments are updated.
I would suggest adding some kind of inheritance/specialization mechanism to the Treatments by adding a parentTreatment reference to the Treatment class. You would have a set of standard Treatments, and each Employee would be able to select and customize them.
The BeautySalons wouldn't explicitly store any Treatments, a transient and volatile getAvailableTreatments() method would iterate over the associated Employees and aggregate the parent Treatments of the Treatments offered by each Employee.
Why do you want to have different treatments with the same ID?
I'd rather set up a "custom treatment" ID.