Relational model: Tables or primary and foreign keys?

Relational model: Tables or primary and foreign keys? - foreign-keys

When talking about relational databases, it seems that most people refer to the primary and foreign key 'relations' as the reason for the 'relational database' terminology.
This is causing me considerable confusion because the textbook linked below states explicitly "A common misconception is that the name "relational" has to do with relationships between tables (that is, foreign keys). Actually, the true source for the model's name is the mathematical concept relation. A relation in the relational model is what SQL calls a table."
http://www.valorebooks.com/textbooks/training-kit-exam-70-461-querying-microsoft-sql-server-2012-microsoft-press-training-kit-1st-edition/9780735666054#default=buy&utm_source=Froogle&utm_medium=referral&utm_campaign=Froogle&date=11/12/15
Furthermore the next source explicitly refers to the tables as the relations and not the primary/foreign keys.
https://docs.oracle.com/javase/tutorial/jdbc/overview/database.html
However it seems common knowledge almost anywhere else I look or read that the primary and foreign keys are the relations.
Does anyone have a reason for the inconsistency?

Foreign key constraints are a kind of relation - a subset relation - but these aren't the relations from which the model derives its name. Rather, the relations of the relational model refer to finitary relations. Ted Codd wrote in his 1970 paper A Relational Model of Data for Large Shared Data Banks that "The term relation is used here in its accepted mathematical sense. Given sets S1, S2, ... Sn (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on." Thus, he was describing a structure which can be represented by a table, if we follow some rules like ignoring duplicate rows and the order of rows (it's a set, after all).
Another common misunderstanding is that foreign key constraints represent relationships between entities. They don't. Relationships are represented as sets/tables of rows of associated values. The keys of two or more entities will be recorded together in a row, whether it's in an "entity table" or a "relationship table". Foreign key constraints only enforce integrity, they don't link entities or tables. Tables can be joined on any predicate function, foreign key constraints play no role here.
Most people learn database concepts from blogs, tutorials and answers ranked by popularity. Most people have never read a decent database book, let alone papers by the inventors and students of the relational model of data. Most programmers and corporations want to get the product released and have little time or appreciation for logic, theory and philosophy. It's an inherently complicated field - see Bill Kent's book Data and Reality for an exploration of this complexity. Thus, most of what you'll find on the internet are half-truths at best as people try to make sense of a difficult topic.
People are familiar with records and pointers, due to their prevalence in mainstream programming languages, and they certainly look and sound a lot like entities and relationships. If entities are represented by tables/records, attributes by fields/columns, then 1-to-1 / 1-to-many relationships between entities must be an association between records/tables, right? It's a simple idea, and that makes it difficult to correct. The popularity of object/relational mapping and object-oriented domain models derive from this simple idea (and from well-spoken and sociable authors, unlike the surly attitudes of some relational proponents) but also further entrenches it.
Peter Chen (author of The Entity-Relationship Model - Toward a Unified View of Data made some effort to be rigorous, distinguishing "entity relations" and "relationship relations". In his view, entities were real-world concepts which were represented in a database as values, and described via association of values in rows. Relationships between entities were similarly represented by association of values in rows. The E-R model's distinction between relationships and attributes is somewhat redundant (attributes are just binary relationships) and there's little benefit in distinguishing entity tuples from relationship tuples. In fact, I believe it serves to reinforce the confusion. It's superficial similarity to the older network model helped its adoption but also served to maintain the latter, as developers adopted new terminology while maintaining old practices.
Object-role modeling (aka NIAM, by Sjir Nijssen and Terry Halpin) does away with attributes and focuses on domains, roles and relations. It's more elegant than E-R and much closer to a true relational model, but its strengths (logical, comprehensive, move away from the network model) is also its weaknesses (learning curve, more complicated diagrams, less amenable as a vehicle for familiar techniques).
Ted Codd remarked in the paper mentioned above that "The network model, on the other hand, has spawned a number of confusions, not the least of which is mistaking the derivation of connections for the derivation of relations." This is as true today as it was then. The relational model which he described has since been built on by many others, including Chris Date whose book An Introduction to Database Systems is one of the most comprehensive sources on the topic.
I'm naming all these authors because one more opinion on either side isn't going to clear up your confusion. Rather, go to the sources and study them for yourself. Yes, it's hard work, but your efforts will be repaid in the quality of understanding you'll gain.

Related

Does many to many-many relationship hurt model performance?

Generally, we recommend minimizing the use of bi-directional
relationships. They can negatively impact on model query performance,
and possibly deliver confusing experiences for your report users.
Link: https://learn.microsoft.com/en-us/power-bi/guidance/relationships-bidirectional-filtering
Documentation clearly says that bi-directional filtering hurts model performance.
Does many to many relationship also hurt model performance? The documentation (https://learn.microsoft.com/en-us/power-bi/guidance/relationships-many-to-many) doesn't mention this.
The reason for asking this question is - my understanding was that model performance is based on table expansion, and since many to many relationship doesn't support table expansion, does this imply that it will have bad performance?
Whereas bidirectional relationship doesn't affect table expansion (in an intra group 1:n relationship). Yet it is said that bidirectional relationship has bad performance.
So is table expansion not a factor that affects model performance?

There is an extensive article and associated video on many-to-many performance here: https://www.sqlbi.com/articles/different-options-to-model-many-to-many-relationships-in-power-bi-and-tabular/

Are there conventions for naming 'through' models in Django

Django has some very clear conventions for naming models:
do so in the singular
describe the object the model represents
using capWords convention etc.
When you are using a 'through' model, to describe a many-to-many relationship however, you are no longer describing an object, but a relationship between objects.
Having googled a bit, I can't find any guidance on naming 'through' models. The example used in the django docs (Musician and Band) is named Membership. This is kind of perfect because membership exactly describes the relationship between the musician and the band. But in so many other cases there doesn't seem to be a word (or even phrase) to describe such a relationship.
Take for example, the other situation used in the django docs (for a 'normal' many-to-many field) of Pizza and Topping. There doesn't seem to be a good word to describe how a pizza relates to a topping; and if I therefore needed a through field to add additional information (e.g. maybe I have a primary topping and a secondary topping) I end up with a naming difficulty.
In practice there are two (maybe more, but two that I can think of) ways to procede:
Call the through model something along the lines of ThingOneThingTwoRelationship e.g. PizzaToppingRelationship. I guess this works, but it's kind of ugly and verbose.
Try and name the field after the additional info it stores, e.g. ToppingSignificance. Less ugly than option 1, but has other drawbacks. For one, if the model grows to contain additional information the name is no longer particularly descriptive. If we take the 'band' example from the django docs. Imagine if we started with just the joining_date, and we called the Membership model, MemberJoingDate as that model grew to include, reason and leaving_date, the name would no longer be apt.
So what am I actually asking...
Are there any known conventions (not opinions) for naming through fields. I'm guessing there's nothing official or I would have found it on the django site, but are there any conventions that are just standards that are generally accepted by folk that have been doing this a while.
Failing that, are there any django style guides that discuss this (two-scoops etc. - if I have my copy available I'd look it up)
Failing either of the above, are there any conventions that could be borrowed from general relational database parlance.
... yes, I know - naming things is hard.

which diagram to use in NoSql (Mcd, Merise, UML)

again, sorry for my silly question, but it seems that what i've learned from Relation Database should be "erased", there is no joins, so how the hell will i draw use Merise and UML in NoSql?
http://en.wikipedia.org/wiki/Class_diagram
this one will not work for NoSql?

How you organize your project is an independent notion of the technology used for persistence; In particular; UML or ERD or any such tool doesn't particularly apply to relational databases any more than it does to document databases.
The idea that NoSQL has "No Joins" is both silly and unhelpful. It's totally correct that (most) document databases do not provide a join operator; but that just means that when you do need a join, you do it in the application code instead of the query language; The basic facts of organizing your project stay the same.
Another difference is that document databases make expressing some things easier, and other things harder. For example, it's often easier to an entity relationship constraint in a relational database, but it's easier to express an inheritance heirarchy in a document database. Both notions can be supported by both technologies, and you will certainly use them when your application needs them; regardless of the technology you end up using.
In short, you should design your application without first choosing a persistence technology. Once you've got a good idea what you want to persist, You may have a better idea of which technology is a better fit. It may be the case that what you really need is both, or you might need something totally different from either.
EDIT: The idea of a foreign key is no more magical than simply saying "This is a name of that kind of thing". It happens that many SQL databases provide some very terse and useful features for dealing with this sort of thing; specifically, constraints (This column references this other relation; so it cannot take a value unless there is a corresponding row in the referant), and cascades, (If the status of the referent changes, make the corresponding change to the reference). This does make it easy to keep data consistent even at the lowest level, since there's no way to tell the database to enter a state where the referent is missing.
The important thing to distinguish though, is that the idea of giving a database entity (A row in a relational database, document in document databases) is distinct from the notion of schema constraints. One of the nice things about document databases is that they can easily combine or reorient where data lives so that you don't always have to have a referant that actually exists; Most document databases use the document class as part of the key, and so you can still be sure that the key is meaningful, even if when the referent doesn't actually exist.
But most of the time, you actually do want the referent to exist; you don't want a blog post to have an author unless that author actually exists in your system. Just how well this is supported depends a lot on the particular document database; Some databases do provide triggers, or other tools, to enforce the integrity of the reference, but many others only provide some kind of transactional capability, which requires that the integrity be enforced in the application.
The point is; for most kinds of database, every value in the database has some kind of identifier; in a relational database, that's a triple of relation:column:key; and in a document database it's usually something like the pair document_class:path. When you need one entity to refer to another, you use whatever sort of key you need to identify that datum for that kind of database. Foreign Key constraints found in RDBMses are just (really useful) syntactic sugar for "if not referant exists then raise ForeignKeyError", which could be implemented with equal power in other ways, if that's helpful for your particular use.

Why are relational databases needed?

Specifically thinking of web apps,
(1) why are relationships(ie:foreign keys) in RDBMS even useful?
The web apps I write have logic built-in that validates user input against required fields. I see no real use for foreign keys and thus no real use for relational databases.
Besides, if I were to put all the required field validation logic in the RDBMS(ie:MySQL) it would simply return a vague error. At least with PHP-based validation I know which field is missing and I can notify the user(though with Javascript-based validation this would almost NEVER happen anyway).
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
I really need some insight on this topic. I'm simply can't come up with a good answer.

I will come at this from a different angle.
I work at a place where we had a database that had no foreign key constraints, default values, or other data checks whatsoever in their initial records database. The lead engineer's excuse for this was something similar to what you have described above. "The application will ensure the referential integrity".
The problem is, we did not have a standard data layer (like an object relational mapping) over the top of the database. We had multiple programmatic sources that fed into the same tables. It was funny because after a while, you could tell which parts of the code created which rows in the table. Sometimes the links lined up, sometimes they didn't. Sometimes the links were NULL (when they shouldn't be), and sometimes they were 0. We even had a few cyclic records which was fun.
My point is, you never know when you are going to need to write a quick script to batch import records, or write a new subsystem that references the same tables. It behooves us as programmers to program as defensively as possible. We can't assume that those who come after us will know as much (if anything) about how our schema should be used.

I'm not much of an SQL lover, but even I must say that the relational structure has its advantages.
It doesn't only allow validation. By providing the database with metadata describing the relations between the actual pieces information stored, a great number of optimizations are possible.
This makes it possible to quickly retrieve large, complex datasets. It also reduces the number of queries needed to make modifications and keep the data coherent, since most of the "book-keeping" is carried out automatically on the DB side of the connection.

One incredibly useful feature of foreign keys in most relational databases are cascades.
Suppose you have a families table and a persons table. Each family can have multiple people, but a person can only belong in one family (one-to-many relationship). If you have foreign keys and you delete a family row, the database can automatically update all the related people, either by deleting them or setting their foreign keys to null.
If you do not have this constraint, you must handle this situation yourself, in your own code.

RDBMSs are still very useful. Not sure why you wouldn't think so. Foreign key constraints can be used to maintain referential integrity (in other words, to provide a simple way to express 1:1, 1:many and many:many relationships. RDBMSs are also useful because there was a rich theory accompanying practical developments, unlike previous DBMSs. In particular, relational calculus/algebra are nice since they allow for good query optimization, normalization, etc.
Not sure if that really answers your question. Wikipedia might list some advantages of RDBMSs.

(1) why are relationships(ie:foreign keys) in RDBMS even useful?
First off, I think you are talking about foreign key CONSTRAINTS. Foreign keys are just a logical design feature that says that this entity matches up with that one.
The reason foreign key constraints are useful are:
They help you adhere to the DRY (Don't repeat yourself) principle. Sure your app validates the relationship, but does it do it in several places? Are there multiple apps that access the same DB? Do you have to repeat the logic in each app? Hey, you could pull that logic out and use a common DLL for access to that data that enforces that logic.Better yet, what if that was built into the RDMBS so I didn't have to write custom code to do something so routine? Bam. Foreign key constraints.
If your app enforces the foreign key validations, how do you force users who are working directly in the DB to honor your rules? I know, I know. You shouldn't let users into the back-end directly, but you just try telling that to the data analysts when they have a project for corporate and you are the bottleneck.
As to the vague error. Wouldn't your argument be better stated as RDBMS X has vague errors when data fails foreign key constraint checks? The way you have generalized it, you could also argue that we should use paper ledgers instead of computers because the constraint had a vague error.
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
Yeah, that would be now, yesterday and probably long into the future.
I could go on forever about the reasons, but here is the big one...
It provides a common structured file format that is easy to extend, leverage by other applications. You may be too young to remember when every dang system had it's own proprietary structured file format, but it sucked. Plus, it forced you re-invent the wheel constantly in terms of things like indexing, a query language, locking, etc.

"I see no real use for foreign keys and thus no real use for
relational databases"
Judging by this remark, you seem to be underestimating what a relational database is for. Foreign key constraints aren't a defining feature of relational databases and certainly aren't the only reason for using such databases. The relational database model is a powerful and effective way to represent data and it remains so even if you decide you don't want to implement a foreign key constraint. I will therefore assume the question you really meant to ask is: Why are foreign keys useful in relational databases?
A foreign key constraint is just one kind of data integrity constraint. You can of course implement integrity rules outside the database but the DBMS is designed and optimised to do the job for you and is generally the most efficient place to do it because it is closest to the data structures. If you did it outside the database then you would have at least an extra round trip to retrieve the necessary data. You would also have to replicate the DBMS's locking/concurrency model in your application code.
The database optimiser can take advantage of constraints in the database to improve the performance of queries. It can't do that if the rules only exist in your application code.
If you have many applications sharing the same database then implementing data integrity rules in every application is impractical and expensive to maintain. Centralising the constraint logic makes more sense.
Various CASE tools and DBA tools will take advantage of database constraints, can reverse engineer them and use them to assist development and maintenance tasks.
In practice the meaning and function of a database constraint versus some procedural code that validates data only on entry is very different. If X is implemented in a database constraint then I know it is valid for every piece of data in the database. If X is implemented in the application when data is entered then I only know it applies to future data - I can't be sure it applies to everything already in the database (maybe X was only implemented today and didn't apply to the data entered yesterday).

Because they maintain the integrity of the database. If you have all your business logic in the application then in theory they are not needed, but are still useful as a safeguard against bad data.

Does SAP BusinessObjects require a Universe for relational database?

Goal: I wish users to be able to directly connect to a RDBMS (e.g., MS SQL Server) and do some queries with possible cross references.
Tool: SAP BusinessObjects XI Enterprise
Description:
The main reason is that Universe creation is pretty techy. Imagine the SQL DB structure changing frequently, may be even daily. Hense the synchronization issues.
Is BO capable of doing a cross reference using the BO query GUI usable by non-techy do generate a request like:
SELECT
Classroom.Location
FROM
Student,
Classroom
WHERE
Student.Name = 'Foo' AND
Student.ClassroomName = Classroom.Name
...with only a ODBC connection and no Universe (or an autogenerated Universe)?
If yes, does it require foreign keys to be defined?
If no, is there a simple way to create and update (synch) a BO Universe directly from the DB structure? May be using their new XML format?

Good question.
Background
I have implemented one very large and "complex" banking database, 500+ tables, that the customer bought BO for. The "complex" is in quotes because although I created a pure 5NF (correctly Normalised to 5NF) RDB, and most developers and the power users did not find it "complex", some developers and users found it "complex". The first BO consultant could not even create a working Universe, and overran his budgeted one month. The second BO consultant created the entire Universe in 10 days. The whole structure (one 5NF RDB; 5 apps; one Universe; web reporting) all worked beautifully.
But as a result of that exercise, it was clear to me that although the Universe is very powerful, it is only required to overcome the impediments of an un-normalised database, or a data warehouse that has tables from many different source databases, which then need to be viewed together as one logical table. The first consultant was simply repeating what he was used to, doing his techie thing, and did not understand what a Normalised db meant. The second realisation was that BO Universe was simply not required for a true (normalised) RDB.
Therefore on the next large banking project, in which the RDB was pretty much 120% of the previous RDB, I advised against BO, and purchased Crystal Reports instead, which was much cheaper. It provided all the reports that users required, but it did not have the "slice and dice" capability or the data cube on the local PC. The only extra work I had to do was to provide a few Views to ease the "complex" bits of the RDB, all in a days work.
Since then, I have been involved in assignments that use BO, and fixed problems, but I have not used XI (and its auto-generated Universe). Certainly, a preponderance towards simple reporting tools, and avoiding the Universe altogether, which has been proved many times.
In general then, yes, BO Query GUI (even pre-XI) will absolutely read the RDB catalogue directly and you can create and execute any report you want from that, without an Universe. Your example is no sweat at all. "Cross references" are no sweat at all. The non-techie users can create and run such reports themselves. I have done scores of these, it takes minutes. Sometimes (eg. for Supertype-Subtype structures), creating Views eases this exercise even further.
Your Question
Exposes issues which are obstacles to that.
What is coming across is that you do not have a Relational Database. Pouring some data into a container called "relational DBMS" does not transform that content into a Relational Database.
one aspect of a true RDB is that all definitions are in the ISO/IEC/ANSI standard SQL catalogue.
if our "foreign keys" are not in the catalogue then you do not have Foreign Keys, you do not have Referential Integrity that is defined, maintained by the server.
you probably do not have Rules and Check Constraints either; therefore you do not have Data Integrity that is defined and maintained by the server.
Noting your comments regarding changing "db" structure. Evidently then, you have not normalised the data.
If the data was normalised correctly, then the structure will not change.
Sure, the structure will be extended (columns added; new tables added) but the existing structure of Entities and Attributes will not change, because they have been (a) modelled correctly and (b) normalised
therefore any app code written, or any BO Universe built (and reports created from that), are not vulnerable to such extensions to the RDB; they continue running merrily along.
Yes of course they cannot get at the new columns and new tables, but providing that is part of the extension; the point is the existing structure, and everything that was dependent on it, is stable.
Noting your example query. That is prima facie evidence of complete lack of normalisation: Student.ClassroomName is a denormalised column. Instead of existing once for every Student, it should exist once for each Classroom.
I am responding to your question only, but it should be noted that lack of normalisation will result in many other problems, not immediately related to your question: massive data duplication; Update Anomalies; lack of independence between the "database" and the "app" (changes in one will affect the other); lack of integrity (data and referential); lack of stability, and therefore a project that never ends.
Therefore you not only have some "structure" that changes almost daily, you have no structure in the "structure" of that, that does not change. That level of ongoing change is classic to the Prototype stage in a project; it has not yet settled down to the Development stage.
If you use BO, or the auto-generated Universe, you will have to auto-generate the Universe daily. And then re-create the report definition daily. The users may not like the idea of re-developing an Universe plus their reports daily. Normally they wait for the UAT stage of a project, if not the Production stage.
if you have Foreign Keys, since they are in the Standard SQL catalogue, BO will find them
if your do not have Foreign Keys, but you have some sort of "relation" between files, and some sort of naming convention from which such "relations" can be inferred, BO has a check box somewhere in the auto-generate window, that will "infer foreign keys from column names". Of course, it will find "relations" that you may not have intended.
if you do not have naming conventions, then there is nothing that BO can use to infer such "relations". there is only so much magic that a product can perform
and you still have the problem of "structure" changing all the time, so whatever magic you are relying on today may not work tomorrow.
Answer
Business Objects, Crystal reports, and all high end to low end report tools, are primarily written for Relational Databases, which reside in an ISO/IEC/ANSI Standard SQL DBMS. that means, if the definition is in the catalogue, they will find it. The higher end tools have various additional options (that's what you pay for) to assist with overcoming the limitations of sub-standard contents of a RDBMS, culminating in the Universe; but as you are aware takes a fair amount of effort and technical qualification to implement.
The best advise I can give you therefore, is to get a qualified modeller and model your data; such that it is stable, free of duplication, and your code is stable, etc, etc; such that simple (or heavy duty) report tools can be used to (a) define reports easily and (b) run those report definitions without changing them daily. You will find that the "structure" that changes daily, doesn't. What is changing daily is your understanding of the data.
Then, your wish will come true, the reports can be easily defined once, by the users, "cross references" and all, without an Universe, and they can be run whenever they like.
Related Material
This, your college or project, is not the first in the universe to be attempting to either (a) model their data or (b) implement a Database, relational or not. You may be interested in the work that other have already done in this area, as often much information is available free, in order to avoid re-inventing the wheel, especially if your project does not have qualified staff. Here is a simplified version (they are happy for me to publish a generic version but not the full customer-specific version) of a recent project I did for a local college; I wrote the RDB, they wrote the app.
Simplified College Data Model
Readers who are not familiar with the Relational Modelling Standard may find IDEF1X Notation useful.
Response to Comments
To be clear then. First a definition.
a Relational Database is, in chronological order, in the context of the last few days of 2010, with over 25 years of commonly available true relational technology [over 35 years of hard-to-use relational technology], for which there are many applicable Standards, and using such definitions (Wikipedia being unfit to provide said definitions, due to the lack of technical qualification in the contributors):
adheres the the Relational Model as a principle
Normalised to at least Third Normal Form (you need 5NF to be completely free of data duplication and Update Anomalies)
complies with the various existing Standards (as applicable to each particular area)
modelled by a qualified and capable modeller
is implemented in ISO/IEC/ANSI Standard SQL (that's the Declarative Referential Integrity ala Foreign Key definitions; Rule and Check constraints; Domains; Datatypes)
is Open Architecture (to be used by any application)
treated as as a corporate asset, of substantial value
and therefore reasonably secured against unauthorised access; data and referential integrity; uncontrolled change (unplanned changes affecting other users, etc).
Without that, you cannot enjoy the power, performance, ease of change, and ease of use, of a Relational Database.
What it is not, is the content of an RDBMS platform. Pouring unstructured or un-organised data into a container labelled "Relational Database Engine" does not magically transform the content into the label of the container.
Therefore, if it is reasonably (not perfect, not 100% Standard-complaint), a Relational Database, the BO Universe is definitely not required to access and use it to it full capability (limited only by functions of the report tool).
If it has no DRI (FK definitions), and no older style "defined keys" and no naming conventions (from which "relations can be derived) and no matching datatypes, then no report tool (or human being) will be able to find anything.
It is not just the FK definitions.
Depending on exactly which bits of a Relational Database has been implemented in the data heap, and on the capability of the report tool (how much the licence costs), some capability somewhere within the two ends of the spectrum, is possible. BO without the Universe is the best of breed for report tools; their Crystal Reports item is about half the grunt. The Universe is required to provide the database definitions for the non-database.
Then there is the duplication issue. Imagine how an user is going to feel when they find out that the data that they finally got through to, after 3 months, turns out to be a duplicate that no one keeps up-to-date.
"Database" Object Definition
If you have unqualified developers or end users implementing "tables" in the "database", then there is no limit to the obstacles and contradictions they place on themselves. ("Here, I've got an RDBMS but the content isn't; I've got BO but it can't; I've got encryption but I've copied the payroll data to five places, so that people can get at it when they forget their encryption key".) Every time I think I have seen the limit of insanity, someone posts a question on SO, and teaches me again that there is no limit to insanity.
BO via an ODBC connection is capable of doing JOIN (cross reference) without Universe as long as there are the correct FK defined?
(ODBC has nothing to do with it; it will operate the same via a native connection or via a browser.)
For that one time, re FKs defined correctly, yes. But the purpose of my long response is to identify the that are many other factors.
It isn't a BO or BO Universe question, it is "just how insane are the users' definitions and duplication". FKs could work sometimes and not others; could work today and not tomorrow.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js