Why are relational databases needed? - foreign-keys

Specifically thinking of web apps,
(1) why are relationships(ie:foreign keys) in RDBMS even useful?
The web apps I write have logic built-in that validates user input against required fields. I see no real use for foreign keys and thus no real use for relational databases.
Besides, if I were to put all the required field validation logic in the RDBMS(ie:MySQL) it would simply return a vague error. At least with PHP-based validation I know which field is missing and I can notify the user(though with Javascript-based validation this would almost NEVER happen anyway).
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
I really need some insight on this topic. I'm simply can't come up with a good answer.

I will come at this from a different angle.
I work at a place where we had a database that had no foreign key constraints, default values, or other data checks whatsoever in their initial records database. The lead engineer's excuse for this was something similar to what you have described above. "The application will ensure the referential integrity".
The problem is, we did not have a standard data layer (like an object relational mapping) over the top of the database. We had multiple programmatic sources that fed into the same tables. It was funny because after a while, you could tell which parts of the code created which rows in the table. Sometimes the links lined up, sometimes they didn't. Sometimes the links were NULL (when they shouldn't be), and sometimes they were 0. We even had a few cyclic records which was fun.
My point is, you never know when you are going to need to write a quick script to batch import records, or write a new subsystem that references the same tables. It behooves us as programmers to program as defensively as possible. We can't assume that those who come after us will know as much (if anything) about how our schema should be used.

I'm not much of an SQL lover, but even I must say that the relational structure has its advantages.
It doesn't only allow validation. By providing the database with metadata describing the relations between the actual pieces information stored, a great number of optimizations are possible.
This makes it possible to quickly retrieve large, complex datasets. It also reduces the number of queries needed to make modifications and keep the data coherent, since most of the "book-keeping" is carried out automatically on the DB side of the connection.

One incredibly useful feature of foreign keys in most relational databases are cascades.
Suppose you have a families table and a persons table. Each family can have multiple people, but a person can only belong in one family (one-to-many relationship). If you have foreign keys and you delete a family row, the database can automatically update all the related people, either by deleting them or setting their foreign keys to null.
If you do not have this constraint, you must handle this situation yourself, in your own code.

RDBMSs are still very useful. Not sure why you wouldn't think so. Foreign key constraints can be used to maintain referential integrity (in other words, to provide a simple way to express 1:1, 1:many and many:many relationships. RDBMSs are also useful because there was a rich theory accompanying practical developments, unlike previous DBMSs. In particular, relational calculus/algebra are nice since they allow for good query optimization, normalization, etc.
Not sure if that really answers your question. Wikipedia might list some advantages of RDBMSs.

(1) why are relationships(ie:foreign keys) in RDBMS even useful?
First off, I think you are talking about foreign key CONSTRAINTS. Foreign keys are just a logical design feature that says that this entity matches up with that one.
The reason foreign key constraints are useful are:
They help you adhere to the DRY (Don't repeat yourself) principle. Sure your app validates the relationship, but does it do it in several places? Are there multiple apps that access the same DB? Do you have to repeat the logic in each app? Hey, you could pull that logic out and use a common DLL for access to that data that enforces that logic.Better yet, what if that was built into the RDMBS so I didn't have to write custom code to do something so routine? Bam. Foreign key constraints.
If your app enforces the foreign key validations, how do you force users who are working directly in the DB to honor your rules? I know, I know. You shouldn't let users into the back-end directly, but you just try telling that to the data analysts when they have a project for corporate and you are the bottleneck.
As to the vague error. Wouldn't your argument be better stated as RDBMS X has vague errors when data fails foreign key constraint checks? The way you have generalized it, you could also argue that we should use paper ledgers instead of computers because the constraint had a vague error.
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
Yeah, that would be now, yesterday and probably long into the future.
I could go on forever about the reasons, but here is the big one...
It provides a common structured file format that is easy to extend, leverage by other applications. You may be too young to remember when every dang system had it's own proprietary structured file format, but it sucked. Plus, it forced you re-invent the wheel constantly in terms of things like indexing, a query language, locking, etc.

"I see no real use for foreign keys and thus no real use for
relational databases"
Judging by this remark, you seem to be underestimating what a relational database is for. Foreign key constraints aren't a defining feature of relational databases and certainly aren't the only reason for using such databases. The relational database model is a powerful and effective way to represent data and it remains so even if you decide you don't want to implement a foreign key constraint. I will therefore assume the question you really meant to ask is: Why are foreign keys useful in relational databases?
A foreign key constraint is just one kind of data integrity constraint. You can of course implement integrity rules outside the database but the DBMS is designed and optimised to do the job for you and is generally the most efficient place to do it because it is closest to the data structures. If you did it outside the database then you would have at least an extra round trip to retrieve the necessary data. You would also have to replicate the DBMS's locking/concurrency model in your application code.
The database optimiser can take advantage of constraints in the database to improve the performance of queries. It can't do that if the rules only exist in your application code.
If you have many applications sharing the same database then implementing data integrity rules in every application is impractical and expensive to maintain. Centralising the constraint logic makes more sense.
Various CASE tools and DBA tools will take advantage of database constraints, can reverse engineer them and use them to assist development and maintenance tasks.
In practice the meaning and function of a database constraint versus some procedural code that validates data only on entry is very different. If X is implemented in a database constraint then I know it is valid for every piece of data in the database. If X is implemented in the application when data is entered then I only know it applies to future data - I can't be sure it applies to everything already in the database (maybe X was only implemented today and didn't apply to the data entered yesterday).

Because they maintain the integrity of the database. If you have all your business logic in the application then in theory they are not needed, but are still useful as a safeguard against bad data.

Related

Is there a persistency layer for list/queue containers?

Is there some kind of persistency layer that can be used for a regularly-modified list/queue container that stores strings?
The data in the list is just strings, nothing fancy. It could be useful, though, to store a key or hash with each string for definite references, so I thought I'd wrap each string in a struct with an extra key field.
The persistency should be saved on each modification, more or less, as spontaneous power offs might happen.
I looked into Boost::Serialisation and it seems easy to use, but I guess I'd have to write the whole queue everytime it gets modified to close the file and be safe for power offs, as I see no journaling option there.
I saw SQLite, but it could be over the top as I don't need relations or any sophisticated queries.
And I don't want to reinvent the wheel by doing it manually in some files.
Is there anything available worth looking into?
I have few experience with C++ and an OS beneath, so I'm unaware of what's available and what's suitable. And couldn't find any better.
Potentially simpler alternative to relational databases, when you don't need those relations, are "nosql" databases. A document oriented database might be a reasonable choice based on the description.

Django Best Practices -- Migrating Data

I have a table with data that must be filled by users. Once this data is filled, the status changes to 'completed' (status is a field inside data).
My question is, is it good practice to create a table for data to be completed and another one with completed data? Or should I only make one table with both types of data, distinguished by the status?
Not just Django
This is actually a very good general question, not necessarily specific to Django. But Django, through easy use of linked tables (ForeignKey, ManyToMany) is a good use case for One table.
One table, or group of tables
One table has some advantages:
No need to copy the data, just change the Status field.
If there are linked tables then they don't need to be copied
If you want to remove the original data (i.e., avoid keeping redundant data) then this avoids having to worry about deleting the linked data (and deleting it in the right sequence).
If the original add and the status change are potentially done by different processes then one table is much safer - i.e., marking the field "complete" twice is harmless but trying to delete/add a 2nd time can cause a lot of problems.
"or group of tables" is a key here. Django handles linked tables really well, so but doing all of this with two separate groups of linked tables gets messy, and easy to forget things when you change fields or data structures.
One table is the optimal way to approach this particular case. Two tables requires you to enforce data integrity and consistency within your application, rather than relying on the power of your database, which is generally a very bad idea.
You should aim to normalize your database (within reason) and utilize the database's built-in constraints as much as possible to avoid erroneous data, including duplicates, redundancies, and other inconsistencies.
Here's a good write-up on several common database implementation problems. Number 4 covers your 2-table option pretty well.
If you do insist on using two tables (please don't), then at least be sure to use an artificial primary key (IE: a unique value that is NOT just the id) to help maintain integrity. There may be matching id integer values in each table that match, but there should only ever be one version of each artificial primary key value between the two tables. Again, though, this is not the recommended approach, and adds complexity in your application that you don't otherwise need.

Django functions vs database functions

What is the best way to implement functions while writing an app in django? For example, I'd like to have a function that would read some data from other tables, merge then into the result and update user score based on it.
I'm using postgresql database, so I can implement it as database function and use django to directly call this function.
I could also get all those values in python, implement is as django function.
Since the model is defined in django, I feel like I shouldn't define functions in the database construction but rather implement them in python. Also, if I wanted to recreate the database on another computer, I'd need to hardcode those functions and load them into database in order to do that.
On the other hand, if the database is on another computer such function would need to call database multiple times.
Which is preferred option when implementing an app in django?
Also, how should I handle constraints, that I'd like the fields to have? Overloading the save() function or adding constraints to database fields by hand?
This is a classic problem: do it in the code or do it in the DBMS? For me, the answer comes from asking myself this question: is this logic/functionality intrinsic to the data itself, or is it intrinsic to the application?
If it is intrinsic to the data, then you want to avoid doing it in the application. This is particularly true where more than one app is going to be accessing / modifying the data. In which case you may be implementing the same logic in multiple languages / environments. This is a situation that is ripe with ways to screw up—now or in the future.
On the other hand, if this is just one app's way of thinking about the data, but other apps have different views (pun intended), then do it in the app.
BTW, beware of premature optimization. It is good to be aware of DB accesses and their costs, but unless you are talking big data, or a very time sensitive UI, then machine-time, and to a lesser degree user-time, is less important than your time. Getting v1.0 out the door is often more important. As the inimitable Fred Brooks said, "Plan to throw one away; you will anyhow."

which diagram to use in NoSql (Mcd, Merise, UML)

again, sorry for my silly question, but it seems that what i've learned from Relation Database should be "erased", there is no joins, so how the hell will i draw use Merise and UML in NoSql?
http://en.wikipedia.org/wiki/Class_diagram
this one will not work for NoSql?
How you organize your project is an independent notion of the technology used for persistence; In particular; UML or ERD or any such tool doesn't particularly apply to relational databases any more than it does to document databases.
The idea that NoSQL has "No Joins" is both silly and unhelpful. It's totally correct that (most) document databases do not provide a join operator; but that just means that when you do need a join, you do it in the application code instead of the query language; The basic facts of organizing your project stay the same.
Another difference is that document databases make expressing some things easier, and other things harder. For example, it's often easier to an entity relationship constraint in a relational database, but it's easier to express an inheritance heirarchy in a document database. Both notions can be supported by both technologies, and you will certainly use them when your application needs them; regardless of the technology you end up using.
In short, you should design your application without first choosing a persistence technology. Once you've got a good idea what you want to persist, You may have a better idea of which technology is a better fit. It may be the case that what you really need is both, or you might need something totally different from either.
EDIT: The idea of a foreign key is no more magical than simply saying "This is a name of that kind of thing". It happens that many SQL databases provide some very terse and useful features for dealing with this sort of thing; specifically, constraints (This column references this other relation; so it cannot take a value unless there is a corresponding row in the referant), and cascades, (If the status of the referent changes, make the corresponding change to the reference). This does make it easy to keep data consistent even at the lowest level, since there's no way to tell the database to enter a state where the referent is missing.
The important thing to distinguish though, is that the idea of giving a database entity (A row in a relational database, document in document databases) is distinct from the notion of schema constraints. One of the nice things about document databases is that they can easily combine or reorient where data lives so that you don't always have to have a referant that actually exists; Most document databases use the document class as part of the key, and so you can still be sure that the key is meaningful, even if when the referent doesn't actually exist.
But most of the time, you actually do want the referent to exist; you don't want a blog post to have an author unless that author actually exists in your system. Just how well this is supported depends a lot on the particular document database; Some databases do provide triggers, or other tools, to enforce the integrity of the reference, but many others only provide some kind of transactional capability, which requires that the integrity be enforced in the application.
The point is; for most kinds of database, every value in the database has some kind of identifier; in a relational database, that's a triple of relation:column:key; and in a document database it's usually something like the pair document_class:path. When you need one entity to refer to another, you use whatever sort of key you need to identify that datum for that kind of database. Foreign Key constraints found in RDBMses are just (really useful) syntactic sugar for "if not referant exists then raise ForeignKeyError", which could be implemented with equal power in other ways, if that's helpful for your particular use.

Does SAP BusinessObjects require a Universe for relational database?

Goal: I wish users to be able to directly connect to a RDBMS (e.g., MS SQL Server) and do some queries with possible cross references.
Tool: SAP BusinessObjects XI Enterprise
Description:
The main reason is that Universe creation is pretty techy. Imagine the SQL DB structure changing frequently, may be even daily. Hense the synchronization issues.
Is BO capable of doing a cross reference using the BO query GUI usable by non-techy do generate a request like:
SELECT
Classroom.Location
FROM
Student,
Classroom
WHERE
Student.Name = 'Foo' AND
Student.ClassroomName = Classroom.Name
...with only a ODBC connection and no Universe (or an autogenerated Universe)?
If yes, does it require foreign keys to be defined?
If no, is there a simple way to create and update (synch) a BO Universe directly from the DB structure? May be using their new XML format?
Good question.
Background
I have implemented one very large and "complex" banking database, 500+ tables, that the customer bought BO for. The "complex" is in quotes because although I created a pure 5NF (correctly Normalised to 5NF) RDB, and most developers and the power users did not find it "complex", some developers and users found it "complex". The first BO consultant could not even create a working Universe, and overran his budgeted one month. The second BO consultant created the entire Universe in 10 days. The whole structure (one 5NF RDB; 5 apps; one Universe; web reporting) all worked beautifully.
But as a result of that exercise, it was clear to me that although the Universe is very powerful, it is only required to overcome the impediments of an un-normalised database, or a data warehouse that has tables from many different source databases, which then need to be viewed together as one logical table. The first consultant was simply repeating what he was used to, doing his techie thing, and did not understand what a Normalised db meant. The second realisation was that BO Universe was simply not required for a true (normalised) RDB.
Therefore on the next large banking project, in which the RDB was pretty much 120% of the previous RDB, I advised against BO, and purchased Crystal Reports instead, which was much cheaper. It provided all the reports that users required, but it did not have the "slice and dice" capability or the data cube on the local PC. The only extra work I had to do was to provide a few Views to ease the "complex" bits of the RDB, all in a days work.
Since then, I have been involved in assignments that use BO, and fixed problems, but I have not used XI (and its auto-generated Universe). Certainly, a preponderance towards simple reporting tools, and avoiding the Universe altogether, which has been proved many times.
In general then, yes, BO Query GUI (even pre-XI) will absolutely read the RDB catalogue directly and you can create and execute any report you want from that, without an Universe. Your example is no sweat at all. "Cross references" are no sweat at all. The non-techie users can create and run such reports themselves. I have done scores of these, it takes minutes. Sometimes (eg. for Supertype-Subtype structures), creating Views eases this exercise even further.
Your Question
Exposes issues which are obstacles to that.
What is coming across is that you do not have a Relational Database. Pouring some data into a container called "relational DBMS" does not transform that content into a Relational Database.
one aspect of a true RDB is that all definitions are in the ISO/IEC/ANSI standard SQL catalogue.
if our "foreign keys" are not in the catalogue then you do not have Foreign Keys, you do not have Referential Integrity that is defined, maintained by the server.
you probably do not have Rules and Check Constraints either; therefore you do not have Data Integrity that is defined and maintained by the server.
Noting your comments regarding changing "db" structure. Evidently then, you have not normalised the data.
If the data was normalised correctly, then the structure will not change.
Sure, the structure will be extended (columns added; new tables added) but the existing structure of Entities and Attributes will not change, because they have been (a) modelled correctly and (b) normalised
therefore any app code written, or any BO Universe built (and reports created from that), are not vulnerable to such extensions to the RDB; they continue running merrily along.
Yes of course they cannot get at the new columns and new tables, but providing that is part of the extension; the point is the existing structure, and everything that was dependent on it, is stable.
Noting your example query. That is prima facie evidence of complete lack of normalisation: Student.ClassroomName is a denormalised column. Instead of existing once for every Student, it should exist once for each Classroom.
I am responding to your question only, but it should be noted that lack of normalisation will result in many other problems, not immediately related to your question: massive data duplication; Update Anomalies; lack of independence between the "database" and the "app" (changes in one will affect the other); lack of integrity (data and referential); lack of stability, and therefore a project that never ends.
Therefore you not only have some "structure" that changes almost daily, you have no structure in the "structure" of that, that does not change. That level of ongoing change is classic to the Prototype stage in a project; it has not yet settled down to the Development stage.
If you use BO, or the auto-generated Universe, you will have to auto-generate the Universe daily. And then re-create the report definition daily. The users may not like the idea of re-developing an Universe plus their reports daily. Normally they wait for the UAT stage of a project, if not the Production stage.
if you have Foreign Keys, since they are in the Standard SQL catalogue, BO will find them
if your do not have Foreign Keys, but you have some sort of "relation" between files, and some sort of naming convention from which such "relations" can be inferred, BO has a check box somewhere in the auto-generate window, that will "infer foreign keys from column names". Of course, it will find "relations" that you may not have intended.
if you do not have naming conventions, then there is nothing that BO can use to infer such "relations". there is only so much magic that a product can perform
and you still have the problem of "structure" changing all the time, so whatever magic you are relying on today may not work tomorrow.
Answer
Business Objects, Crystal reports, and all high end to low end report tools, are primarily written for Relational Databases, which reside in an ISO/IEC/ANSI Standard SQL DBMS. that means, if the definition is in the catalogue, they will find it. The higher end tools have various additional options (that's what you pay for) to assist with overcoming the limitations of sub-standard contents of a RDBMS, culminating in the Universe; but as you are aware takes a fair amount of effort and technical qualification to implement.
The best advise I can give you therefore, is to get a qualified modeller and model your data; such that it is stable, free of duplication, and your code is stable, etc, etc; such that simple (or heavy duty) report tools can be used to (a) define reports easily and (b) run those report definitions without changing them daily. You will find that the "structure" that changes daily, doesn't. What is changing daily is your understanding of the data.
Then, your wish will come true, the reports can be easily defined once, by the users, "cross references" and all, without an Universe, and they can be run whenever they like.
Related Material
This, your college or project, is not the first in the universe to be attempting to either (a) model their data or (b) implement a Database, relational or not. You may be interested in the work that other have already done in this area, as often much information is available free, in order to avoid re-inventing the wheel, especially if your project does not have qualified staff. Here is a simplified version (they are happy for me to publish a generic version but not the full customer-specific version) of a recent project I did for a local college; I wrote the RDB, they wrote the app.
Simplified College Data Model
Readers who are not familiar with the Relational Modelling Standard may find IDEF1X Notation useful.
Response to Comments
To be clear then. First a definition.
a Relational Database is, in chronological order, in the context of the last few days of 2010, with over 25 years of commonly available true relational technology [over 35 years of hard-to-use relational technology], for which there are many applicable Standards, and using such definitions (Wikipedia being unfit to provide said definitions, due to the lack of technical qualification in the contributors):
adheres the the Relational Model as a principle
Normalised to at least Third Normal Form (you need 5NF to be completely free of data duplication and Update Anomalies)
complies with the various existing Standards (as applicable to each particular area)
modelled by a qualified and capable modeller
is implemented in ISO/IEC/ANSI Standard SQL (that's the Declarative Referential Integrity ala Foreign Key definitions; Rule and Check constraints; Domains; Datatypes)
is Open Architecture (to be used by any application)
treated as as a corporate asset, of substantial value
and therefore reasonably secured against unauthorised access; data and referential integrity; uncontrolled change (unplanned changes affecting other users, etc).
Without that, you cannot enjoy the power, performance, ease of change, and ease of use, of a Relational Database.
What it is not, is the content of an RDBMS platform. Pouring unstructured or un-organised data into a container labelled "Relational Database Engine" does not magically transform the content into the label of the container.
Therefore, if it is reasonably (not perfect, not 100% Standard-complaint), a Relational Database, the BO Universe is definitely not required to access and use it to it full capability (limited only by functions of the report tool).
If it has no DRI (FK definitions), and no older style "defined keys" and no naming conventions (from which "relations can be derived) and no matching datatypes, then no report tool (or human being) will be able to find anything.
It is not just the FK definitions.
Depending on exactly which bits of a Relational Database has been implemented in the data heap, and on the capability of the report tool (how much the licence costs), some capability somewhere within the two ends of the spectrum, is possible. BO without the Universe is the best of breed for report tools; their Crystal Reports item is about half the grunt. The Universe is required to provide the database definitions for the non-database.
Then there is the duplication issue. Imagine how an user is going to feel when they find out that the data that they finally got through to, after 3 months, turns out to be a duplicate that no one keeps up-to-date.
"Database" Object Definition
If you have unqualified developers or end users implementing "tables" in the "database", then there is no limit to the obstacles and contradictions they place on themselves. ("Here, I've got an RDBMS but the content isn't; I've got BO but it can't; I've got encryption but I've copied the payroll data to five places, so that people can get at it when they forget their encryption key".) Every time I think I have seen the limit of insanity, someone posts a question on SO, and teaches me again that there is no limit to insanity.
BO via an ODBC connection is capable of doing JOIN (cross reference) without Universe as long as there are the correct FK defined?
(ODBC has nothing to do with it; it will operate the same via a native connection or via a browser.)
For that one time, re FKs defined correctly, yes. But the purpose of my long response is to identify the that are many other factors.
It isn't a BO or BO Universe question, it is "just how insane are the users' definitions and duplication". FKs could work sometimes and not others; could work today and not tomorrow.