I am in an Enterprise environment.
I have to design an HRM database (Human Resource Management).
This database must contain information for each employee (ie. Name, Last Name, Company, etc.).
To this database it must be connected to several applications.
Each of these applications performs different functions and manages other data to associate with the HRM database (ie. courses, activities, vehicles, etc.).
Each application is managed by some users.
Each application must have the possibility to create a new "employee record" in the HRM database.
A general scheme is the following:
General scheme
Q: What is the best database design and software architecture to do this?
I would use Laravel as a framework for each application and PostgreSQL as database.
Q: What is the best database design and software architecture to do this?
I think database design and software architecture depend on the framework, Laravel. At first, I recommend you to read a document: Laravel 4.2 Laravel Quickstart - Eloquent ORM
database design
ORM, it means tables are abstracted by the framework. To represent data models, you should consider about what is required data models (= table).
To design data models, at first you need to check required real world Requirement ( for example, users table must have id, first_name, last_name.)
After that, please check a kind of ActiveRecord pettern from this article: Active record pattern. Because Laravel is based on Ruby on Rails's activerecord.
FYI
Mapping Objects to Relational Databases: O/R Mapping In Detail
software architecture
TBD
Related
I m designing a website where security of data is an issue.
I ve read this book : https://books.agiliq.com/projects/django-multi-tenant/en/latest/index.html
I'm still thinking about the right database structure for users.
And im hesitating between shared database with isolated schemas, isolated databases in a shared app, or completely isolated tenants using docker.
As security of data is an issue, i would like to avoid to put all the users in the same table in the database or in different schemas in the same database. However i dont understand well if i should put each user in a separate database (create a database per user, sqlite for example but i dont know if it would communicate well with postgres). What is the best practice for this in terms of security?
Im wondering how these options affect database speed compared to a shared database with a shared schema, which was the basic configuration of the course i attended on django.
I dont have good knowledge on databases so your help on the performance issue would be very appreciated!
Also, if I want to do some stats and use tenants data, how difficult is it to query completely isolated tenants using docker or isolated databases, in particular if each user is a separate docker or database?
This is more of an architectural question than a technological one per se.
I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).
I am using Django and a PostgreSQL database.
Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.
We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.
I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).
Thank you in advance!
Here are some cool Open Source tools I used recently:
Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.
My answer does not necessarily apply to data warehousing. In your case I see the possibility to implement a NoSQL database solution alongside an OLTP relational storage, which in this case is PostgreSQL.
Why consider NoSQL? In addition to the obvious scalability benefits, NoSQL offer a number of advantages that probably will apply to your scenario. For instance, the flexibility of having records with different sets of fields, and key-based access.
Since you're still in "trial" stage you might find it easier to decide for a NoSQL database solution depending on your hosting provider. For instance AWS have SimpleDB, Google App Engine provide their own DataStore, etc. However there are plenty of other NoSQL solutions you can go for that have nice Python bindings.
In Django, the suggested software architecture is to put all business logic and data access in models.
But, some colleagues have suggested that the data access layer should be separate from the business logic (business service layer). Their justification is that the data access layer can isolate changes if a different data source is used. They also say that there is business logic that can be in more than one model.
But, when I start coding using the separate data access and business logic layers, the data access layer is simple (basically the model code that defines the db schema) and it does not seem to add much value.
Is there really value in separating out the data access from django models or does django already provide a sufficient data access layer with its ORM?
I'm looking for developers that have implemented a fair number of django apps and find out what their opinion is. This is for a small to medium sized web app.
After three years of Django development, I've learned the following.
The ORM is the access layer. Nothing more is needed.
50% of the business logic goes in the model. Some of this is repeated or amplified in the Forms.
20% of the business logic goes in Forms. All data validation, for example, is in the forms. In some cases, the forms will narrow a general domain (allowed in the model) to some subset that's specific to the problem, the business or the industry.
20% of the business logic winds up in other modules in the application. These modules are above the models and forms, but below the view functions, RESTful web services and command-line apps.
10% of the business logic winds up in command-line apps using the management command interface. This is file loads, extracts, and random bulk changes.
It's very important that view functions and RESTful web services do approximately nothing. They use models, forms, and other modules as much as possible. The view functions and RESTful web services are limited to dealing with the vagaries of HTTP and the various data formats (JSON, HTML, XML, YAML, whatever.)
Trying to invent Yet Another Access Layer is a zero-value exercise.
The answer depends on the requirements of your application.
For applications which will always use relational databases and can be coupled with a specific ORM, you do not need to separate data access and models. Django ORM is based on the active record design pattern, which supposes data access and model are together. Pro is simplicity, con is less flexibility.
Separating data access and model is only necessary when developer wants to uncouple completely data access layer and business logic. You can do it with the data mapper design pattern. Some ORMs support this design pattern, such as SQLAlchemy. Pro is more flexibility, con is more complexity.
I recommend the book "Patterns of Enterprise Application Architecture" written by Martin Fowler for more details.
In my case the separate system is a web-service (but it could conceivably be anything).
My question is what are the best practices when you integrate against a separate system such as a web-service when it comes to data?
Example: Web-service provides a list of products. Products are grouped using categories. You can get all products in a sub-category. You can get a specific product by its id (an integer) or its name (a unique value).
In my application:
I display the list of categories and products - and the user can choose the product and specify an order quantity.
Should I store the name of the category or the id of the category?
Should I store the name of the product or the id of the product?
How should I name the field in the database that stores the data from the web-service
(CategoryId or WsCategoryId: so that by convention one knows where the value is coming from?)
Any other best practices?
Any other references?
From your question I understand that the web service's interface looks something like this:
/product/
/product/{ProductId}
/product/{ProductName}
/product/category/{CategoryId}
Since you are asking if you should store CategoryName, I assume that it is unique (same as ProductName).
I also assume that the web service handles cases where products or categories are renamed transparently (i.e. by providing a redirect or any other means which allow you to detect this and handle it accordingly). If it doesn't, do not consider storing names as references to products or categories - always use IDs.
I would provide the same answer to your questions #1 and #2. Even though uniqueness of ProductName and CategoryName will technically allow you to store them in your application as unique identifiers of products and categories, I would opt for storing their IDs instead. The main decision point would be your storage medium. Since you are using a database, and the web service allows you to access objects by unique numerical IDs, database normalization rules should apply - hence you should store IDs.
The above however assumes that you are using a relational database - if you are using a NoSQL database, I assume that storing names instead of IDs would be a viable option as well (at least as far as I can tell with my current understanding of NoSQL solutions, unfortunately I don't have any practical experience with any of them yet).
Regarding question #3 - I would stick with the naming conventions that you already use in your database. There are many different conventions for naming tables and columns out there, so I really doubt that there are any standardized conventions on how to name columns referencing web service objects. I would name them according to your existing naming conventions and in a way that purpose of the columns is clear to everybody who is using the system. Note that if there is a chance that you will be using other web services in the future, you should consider keeping the name of the service in the column name rather than using a generic ws prefix - e.g. AmazonProductId or AmazonCategoryId.
I'll try to point out a few items from my experience, but I would not label them as best practices - just topics to think about.
In my experience, I found it useful to treat data from web services in the same fashion as the data from a database - at least from an application's perspective, where your storage layer would be abstracted from application logic. By this I mean that you would should think about and prepare for similar scenarios regardless if your storage medium is a database or a web service. Same as databases, web services can go down, both can have their data or integrity corrupt, both will require you to sanitize or otherwise process data on input.
Caching of data should be an item which is high on your list - apart from the obvious performance reasons, it can allow you to deal with outages of the web service (to an extend limited by which data you cache).
An example would be that your application displays a list products most frequently purchased products in your application. If your application stores only IDs of products, you will have to do one or more requests to the web service in order to retrieve the names of all products which you need to display in the list. If you cache product names locally or in your database, you will achieve better performance, conserve your resources and you will also have a failsafe scenario in case that the web service goes down.
Referential integrity is one other important aspect to think about when working with web services. As the web service is completely separate from your database, you do not have the option to create foreign keys as you would do in a database-only solution. This means that data changes in the web service (i.e. product updates or deletions) can break the integrity of data in your database.
Regarding references, these depend mostly on the type of web service that you are about to use (you didn't specify which service you will be using). If the service is based on REST principles, I can recommend Restful Web Services by Leonard Richardson and Sam Ruby. Even though it isn't focused on application/service integration as such, it's a great introduction into REST.
We are tasked with migrating an existing set of entities (currently POCOs persisted with NHibernate against an MSSQL database) to now persist to some kind of web service (yet to be built, either RESTful or SOAP-based, and that we control).
I like how NHibernate encapsulates the persistence concerns and lets us maintain a logic-rich, persistence-agnostic domain model. Is there any way to make NHibernate talk to a web service at the back end instead of a SQL database directly? In other words, can "service instead of SQL database" be treated as a persistence implementation detail and allow us to continue to use NHibernate?
Am I asking the right question? :)
NHibernate is an ORM. It maps between objects and relational tables. It does not map between objects and web services. You need to use a different API for persistence oriented web services. You can create a set of interfaces that are implemented by both your NHibernate layer (for the relational database) and the web service layer to make it appear like it's one API.
The question is valid, but unfortunately the answer is no. Ideally you'd design your services to return objects that are relevant to the consumers of your service and you'd use NHibernate to get the data from your database which the service can then use to return to the consumer.
I was actually reading an interesting article about exposing your data through your service layer the other day. http://davybrion.com/blog/2010/05/why-you-shouldnt-expose-your-entities-through-your-services/ It was an interesting perspective on what the role of services play in serving data to your applications.