Can Elasticsearch be used as a database in a Django Project? - django

Can we directly use Elasticsearch as a primary database in Django? I have tried finding the solution or a way out but could not find any relevant information. Everywhere it is said that we can use Elasticsearch as a search engine over any other primary database. But as per my understanding Elasticsearch is a NoSQL Database, so there should be a way to use it as a Primary Database in a Django Project.
Please help, if someone has any idea about it.

The short answer is no.
SO already has an answer here and this is still valid: Using ElasticSearch as Database/Storage with Django
ES is not a ACID compliant
Indexing is not immediate so any kind of load would be an issue
It's very weakly consistent
Use it together with a proper database and it will help with real time searches, analytics, expensive queries etc. but treat it as derived data.

Related

MERN Stack in AWS

I am new to MERN stack and managed to build an app. I want to deploy it in AWS. But the problem I have to use document DB instead of Mongo DB. Do I need to rewrite my code to do this. Can I use the same mongoose methods? Please help. I am very new to this.
DocumentDB is API compatible with MongoDB for the most part, that's its whole claim to fame, so you most likely won't have to change anything.
There are however some limitations and differences between the systems, which are documented here (Unfortunately the article is too long to briefly summarize it here, so I'm just going to include the list of subtopics - check out the docs for more details).
Admin Databases and Collections
cursormaxTimeMS
explain()
Field Name Restrictions
Index Builds
Lookup with empty key in path
MongoDB APIs, Operations, and Data Types
mongodump and mongorestore Utilities
Result Ordering
Retryable Writes
Sparse Index
Storage Compression
Using $elemMatch Within an $all Expression
$distinct and $elemMatch Indexing
$lookup

Django + PostgreSQL with bi-directional replication

Firstly please let me introduce my use-case: I am working on Django application (GraphQL API using Graphene), which runs in the cloud but also have its local instances in local customer's networks.
For example One application in the cloud and 3 instances (local Django app instance with a PostgreSQL server with enabled BDR) on local networks. If there is a network connection we are using bi-directional replication to have fresh data because if there is no connectivity we use local instances. Here is the simplified infrastructure diagram for an illustration.
So, if I want to use the BDR I can't do DELETE and UPDATE operations in ORM. I have to generate UUIDs for my entities and every change is just a new record with updated data for the same UUID. Latest record for selected UUID is my valid record. Removal is just a another flag. Till now, everything seems to be fine, problem starts when I want to use for example many-to-many relationship. Relationship relies on the database primary keys and I have to handle removal somehow. Can you please find the best way how to solve this issue? I have few ideas but I do not want to made a bad decision:
I can try to override ManyToManyField to work with my UUIDs and special removal flag. It's looks like nice idea because everything should work as before (Graphene will find the relations etc.). But I am afraid of "invisible" consequences.
Create my own models to simulate ManyToMany relationship. It's much more work but it should work just fine.
Did you have to solve similar issue before? Is there some kind of good practice or it's just building a highway to hell (AC/DC is pretty cool)?
Or if you think there is a better way how to build the service architecture, I would love to hear your ideas.
Thanks in advance.

Django supported alternative to noSQL

We need a reasonable insert and query speed over huge tables so I considered using some noSQL adapter with Django. Unfortunately:
Django does not provide official support for noSQL databases.
In our original schema some Big Data are relational to other Big Data making the data duplication unacceptable.
Project deadlines are enemies of hot stuff like this.
So, as far I can see, PostgreSQL should be the way to go for this scenario, right?!
Please let me know any other detail that may be relevant to this question!
Bonus to anyone that can point out some useful database techniques like database sharding...
Well, there is a fork of django project that uses MongoDb as the backend.You can read about it here . The Code on GitHub is here.You give some heads up, MongoDB is a NOSQL db that does support sharding and replication. So i think this might something that you are looking for.

Data Warehouse and Django

This is more of an architectural question than a technological one per se.
I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).
I am using Django and a PostgreSQL database.
Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.
We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.
I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).
Thank you in advance!
Here are some cool Open Source tools I used recently:
Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.
My answer does not necessarily apply to data warehousing. In your case I see the possibility to implement a NoSQL database solution alongside an OLTP relational storage, which in this case is PostgreSQL.
Why consider NoSQL? In addition to the obvious scalability benefits, NoSQL offer a number of advantages that probably will apply to your scenario. For instance, the flexibility of having records with different sets of fields, and key-based access.
Since you're still in "trial" stage you might find it easier to decide for a NoSQL database solution depending on your hosting provider. For instance AWS have SimpleDB, Google App Engine provide their own DataStore, etc. However there are plenty of other NoSQL solutions you can go for that have nice Python bindings.

Django norel access to different nosql at the same time?

i'm new to the nosql world, and from forums and articles that i've read: most of users try to "mix" nosql tools, for example, they use Cassandra and MongoDB together to make a "powerful system", because am beginning with MongoDB, i've downloaded the DjanMon project (am a django fan ^_^ ), of course i've downloaded the special version of django that accepts the NoSql use: Django NonRel, and i've noticed that the Setting file dont "oblige" you to use one specific NoSql solution like in Django with RDBMS where you must specify MySql or PostegreSql or other solution, so, is it possible to mix lot of (or two of course) NoSql solution using Django (for example MongoDB+Cassandra)?
There's nothing to stop you using multiple storage solutions, whether SQL or NoSQL - but the NoSQL solutions all have different architectures, data models and APIs (For example, MongoDB is a document-oriented database, whereas Cassandra is Column-oriented), so you can't usually swap one for another without some effort.
Can you clarify what you are actually trying to achieve? I.e. why are you interested in mixing these two specific solutions?