FlywayDB and data replication - database-replication

FlywayDB documentation does not mention anything related to the use of this software together with data replication.
Can FlywayDB be used with Oracle RAC and DB2 Q Replication ? Are there any FlywatDB specific guidelines related to data replication available ?

Data replication is related to each database engine itself.
FlywayDB focus is quite different. It's a solution that focus on database change management (DATA oriented, of course).
For instance, creating an schema using FlywayDB in an Oracle standalone database will be the same process as to create it in an ORACLE RAC database.

Related

How can I retrieve the list of all databases in an AWS RDS DB instance via API?

Is there any way to retrieve the list of all databases (or at least database names) in an AWS RDS DB instance via API or SDK (lang doesn't matter)?
The 'describe-db-instances' action doesn't serve my needs, as it contains only the "name of the initial database of this instance that was provided at create time"...
couldn't find anything in the docs / web 🤷‍♂️
I would like to avoid triggering an SQL query and maximize API usage.
I saw this SO-question already, however it is specific for Boto3 usage, whereas I'm not limited to any specific AWS SDK.
Thank you all!
While you're not limited to boto3, you're limited to what the AWS API offers (and to my knowledge boto3 covers pretty much all of it).
The DescribeDBInstances API-Call returns a list of DBInstance objects.
Each object has the Attribute DBName, which is pretty much the only information exposed about the layout of the database Schema through the AWS API.
DBName
The meaning of this parameter differs according to the database
engine you use.
MySQL, MariaDB, SQL Server, PostgreSQL
Contains the name of the initial database of this instance that was
provided at create time, if one was specified when the DB instance was
created. This same name is returned for the life of the DB instance.
Type: String
Oracle
Contains the Oracle System ID (SID) of the created DB instance. Not
shown when the returned parameters do not apply to an Oracle DB
instance.
Type: String
That's essentially all you're going to get using the AWS API. If you want more, you need to connect to the instance and use the RDBMS to query that information. RDS as a service doesn't actually manage the schemas on your instance (beyond creating an initial one if you want it to) and doesn't expose them.
While Mark B and Maurice are 100% correct, there's a workaround that might come in handy for others, as the user Optimus replied my question in AWS forum:
It's not possible because each database engine has a different
implementation of the database concept, so it's a subject that varies
from engine to engine.
What you can do is to use the ExecuteStatement RDS API command
https://docs.aws.amazon.com/rdsdataservice/latest/APIReference/API_ExecuteStatement.html
to run a select that queries the DB data dictionary for that specific
engine for that RDS instance, and then you can get the database names.
It does not avoid to run the SQL, but it is a standardized way to get
this info no matters the instance DB engine, as the result output has
always the same structure.
For me this solution is not good enough as I don't have the DB instance's Secret-ARN which is required for ExecuteStatement action.
However, I thought others may use this workaround.
Thank you all!

Using Amazon Redshift for analytics for a Django app with Postgresql as the database

I have a working Django web application that currently uses Postgresql as the database. Moving forward I would like to perform some analytics on the data and also generate reports etc. I would like to make use of Amazon Redshift as the data warehouse for the above goals.
In order to not affect the performance of the existing django web application, I was thinking of writing a NEW Django application that essentially would leverage a READ-ONLY replica of the Postgresql database and continuously write data from read-only replicas to the Amazon Redshift. My thinking is that perhaps the NEW Django application can be used to handle some/all of the Extract, Transform and Load functions
My questions are as follows:
1. Does the Django ORM work well with Amazon Redshift? If yes, how does one handle the model schema translations? Any pointers in this regard would be greatly appreciated.
2. Is there any better alternative to achieve the goals listed above?
Thanks in advance.

Amazon Redshift Framework (Oracle Data Warehouse Migration)

We are currently planning to migrate a 50 TB Oracle data warehouse to Amazon Redshift.
Data from different OLTP data sources were staged first in an Oracle staging database and then loaded into the Data Warehouse currently. Currently data has been transformed using tons of PL/SQL stored procedures within staging database as well as loading into the Data Warehouse.
OLTP Data Source 1 --> JMS (MQ) Real-time --> Oracle STG Database --> Oracle DW
Note: JMS MQ consumer writes data into staging database
OLTP Data Source 2 --> CDC Incremental Data (once in 10 mins) --> Oracle STG Database --> Oracle DW
Note: Change Data Capture on the source side data gets loaded into staging database once in 10 mins.
What would be the better framework to migrate this stack entirely (highlighted) to Amazon Redshift? What are the different components within AWS we can migrate to?
Wow, sounds like a big piece of work. There are quite a few things going on here that all need to be considered.
Your best starting point is probably AWS Database Migration Service (https://aws.amazon.com/dms/). This can do a lot of work for you in regards to converting your schemas and highlighting areas that you will have to migrate manually.
You should consider S3 to be your primary staging area. You need to land all (or almost all) the data in S3 before loading to Redshift. Give very careful consideration to how the data is laid out. In particular, I recommend that you use partitioning prefixes (s3://my_bucket/YYYYMMDDHHMI/files or s3://my_bucket/year=YYYY/month=MM/day=DD/hour=HH/minute=MI/files).
Your PL/SQL logic will not be portable to Redshift. You'll need to convert the non-SQL parts to either bash or Python and use an external tool to run the SQL parts in Redshift. I'd suggest that you start with Apache Airflow (Python) or Azkaban (bash). If you want to stay pure AWS then you can try Data Pipeline (not recommended) or wait for AWS Glue to be released (looks promising - untested).
You may be able to use Amazon Kinesis Firehose for the work that's currently done by JMS but the ideal use of Kinesis is quite different from the typical use of JMS (AFAICT).
Good luck

WSO2 Stratos - Multi-tenant application development

I am exploring the product WSO2 stratos ,watched some of the webinar recordings. I would like to create an application and expose it as SAAS.One of the webex recordings cover this in detail , but it is not explaining the multi-tenancy on data storage. Is there any tutorial available for the same ? I would like to use shared schema for data storage. What kind of database can i use for this ( For eg: MySql,MongoDB,Cassandra etc ) Is it possible to use some frame works like Athena ? I am just trying to do a kind of POC and then i need to decide whether this platform really fits for the application that i am thinking to build
You can create databases through WSO2 Storage Server in StratosLive which can be accessed via storage.stratoslive.wso2.com. You need to create a database and attach a user to it. Then you can access that database from your webapp (you will get a jdbc url) as you do it in normal cases. Also, you can create Cassandra keyspaces in the Storage Server. But we dont have the MongoDB support at the moment. There is no documentation on this yet.
Yes, you're right. Multi-tenant data architecture is up to the user to decide. This white paper from Microsoft explains multi-tenant data architecture nicely. The whitepaper however is written assuming you're using an RDBMS. I haven't played around with Athena so it's difficult to say how it'll map with what Stratos provides. The data architecture might be different when you're using a NoSQL DB and different DBs have different ways of filtering a set of data by a given tenant (or an ID). So probably going by the whitepaper it'll map to,
Different DBs -> Different keyspaces
Different tabeles -> Different column families
Shared schema -> Shared column family
Better to define your application characteristics before hand and then choose an appropriate DB

Data Warehouse and Django

This is more of an architectural question than a technological one per se.
I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).
I am using Django and a PostgreSQL database.
Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.
We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.
I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).
Thank you in advance!
Here are some cool Open Source tools I used recently:
Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.
My answer does not necessarily apply to data warehousing. In your case I see the possibility to implement a NoSQL database solution alongside an OLTP relational storage, which in this case is PostgreSQL.
Why consider NoSQL? In addition to the obvious scalability benefits, NoSQL offer a number of advantages that probably will apply to your scenario. For instance, the flexibility of having records with different sets of fields, and key-based access.
Since you're still in "trial" stage you might find it easier to decide for a NoSQL database solution depending on your hosting provider. For instance AWS have SimpleDB, Google App Engine provide their own DataStore, etc. However there are plenty of other NoSQL solutions you can go for that have nice Python bindings.