"source and sink in the copy activity must be connected via the same self-hosted integration runtime." - azure-virtual-machine

I have a requirement where I need to run azure data factory pipeline from copying data from On-Premise database to Azure SQL Server Virtual Machine database(using VM data server due to limitations with azure SQL database and Azure SQL managed instance).
Hence I have created two self-hosted integration Runtime, 1 for on-prem VM database server, and another for azure VM data server.
But when I validate or run the pipeline, getting below error message
"source and sink in the copy activity must be connected via the same self-hosted integration runtime."
Can some please suggest the possible solution if any...

Basically it means that you cannot have 2 self-hosted integration runtimes in a single copy activity. What you are trying to do is copy from an onpremise source to another onpremise sink, but the IRs are different.
You will need to use 2 copy activities and an intermediate storage. For example:
1st copy: take the data from the source sql to a blob storage as a csv.
2nd copy: grab that csv and insert its data into sink database.
This way, you will be using only 1 IR for each copy activity.
Hope this helped!

Related

How to read data from Google-storage to Cloud-run dynamically?

I have a dash application running on Google-cloud-run This application needs some data in order to work, it reads it from Google-cloud-storage
This data in google-cloud-storage is updated once a week, I am looking for a way to enable reading the new data without the need to re-deploy a new version of the application every week. Otherwise, the application will read the data stored in the memory (old data)
I tried to call a function that downloads the new data (on google-cloud-run's server) but I couldn't load the data to the app because it's already running and reading the loaded data in memory
First of all, stop to waste you time to update Cloud Run with Cloud Functions. The cloud run containers are immutable (as any container) and the only way to change the data is to build a new container. (solution that you don't want)
There, you still have 2 solutions to achieve that:
You can read data from Cloud Storage when you start your container.
Create a bash script that load the data from Cloud Storage with gsutil, and then start your binary. Put that bash script in the entrypoint of your container
Use the Cloud Storage client libraries in your Cloud Run service to load the data
Use the 2nd gen runtime execution of Cloud Run and mount the bucket as a volume on Cloud Run.

What is the best way to replicatedata from Oracle Goldengate Onpremise to AWS (SQL or NOSQL)?

What is the best way to replicate data from Oracle Goldengate On premise to AWS (SQL or NOSQL)?
I was just checking this for azure,
My company is looking for solutions of moving data to the cloud
Minimal impact for on-prem legacy/3rd party systems.
No oracle db instances on the cloud side.
Minimum "hops" for the data between the source and destination.
Paas over IaaS solutions.
Out of the box features over native code and in-house development.
oracle server 12c or above
some custom filtering solution
some custom transformations
** filtering can be done in goldengate, in nifi, azure mapping, ksqldb
solutions are divided into:
If solution is alolwed to touch.read the logfile of the oracle server
you can use azure ADF, azure synapse, K2view, apache nifi, Orcle CDC adapter for BigData (check versions) to directly move data to the cloud buffered by kafka however the info inside the kafka will be in special-schema json format.
If you must use GG Trail file as input to your sync/etl paradigm you can
use a custom data provider that would translate the trailfile into a flowfile for nifi (you need to write it, see this 2 star project on github for a direction
use github project with gg for bigdata and kafka over kafkaconect to also get translated SQL dml and ddl statements which would make the solution much more readable
other solutions are corner cases, but i hope this gives you what you needed
In my company's case we have Oracle as a source db and Snowflake as a target db. We've built the following processing sequence:
On-premise OGG Extract works with on-premise Oracle DB.
Datapump sends trails to another host
On this host we have OGG for Big data Replicat that processes trails and then sends result as json to AWS S3 bucket.
Since Snowflake DB can handle JSON as a source of data and works with S3 bucket it loads jsons into staging tables where further processing takes place.
You can read more about this approach here: https://www.snowflake.com/blog/continuous-data-replication-into-snowflake-with-oracle-goldengate/

Datastore Emulator Query/Issue

I have installed google datastore emulator in my local machine along with it written a sample spring boot application which performs crud operations on datastore.
When i hit the rest endpoints through postman i can actually see the data gets inserted in datastore in gcp console
can someone help me by clearing below queries:
1>Even though using an emulator in local , whether data gets inserted to actual datastore in cloud (gcp)
2>what is the purpose of emulator (if qn 1 is correct)
No data is inserted on Datastore servers, everything is local as mentioned here
The emulator simulates Datastore by creating /WEB-INF/appengine-generated/local_db.bin in a specified data directory and storing data in local_db.bin. By default, the emulator uses the data directory ~/.config/gcloud/emulators/datastore/. The local_db.bin file persists between sessions of the emulator. You can set up multiple data directories and think of each as a separate, local Datastore mode instance. To clear the contents of a local_db.bin file, stop the emulator and manually delete the file.
There are multiple uses for example:
To develop and test your application locally without writing actual Data to the servers hence avoiding charges during the development process
Help you generate indexes for your production Firestore in Datastore mode instance and delete unneeded indexes, that could be exported then into production
Edit
In order to use the emulator on the same machine it's recommended to set the environment variables automatically as mentioned in the documentation

Can I tell Google Cloud SQL to restore my backup to a completely different database?

Since there is a nightly backup of SQL we are wondering of a good way to restore this backup to a different database in the same MySQL server instance. We have prod_xxxx for all our production databases AND we have staging_xxxx for all our staging databases (yes not that good in that they are all on the same mysql instance right now).
Anyways, we would love to restore all tables/constraints/etc and data from prod_incomingdb to staging_incomingdb. Is this possible in cloud SQL?
Since this is over a productive instance I recommend you to perform a backup before start, in order to avoid any data corruption.
To clone a database within the same instance, there is not a direct way to perform the task (this is a missing feature on MySQL).
I followed this path in order to successfully clone a database within same MySQL Cloud SQL instance.
1.- Create a dump of the desired database using the Google Cloud Console (Web UI) by follow these steps
*it is very important to only dump the desired database in format SQL, please not select multiple databases on the dump.
After finish the process, the dump will be available in a Google Cloud Storage Bucket.
2.- Download the dump file to a Compute Engine VM or to any local machine with linux.
3.- please replace the database name (the old one) in the USE clauses.
I used this sed command over my downloaded dump to change the names of the databases
sed -i 's/USE `employees`;/USE `emp2`;/g' employees.sql
*this can take some seconds depending the size of your file.
4.- Upload the updated file to the Cloud storage bucket.
5.- Create a new empty database on your Cloud SQL instance, in this case my target instance is called emp2.
6.- Import the modified dump by following these steps
I could not figure out the nightly backups as it seems to restore an entire instance. I think the answer to the above is no. I did find out that I can export and then import (not exactly what I wanted though as I didn't want to be exporting our DB during the day but for now, we may go with that and automate a nightly export later).

Backup strategy for django

I recently deployed a couple of web applications built using django (on webfaction).
These would be some of the first projects of this scale that i am working on, so I wanted to know what an effective backup strategy was for maintaining backups both on webfaction and an alternate location.
EDIT:
What i want to backup?
Database and user uploaded media. (my code is managed via git)
I'm not sure there is a one size fits all answer especially since you haven't said what you intend to backup. My usual MO:
Source code: use source control such as svn or git. This means that you will usually have: dev, deploy and repository backups for code (specially in a drsc).
Database: this also depends on usage, but usually:
Have a dump_database.py management command that will introspect settings and for each db will output the correct db dump command (taking into consideration the db type and also the database name).
Have a cron job on another server that connects through ssh to the application server, executes the dump db management command, tars the sql file with the db name + timestamp as the file name and uploads it to another server (amazon's s3 in my case).
Media file: e.g. user uploads. Keep a cron job on another server that can ssh into the application server and calls rsync to another server.
The thing to keep in mind though, it what is the intended purpose of the backup.
If it's accidental (be it disk failure, bug or sql injection) data loss or simply restoring, you can keep those cron jobs on the same server.
If you also want to be safe in case the server is compromised, you cannot keep the remote backup credentials (sshkeys, amazon secret etc) on the application server! Or else an attacker will gain access to the backup server.