Migrate Apache Cassandra to Amazon DynamoDB - amazon-web-services

I want to migrate the database from Apache Cassandra to Amazon DynamoDB.
I am following this user guide
https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/agents.cassandra.html
When I try to create a clone data centre for extraction it throws

If you read through that document, you'll find that the conversion tool supports very old versions of Cassandra: 3.11.2, 3.1.1, 3.0, 2.1.20.
There will be a lot of configuration items in your cassandra.yaml that will not be compatible with the conversion tool including replica_filtering_protection since that property was not added until C* 3.0.22, 3.11.8 (CASSANDRA-15907).
You'll need to engage AWS Support to figure out what migration options are available to you. Cheers!

Related

When I try fetch data from Amazon Keyspaces with Pyspark, I get Unsupported partitioner: com.amazonaws.cassandra.DefaultPartitioner Error

I'm not experienced in Java or Hadoop ecosystem. I configured my Spark cluster to connect to Amazon Keyspaces by using spark-cassandra-connector from Datastax. I'm using Pyspark to fetch data from Cassandra. I can successfully connect to Keyspaces/Cassandra cluster. But, when I try to fetch data from it.
df = spark.sql("SELECT * FROM cass.tutorialkeyspace.tutorialtable")
print ("Table Row Count: ")
print (df.count())
I get this error:
Unsupported partitioner: com.amazonaws.cassandra.DefaultPartitioner
Yes, keyspace & table exists and has data. How can I fix/workaround this? Thanks!
As an FYI, Keyspaces now supports using the RandomPartitioner, which enables reading and writing data in Apache Spark by using the open-source Spark Cassandra Connector.
Docs: https://docs.aws.amazon.com/keyspaces/latest/devguide/spark-integrating.html
Launch announcement: https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-keyspaces-read-write-data-apache-spark/
Spark Cassandra Connector is relying on specific partitioner implementation to define data splits, etc. There is no workaround for this problem right now, until somebody adds the implementation of corresponding TokenFactory into this code. It shouldn't be very complex, just should be done by someone who is interested in it.
Thank you for the feedback. At this time, You can write to Keyspaces using the Cassandra Spark Connector. Reading requires support for token rage. Please see the following doc page to see list of supported APIs https://docs.aws.amazon.com/keyspaces/latest/devguide/cassandra-apis.html.
Although we don't have timelines to share at the moment, we prioritize our roadmap based on customer feedback. We are releasing new features all the time. To learn more about our roadmap and upcoming features please contact your AWS Account manager.

AWS DocumentDb not support mongodb 4.0

I do not know why AWS DocumentDb does not support MongoDB version that above 3.6? Should I use mongo 3.6 or 4.0 above?
Amazon DocumentDB now supports MongoDB 4.0 compatibility including transactions: https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-documentdb-with-mongodb-compatibility-adds-support-for-mongodb-4-and-transactions/
Document DB is compatible with only MongoDB 3.6.
See: https://aws.amazon.com/documentdb/features/
Whether you want to use 3.6 vs 4.0 or even 4.2 or 4.4 is very subjective to what you want to do with the DB.
The pros of using Document DB is that its a managed service and hence you dont have to worry a lot about setting it up.
The con is you will not get features present in version 3.6 onwards example multi document transactions, new operators in the agg pipeline, bug fixes etc.
To figure out the exact changes check - https://docs.mongodb.com/manual/release-notes/
You can install mongodb on EC2 instances, this will ensure you will get the latest and greatest mongodb. However this comes with the added work of managing the mongodb instance, its backups, High availability considerations etc.
Do note: No matter what you decide I would recommend that you try to use the latest drivers that are present today so that you have the freedom to go to the latest version of self installed mongo or even upgrade document db engine versions as they become available.
AWS DocumentDb uses its own database engine compatible with MongoDB 3.6 API.
MongoDB 4.0 API is not supported as of yet.
We have MongoDB Atlas available in AWS - AWS also supports fully managed database service - MongoDB Atlas - https://aws.amazon.com/de/quickstart/architecture/mongodb/
AWS enables to set up the infrastructure to support MongoDB deployment in a flexible, scalable, and cost-effective manner on the AWS Cloud.
https://aws.amazon.com/de/quickstart/architecture/mongodb/

Installing Sitecore9.2 in AWS

Can anyone provide me answer to below query?
I wanted to install Sitecore9.2 on AWS, does the installation process requires SQL VMs?
or Can someone point me to right article to this
Thanks in advance.
From my own experience Sitecore XM can use AWS RDS for the Database. If that is a good idea you must know yourself. For the installation, the Sitecore 9 installation uses contained database that may broke the SIF installation, you can turn it on in AWS RDS or use normal database user account but you need a workaround. like first installing on SQL server and migrate to RDS, or installing manual without SIF. or adjust SIF
For more information see:
https://jeroen-de-groot.com/2018/07/19/deploying-sitecore-9-in-aws-rds/
https://sitecore.stackexchange.com/questions/11047/sitecore-9-installation-using-sql-active-directory-user/11063
https://sitecore.stackexchange.com/questions/13859/why-do-we-require-contained-database-for-sitecore-9
https://sheenumalhi.wordpress.com/2019/02/19/sitecore-9-with-aws-rds/

Does Google Dataproc support Apache Impala?

I am new to using cloud services and navigating Google's Cloud Platform is quite intimidating. When it comes to Google Dataproc, they do advertise Hadoop, Spark and Hive.
My question is, is Impala available at all?
I would like to do some benchmarking projects using all four of these tools and I require Apache Impala along side Spark/Hive.
No, DataProc is a cluster that supports Hadoop, Spark, Hive and pig; using default images.
Check this link for more information about native image list for DataProc
https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions
You can try also using another new instance of Dataproc, instead of using the default.
For example, you can create a Dataproc instance with HUE (Hadoop User Experience) which is an interface to handle Hadoop cluster built by Cloudera. The advantage here is that HUE has as a default component Apache Impala. It also has Pig, Hive, etc. So it's a pretty good solution for using Impala.
Another solution will be to create your own cluster by the beginning but is not a good idea (at least you want to customize everything). With this way, you can install Impala.
Here is a link, for more information:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/hue
Dataproc provides you SSH access to the master and workers, so it is possible to install additional software and according to Impala documentation you would need to:
Ensure Impala Requirements.
Set up Impala on a cluster by building from source.
Remember that it is recommended to install the impalad daemon with each DataNode.
Cloud Dataproc supports Hadoop, Spark, Hive, Pig by default on the cluster. You can install more optionally supported components such as Zookeeper, Jyputer, Anaconda, Kerberos, Druid and Presto (You can find the complete list here). In addition, you can install a large set of open source components using initialization-actions.
Impala is not supported as optional component and there is no initialization-action script for it yet. You could get it to work on Dataproc with HDFS but making it work with GCS may require non-trivial changes.

What is the best WSO2 upgrade strategy?

What is the best / flexible WSO2 upgrade strategy?
Because now we are upgrading WSO2 DSS 3.0.1 to DSS 3.1.1, therefore there is some difficult changes in dbs file one by one
wso2dss-3.0.1
<data name="BASE_PERSON_DataService" serviceNamespace=
"http://company.mn/base/BASE_PERSON">
wso2dss-3.1.1
<data description="multiple services per each table" enableBatchRequests="false"
enableBoxcarring="false" name="BASE_PERSON_DataService"
serviceNamespace="http://company.mn/base/BASE_PERSON" serviceStatus="active">
What is the easy way, we have many data services (dbs files)?
Regards,
Eba
As far as I know, there is usually no standard migration tool or procedure available. Check that the newer version uses a compliant schema for the wso2 registry database and so on; maybe it's the same or you just need to create new additional tables. Sometimes you find things like migration scripts in the dbscripts folder. You should also check for differences in newer xml configuration files, and adjust your older custom configuration to the new format (usually few or no changes could be required). As far as the artifacts are concerned, I never heard of any way to convert them. If there are many of them, I would probably try some script and regex to batch modify and adjust them to the new format.
These are the steps you should follow if you are upgrading
Step 1 - Deploy artifacts {dbs/datasource/drivers}
Cappy the deployed data services from current installation to new installation by copying repository/deployment/server folder.(all dbs files are backword compatible so what ever worked in WSO2 DSS 3.0.1, should work on DSS 3.1.1) Also note you need to copy data source configuration properties if you have created carbon data sources therefore copy master-datasources.xml from repository/conf/datasources to the new installation.
Also Copy all the content of repository/component/lib to the new installation to ensure the the jdbc drivers are properly installed.
Step 2- Change the configuration files
Apply the same changes you have done to configurations files inside OLD_DSS/repository/conf to NEW_DSS/repository/conf (if you have done any such to any configuration files)
Note - If you have done registry mounting make sure you apply to the new installation as done before by changing relevant configuration files such as
carbon.xml,axis2.xml,user-mgt.xml,mgt-transports.xml