How can I perform data lineage in GCP? [closed] - google-cloud-platform

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
When we realize the data lake with GCP Cloud storage, and data processing with Cloud services such as Dataproc, Dataflow, how can we generated data lineage report in GCP?

Google Cloud Platform doesn't have serverless data lineage offering.
Instead, you may want to install Apache Atlas on Google Cloud Dataproc and use it for data lineage.

Google Cloud Data Fusion supports lineage in the Enterprise edition. You can use DF to build and orchestrate pipelines and use Dataproc and Dataflow as the capacity for running them. Introduction to CDF lineage can be found in the documentation here: https://cloud.google.com/data-fusion/docs/tutorials/lineage
If you otherwise do not use CDF capabilities, it is a bit overkill for just lineage. Lineage capability in Google Cloud Data Catalog would be optimal at least in many of my use-cases. Unfortunately currently CDC does not support lineage. I hope it is on the product roadmap and it would support lineage in the future.

Related

Hashicorp Vault on aws with cross region active - active setup | CFT, Terraform [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I need to setup Hashicorp vault on aws with cross region setup in HA. And I gotta do it with complete automation , what would be the best IAC tool , Cloudformation- I found very less documentation on this for vault setup. Or Terraform - ?
has some one achieved it by complete automation method on aws. ?
Terraform is not a Configuration Management tool, It's an IaC tool. You can use Terraform to create underline infrastructure for your Vault setup and it should not use to provision applications in the infrastructure. Of course, you can install applications in your EC2s using exec remote provisioner, but you should use provisioners as the last resort.
So I think of using Terraform for creating the infrastructure of the Vault setup. But you need to use some other tools like Ansible or Puppet to provision software in your infrastructure. Using IaC tools for configuration management will create major technical confusion in the long run.

Refactoring legacy infrastructure on AWS [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Can somebody please recommend some sources on how best to approach refactoring a legacy AWS infrastructure? That is, how to reduce downtime, optimally migrate data stores (such as DynamoDB or S3), etc. Thanks in advance!
There are a number of approaches you can take to do this.
AWS have a lot of great resources on "migration", as an initial thought take a look at the 6 Strategies for Migrating Applications to the Cloud. Whilst you're already in the AWS Cloud it is a great time to evaluate whether you have anything you can replace or is no longer needed.
There are a number of services that assist with migration, for migrating data stores take a look at the below 2 services which might help to migrate most of your data needs:
Database Migration Service
Data Pipeline
Other services such as S3 you would need to migrate to another S3 bucket, as buckets are uniquely named. If you want to keep the name you will need to delete the origin bucket first. If it is being served publicly try using a CloudFront distribution and then switching the origin to the new S3 bucket afterwards.
For architecting your new infrastructure take a look at the AWS Well-Architected Framework.
There are a number of migration whitepapers that AWS has also produced, some are specific to particular technologies and some are more general.

What is Equivalent of Amazon Glacier in Microsoft Azure environment? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am doing some research where I need to find the corresponding tool for archiving. So, what will be the equivalent of amazon glacier in the Azure environment
The Azure equivalent service to AWS S3 is Azure Storage. S3 defines several storage classes, Glacier is one of them.
Azure Storage has the concepts of different access tiers. I think the Archive tier is the closest match to S3 Glacier. Another option would be the cool tier. Which one to choose depends on the frequency the data is accessed.
Equivalent service is the Azure Storage. You can always explore from the site.
AWS to Azure services comparison

How to connect Hive/Presto via JDBC to Amazon QuickSight (SPICE) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I would like to connect a Hive/Preso query engine within AWS to Amazon QuickSight (SPICE) to query the data. Usually I would use JDBC (https://prestodb.io/docs/current/installation/jdbc.html) for this.
Is this already possible? Or is there another way to do set up this connection?
not possible currently with QuickSight. See: https://quicksight.aws/
From the link:
"Choose your data source
You can upload CSV or Excel files; ingest data from AWS data sources such as Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon Athena, and Amazon S3; connect to databases like SQL Server, MySQL, and PostgreSQL, in the cloud or on-premises; or connect to SaaS applications like Salesforce. Future releases will let you ingest data from Amazon EMR, Amazon DynamoDB, and Amazon Kinesis as well as other cloud applications."

Comparison between Amazon web services (AWS) or Rackspace cloud servers? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
There are two major offering of cloud computing environment by Amazon through AWS and by Rackspace through Rackspace cloud. I wanted to know more about What are cons/pros of one platform over other. That will help me in deciding platform for my future applications.
Please see some of these links to better analyze & understand the difference between Amazon Cloud Server with Rackspace Cloud.
Things come into my mind:
Amazon server stack has CHOICES possibly everything, but Rackspace server stack is fixed.
Control everything on your server stack with Amazon but Rackspace - NOPE.
You can play around with various services (EBS, EIP, S3, etc) in Amazon server to suite your price, you can't with Rackspace, since you are priced for the whole stack.
In Amazon - single EBS AMI, you can have many different instance types of machine.
Difference:
http://www.distractable.net/tech/amazon-aws-ec2-vs-rackspace-high-level-comparison/
Goodby Rackspace:
http://code.mixpanel.com/amazon-vs-rackspace/
Performance Analysis:
http://www.thebitsource.com/featured-posts/rackspace-cloud-servers-versus-amazon-ec2-performance-analysis/