Data Learn (AWS MSK with Google Dataflow connection issue) - google-cloud-platform

My Google Dataflow Job is keep getting failed while creating "Kafka-Bigquery" dataflow with AWS MSK public access bootstrap server. Is there any way to solve this issue?
I tried accessing public access bootstrap server by disabling unauthenticated access and enabling IAM based access, and also updated security groups to access from anywhere. also created a topic using EC2 instance terminal by accessing through private connection string.
In GCP, I created a dataflow using Kafka-Bigquery template and passed public-access-bootstrap-connection-string, and topic created in EC2 also passed bigquery table id.
Expectation: Job creates successfully and table is populated based on kafka message sent to msk server.
Can I connect AWS MSK (Kafka) with Google Dataflow (Kafka-Bigquery) Job (as consumer)?

Related

Is aws datapipeline service being deprecated?

When I navigate to aws datapipeline console it shows this banner,
Please note that Data Pipeline service is in maintenance mode and we are not planning to expand the service to new regions. We plan to remove console access by 02/28/2023.
Will aws datapipeline service be gone in near future?
Maintenance Mode
Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that AWS Data Pipeline service is in maintenance mode and we are not planning to expand the service to new regions.
Alternatives
For alternatives to AWS Data Pipeline please refer to
AWS Glue
AWS Step Functions
Amazon Managed Workflows for Apache Airflow
For information about migrating from AWS Data Pipeline, please refer to the AWS Data Pipeline migration documentation.
Contact
AWS will provide customers with at least 12 months notice before any service is deprecated.
If you have any questions or concerns, please reach out to AWS Support.
Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API.
Please note that AWS Data Pipeline service is in maintenance mode and they are not planning to expand the service to new regions.

Connecting to Amazon Redshift from Azure Data Factory

We are attempting to connect to an Amazon Redshift Instance from Azure Data Factory as a linked service.
Steps Taken:
Provisioned Self Hosted Integration Runtime (Azure)
Created user access to database within Redshift (AWS)
White list IP addresses of SHIR within security group (AWS)
Built linked service to Redshift using log in, server address and database name (Azure)
From testing we know that this user log in works with this database for other sources and in general the process has worked for other technologies.
A screenshot of the error message received can be seen here
Any suggestions would be greatly appreciated :)
To connect to Amazon Redshift from Azure, look at using the Amazon Redshift AWS SDK for .NET. You can use the .NET Service client to write logic that performs CRUD operations on a Redshift cluster.
You can create a service client in .NET with this code:
var dataClient = new AmazonRedshiftDataAPIServiceClient(RegionEndpoint.USWest2);
Ref docs here:
https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/RedshiftDataAPIService/TRedshiftDataAPIServiceClient.html

GCS bucket to Snowflake Stage data transfer

We are creating a storage integration and the then creating a stage with this storage integration. For this we have STORAGE_PROVIDER = ‘GCS’ from GCP,
so for this we are getting a service account created by snowflake automatically, is there a way we can use our own service account or create a new one and replace it with the snowflake created service account?
we have come to know that Private Connectivity to Snowflake Internal Stages is currently not supported on GCP

Does GCP have an equivalent of AWS's custom Glue connector for Snowflake access?

We've got some data in Snowflake that we'd like to pull into our GCP environment, on a periodic basis. One of our engineers has done the equivalent setup on AWS on a previous project, using the documentation here. I find this setup to be a lot simpler than setting up a push data flow, which requires creating an integration and service account from the Snowflake side, then granting the service account some IAM permissions in GCP.
Can anyone tell me if GCP offers a similar pull-based connector API/setup for Snowflake?

Access AWS S3 data via Web Identity From GCP without using keys

I want to access the data residing in a AWS s3 bucket from GCP Cloud Composer Environment's Service Account.
I followed this link. But this also uses the key creation inside.
Is there a way to connect to AWS S3 from GCP via roles only?