Dataflow job fails without proper error when implemented in VPC - google-cloud-platform

I am trying to run a dataflow job and to do this I am using dataflow template Cloud Spanner to text file on Cloud Storage. My dataflow is on a shared VPC but both &Spanner is not a resource which is on VPC. This job fails but there is no proper error message when it fails. I tried to clone the same job and run this on default VPC then things seems to work and job was successful. Can someone help me understand what is going on and where i should look? Is there an issue for dataflow to communicate with Spanner? If so is there a resource which could help to fix this issue?

Please ensure the following is met -
The Shared VPC network that you select is an auto mode network.
You are a Service Project Admin with project-level permissions to the whole Shared VPC host project. This means that a Shared VPC Admin has granted you the Compute Network User role for the whole host project, so you are able to use all of its networks and subnetworks.
Ref - https://cloud.google.com/dataflow/docs/guides/specifying-networks#network_parameter

Related

AWS EMR jupyter error 403 Forbidden (Workspace is not attached to cluster)

I have a simple notebook in EMR. I have no running clusters. From the notebook open page itself I request a new cluster so my expectation is that all params necessary to ensure a good workbook-cluster connection are in place. I observe that the release is emr-5.36.0 and that applications Hadoop, Spark, Livy, Hive, JupyterEnterpriseGateway are all included. I am using default security groups.
Both the cluster and the workbook hosts start but upon opening jupyter (or jupyterlab), the kernel launch fails with the message Error 403: Workbook is not attached to cluster. All attempts at "jiggling" the kernel -- choosing a different one, doing a start/stop, etc. -- all yield the same error.
There are a number of docs plus answers here on SO but these tend to revolve around trying to use EC2 instances instead of EMR, messing with master vs. core nodes, forgetting the JupyterGateway, and the like. Again, you'd think that a cluster launched directly from notebook would work.
Any clues?
I have done this many times before and it always works, with the create new cluster option, and default security groups are not an issue.
here is an image of one from before:
One thing that could cause this error, and which you have not made clear is that it will not let you open it as root. So do not use the root AWS account to create the cluster / notebook. Create and use an IAM user that has permissions to launch the cluster
I tried with the admin policy attached.

Where to keep the Dataflow and Cloud composer python code?

It probably is a silly question. In my project we'll be using Dataflow and Cloud composer. For that I had asked permission to create a VM instance in the GCP project to keep the both the Dataflow and Cloud composer python program. But the client asked me the reason of creation of a VM instance and told me that you can execute the Dataflow without the VM instance.
Is that possible? If yes how to achieve it? Can anyone please explain it? It'll be really helpful to me.
You can run Dataflow pipelines or manage Composer environments in you own computer once your credentials are authenticated and you have both the Google SDK and Dataflow Python library installed. However, this depends on how you want to manage your resources. I prefer to use a VM instance to have all the resources I use in the cloud where it is easier to set up VPC networks including different services. Also, saving data from a VM instance into GCS buckets is usually faster than from an on-premise computer/server.

deploy different resources using deployment manager?

I'm planning to use the deployment manager to deploy a new project for each of our client.
I'm just wondering can I do the following using the deployment manager or put into script/YAML, so it deploys all components all at once through the command shell?
create a new GCP project
create a VPC for the client with custom subnet assigned
create a VM and set the network to the custom VPC/subnet
create an app engine with different services using the yaml file
create storage buckets
create cloud Postgres SQL instance
What I tried so far, I can deploy the VM only through the deployment manager, I can do them individually using the command line, but not using the deployment manager in one single step.
Thanks for your help.
Deployment Manager should work perfectly for this type of setup. There are a few minor caveats though.
You need to have a project in place where you can run deployment manager from
You will need to provide the deployment manager service account all the required permissions before creating the deployment (such as project creator at the org level). The service account is [PROJECT_ID]#cloudservices.gserviceaccount.com
Next, you will want to call each of the resources individually in your deployment manager manifest, luckily all these resource APIs are supported by DM:
Projects to create the project.
** All following resources should make a reference to this resource to create a dependancy so that DM does not try to create them before the project exists... which would result in a failure
VPC and VMs: use something like this
** This includes adding GKE clusters at the end and a VPC peering you won't need, but it demonstrates the creation of a VPC, subnets, firewall rules and a VM
App Engine
GCS Bucket
SQL instance
As long as your overall config is less than 1 MB, you can place all these resources into a single config.
If you are new to DM, I recommend trying each of these resources individually to make sure that you have the syntax correct. Trying to debug syntax errors with multiple resources is much more difficult.
I also recommend using the --preview flag before creating or updating resources so that you can make sure that your configurations or changes will come into effect the way you planned.
Finally, you can either write all this directly into a YAML config or you can create templates using either jinja or python2 which can be imported into your config.yaml
Please take a look at the Deployment Manager Cloud Foundation Toolkit which is a sets of well designed templates.

Network default is not accessible to Dataflow Service account

Having issues starting a Dataflow job(2018-07-16_04_25_02-6605099454046602382) in a project without a local VPC Network when I get this error
Workflow failed. Causes: Network default is not accessible to Dataflow
Service account
There is a shared VPC connected to the project with a networked called default with a subnet default in us-central1 – however the service account used to run dataflow job don't seam to have access to it. I have given the dataflow-service-producer service account Compute Network User, without any noticeable effect. Any ideas on how I can processed?
The usage of subnetworks in Cloud Dataflow requires to specify the subnetwork parameter when running the pipeline; However, in the case of subnetwork that are located in a Shared VPC network, it is required to use the complete URL based on the following format, as you well mentioned.
https://www.googleapis.com/compute/v1/projects/<HOST_PROJECT>/regions/<REGION>/subnetworks/<SUBNETWORK>
Additionally, in this cases is recommended to verify that you are adding the project's Dataflow service account into the Shared VPC's project IAM table and give it the "Compute Network User" role permission in order to ensure that the service has the required access scope.
Finally, it is seems that the Subnetwork parameter official Google's documentation is alraedy available with detailed information about this matter.
Using the --subnetwork option with the following (undocumented) fully qualified subnetwork format made the Dataflow job run. Where {PROJECT} is the name of the project hosting the shared VPC and {REGION} matches the region you run your dataflow job in.
--subnetwork=https://www.googleapis.com/compute/alpha/projects/{PROJECT}/regions/{REGION}/subnetworks/{SUBNETWORK}

AWS Glue ETL job from AWS Redshift to S3 fails

I am trying out AWS Glue service to ETL some data from redshift to S3. Crawler runs successfully and creates the meta table in data catalog, however when I run the ETL job ( generated by AWS ) it fails after around 20 minutes saying "Resource unavailable".
I cannot see AWS glue logs or error logs created in Cloudwatch. When I try to view them it says "Log stream not found. The log stream jr_xxxxxxxxxx could not be found. Check if it was correctly created and retry."
I would appreciate it if you could provide any guidance to resolve this issue.
So basically, the job you add to Glue will either run if there's not too much traffic in the region your Glue is. If there are no resources available, you need to either manually re-add the job again or you can also bind yourself to events from CloudWatch via SNS.
Also, there are parameters you can pass to the job like maximunRetry and timeout.
If you have a Ressource not available, it won't trigger a retry because the job did not fail, it just didn't even started. But if you set the timeout to let's say 60 minutes, it will trigger an error after that time, decrement your retry pool and re-launch the job.
The closest thing I see to Glue documentation on this is here:
If you encounter errors in AWS Glue, use the following solutions to
help you find the source of the problems and fix them. Note The AWS
Glue GitHub repository contains additional troubleshooting guidance in
AWS Glue Frequently Asked Questions. Error: Resource Unavailable If
AWS Glue returns a resource unavailable message, you can view error
messages or logs to help you learn more about the issue. The following
tasks describe general methods for troubleshooting. • A custom DNS
configuration without reverse lookup can cause AWS Glue to fail. Check
your DNS configuration. If you are using Amazon Route 53 or Microsoft
Active Directory, make sure that there are forward and reverse
lookups. For more information, see Setting Up DNS in Your VPC (p. 23).
• For any connections and development endpoints that you use, check
that your cluster has not run out of elastic network interfaces.
I have recently struggled with Resource Unavailable thrown by Glue Job
Also i was not able to make a direct connection in Glue using RDS -it said "no suitable security group found"
I faced this issue while trying to connect with AWS RDS and Redshift
The problem was with the Security Group that the Redshift was using. There is a need to place a self referencing inbound rule in the Security Group.
For those who dont know what is self referencing inbound rule, follow the steps
1) Go to the Security Group you are using (VPC -> Security Group)
2) In the Inbound Rules select Edit Inbound Rules
3) Add a Rule
a) Type - All Traffic b) Protocol - All c) Port Range - ALL d) Source - custom and in space available write the initial of your security group and select it. e) Save it.
Its done !
if you were missing this condition in your Security Group Inbound Rules
Try creating the connection you will be able to create the connection.
Also job should work this time.