Kafka to cloud dataflow , Timeout expired while fetching topic metadata - google-cloud-platform

I have set-up Kafka on my GCP cloud shell terminals and I am trying to read data from Kafka topic to cloud dataflow using flex template.
I am able to read data using console consumer on the terminal.
user#cloudshell:~/kafka_project/kafka/bin $ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic txtopic --from-beginning
Sample data:
{"prod_id": 100, "prod_name": "Redmi 100 X", "price": 8000,
"event_time": "2021-12-04 16:32:32"}
{"prod_id": 102, "prod_name": "Redmi 102 X", "price": 8500,
"event_time": "2021-12-04 16:33:02"}
But while running dataflow pipeline I am getting the below error:
{"container_id":"2ab55e9b52e11de98a17a4546ca119af7b607490008bdb0be273e0cb4c13f6dc","severity":"INFO","time":"2021/12/04 16:39:35.135010","line":"exec.go:64","message":"Caused by: **org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata**"}
I suspect cloud dataflow is not able to connect to the bootstrap server or topic. But I am unable to figure out the solution. Please help.

Related

Phillips-Labs terraform-aws-github-runner not creating ec2 instance

I am trying to set up self-hosted runners for GitHub using Terraform with Phillips-Labs terraform-aws-github-runner module. I see the GH webhook send/receive messages, SQS queue receiving messages and those messages being retrieve. The scale-up lambda is firing and I see the following logs:
2023-01-31 11:50:15.879 INFO [scale-up:22b11002-76d2-5596-9451-4c51746730c2 index.js:119051 scaleUp] Received workflow_job from {my-org}/terraform-aws-github-self-hosted-runners
{}
2023-01-31 11:50:15.880 INFO [scale-up:22b11002-76d2-5596-9451-4c51746730c2 index.js:119084 scaleUp] Received event
{
"runnerType": "Org",
"runnerOwner": "my-org",
"event": "workflow_job",
"id": "11002102910"
}
2023-01-31 11:50:16.188 DEBUG [gh-auth:22b11002-76d2-5596-9451-4c51746730c2 index.js:118486 createAuth] GHES API URL: {"runnerType":"Org","runnerOwner":"my-org","event":"workflow_job","id":"11002102910"}
2023-01-31 11:50:16.193 WARN [scale-runners:22b11002-76d2-5596-9451-4c51746730c2 index.js:118529 Runtime.handler] Ignoring error: error:1E08010C:DECODER routines::unsupported
{
"runnerType": "Org",
"runnerOwner": "my-org",
"event": "workflow_job",
"id": "11002102910"
}
I do not see any EC2 instances being creating. I suspect the GHES API URL: should have a value after it, but I'm not certain. Also, the final log says it is ignoring an error...
I have confirmed my private key pem file is stored as a multi-line secret in secrets manager.
Any advice would be much appreciated!
It looks like not all the permissions needed by the github app are documented. I needed to add a subscription to the Workflow run event.

How to set NumberOfWorkers, WorkerType as G2X in AWS Glue via airflow?

I am trying to create a glue job with this configuration 'NumberOfWorkers': 10, 'WorkerType': 'G.2X'. Here's my code for job creation.
glue_job_step = AwsGlueJobOperator(
job_name=glue_job_name,
job_desc="AWS Glue Job with Airflow",
script_location="s3://\<bucket_name\>/scripts/test_spark_hello.py",
create_job_kwargs={'GlueVersion': '3.0', 'NumberOfWorkers': 10, 'WorkerType': 'G.2X'},
num_of_dpus=10,
concurrent_run_limit=1,
script_args=None,
retry_limit=0,
region_name=region_name,
s3_bucket="s3_bucket_name",
iam_role_name=glue_iam_role,
run_job_kwargs=None,
wait_for_completion=True,
task_id='glue_job_step',
dag=dag
)
And I am facing the following error:
Failed to run aws glue job, error: An error occurred (InvalidInputException) when calling the CreateJob operation: Please do not set Allocated Capacity if using Worker Type and Number of Workers.
Note:
If i remove 'NumberOfWorkers' and 'WorkerType' the job is working with G1.X workers and 10 worker nodes. But just wondering how to upgrade to G2.X Workers.
I am using AWS MWAA for airflow infrastructure.

Greengrass_HelloWorld lambda doesn't publish to Amazon IoT console

I have been following the documentation in every step, and I didn't face any errors. Configured, deployed and made a subscription to hello/world topic just as the documentation detailed. However, when I arrived at the testing step here: https://docs.aws.amazon.com/greengrass/latest/developerguide/lambda-check.html
No messages were showing up on the IoT console (subscription view hello/world)! I am using Greengrass core daemon which runs on my Ubuntu machine, it is active and listens to port 8000. I don't think there is anything wrong with my local device because the group was deployed successfully and because I see the communications going both ways on Wireshark.
I have these logs on my machine: /home/##/Desktop/greengrass/ggc/var/log/system/runtime.log:
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Version: 1.9.3-RC3
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Root: /home/##/Desktop/greengrass
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Write Directory: /home/##/Desktop/greengrass/ggc
[2019-09-28T06:57:42.492-07:00][INFO]-Group File Directory: /home/##/Desktop/greengrass/ggc/deployment/group
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda UID: 122
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda GID: 127
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.492-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.493-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.494-07:00][INFO]-No proxy URL found.
[2019-09-28T06:57:42.495-07:00][INFO]-Started Deployment Agent to listen for updates. [2019-09-28T06:57:42.495-07:00][INFO]-Connecting with MQTT. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.497-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection successful. {"attemptId": "GVko", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection established. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection connected. Start subscribing. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Deployment agent connected to cloud.
[2019-09-28T06:57:42.685-07:00][INFO]-Start subscribing. {"numOfTopics": 2, "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/update/delta
[2019-09-28T06:57:42.727-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/get/accepted
[2019-09-28T06:57:42.814-07:00][INFO]-All topics subscribed. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:58:57.888-07:00][INFO]-Daemon received signal: terminated. [2019-09-28T06:58:57.888-07:00][INFO]-Shutting down daemon.
[2019-09-28T06:58:57.888-07:00][INFO]-Stopping all workers.
[2019-09-28T06:58:57.888-07:00][INFO]-Lifecycle manager is stopped.
[2019-09-28T06:58:57.888-07:00][INFO]-IPC server stopped.
/home/##/Desktop/greengrass/ggc/var/log/system/localwatch/localwatch.log:
[2019-09-28T06:57:42.491-07:00][DEBUG]-will keep the log files for the following lambdas {"readingPath": "/home/##/Desktop/greengrass/ggc/var/log/user", "lambdas": "map[]"}
[2019-09-28T06:57:42.492-07:00][WARN]-failed to list the user log directory {"path": "/home/##/Desktop/greengrass/ggc/var/log/user"}
Thanks in advance.
I had a similar issue on another platform (Jetson Nano). I could not get a response after going through the AWS instructions for setting up a simple Lambda using IOT Greengrass. In my search for answers I discovered that AWS has a qualification test script for any device you connect.
It goes through an automated process of deploying and testing a lambda function(as well as other functionality) and reports results for each step and docs provide troubleshooting info for failures.
By going through those tests I was able to narrow down the issues with my setup, installation, and configuration. The testing docs give pointers to troubleshoot test results. Here is a link to the test: https://docs.aws.amazon.com/greengrass/latest/developerguide/device-tester-for-greengrass-ug.html
If you follow the 'Next Topic' links, it will take you through the complete test. Let me warn you that its extensive, and will take some time, but for me it gave a lot of detailed insight that a hello world does not.

AWS Sagemaker - Blazingtext BatchTransform no output

I have trained a blazingText model and followed this guide.
https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html
"Sample JSON request" The Invoke end point is working perfectly. So I switched to,
Batch Transform Job with "content-type: application/jsonlines" and created a file in S3 with the following format data:
{"source": "source_0"}
The job ran success. But the output did not sent to S3. Also In the cloud logs,
" [79] [INFO] Booting worker with pid: 79"
This is is the last response. Did anyone know what went wrong?
I have found the issue. The batchtransform select the folder as input and the s3 source should be S3Prefix instead of manifest.

Mesos DCOS offer declined on AWS

I created a DCOS setup on AWS using default config
I add two kafka brokers using CLI
(DCOS) PS C:\DCOS> dcos kafka broker list
brokers:
id: 1
active: false
state: stopped
resources: cpus:2.00, mem:8192, heap:1024, port:auto
failover: delay:1m, max-delay:10m
stickiness: period:10m, expires:2016-03-22 15:58:51-04
When I start broker I see that offer from master was declined
I0322 20:56:38.954476 1316 master.cpp:5350] Sending 2 offers to framework d8c03032-ebab-4c88-80cb-e2de92e3c4c4-0001 (kafka) at scheduler-fff6da19-e31e-4518-864e-2dfcdc31a5d2#10.0.3.104:53766
I0322 20:56:38.966846 1320 master.cpp:3673] Processing DECLINE call for offers: [ d8c03032-ebab-4c88-80cb-e2de92e3c4c4-O7389 ] for framework d8c03032-ebab-4c88-80cb-e2de92e3c4c4-0001 (kafka) at scheduler-fff6da19-e31e-4518-864e-2dfcdc31a5d2#10.0.3.104:53766
I0322 20:56:38.967591 1319 master.cpp:3673] Processing DECLINE call for offers: [ d8c03032-ebab-4c88-80cb-e2de92e3c4c4-O7390 ] for framework d8c03032-ebab-4c88-80cb-e2de92e3c4c4-0001 (kafka) at scheduler-fff6da19-e31e-4518-864e-2dfcdc31a5d2#10.0.3.104:53766
I0322 20:56:40.043771 1318 http.cpp:512] HTTP GET for /master/state-summary from 10.0.6.116:60000 with User-Agent='python-requests/2.6.0 CPython/3.4.2 Linux/4.1.7-coreos-r1'
I'm not able to find any relevant logs on the slaves to see what is going on.
/var/log/mesos has some files with no relevant info. As per the doc I should see syslogs in /var/log/messages but I don't see that file. The default config provisions CoreOS. I tried journalctl command but didn't find anything there too. Not sure how to debug this.