Camunda 8 python client Issue - camunda

I tried the camunda community python client, from the repo (https://github.com/camunda-community-hub/camunda-8-code-studio/tree/main/src/PythonCloudWorker). I have set up caumnda 8 saas account to run my tasks from the repo.
I 'm getting error when i try to run the python file, posting the error. Any suggestions appriciated.
communda_connect.py:59: DeprecationWarning: There is no current event loop
loop = asyncio.get_event_loop()
E0118 00:29:19.302897000 6259650560 hpack_parser.cc:1218] Error parsing metadata: error=invalid value key=content-type value=text/plain; charset=utf-8
E0118 00:29:19.307140000 6259650560 hpack_parser.cc:1218] Error parsing metadata: error=invalid value key=content-type value=text/plain; charset=utf-8
E0118 00:29:19.310754000 6259650560 hpack_parser.cc:1218] Error parsing metadata: error=invalid value key=content-type value=text/plain; charset=utf-8
Traceback (most recent call last):
env/lib/python3.10/site-packages/grpc/aio/_call.py", line 236, in _raise_for_status
raise _create_rpc_error(await self.initial_metadata(), await
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = "Received http2 header with status: 404"
debug_error_string = "UNKNOWN:Error received from peer ipv4:32.12.17.224:443 {created_time:"2023-01-18T00:29:19.304994+05:30", grpc_status:12, grpc_message:"Received http2 header with status: 404"}"
>
During handling of the above exception, another exception occurred:
env/lib/python3.10/site-packages/pyzeebe/grpc_internals/zeebe_adapter_base.py", line 33, in _handle_grpc_error
raise pyzeebe_error
pyzeebe.errors.zeebe_errors.UnkownGrpcStatusCodeError

problem was i had not passed the region parameter which was defaulting to bru-2.
camunda_region = os.environ.get('CAMUNDA_CLUSTER_REGION')
channel = create_camunda_cloud_channel(client_id=zeebe_client_id, client_secret=zeebe_client_secret, cluster_id=camundacloud_cluster_id,region=camunda_region)

Related

Creating Connection for RedshiftDataOperator

So i when to the airflow documentation for aws redshift there is 2 operator that can execute the sql query they are RedshiftSQLOperator and RedshiftDataOperator. I already implemented my job using RedshiftSQLOperator but i want to do it using RedshiftDataOperator instead, because i dont want to using postgres connection in RedshiftSQLOperator but AWS API.
RedshiftDataOperator Documentation
I had read this documentation there is aws_conn_id in the parameter. But when im trying to use the same connection id there is error.
[2023-01-11, 04:55:56 UTC] {base.py:68} INFO - Using connection ID 'redshift_default' for task execution.
[2023-01-11, 04:55:56 UTC] {base_aws.py:206} INFO - Credentials retrieved from login
[2023-01-11, 04:55:56 UTC] {taskinstance.py:1889} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/redshift_data.py", line 146, in execute
self.statement_id = self.execute_query()
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/redshift_data.py", line 124, in execute_query
resp = self.hook.conn.execute_statement(**filter_values)
File "/home/airflow/.local/lib/python3.7/site-packages/botocore/client.py", line 415, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/botocore/client.py", line 745, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the ExecuteStatement operation: The security token included in the request is invalid.
From task id
redshift_data_task = RedshiftDataOperator(
task_id='redshift_data_task',
database='rds',
region='ap-southeast-1',
aws_conn_id='redshift_default',
sql="""
call some_procedure();
"""
)
What should i fill in the airflow connection ? Because in the documentation there is no example of value that i should fill to airflow. Thanks
Airflow RedshiftDataOperator Connection Required Value
Have you tried using the Amazon Redshift connection? There is both an option for authenticating using your Redshift credentials:
Connection ID: redshift_default
Connection Type: Amazon Redshift
Host: <your-redshift-endpoint> (for example, redshift-cluster-1.123456789.us-west-1.redshift.amazonaws.com)
Schema: <your-redshift-database> (for example, dev, test, prod, etc.)
Login: <your-redshift-username> (for example, awsuser)
Password: <your-redshift-password>
Port: <your-redshift-port> (for example, 5439)
(source)
and an option for using an IAM role (there is an example in the first link).
Disclaimer: I work at Astronomer :)
EDIT: Tested the following with Airflow 2.5.0 and Amazon provider 6.2.0:
Added the IP of my Airflow instance to the VPC security group with "All traffic" access.
Airflow Connection with the connection id aws_default, Connection type "Amazon Web Services", extra: { "aws_access_key_id": "<your-access-key-id>", "aws_secret_access_key": "<your-secret-access-key>", "region_name": "<your-region-name>" }. All other fields blank. I used a root key for my toy-aws. If you use other credentials you need to make sure that IAM role has access and the right permissions to the Redshift cluster (there is a list in the link above).
Operator code:
red = RedshiftDataOperator(
task_id="red",
database="dev",
sql="SELECT * FROM dev.public.users LIMIT 5;",
cluster_identifier="redshift-cluster-1",
db_user="awsuser",
aws_conn_id="aws_default"
)

Celery unexpectedly closes TCP connection

I'm using RabbitMQ 3.8.2 with Erlang 22.2.7 and having a problem while consuming tasks. My configuration is django-celery-rabbitmq. While publishing messages in a queue everything goes ok until the length of the queue reaches 1200 messages. After this point RabbitMQ starts to close AMQP connection with following errors:
...
2022-11-01 09:35:25.327 [info] <0.20608.9> accepting AMQP connection <0.20608.9> (185.121.83.107:60447 -> 185.121.83.116:5672)
2022-11-01 09:35:25.483 [info] <0.20608.9> connection <0.20608.9> (185.121.83.107:60447 -> 185.121.83.116:5672): user 'rabbit_admin' authenticated and granted access to vhost '/'
...
2022-11-01 09:36:59.129 [warning] <0.19994.9> closing AMQP connection <0.19994.9> (185.121.83.108:36149 -> 185.121.83.116:5672, vhost: '/', user: 'rabbit_admin'):
client unexpectedly closed TCP connection
...
[error] <0.11162.9> closing AMQP connection <0.11162.9> (185.121.83.108:57631 -> 185.121.83.116:5672):
{writer,send_failed,{error,enotconn}}
...
2022-11-01 09:35:48.256 [error] <0.20201.9> closing AMQP connection <0.20201.9> (185.121.83.108:50058 -> 185.121.83.116:5672):
{inet_error,enotconn}
...
Then the django-celery consumer disappears from queue list, messages become "ready" and celery pods are unable to ack the message after the job is finished with the following error:
ERROR: [2022-11-01 09:20:23] /usr/src/app/project/celery.py:114 handle_message Error while handling Rabbit task: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/amqp/connection.py", line 514, in channel
return self.channels[channel_id]
KeyError: None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/project/celery.py", line 76, in handle_message
message.ack()
File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 125, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/usr/local/lib/python3.10/site-packages/amqp/channel.py", line 1407, in basic_ack
return self.send_method(
File "/usr/local/lib/python3.10/site-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/usr/local/lib/python3.10/site-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
File "/usr/local/lib/python3.10/site-packages/amqp/transport.py", line 347, in write
self._write(s)
ConnectionResetError: [Errno 104] Connection reset by peer
I have noticed that the message size also affects this behavior. In the above case there are like 1000-1500 symbols in each message. If I decrease it to 50 symbols, then the threshold at which RabbitMQ starts to close AMQP connection shifts to 4000-5000 messages.
I suspect that the problem is with lack of resources for RabbitMQ, but I don't know how find what exactly is going wrong. If I run htop on the server, I see that 2 available CPU are not at high load at any time (loaded less than 20% each) and RAM is 400mb / 3840mb used. So nothing seems to be wrong. Is there any resource checking command for RabbitMQ? The tasks do not take long time to complete, about 10 seconds each.
Also maybe there are some missing heartbeats from the client (I had the problem earlier, but not now, there are currently no error messages about that).
Also if I run sudo journalctl --system | grep rabbitmq, I get the following output:
......
Мау 24 05:15:49 oms-git.omsystem sshd[809111]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=43.154.63.169 user=rabbitmq
Мау 24 05:15:51 oms-git.omsystem sshd[809111]: Failed password for rabbitmq from 43.154.63.169 port 37010 ssh2
Мау 24 05:15:51 oms-git.omsystem sshd[809111]: Disconnected from authenticating user rabbitmq 43.154.63.169 port 37010 [preauth]
Мау 24 16:12:32 oms-git.omsystem sudo[842182]: ad : TTY=pts/3 ; PWD=/var/log/rabbitmq ; USER=root ; COMMAND=/usr/bin/tail -f -n 1000 rabbit#XXX-git.log
......
Maybe here is another issue with firewall, but I don't see any error messages about that in /var/log/rabbitmq/rabbit#XXX.log.
My Celery configuration on client is like:
CELERY_TASK_IGNORE_RESULT = True
CELERY_RESULT_BACKEND = 'django-db'
CELERY_CACHE_BACKEND = 'django-cache'
CELERY_SEND_EVENTS = False
CELERY_BROKER_POOL_LIMIT = 30
CELERY_BROKER_HEARTBEAT = 30
CELERY_BROKER_CONNECTION_TIMEOUT = 600
CELERY_PREFETCH_MULTIPLIER = 1
CELERY_SEND_EVENTS = False
CELERY_WORKER_CONCURRENCY = 1
CELERY_TASK_ACKS_LATE = True
Currently I'm running the pod using following command:
celery -A project.celery worker -l info -f /var/log/celery/celery.log -Ofair
Also I have tried to use various arguments to limit prefetch or turn off heartbit but it didn't work:
celery -A project.celery worker -l info -f /var/log/celery/celery.log --without-heartbeat --without-gossip --without-mingle
celery -A project.celery worker -l info -f /var/log/celery/celery.log --prefetch-multiplier=1 --pool=solo --
I expect that there are no limitations on queue length and every celery pod in my kubernetes cluster consumes and acks messages without errors.

ExportDicomData request of Google Cloud Healthcare API on GitHub tutorials never finish

I'm trying AutoML Vision of ML Codelabs on Cloud Healthcare API GitHub tutorials.
https://github.com/GoogleCloudPlatform/healthcare/blob/master/imaging/ml_codelab/breast_density_auto_ml.ipynb
I run the Export DICOM data cell code of Convert DICOM to JPEG section and the request as well as all the premise cell code succeeded.
But waiting for operation completion is timed out and never finish.
(ExportDicomData request status on Dataset page stays "Running" over the day. I did many times but all the requests were stacked staying "Running". A few times I tried to do from scratch and the results were same.)
I did so far:
1) Remove "output_config" since INVALID ARGUMENT error occurs.
https://github.com/GoogleCloudPlatform/healthcare/issues/133
2) Enable Cloud Resource Manager API since it is needed.
This is the cell code.
# Path to export DICOM data.
dicom_store_url = os.path.join(HEALTHCARE_API_URL, 'projects', project_id, 'locations', location, 'datasets', dataset_id, 'dicomStores', dicom_store_id)
path = dicom_store_url + ":export"
# Headers (send request in JSON format).
headers = {'Content-Type': 'application/json'}
# Body (encoded in JSON format).
# output_config = {'output_config': {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}}
output_config = {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}
body = json.dumps(output_config)
resp, content = http.request(path, method='POST', headers=headers, body=body)
assert resp.status == 200, 'error exporting to JPEG, code: {0}, response: {1}'.format(resp.status, content)
print('Full response:\n{0}'.format(content))
# Record operation_name so we can poll for it later.
response = json.loads(content)
operation_name = response['name']
This is the result of waiting.
Waiting for operation completion...
Full response:
{
"name": "projects/my-datalab-tutorials/locations/us-central1/datasets/sample-dataset/operations/18300485449992372225",
"metadata": {
"#type": "type.googleapis.com/google.cloud.healthcare.v1beta1.OperationMetadata",
"apiMethodName": "google.cloud.healthcare.v1beta1.dicom.DicomService.ExportDicomData",
"createTime": "2019-08-18T10:37:49.809136Z"
}
}
AssertionErrorTraceback (most recent call last)
<ipython-input-18-1a57fd38ea96> in <module>()
21 timeout = time.time() + 10*60 # Wait up to 10 minutes.
22 path = os.path.join(HEALTHCARE_API_URL, operation_name)
---> 23 _ = wait_for_operation_completion(path, timeout)
<ipython-input-18-1a57fd38ea96> in wait_for_operation_completion(path, timeout)
15
16 print('Full response:\n{0}'.format(content))
---> 17 assert success, "operation did not complete successfully in time limit"
18 print('Success!')
19 return response
AssertionError: operation did not complete successfully in time limit
API Version is v1beta1.
I was wondering if somebody has any suggestion.
Thank you.
After several times kept trying and stayed running one night, it finally succeeded. I don't know why.
There was a recent update to the codelabs. The error message is due to the timeout in the codelab and not the actual operation. This has been addressed in the update. Please let me know if you are still running into any issues!

Salt masters behind ELB have flaky connection to minions

I am running the following setup at AWS:
Elastic Loadbalancer in front of two EC2 machines (Amazon Linux) with a docker container that the salt-master runs in
Two EC2 instances with salt-minions installed
The 'master' value in the minion config is set to the dns of the loadbalancer (SaltMaster-env-vpc-test.szfegmankg.us-east-1.elasticbeanstalk.com)
The ELB accepts all traffic from the minions
The Salt-masters accept all traffic from the ELB as well as from the minions
The Salt-masters PKI Folder is shared between the two masters
The Salt-masters have the same private+public keys
The Salt-masters run on 2017.7.1
The Salt-minions run on 2016.11.5 (I tried it with 2017.7.1, but got the same results)
The Salt-minions accept all traffic from the ELB as well as from the masters
The master config looks as follows:
open_mode: True
worker_threads: 20
auto_accept: True
log_level: error
log_level_logfile: debug
extension_modules: srv/salt/ext
rest_cherrypy:
port: 8000
disable_ssl: True
debug: True
external_auth:
pam:
saltdev:
- .*
- '#runner'
# Setting the job_cache to redis.
# The redis config settings are generated at the start of the docker container and
# will be written into /etc/salt/master.d/redis.conf
master_job_cache: redis
cache: redis
pki_dir: /etc/salt/pki/master/efs
The minion config looks as follows:
id: WIN-AB3GO7BJ72I
log_file: C:\salt.log
multiprocessing: False
log_level_logfile: debug
pki_dir: /conf/pki/minion
master: SaltMaster-env-vpc-test.szfegmankg.us-east-1.elasticbeanstalk.com
master_type: str
master_alive_interval: 30
open_mode: True
root_dir: c:\salt
ipc_mode: tcp
recon_default: 1000
recon_max: 199000
recon_randomize: True
In the master log files, I can see on both masters:
2017-09-05 10:06:18,118 [salt.utils.verify][DEBUG ][35] This salt-master instance has accepted 2 minion keys.
A salt-key -L on both masters yield the same result:
Accepted Keys:
WIN-AB3GO7BJ72I
WIN-EDMP9VB716B
Denied Keys:
Unaccepted Keys:
Rejected Keys:
So it looks like all is fine and everything should work. However, a test.ping is extremely flaky. Sometimes it works, but most of the time it doesnt.
Most of the time neither master gets any return from the minion and on the minion side I can see in the log that the minion never receives the message to execute 'test.ping' from the master.
Example 1:
test.ping from Master1:
root#d7383ff8f8bf:/# salt 'WIN-EDMP9VB716B' test.ping
[ERROR ] Exception raised when processing __virtual__ function for salt.loaded.int.cache.consul. Module will not be loaded: 'module' object has no attribute 'Consul'
[ERROR ] An un-handled exception was caught by salt's global exception handler:
KeyError: 'redis.ls'
Traceback (most recent call last):
File "/usr/bin/salt", line 10, in <module>
salt_main()
File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 476, in salt_main
client.run()
File "/usr/lib/python2.7/dist-packages/salt/cli/salt.py", line 173, in run
for full_ret in cmd_func(**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 805, in cmd_cli
**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 1597, in get_cli_event_returns
connected_minions = salt.utils.minions.CkMinions(self.opts).connected_ids()
File "/usr/lib/python2.7/dist-packages/salt/utils/minions.py", line 577, in connected_ids
search = self.cache.ls('minions')
File "/usr/lib/python2.7/dist-packages/salt/cache/__init__.py", line 244, in ls
return self.modules[fun](bank, **self._kwargs)
File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1113, in __getitem__
func = super(LazyLoader, self).__getitem__(item)
File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 101, in __getitem__
raise KeyError(key)
KeyError: 'redis.ls'
Traceback (most recent call last):
File "/usr/bin/salt", line 10, in <module>
salt_main()
File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 476, in salt_main
client.run()
File "/usr/lib/python2.7/dist-packages/salt/cli/salt.py", line 173, in run
for full_ret in cmd_func(**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 805, in cmd_cli
**kwargs):
File "/usr/lib/python2.7/dist-packages/salt/client/__init__.py", line 1597, in get_cli_event_returns
connected_minions = salt.utils.minions.CkMinions(self.opts).connected_ids()
File "/usr/lib/python2.7/dist-packages/salt/utils/minions.py", line 577, in connected_ids
search = self.cache.ls('minions')
File "/usr/lib/python2.7/dist-packages/salt/cache/__init__.py", line 244, in ls
return self.modules[fun](bank, **self._kwargs)
File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1113, in __getitem__
func = super(LazyLoader, self).__getitem__(item)
File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 101, in __getitem__
raise KeyError(key)
KeyError: 'redis.ls'
I am aware that the redis error will be fixed soon https://github.com/saltstack/salt/issues/43295
Example 2:
test.ping from Master1, ~ 1 Minute after Example 1:
root#d7383ff8f8bf:/# salt 'WIN-EDMP9VB716B' test.ping
WIN-EDMP9VB716B:
True
Also during my tests, a test.ping from Master2 never succeeded.
I would like to know if there is some flaw in my setup that I am not seeing, or if Salt only works with an HA Proxy as an ELB?
Or maybe Salt doesn't work at all behind an ELB?
See https://github.com/saltstack/salt/issues/43368 for more answers.
TL;DR
Because there is no session stickyness for TCP connections, it is currently not possible to work with a saltmaster that is behind an ELB, if you use the ELB's ip/name as an entrypoint.

Invalid Token Error Asing AWS logs

I have been battling this for hours and it's driving me nuts. I installed log agent and set it up correctly.
I can access the instance via this command. eb ssh
However, when I run the command sudo service awslogs restart , I get weird errors like
2017-06-12 16:31:41,899 - cwlogs.push.publisher - WARNING - 31909 -
Thread-7 - Caught exception: An error occurred
(UnrecognizedClientException) when calling the PutLogEvents operation:
The security token included in the request is invalid.
2017-06-12 16:31:41,899 - cwlogs.threads - ERROR - 31909 - Thread-7 -
Exception caught in <EventBatchPublisher(Thread-7, started daemon
140242458298112)>
Traceback (most recent call last):
I have changed the credentials multiple times, all to no avail.
Also, I get this error in the awslogs.log file:
2017-06-12 16:31:40,862 - cwlogs.push.reader -
WARNING - 31909 - Thread-8 - Fall back to previous event time:
{'timestamp': 1497246644000, 'start_position': 7142L, 'end_position':
7246L}, previousEventTime: 1497246644000, reason: timestamp could not
be parsed from message.
I am using the following format:
[/var/log/tomcat8/catalina.out]
datetime_format = %d-%b-%Y %H:%M:%S
file = /var/log/tomcat8/catalina.out
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = Catalina
Any help at this point will be appreciated.
Kindly append "sudo" to the "aws configure" command.