Slack Bot deployed in Cloud Foundry returns 502 Bad Gateway errors - cloud-foundry

In Slack, I have set up an app with a slash command. The app works well when I use a local ngrok server.
However, when I deploy the app server to PCF, it is returning 502 errors:
[CELL/0] [OUT] Downloading droplet...
[CELL/SSHD/0] [OUT] Exit status 0
[APP/PROC/WEB/0] [OUT] Exit status 143
[CELL/0] [OUT] Cell e6cf018d-0bdd-41ca-8b70-bdc57f3080f1 destroying container for instance 28d594ba-c681-40dd-4514-99b6
[PROXY/0] [OUT] Exit status 137
[CELL/0] [OUT] Downloaded droplet (81.1M)
[CELL/0] [OUT] Cell e6cf018d-0bdd-41ca-8b70-bdc57f3080f1 successfully destroyed container for instance 28d594ba-c681-40dd-4514-99b6
[APP/PROC/WEB/0] [OUT] ⚡️ Bolt app is running! (development server)
[OUT] [APP ROUTE] - [2021-12-23T20:35:11.460507625Z] "POST /slack/events HTTP/1.1" 502 464 67 "-" "Slackbot 1.0 (+https://api.slack.com/robots)" "10.0.1.28:56002" "10.0.6.79:61006" x_forwarded_for:"3.91.15.163, 10.0.1.28" x_forwarded_proto:"https" vcap_request_id:"7fe6cea6-180a-4405-5e5e-6ba9d7b58a8f" response_time:0.003282 gorouter_time:0.000111 app_id:"f1ea0480-9c6c-42ac-a4b8-a5a4e8efe5f3" app_index:"0" instance_id:"f46918db-0b45-417c-7aac-bbf2" x_cf_routererror:"endpoint_failure (use of closed network connection)" x_b3_traceid:"31bf5c74ec6f92a20f0ecfca00e59007" x_b3_spanid:"31bf5c74ec6f92a20f0ecfca00e59007" x_b3_parentspanid:"-" b3:"31bf5c74ec6f92a20f0ecfca00e59007-31bf5c74ec6f92a20f0ecfca00e59007"
Besides endpoint_failure (use of closed network connection), I also see:
x_cf_routererror:"endpoint_failure (EOF (via idempotent request))"
x_cf_routererror:"endpoint_failure (EOF)"
In PCF, I created an https:// route for the app. This is the URL I put into my Slack App's "Redirect URLs" section as well as my Slash command URL.
In Slack, the URLs end with /slack/events
This configuration all works well locally, so I guess I missed a configuration point in PCF.
Manifest.yml:
applications:
- name: kafbot
buildpacks:
- https://github.com/starkandwayne/librdkafka-buildpack/releases/download/v1.8.2/librdkafka_buildpack-cached-cflinuxfs3-v1.8.2.zip
- https://github.com/cloudfoundry/python-buildpack/releases/download/v1.7.48/python-buildpack-cflinuxfs3-v1.7.48.zip
instances: 1
disk_quota: 2G
# health-check-type: process
memory: 4G
routes:
- route: "kafbot.apps.prod.fake_org.cloud"
env:
KAFKA_BROKER: 10.32.17.182:9092,10.32.17.183:9092,10.32.17.184:9092,10.32.17.185:9092
SLACK_BOT_TOKEN: ((slack_bot_token))
SLACK_SIGNING_SECRET: ((slack_signing_key))
command: python app.py

When x_cf_routererror says endpoint_failure it means that the application has not handled the request sent to it by Gorouter for some reason.
From there, you want to look at response_time. If the response time is high (typically the same value as the timeout, like 60s almost exactly), it means your application is not responding quickly enough. If the value is low, it could mean that there is a connection problem, like Gorouter tries to make a TCP connection and cannot.
Normally this shouldn't happen. The system has a health check in place that makes sure the application is up and listening for requests. If it's not, the application will not start correctly.
In this particular case, the manifest has health-check-type: process which is disabling the standard port-based health check and using a process-based health check. This allows the application to start up even if it's not on the right port. Thus when Gorouter sends a request to the application on the expected port, it cannot connect to the application's port. Side note: typically, you'd only use process-based health checks if your application is not listening for incoming requests.
The platform is going to pass in a $PORT env variable with a value in it (it is always 8080, but could change in the future). You need to make sure your app is listening on that port. Also, you want to listen on 0.0.0.0, not localhost or 127.0.0.1.
This should ensure that Gorouter can deliver requests to your application on the agreed-upon port.

Related

SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: (null) when communicating with Chef Automate server

I am having difficulty connecting to my Chef Automate server, hosted on AWS OpsWorks.
I am usually connecting to it at least once per day, however since the start of the week I have been unable to.
There is some weekly maintenance performed on the server on a Friday, however this seems to go unnoticed.
When I try and communicate with the server I get the following error:
knife environment from file environments/production.json
ERROR: SSL Validation failure connecting to host: crmpicco-production-lay0vgyp4ighjsxv.us-east-1.opsworks-cm.io - SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A: (null)
ERROR: SSL Error connecting to https://crmpicco-production-lay0vgyp4ighjsxv.us-east-1.opsworks-cm.io/organizations/rfc1872/environments/production, retry 1/5
In the events, I can see the following:
2022-08-26T12:25:26Z Maintenance completed successfully
2022-08-26T12:24:54Z Updating stack arn:aws:cloudformation:us-east-1:367114569123:stack/aws-opsworks-cm-instance-mc-prod-chef-1661515433111/27c16c50-2537-22ed-80ab-12a4e5696267 to associate EIP 2.51.125.211
2022-08-26T12:24:23Z Updating stack arn:aws:cloudformation:us-east-1:367114569123:stack/aws-opsworks-cm-instance-mc-prod-chef-1660910626222/fad95750-1fb6-22ed-817f-0aca43928f1d to disassociate EIP 2.51.125.211
2022-08-26T12:24:11Z Checking health of new instance
I have tried a knife ssl fetch, but that is also unable to communicate with the server.

strongswan configuration and traffic on tunnel problem IKEv2

im new in this scope. I tried to configure strongswan site-to-site with centos7 (different region) at google cloud platform. Ive done follow this guide:
https://blog.ruanbekker.com/blog/2018/02/11/setup-a-site-to-site-ipsec-vpn-with-strongswan-and-preshared-key-authentication/
https://www.tecmint.com/setup-ipsec-vpn-with-strongswan-on-centos-rhel-8/
https://medium.com/#georgeswizzalonge/how-to-setup-a-site-to-site-vpn-connection-with-strongswan-32d4ed034ae2
This ipsec.conf comes from site A:
config setup
charondebug="all"
strictcrlpolicy=no
uniqueids = yes
conn sg-to-jkt
authby=secret
left=%defaultroute
leftid=34.xx.xx.xxx
leftsubnet=10.xxx.x.xx/24
right=34.xxx.xxx.xxx
rightsubnet=10.xxx.x.x/24
ike=aes256-sha2_256-modp1024!
esp=aes256-sha2_256!
keyingtries=0
ikelifetime=1h
lifetime=8h
dpddelay=30
dpdtimeout=120
dpdaction=restart
auto=start
ipsec.secrets file site A:
site-A site-B : PSK "someencryptedkey"
This ipsec.conf site B:
config setup
charondebug="all"
strictcrlpolicy=no
uniqueids = yes
conn jkt-to-sg
authby=secret
left=%defaultroute
leftid=34.xxx.xxx.xxx
leftsubnet=10.xxx.x.x/24
right=34.xx.xx.xxx
rightsubnet=10.xxx.x.xx/24
ike=aes256-sha2_256-modp1024!
esp=aes256-sha2_256!
keyingtries=0
ikelifetime=1h
lifetime=8h
dpddelay=30
dpdtimeout=120
dpdaction=restart
auto=start
ipsec.secret file site B:
site-B site-A : PSK "someencryptedkey"
My questions are:
Why everytime i used to restart the strongswan (strongswan restart), the strongswan service (systemctl status strongswan) becomes dead/inactive? (note: strongswan tunnel is still up)
● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using ipsec.conf
Loaded: loaded (/usr/lib/systemd/system/strongswan.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Sun 2020-10-11 16:37:06 UTC; 32min ago
No traffic in the ESP protocol, tcpdump esp not display anything but the strongswan tunnel is up. I realized that the status give different result from the example. The result return ESP in UDP SPIs instead of ESP SPIs. Is there any different or anything else?
thank you for your help and advices
You maigh check your Systemd service file strongswan.service and change the Type= option.
By default you should have Type=simple and it works for many Systemd service files, but it does not work when the script in ExecStart launches another process and completes, please consider to change to explicitly specify Type=forking in the [Service] section so that Systemd knows to look at the spawned process rather than the initial one.
From man systemd.service:
If set to forking, it is expected that the process configured with ExecStart= will call fork() as part of its start-up. The parent process is expected to exit when start-up is complete and all communication channels are set up. The child continues to run as the main daemon process. This is the behavior of traditional UNIX daemons. If this setting is used, it is recommended to also use the PIDFile= option, so that systemd can identify the main process of the daemon. systemd will proceed with starting follow-up units as soon as the parent process exits.
Additionally, I have found another thread in StrackOverflow with a similar issue.
But please see man systemd.service for an appropriate type.
For your second question you might check your firewall, I found another similar case in this link

Gremlin remote command fails with timeout error: Host did not respond in a timely fashion

I connected to a remote gremlin server via gremlin groovy shell. Connection succeeded. But for any remote command I try to execute it gives timeout error. Even for command :> 1+1
gremlin> :remote connect tinkerpop.server conf/senthil.yaml
==>Connected - 10.40.40.65/10.40.40.65:50080
gremlin> :> 1+1
Host did not respond in a timely fashion - check the server status and submit again.
Display stack trace? [yN]
org.apache.tinkerpop.gremlin.groovy.plugin.RemoteException: Host did not respond in a timely fashion - check the server status and submit again.
at org.apache.tinkerpop.gremlin.console.groovy.plugin.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:120)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.apache.tinkerpop.gremlin.console.commands.SubmitCommand.execute(SubmitCommand.groovy:41)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.codehaus.groovy.tools.shell.Shell.execute(Shell.groovy:101)
at org.codehaus.groovy.tools.shell.Groovysh.super$2$execute(Groovysh.groovy)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
This is my conf file: remote.yaml
hosts: [10.40.40.65]
port: 50080
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
I'm using dynamodb + titan.
You might not have a truly successful connection. The console (and underlying driver) is optimistic in that it really doesn't fail a connection until a request is sent as it expects the server may come online "later". I would go back to investigating if the server is running, if you have the right IP, if the host property is not set to something like "localhost" if you are connecting remotely, if the port is open, that you are using a compatible version of TinkerPop, etc.

Spark - Remote Akka Client Disassociated

I am setting up Spark 0.9 on AWS and am finding that when launching the interactive Pyspark shell, my executors / remote workers are first being registered:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Registered executor:
Actor[akka.tcp://sparkExecutor#ip-xx-xx-xxx-xxx.ec2.internal:54110/user/
Executor#-862786598] with ID 0
and then disassociated almost immediately, before I have the chance to run anything:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Executor 0 disconnected,
so removing it
14/07/08 22:48:05 ERROR scheduler.TaskSchedulerImpl: Lost an executor 0 (already
removed): remote Akka client disassociated
Any idea what might be wrong? I've tried adjusting the JVM options spark.akka.frameSize and spark.akka.timeout, but I'm pretty sure this is not the issue since (1) I'm not running anything to begin with, and (2) my executors are disconnecting a few seconds after startup, which is well within the default 100s timeout.
Thanks!
Jack
I had a very similar problem, if not the same.
It started to work for me once the workers were connecting to master by using the very same name as the master thought it had.
My log messages were something like:
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#idc1-hrm1.heylinux.com:7078] -> [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]].
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#192.168.121.127:7078] -> [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]]
WARN util.Utils: Your hostname, idc1-hrm1 resolves to a loopback address: 127.0.0.1; using 192.168.121.187 instead (on interface eth0)
So check the log of the master and see what name it thinks it has.
Then use that very same name on the workers.

What the reason for AWS Health status becoming RED?

I've deployed an application to AWS elastic beanstalk.
after start the application, it runs well. But after 5 minutes(I set health check every 5 min), it runs failed. I access the url but HTTP 503 error back.
From the event info, I only get the info that the health status from YELLOW TO GREEN.
But how can I get detailed info and what can I do about this error?
BTW: I don't understand that is this health status RED leads to application can't start up OR something else failed leads to application failed, then the health status becomes to RED?
Elastic Load Balancing has a health check daemon that checks the path you've provided for a 200-range HTTP status.
If there is a problem with the application, or its not returning a 2xx status code, or if you've misconfigured the health check URL, the status will go RED.
Two things you can do to see what's going on:
Hit the hostname of an individual instance in your web browser — particularly the health check path. Are you seeing what you expected?
SSH into the instance and check the logs in /var/log and /opt/elasticbeanstalk/var/log. Are there any errors that you can find?
Without knowing more about your application, stack or container type, that's the best I can do.
I hope this helps! :)