Neo4j HA with Haproxy (Neo.ClientError.Transaction.TransactionNotFound)

Neo4j HA with Haproxy (Neo.ClientError.Transaction.TransactionNotFound) - amazon-web-services

I am using neo4j-enterprise-3.0.4 in cluster on AWS with bolt protocol. I'm using HAproxy to know who's master and who are the slaves in the HA cluster.
This is a settings of my HAproxy haproxy.cfg
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 256
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 30s
timeout client 2h
timeout server 2h
frontend http-in
bind *:81
acl write_method method POST DELETE PUT
acl write_hdr hdr_val(X-Write) eq 1
acl write_payload payload(0,0) -m reg -i CREATE|MERGE|SET|DELETE|REMOVE
acl tx_cypher_endpoint path_beg /db/data/transaction
http-request set-var(txn.tx_cypher_endpoint) bool(true) if tx_cypher_endpoint
use_backend neo4j-master if write_hdr
use_backend neo4j-master if tx_cypher_endpoint write_payload
use_backend neo4j-all if tx_cypher_endpoint
use_backend neo4j-master if write_method
default_backend neo4j-all
backend neo4j-all
option httpchk GET /db/manage/server/ha/available HTTP/1.0\r\nAuthorization:\ Basic\ [code]
acl tx_cypher_endpoint var(txn.tx_cypher_endpoint),bool
stick-table type integer size 1k expire 70s # slightly higher with org.neo4j.server.transaction.timeout
stick match path,word(4,/) if tx_cypher_endpoint
stick store-response hdr(Location),word(6,/) if tx_cypher_endpoint
server neo4j-1 192.0.0.250:7687 check port 7474
server neo4j-2 192.0.0.251:7687 check port 7474
server neo4j-3 192.0.0.252:7687 check port 7474
backend neo4j-master
option httpchk GET /db/manage/server/ha/master HTTP/1.0\r\nAuthorization:\ Basic\ [code]
server neo4j-1 192.0.0.250:7687 check port 7474
server neo4j-2 192.0.0.251:7687 check port 7474
server neo4j-3 192.0.0.252:7687 check port 7474
listen admin
bind *:82
mode http
stats enable
stats uri /haproxy?stats
stats realm Haproxy\ Statistics
stats auth admin:admin
Sometimes I get this when I want to use cypher in browser
"errors": [
{
"code": "Neo.ClientError.Transaction.TransactionNotFound",
"message": "Unrecognized transaction id. Transaction may have timed out and been rolled back."
}
]
Also I tried with this HAproxy configuration but I still have a same problem. This is a settings of my second HAproxy haproxy.cfg
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 256
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 30s
timeout client 2h
timeout server 2h
frontend http-in
bind *:81
acl write_method method POST DELETE PUT
acl write_hdr hdr_val(X-Write) eq 1
acl write_payload payload(0,0) -m reg -i CREATE|MERGE|SET|DELETE|REMOVE
acl tx_cypher_endpoint path_beg /db/data/transaction
http-request set-var(txn.tx_cypher_endpoint) bool(true) if tx_cypher_endpoint
use_backend neo4j-master if write_hdr
use_backend neo4j-master if tx_cypher_endpoint write_payload
use_backend neo4j-all if tx_cypher_endpoint
use_backend neo4j-master if write_method
default_backend neo4j-all
backend neo4j-all
option httpchk GET /db/manage/server/ha/master HTTP/1.0\r\nAuthorization:\ Basic\ [code]
acl tx_cypher_endpoint var(txn.tx_cypher_endpoint),bool
stick-table type integer size 1k expire 70s # slightly higher with org.neo4j.server.transaction.timeout
stick match path,word(4,/) if tx_cypher_endpoint
stick store-response hdr(Location),word(6,/) if tx_cypher_endpoint
server neo4j-1 192.0.0.250:7687 check port 7474
server neo4j-2 192.0.0.251:7687 check port 7474
server neo4j-3 192.0.0.252:7687 check port 7474
backend neo4j-master
option httpchk GET /db/manage/server/ha/slave HTTP/1.0\r\nAuthorization:\ Basic\ [code]
server neo4j-1 192.0.0.250:7687 check port 7474
server neo4j-2 192.0.0.251:7687 check port 7474
server neo4j-3 192.0.0.252:7687 check port 7474
listen admin
bind *:82
mode http
stats enable
stats uri /haproxy?stats
stats realm Haproxy\ Statistics
stats auth admin:admin
So I am not sure why is this happening. Is this because HAproxy or AWS or Bolt. When I switch protocol on http everything is working well and I do not have error.

I fix this problem by adding to HAproxy .cfg this parameters:
backend neo4j-browser with mode http and option prefer-last-server. Now HAproxy is working as charme and I do not getting error any more.
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 256
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 30s
timeout client 2h
timeout server 2h
frontend http-in
bind *:81
acl write_method method POST DELETE PUT
acl write_hdr hdr_val(X-Write) eq 1
acl write_payload payload(0,0) -m reg -i CREATE|MERGE|SET|DELETE|REMOVE
acl tx_cypher_endpoint path_beg /db/data/transaction
http-request set-var(txn.tx_cypher_endpoint) bool(true) if tx_cypher_endpoint
use_backend neo4j-master if write_hdr
use_backend neo4j-master if tx_cypher_endpoint write_payload
use_backend neo4j-all if tx_cypher_endpoint
use_backend neo4j-master if write_method
default_backend neo4j-all
frontend http-browse
bind *:83
mode http
default_backend neo4j-browser
backend neo4j-all
option httpchk GET /db/manage/server/ha/available HTTP/1.0\r\nAuthorization:\ Basic\[code]
acl tx_cypher_endpoint var(txn.tx_cypher_endpoint),bool
stick-table type integer size 1k expire 70s # slightly higher with org.neo4j.server.transaction.timeout
stick match path,word(4,/) if tx_cypher_endpoint
stick store-response hdr(Location),word(6,/) if tx_cypher_endpoint
server neo4j-1 192.0.0.250:7687 check port 7474
server neo4j-2 192.0.0.251:7687 check port 7474
server neo4j-3 192.0.0.252:7687 check port 7474
backend neo4j-master
option httpchk GET /db/manage/server/ha/master HTTP/1.0\r\nAuthorization:\ Basic\[code]
server neo4j-1 192.0.0.250:7687 check port 7474
server neo4j-2 192.0.0.251:7687 check port 7474
server neo4j-3 192.0.0.252:7687 check port 7474
backend neo4j-browser
mode http
option prefer-last-server
option httpchk GET /db/manage/server/ha/master HTTP/1.0\r\nAuthorization:\ Basic\ [code]
server neo4j-1 192.0.0.250:7474 check
server neo4j-2 192.0.0.251:7474 check
server neo4j-3 192.0.0.252::7474 check
listen admin
bind *:82
mode http
stats enable
stats uri /haproxy?stats
stats realm Haproxy\ Statistics
stats auth admin:admin

Related

Flask app IP redirection is not working on Ubuntu

I have a Flask-based website, and working fine when I run on a local PC # localhost.
I cloned the app into the recently built Ubuntu PC and tried to host the website there. I installed Apache2 and and setup conffile # (sudo nano /etc/apache2/sites-available/000-default.conf) as follows :
<VirtualHost *:80>
# The ServerName directive sets the request scheme, hostname and port that
# the server uses to identify itself. This is used when creating
# redirection URLs. In the context of virtual hosts, the ServerName
# specifies what hostname must appear in the request's Host: header to
# match this virtual host. For the default virtual host (this file) this
# value is not decisive as it is used as a last resort host regardless.
# However, you must set it for any further virtual host explicitly.
#ServerName www.example.com
ServerName 10.11.220.58
ServerAdmin webmaster#localhost
#DocumentRoot /var/www/html
#DocumentRoot /home/user_name/imageFIFA/frameworks.ai.image-inference
WSGIScriptAlias / /home/user_name/imageFIFA/frameworks.ai.image-inference/app.wsgi
<Directory /home/user_name/imageFIFA/frameworks.ai.image-inference>
Order allow,deny
Allow from all
</Directory>
# Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
# error, crit, alert, emerg.
# It is also possible to configure the loglevel for particular
# modules, e.g.
#LogLevel info ssl:warn
ErrorLog /home/user_name/imageFIFA/frameworks.ai.image-inference/logs/error.Log
CustomLog /home/user_name/imageFIFA/frameworks.ai.image-inference/logs/acess.log comb>
I created app.wsgi in flask project folder (/home/user_name/imageFIFA/frameworks.ai.image-inference) file as follows :
import sys
sys.path.insert(0,'/home/user_name/imageFIFA/frameworks.ai.image-inference')
from interceptor import app as application
My flask as is located at
/home/ad_pgooneti/imageFIFA/frameworks.ai.image-inference
When I run my front-end flask app and test the response with wget, I get the following :
--2022-09-17 12:24:36-- http://localhost:5000/
Resolving localhost (localhost)... 127.0.0.1, ::1
Connecting to localhost (localhost)|127.0.0.1|:5000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8943 (8.7K) [text/html]
Saving to: 'index.html'
index.html 100%[===================================>] 8.73K --.-KB/s in 0s
2022-09-17 12:24:36 (236 MB/s) - 'index.html' saved [8943/8943]
But If I use the server IP address of wget 10.11.220.58, I get the following :
--2022-09-17 12:29:43-- http://10.11.220.58/
Connecting to 10.11.220.58:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-09-17 12:29:43 ERROR 403: Forbidden.
So, I feel that redirection is not working. Apache2 server is running fine when I tested its status :
* apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-09-17 12:02:23 PDT; 29min ago
Docs: https://httpd.apache.org/docs/2.4/
Process: 1392385 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 1392411 (apache2)
Tasks: 55 (limit: 19140)
Memory: 10.4M
CGroup: /system.slice/apache2.service
|-1392411 /usr/sbin/apache2 -k start
|-1392412 /usr/sbin/apache2 -k start
`-1392413 /usr/sbin/apache2 -k start
Sep 17 12:02:23 ImageFIFA systemd[1]: Starting The Apache HTTP Server...
Sep 17 12:02:23 ImageFIFA systemd[1]: Started The Apache HTTP Server.
Any help with why it's not running? The Flask app is running at default port 5000, but I am not sure where am I supposed to do that in conf and app.wsgi files.

502 Bad Gateway error on fastapi app hosted on EC2 instance + ELB

I have a FastAPI app that is hosted on EC2 instance with ELB for securing the endpoints using SSL.
The app is running using a docker-compose.yml file
version: '3.8'
services:
fastapi:
build: .
ports:
- 8000:8000
command: uvicorn app.main:app --host 0.0.0.0 --reload
volumes:
- .:/kwept
environment:
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
depends_on:
- redis
worker:
build: .
command: celery worker --app=app.celery_worker.celery --loglevel=info --logfile=app/logs/celery.log
volumes:
- .:/kwept
environment:
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
depends_on:
- fastapi
- redis
redis:
image: redis:6-alpine
command: redis-server --appendonly yes
volumes:
- redis_data:/data
volumes:
redis_data:
Till Friday evening, the elb endpoint was working absolutely fine and I could use it. But since today morning, I have suddenly started getting a 502 Bad Gateway error. I had made no changes in the code or the settings on AWS.
The ELB listener settings on AWS:
The target group that is connected to the EC2 instance
When I log into the EC2 instance & check the logs of the docker container that is running the fastapi app, I see the following:
These logs show that the app is starting correctly
I have not configured any health checks specifically. I just have the default settings
Output of netstat -ntlp
I have the logs on the ELB:
http 2022-07-21T06:47:12.458060Z app/dianee-tools-elb/de7eb044e99165db 162.142.125.221:44698 172.31.31.173:443 -1 -1 -1 502 - 41 277 "GET http://18.197.14.70:80/ HTTP/1.1" "-" - - arn:aws:elasticloadbalancing:eu-central-1:xxxxxxxxxx:targetgroup/dianee-tools/da8a30452001c361 "Root=1-62d8f670-711975100c6d9d4038d73544" "-" "-" 0 2022-07-21T06:47:12.457000Z "forward" "-" "-" "172.31.31.173:443" "-" "-" "-"
http 2022-07-21T06:47:12.655734Z app/dianee-tools-elb/de7eb044e99165db 162.142.125.221:43836 172.31.31.173:443 -1 -1 -1 502 - 158 277 "GET http://18.197.14.70:80/ HTTP/1.1" "Mozilla/5.0 (compatible; CensysInspect/1.1; +https://about.censys.io/)" - - arn:aws:elasticloadbalancing:eu-central-1:xxxxxxxxxx:targetgroup/dianee-tools/da8a30452001c361 "Root=1-62d8f670-5ceb74c8530832f859038ef6" "-" "-" 0 2022-07-21T06:47:12.654000Z "forward" "-" "-" "172.31.31.173:443" "-" "-" "-"
http 2022-07-21T06:47:12.949509Z app/dianee-tools-elb/de7eb044e99165db 162.142.125.221:48556 - -1 -1 -1 400 - 0 272 "- http://dianee-tools-elb-yyyyyy.eu-central-1.elb.amazonaws.com:80- -" "-" - - - "-" "-" "-" - 2022-07-21T06:47:12.852000Z "-" "-" "-" "-" "-" "-" "-"

I see you are using EC2 launch type. I'll suggest ssh into the container and try curling the localhost on port 8080, it should return your application page. After that check the same on the instance as well since you have made the container mapping to port 8080. If this also works, try modifying the target group port to 8080 which is the port on which your application works. If the same setup is working on other resources, it could be you are using redirection. If this doesn't help fetch the full logs using - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-logs-collector.html
If your application is working on port 8000. You need to modify the target group to perform health check there. Once the Target Group port will change to 8000 the health check should go through

what is "502 Bad Gateway"?
The HyperText Transfer Protocol (HTTP) 502 Bad Gateway server error response code indicates that the server, while acting as a gateway or proxy, received an invalid response from the upstream server.
HTTP protocols
http - port number: 80
https - port number: 443
From docker-compose.yml file you are exposing port "8000" which will not work.
Possible solutions
using NGINX
install the NGINX and add the server config
server {
listen 80;
listen 443 ssl;
# ssl on;
# ssl_certificate /etc/nginx/ssl/server.crt;
# ssl_certificate_key /etc/nginx/ssl/server.key;
# server_name <DOMAIN/IP>;
location / {
proxy_pass http://127.0.0.1:8000;
}
}
Changing the port to 80 or 443 in the docker-compose.yml file
My suggestion is to use the nginx.

Make sure you've set Keep-Alive parameter of you webserver (in your case uvicorn) to something more than the default value of AWS ALB, which is 60s. Doing it this way you will make sure the service doesn’t close the HTTP Keep-Alive connection before the ALB.
For uvicorn it will be: uvicorn app.main:app --host 0.0.0.0 --timeout-keep-alive=65

C++ grpc client to Nginx ssl

I have a grpc server wrapped with nginx.
And I'm trying to write clients to it in different languages.
The python version works well
cred = grpc.ssl_channel_credentials()
channel = grpc.secure_channel(NAME, cred)
stub = MyServiceStub(channel)
But analogical c++ code doesn't.
auto channel = grpc::CreateChannel(NAME, grpc::SslCredentials(grpc::SslCredentialsOptions()));
auto stub = MyService::NewStub(channel);
nginx version 1.16.1 (built with OpenSSL 1.0.2k-fips 26 Jan 2017)
server {
listen 80;
server_name NAME;
location / {
return 302 grpc://NAME$request_uri;
}
}
server {
listen 443 ssl http2;
server_name NAME;
ssl_certificate chain.pem;
ssl_certificate_key privkey.pem;
location / {
grpc_pass grpc://IP;
grpc_read_timeout 3600;
}
}
nginx access log print "PRI * HTTP/2.0" 400 157 "-" "-" "-" and in debug for bad request
[debug] 24255#0: *103047 http check ssl handshake
[debug] 24255#0: *103047 http recv(): 1
[debug] 24255#0: *103047 plain http
[debug] 24255#0: *103047 http wait request handler
but a good request get
[debug] 30192#0: *105029 http check ssl handshake
[debug] 30192#0: *105029 http recv(): 1
[debug] 30192#0: *105029 https ssl handshake: 0x16
[debug] 30192#0: *105029 tcp_nodelay
[debug] 30192#0: *105029 ssl get session: 5F263490:3
grpc version 1.34.0 and c++ client outputs errors
Failed parsing HTTP/2
Expected SETTINGS frame as the first frame, got frame type 80
failed to connect to all addresses
Trying to connect an http1.x server
I find a similar Go problem How to find out why Nginx return 400 while use it as http/2 load balancer? , but don't know how to pass "h2" in c++.

postgresql - django.db.utils.OperationalError: could not connect to server: Connection refused

Is the server running on host "host_name" (XX.XX.XX.XX)
and accepting TCP/IP connections on port 5432?
typical error message while trying to set up db server. But I just cannot fix it.
my django db settings:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'db_name',
'USER': 'db_user',
'PASSWORD': 'db_pwd',
'HOST': 'host_name',
'PORT': '5432',
}
}
I added to pg_hba.conf
host all all 0.0.0.0/0 md5
host all all ::/0 md5
I replaced in postgresql.conf:
listen_addresses = 'localhost' to listen_addresses = '*'
and did postgresql restart:
/etc/init.d/postgresql stop
/etc/init.d/postgresql start
but still getting the same error. What interesting is:
I can ping XX.XX.XX.XX from outside and it works. but I cannot telnet:
telnet XX.XX.XX.XX
Trying XX.XX.XX.XX...
telnet: connect to address XX.XX.XX.XX: Connection refused
telnet: Unable to connect to remote host
If I telnet the port 22 from outside, it works:
telnet XX.XX.XX.XX 22
Trying XX.XX.XX.XX...
Connected to server_name.
Escape character is '^]'.
SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
If I telnet the port 5432 from inside the db server, I get this:
telnet XX.XX.XX.XX 5432
Trying XX.XX.XX.XX...
Connected to XX.XX.XX.XX.
Escape character is '^]'.
same port from outside:
telnet XX.XX.XX.XX 5432
Trying XX.XX.XX.XX...
telnet: connect to address XX.XX.XX.XX: Connection refused
telnet: Unable to connect to remote host
nmap from inside:
Host is up (0.000020s latency).
Not shown: 998 closed ports
PORT STATE SERVICE
22/tcp open ssh
5432/tcp open postgresql
nmap from outside:
Starting Nmap 7.60 ( https://nmap.org ) at 2018-01-24 07:01 CET
and no response.
It sounds like firewall issue, but I dont know where to look for. What am I doing wrong and what can be the issue?
any help is appreciated.
btw: I can login to postgresql inside the server, it works:
psql -h host_name -U user_name -d db_name
psql (9.4.15)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
db_name =>

issue was, as I guessed, firewall blocking these ports. I tried to communicate with the hosting company but at the end I had to change the server to some other hosting company and it worked with the exact settings

AWS Port in Security Group but Can't Connect

I have a Security Group that has 80, 443, 22, and 8089.
Ports Protocol Source security-group
22 tcp 0.0.0/0 [check]
8089 tcp 0.0.0/0 [check]
80 tcp 0.0.0/0 [check]
443 tcp 0.0.0/0 [check]
However, when I test the connection using a Python program I wrote:
import socket
import sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
p = sys.argv[1]
try:
s.connect(('public-dns', int(p)))
print 'Port ' + str(p) + ' is reachable'
except socket.error as e:
print 'Error on connect: %s' % e
s.close()
However, I'm good with all ports but 8089:
python test.py 80
Port 80 is reachable
python test.py 22
Port 22 is reachable
python test.py 443
Port 443 is reachable
python test.py 8089
Error on connect: [Errno 61] Connection refused

The reason why you are able to connect successfully via localhost (127.0.0.1) and not externally is because your server application is listening on the localhost adapter only. This means that only connections originating from the instance itself will be able to connect to that process.
To correct this, you will want to configure your application to listen on either the local IP address of the interface or on all interfaces (0.0.0.0).
This shoes that it is wrong (listening on 127...):
~ $ sudo netstat -tulpn | grep 9966
tcp 0 0 127.0.0.1:9966 0.0.0.0:* LISTEN 4961/python
Here is it working right (using all interfaces):
~ $ sudo netstat -tulpn | grep 9966
tcp 0 0 0.0.0.0:9966 0.0.0.0:* LISTEN 5205/python

Besides the AWS security groups (which look like you have set correctly), you also need to make sure that if there is an internal firewall on the host, that it is also open for all the ports specified.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Neo4j HA with Haproxy (Neo.ClientError.Transaction.TransactionNotFound) - amazon-web-services

Related

Flask app IP redirection is not working on Ubuntu

502 Bad Gateway error on fastapi app hosted on EC2 instance + ELB

C++ grpc client to Nginx ssl

postgresql - django.db.utils.OperationalError: could not connect to server: Connection refused

AWS Port in Security Group but Can't Connect

Categories

Resources