How to give kubernetes node a host name that it can reach? - amazon-web-services

I am trying to set up a one master and one node k8s cluster; however, when joining my node to my cluster via:
kubeadm join 10.1.3.238:6443 --token 2xm3il.sqjbsq7ebn5yaz4x \
--discovery-token-ca-cert-hash sha256:7fb7e9ca3ee452928fd413bc3ecb4cb8bc50a99d52b73a39a5c758d240054c4e
it gives this output:
[WARNING Hostname]: hostname "k8s-node1" could not be reached
[WARNING Hostname]: hostname "k8s-node1": lookup k8s-node1 on 10.1.0.2:53: no such host
I have tried setting the hostname of the instances to k8s-master and k8s-node1 and I also added them to the etc/hosts file. When I : cat etc/hosts on my master I get:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
ip-10-1-3-16 k8s-node1
ip-10-1-3-16 k8s-master
and when I : cat etc/hosts on my worker I get:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
ip-10-1-3-16 k8s-node1
ip-10-1-3-16 k8s-master

Related

502 Bad Gateway error on fastapi app hosted on EC2 instance + ELB

I have a FastAPI app that is hosted on EC2 instance with ELB for securing the endpoints using SSL.
The app is running using a docker-compose.yml file
version: '3.8'
services:
fastapi:
build: .
ports:
- 8000:8000
command: uvicorn app.main:app --host 0.0.0.0 --reload
volumes:
- .:/kwept
environment:
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
depends_on:
- redis
worker:
build: .
command: celery worker --app=app.celery_worker.celery --loglevel=info --logfile=app/logs/celery.log
volumes:
- .:/kwept
environment:
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
depends_on:
- fastapi
- redis
redis:
image: redis:6-alpine
command: redis-server --appendonly yes
volumes:
- redis_data:/data
volumes:
redis_data:
Till Friday evening, the elb endpoint was working absolutely fine and I could use it. But since today morning, I have suddenly started getting a 502 Bad Gateway error. I had made no changes in the code or the settings on AWS.
The ELB listener settings on AWS:
The target group that is connected to the EC2 instance
When I log into the EC2 instance & check the logs of the docker container that is running the fastapi app, I see the following:
These logs show that the app is starting correctly
I have not configured any health checks specifically. I just have the default settings
Output of netstat -ntlp
I have the logs on the ELB:
http 2022-07-21T06:47:12.458060Z app/dianee-tools-elb/de7eb044e99165db 162.142.125.221:44698 172.31.31.173:443 -1 -1 -1 502 - 41 277 "GET http://18.197.14.70:80/ HTTP/1.1" "-" - - arn:aws:elasticloadbalancing:eu-central-1:xxxxxxxxxx:targetgroup/dianee-tools/da8a30452001c361 "Root=1-62d8f670-711975100c6d9d4038d73544" "-" "-" 0 2022-07-21T06:47:12.457000Z "forward" "-" "-" "172.31.31.173:443" "-" "-" "-"
http 2022-07-21T06:47:12.655734Z app/dianee-tools-elb/de7eb044e99165db 162.142.125.221:43836 172.31.31.173:443 -1 -1 -1 502 - 158 277 "GET http://18.197.14.70:80/ HTTP/1.1" "Mozilla/5.0 (compatible; CensysInspect/1.1; +https://about.censys.io/)" - - arn:aws:elasticloadbalancing:eu-central-1:xxxxxxxxxx:targetgroup/dianee-tools/da8a30452001c361 "Root=1-62d8f670-5ceb74c8530832f859038ef6" "-" "-" 0 2022-07-21T06:47:12.654000Z "forward" "-" "-" "172.31.31.173:443" "-" "-" "-"
http 2022-07-21T06:47:12.949509Z app/dianee-tools-elb/de7eb044e99165db 162.142.125.221:48556 - -1 -1 -1 400 - 0 272 "- http://dianee-tools-elb-yyyyyy.eu-central-1.elb.amazonaws.com:80- -" "-" - - - "-" "-" "-" - 2022-07-21T06:47:12.852000Z "-" "-" "-" "-" "-" "-" "-"
I see you are using EC2 launch type. I'll suggest ssh into the container and try curling the localhost on port 8080, it should return your application page. After that check the same on the instance as well since you have made the container mapping to port 8080. If this also works, try modifying the target group port to 8080 which is the port on which your application works. If the same setup is working on other resources, it could be you are using redirection. If this doesn't help fetch the full logs using - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-logs-collector.html
If your application is working on port 8000. You need to modify the target group to perform health check there. Once the Target Group port will change to 8000 the health check should go through
what is "502 Bad Gateway"?
The HyperText Transfer Protocol (HTTP) 502 Bad Gateway server error response code indicates that the server, while acting as a gateway or proxy, received an invalid response from the upstream server.
HTTP protocols
http - port number: 80
https - port number: 443
From docker-compose.yml file you are exposing port "8000" which will not work.
Possible solutions
using NGINX
install the NGINX and add the server config
server {
listen 80;
listen 443 ssl;
# ssl on;
# ssl_certificate /etc/nginx/ssl/server.crt;
# ssl_certificate_key /etc/nginx/ssl/server.key;
# server_name <DOMAIN/IP>;
location / {
proxy_pass http://127.0.0.1:8000;
}
}
Changing the port to 80 or 443 in the docker-compose.yml file
My suggestion is to use the nginx.
Make sure you've set Keep-Alive parameter of you webserver (in your case uvicorn) to something more than the default value of AWS ALB, which is 60s. Doing it this way you will make sure the service doesn’t close the HTTP Keep-Alive connection before the ALB.
For uvicorn it will be: uvicorn app.main:app --host 0.0.0.0 --timeout-keep-alive=65

postgresql - django.db.utils.OperationalError: could not connect to server: Connection refused

Is the server running on host "host_name" (XX.XX.XX.XX)
and accepting TCP/IP connections on port 5432?
typical error message while trying to set up db server. But I just cannot fix it.
my django db settings:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'db_name',
'USER': 'db_user',
'PASSWORD': 'db_pwd',
'HOST': 'host_name',
'PORT': '5432',
}
}
I added to pg_hba.conf
host all all 0.0.0.0/0 md5
host all all ::/0 md5
I replaced in postgresql.conf:
listen_addresses = 'localhost' to listen_addresses = '*'
and did postgresql restart:
/etc/init.d/postgresql stop
/etc/init.d/postgresql start
but still getting the same error. What interesting is:
I can ping XX.XX.XX.XX from outside and it works. but I cannot telnet:
telnet XX.XX.XX.XX
Trying XX.XX.XX.XX...
telnet: connect to address XX.XX.XX.XX: Connection refused
telnet: Unable to connect to remote host
If I telnet the port 22 from outside, it works:
telnet XX.XX.XX.XX 22
Trying XX.XX.XX.XX...
Connected to server_name.
Escape character is '^]'.
SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
If I telnet the port 5432 from inside the db server, I get this:
telnet XX.XX.XX.XX 5432
Trying XX.XX.XX.XX...
Connected to XX.XX.XX.XX.
Escape character is '^]'.
same port from outside:
telnet XX.XX.XX.XX 5432
Trying XX.XX.XX.XX...
telnet: connect to address XX.XX.XX.XX: Connection refused
telnet: Unable to connect to remote host
nmap from inside:
Host is up (0.000020s latency).
Not shown: 998 closed ports
PORT STATE SERVICE
22/tcp open ssh
5432/tcp open postgresql
nmap from outside:
Starting Nmap 7.60 ( https://nmap.org ) at 2018-01-24 07:01 CET
and no response.
It sounds like firewall issue, but I dont know where to look for. What am I doing wrong and what can be the issue?
any help is appreciated.
btw: I can login to postgresql inside the server, it works:
psql -h host_name -U user_name -d db_name
psql (9.4.15)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
db_name =>
issue was, as I guessed, firewall blocking these ports. I tried to communicate with the hosting company but at the end I had to change the server to some other hosting company and it worked with the exact settings

uWSGI configuration with FlaskApp

In effort to switch to Nginx, I'm running into configuration problems and getting a 502 Gateway error. Here is error log on connection:
tail -f error.log
2016/10/08 16:09:31 [crit] 21682#21682: *29 connect() to unix:/var/www/FlaskApp/FlaskApp/runserver.sock failed (13: Permission denied) while connecting to upstream, client: 73.188.249.47, server: ceejcalvert.com, request: "GET / HTTP/1.1", upstream: "uwsgi://unix:/var/www/FlaskApp/FlaskApp/runserver.sock:", host: "website.com"
If I'm in the terminal on the server I can get the site up by manually directing the socket via:
This command will get everything working if run:
uwsgi -s /var/www/FlaskApp/FlaskApp/runserver.sock -w runserver:app --chmod-socket=666
The issue is I cannot get it working in daemon mode. My configuration is as follows:
$ cat /etc/systemd/system/runserver.service
[Unit]
Description=uWSGI instance to serve runserver
After=network.target
[Service]
User=username
Group=www-data
WorkingDirectory=/var/www/FlaskApp/FlaskApp
Environment="PATH=/var/www/FlaskApp/FlaskApp/venv/bin"
ExecStart=/var/www/FlaskApp/FlaskApp/venv/bin/runserver.sock --ini runserver.ini
[Install]
WantedBy=multi-user.target
...
cat /var/www/FlaskApp/FlaskApp/runserver.ini
[uwsgi]
module = wsgi:app
master = true
processes = 5
logto = /home/jmc856/error.log
socket = runserver.sock
chmod-socket = 666
vacuum = true
die-on-term = true
Assume site-available is linked to sites-enabled
cat /etc/nginx/sites-available/runserver
server {
listen 80;
server_name website.com;
location / {
include uwsgi_params;
uwsgi_pass unix:/var/www/FlaskApp/FlaskApp/runserver.sock;
}
}
Anything obvious I'm missing?
When I run the following, I get exit code 3.
sudo systemctl status runserver
runserver.service - uWSGI instance to serve runserver
Loaded: loaded (/etc/systemd/system/runserver.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2016-10-08 16:08:45 EDT; 20min ago
Main PID: 22365 (code=exited, status=203/EXEC)
Oct 08 16:08:45 FirstNameLastName.com systemd[1]: Stopped uWSGI instance to serve runserver.
Oct 08 16:08:45 FirstNameLastName.com systemd[1]: Started uWSGI instance to serve runserver.
Oct 08 16:08:45 FirstNameLastName.com systemd[1]: runserver.service: Main process exited, code=exited, status=203/EXEC
Oct 08 16:08:45 FirstNameLastName.com systemd[1]: runserver.service: Unit entered failed state.
Oct 08 16:08:45 FirstNameLastName.com systemd[1]: runserver.service: Failed with result 'exit-code'.
Solved my issue. For the most part my configurations were fine. Here is a checklist of things to ensure if you get 502 gateway errors.
1) I first added all absolute paths to config files. For example I changed my systemd config to:
$ cat /etc/systemd/system/runserver.service
[Unit]
Description=uWSGI instance to serve runserver
After=network.target
[Service]
User=username
Group=www-data
WorkingDirectory=/var/www/FlaskApp/FlaskApp
Environment="PATH=/var/www/FlaskApp/FlaskApp/venv/bin"
ExecStart=/var/www/FlaskApp/FlaskApp/venv/bin/runserver.sock --ini /var/www/FlaskApp/FlaskApp/runserver.ini
[Install]
WantedBy=multi-user.target
2) Changed .ini file to directly call uWSGI app:
cat /var/www/FlaskApp/FlaskApp/runserver.ini
[uwsgi]
chdir=/var/Webpage/
wsgi-file = wsgi.py
callable = app
master = true
processes = 5
logto = /home/error.log
socket = runserver.sock
chmod-socket = 666
vacuum = true
die-on-term = true
3) Ensure FlaskApp host is 0.0.0.0:
if __name__ == "__main__":
app.run(host='0.0.0.0')
4) Use these commands to try to find out where things are failing.
Make sure configs have proper syntax
$ sudo nginx -t
Make sure nginx daemon running properly
$ systemctl status nginx.service
Ensure uWSGI instance to serve {app} is running.
$ systemctl
If all is good and still finding errors, search for failure in
$ sudo journalctl
And
$ sudo tail -f /var/log/nginx/error.log
If everything is running properly, make sure you perform the following:
$ sudo systemctl restart {app}
$ sudo start {app}
$ sudo enable {app}
That last command was something I forgot and prevented me from realizing my configuration was fine for a long time. In my case {app} was 'runserver'
I've been using this Docker image for all my nginx and flask apps
https://github.com/tiangolo/uwsgi-nginx-flask-docker

AWS Port in Security Group but Can't Connect

I have a Security Group that has 80, 443, 22, and 8089.
Ports Protocol Source security-group
22 tcp 0.0.0/0 [check]
8089 tcp 0.0.0/0 [check]
80 tcp 0.0.0/0 [check]
443 tcp 0.0.0/0 [check]
However, when I test the connection using a Python program I wrote:
import socket
import sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
p = sys.argv[1]
try:
s.connect(('public-dns', int(p)))
print 'Port ' + str(p) + ' is reachable'
except socket.error as e:
print 'Error on connect: %s' % e
s.close()
However, I'm good with all ports but 8089:
python test.py 80
Port 80 is reachable
python test.py 22
Port 22 is reachable
python test.py 443
Port 443 is reachable
python test.py 8089
Error on connect: [Errno 61] Connection refused
The reason why you are able to connect successfully via localhost (127.0.0.1) and not externally is because your server application is listening on the localhost adapter only. This means that only connections originating from the instance itself will be able to connect to that process.
To correct this, you will want to configure your application to listen on either the local IP address of the interface or on all interfaces (0.0.0.0).
This shoes that it is wrong (listening on 127...):
~ $ sudo netstat -tulpn | grep 9966
tcp 0 0 127.0.0.1:9966 0.0.0.0:* LISTEN 4961/python
Here is it working right (using all interfaces):
~ $ sudo netstat -tulpn | grep 9966
tcp 0 0 0.0.0.0:9966 0.0.0.0:* LISTEN 5205/python
Besides the AWS security groups (which look like you have set correctly), you also need to make sure that if there is an internal firewall on the host, that it is also open for all the ports specified.

nginx, fastcgi and 502 errors wiht spawn issues

I am trying to get fastcgi to work on nginx. I know the config file is correct becuase it worked before and i suspect my c++ program and how I set the fcgi file to be read by nginx. These are the steps I undertake. I am using Ubuntu, nginx, c++ with fastcgi. What did I do wrong?
1) Compile the program
g++ -o rtbCookieServer.fcgi rtbCookieServer.o -lfcgi++ -lboost_system -lcgicc -L/home/cpp/mongo-cxx-driver-v2.0 -I/home/cpp/mongo-cxx-driver-v2.0/mongo
2) move rtbCookieServer.fcgi into /var/www
3) sudo /var/www chmod a+x rtbCookieServer.fcgi
4) Run the below
spawn-fcgi.standalone -u root -g root -G www-data -a 127.0.0.1 -p 9000 -f /var/www/rtbCookieServer.fcgi
spawn-fcgi: child spawned successfully: PID: 2398
if I try and run the command as root I get this:
spawn-fcgi: I will not set uid to 0
5) browse to http://127.0.0.1/rtbCookieServer.fcgi where I get a 502 error and this error in my log file
2012/01/23 15:19:03 [error] 1189#0: *1 upstream closed prematurely FastCGI stdout while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "GET /rtbCookieServer.fcgi HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "127.0.0.1"
When I look what is listening on port 9000 I get the below alomg with some other diagnostics:
sudo lsof -i :9000
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rtbCookie 2398 marktest 0u IPv4 17598 0t0 TCP localhost:9000 (LISTEN)
netstat -an | grep 9000
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
ps auxww | grep rtbCookieServer.fcgi
1000 2398 0.0 0.0 24616 360 ? Ss 15:08 0:00 /var/www/rtbCookieServer.fcgi
Now..1) why does the command say rtbCookie and not rtbCookieServer? even when I kill the process and rerun the spawn command ...still says rtbCookie. Should it not say rtbCookieServer? Also, why does it say marktest for user rather than root?
for Diagnostis I run ./rtbCookieServer.fcgi --9000 and the get the expected output.
Here are my file permissions.
-rwxr-xr-x 1 root root 1580470 2012-01-23 14:28 rtbCookieServer.fcgi
Here is my config file:
server {
listen 80;
server_name localhost;
location ~ \.fcgi$ {
root /var/www;
include /etc/nginx/fastcgi_params;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.html;
fastcgi_param SCRIPT_FILENAME /$fastcgi_script_name;
include fastcgi_params;
}
}
It says rtbCookie because lsof uses fixed width columns and rtbCookie is all that fits.
Sounds like it get's confused while it is processing the headers you send back. I suspect you have a slight formatting error in your response. Each header should end with \r\n
Between the last header and the body of your response must be an empty line also ending with \r\n