Docker swarm containers not able to access internet - amazon-web-services

I am trying to setup a swarm cluster in AWS, however the containers in the host are not able to access the internet. The ping command for both address resolution or direct connectivity via IP is not working from inside the container.
Before creating this ticket I had a look at this issue, but I don't think there is CIDR overlap in my case.
I have the following configurations:
Public Subnet CIDR : 10.2.1.0/24
Namespace server inside this is :10.2.0.2
Ingress overlay network --> 10.255.0.0/16
docker_gwbridge --> 172.18.0.0/1
I have also tried creating the new overlay(192.168.1.0/24) and docker_gwbridge(10.11.0.0/16) network with no luck.
I am creating the service with these options(removing the mount and env parameters):
docker service create --publish 8098:8098 <Imagename>
Please note when I was creating the overlay network by myself I was adding the option --network my-overlay as well in the create command.
Any pointers as to what I might be missing/doing wrong?
Edit 1 Adding more info
Below is the inspect of container when I am not creating a new overlay network and going with the default one:
"NetworkSettings": {
"Bridge": "",
"SandboxID": "eb***",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"5005/tcp": null,
"8080/tcp": null
},
"SandboxKey": "/var/run/docker/netns/e***9",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {
"ingress": {
"IPAMConfig": {
"IPv4Address": "10.255.0.4"
},
"Links": null,
"Aliases": [
"30**"
],
"NetworkID": "g7w**",
"EndpointID": "291***",
"Gateway": "",
"IPAddress": "10.255.0.4",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:4***"
}
And below is from when I am creating the overlay network:
"Networks": {
"ingress": {
"IPAMConfig": {
"IPv4Address": "10.255.0.4"
},
"Links": null,
"Aliases": [
"42***"
],
"NetworkID": "jl***3",
"EndpointID": "792***86c",
"Gateway": "",
"IPAddress": "10.255.0.4",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:4***"
},
"my-overlay": {
"IPAMConfig": {
"IPv4Address": "192.168.1.3"
},
"Links": null,
"Aliases": [
"42**"
],
"NetworkID": "4q***",
"EndpointID": "4c***503",
"Gateway": "",
"IPAddress": "192.168.1.3",
"IPPrefixLen": 24,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:4***"
}

I am answering my question as I found out that the reason for this behavior was my custom chef recipe for docker installation. I was setting up iptables=false in the docker config and hence it was not working for any docker container other than those in host network mode.
I got the following advice from Bret(Docker champion in docker community) which helped me to get to the root of the problem. In short it was a issue with something I was doing wrongly, however posting the suggestion below in case you want to troubleshoot such issues in future.
Hey Manish,
Suggestion: get a single container working correctly without swarm or overlays before trying them.
so you should be able to just docker run --rm nginx:alpine ping 8.8.8.8 and get a response.
That verifies that containers on that host have a way to the internet.
Then trying docker run --rm nginx:alpine ping google.com and get a response.
That verifies DNS resolution is working.
*Then you can try creating a single overlay network on one node in a single node swarm:*
*docker swarm init *
*docker network create --driver overlay --attachable mynet *
*docker run --rm --network mynet nginx:alpine ping google.com *
That verifies they have internet and DNS on a overlay network.
If you then add multiple nodes and have issues, then you likely need to ensure all swarm nodes can talk over swarm ports, which you find a link to the firewall port list in The Swarm Section under the Creating a 3-Node Swarm Cluster resources.

As Manish said, first try to ping public network without overlay network:
docker run --rm nginx:alpine ping 8.8.8.8
If it doesn't work, then you have a problem with firewall or something else.
In my case, iptables firewall restricted DOCKER-USER chain to get access public network.
So I have flushed all docker rules:
sudo iptables -F DOCKER-USER
Then reinitialized:
sudo iptables -I DOCKER-USER -i eth0 -s 0.0.0.0/0 -j ACCEPT

I had a similar issue and was able to fix it by configuring the daemon.json file in the /etc/docker directory. Add following lines if they are not present already.
"iptables":true,
"dns": ["8.8.8.8", "8.8.4.4"]
Your daemon.json file should look something like below
{
"labels": ....,
"data-root": ....,
"max-concurrent-downloads": ....,
"iptables":true,
"dns": ["8.8.8.8", "8.8.4.4"]
}
Then restart the docker service
sudo service docker restart

Related

AWS EKS logging to CloudWatch - how to send logs only, without metrics?

I would like to forward the logs of select services running on my EKS cluster to CloudWatch for cluster-independent storage and better observability.
Following the quickstart outlined at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-quickstart.html I've managed to get the logs forwarded via Fluent Bit service, but that has also generated 170 Container Insights metrics channels. Not only are those metrics not required, but they also appear to cost a fair bit.
How can I disable the collection of cluster metrics such as cpu / memory / network / etc, and only keep forwarding container logs to CloudWatch? I'm having a very hard time finding any documentation on this.
I think I figured it out - the cloudwatch-agent daemonset from quickstart guide is what's sending the metrics, but it's not required for log forwarding. All the objects with names related to cloudwatch-agent in quickstart yaml file are not required for log forwarding.
As suggested by Toms Mikoss, you need to delete the metrics object in your configuration file. This file is the one that you pass to the agent when starting
This applies to "on-premises" "linux" installations. I havent tested this on windows, nor EC2 but I imagine it will be similar. The AWS Documentation here says that you can also distribute the configuration via SSM, but again, I imagine the answer here is still applicable.
Example of file with metrics:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx.log",
"log_group_name": "nginx",
"log_stream_name": "{hostname}"
}
]
}
}
},
"metrics": {
"metrics_collected": {
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait"
],
"metrics_collection_interval": 60,
"totalcpu": true
}
}
}
}
Example of file without metrics:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx.log",
"log_group_name": "nginx",
"log_stream_name": "{hostname}"
}
]
}
}
}
}
For reference, the command to start for linux on-premises servers:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config \
-m onPremise -s -c file:configuration-file-path
More details in the AWS Documentation here

Error while deploying web app to AWS ElasticBeanStalk

I am getting the below error while deploying to aws elastic beanstalk from travis CI.
Service:AmazonECS, Code:ClientException, Message:Container list cannot be empty., Class:com.amazonaws.services.ecs.model.ClientException
.travis.yml:
sudo: required
language: generic
services:
- docker
before_install:
- docker build -t sathishpskdocker/react-test -f ./client/Dockerfile.dev ./client
script:
- docker run -e CI=true sathishpskdocker/react-test npm test
after_success:
- docker build -t sathishpskdocker/multi-client ./client
- docker build -t sathishpskdocker/multi-nginx ./nginx
- docker build -t sathishpskdocker/multi-server ./server
- docker build -t sathishpskdocker/multi-worker ./worker
# Log in to the docker CLI
- echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_ID" --password-stdin
# Take those images and push them to docker hub
- docker push sathishpskdocker/multi-client
- docker push sathishpskdocker/multi-nginx
- docker push sathishpskdocker/multi-server
- docker push sathishpskdocker/multi-worker
deploy:
provider: elasticbeanstalk
region: 'us-west-2'
app: 'multi-docker'
env: 'Multidocker-env'
bucker_name: elasticbeanstalk-us-west-2-194531873493
bucker_path: docker-multi
On:
branch: master
access_key_id: $AWS_ACCESS_KEY
secret_access_key: $AWS_SECRET_KEY
Dockerrun.aws.json:
{
"AWSEBDockerrunVersion": 2,
"containerDefintions": [
{
"name": "client",
"image": "sathishpskdocker/multi-client",
"hostname": "client",
"essential": false,
"memory": 128
},
{
"name": "server",
"image": "sathishpskdocker/multi-server",
"hostname": "api",
"essential": false,
"memory": 128
},
{
"name": "worker",
"image": "sathishpskdocker/multi-worker",
"hostname": "worker",
"essential": false,
"memory": 128
},
{
"name": "nginx",
"image": "sathishpskdocker/multi-nginx",
"hostname": "nginx",
"essential": true,
"portMappings": [
{
"hostPort": 80,
"containerPort": 80
}
],
"links": ["client", "server"],
"memory": 128
}
]
}
Deploying part alone failing with the error:
Service:AmazonECS, Code:ClientException, Message:Container list cannot be empty., Class:com.amazonaws.services.ecs.model.ClientException
Ah, Never mind, it's my mistake. There is typo in the dockerrun config file which wrongly reads containerDefintions instead of containerDefinitions.
Thanks everyone whoever taking look at my question. Cheers!

Creation GCP ressource and get IP adresse

I must create new nexus server on GCP. I have decided to use nfs point for datastorage. All must be done with ansible ( instance is already created with terraform)
I must get the dynamic IP setted by GCP and create the mount point.
It's working fine with gcloud command, but how to get only IP info ?
Code:
- name: get info
shell: gcloud filestore instances describe nfsnexus --project=xxxxx --zone=xxxxx --format='get(networks.ipAddresses)'
register: ip
- name: Print all available facts
ansible.builtin.debug:
msg: "{{ip}}"
result:
ok: [nexus-ppd.preprod.d-aim.com] => {
"changed": false,
"msg": {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": true,
"cmd": "gcloud filestore instances describe nfsnexus --project=xxxxx --zone=xxxxx --format='get(networks.ipAddresses)'",
"delta": "0:00:00.763235",
"end": "2021-03-14 00:33:43.727857",
"failed": false,
"rc": 0,
"start": "2021-03-14 00:33:42.964622",
"stderr": "",
"stderr_lines": [],
"stdout": "['1x.x.x.1xx']",
"stdout_lines": [
"['1x.x.x.1xx']"
]
}
}
Thanks
Just use the proper format string, eg. to get the first IP:
--format='get(networks.ipAddresses[0])'
Find solution just add this:
- name:
debug:
msg: "{{ip.stdout_lines}}"
I'am feeling so stupid :(, I must stop to work after 2h AM :)
Thx

Installing authorized_keys file under custom user for Ubuntu AWS

I'm trying to setup an ubuntu server and login with a non-default user. I've used cloud-config with the user data to setup an initial user, and packer to provision the server:
system_info:
default_user:
name: my_user
shell: /bin/bash
home: /home/my_user
sudo: ['ALL=(ALL) NOPASSWD:ALL']
Packer logs in and provisions the server as my_user, but when I launch an instance from the AMI, AWS installs the authorized_keys files under /home/ubuntu/.ssh/
Packer config:
{
"variables": {
"aws_profile": ""
},
"builders": [{
"type": "amazon-ebs",
"profile": "{{user `aws_profile`}}",
"region": "eu-west-1",
"instance_type": "c5.large",
"source_ami_filter": {
"most_recent": true,
"owners": ["099720109477"],
"filters": {
"name": "*ubuntu-xenial-16.04-amd64-server-*",
"virtualization-type": "hvm",
"root-device-type": "ebs"
}
},
"ami_name": "my_ami_{{timestamp}}",
"ssh_username": "my_user",
"user_data_file": "cloud-config"
}],
"provisioners": [{
"type": "shell",
"pause_before": "10s",
"inline": [
"echo 'run some commands'"
]}
]
}
Once the server has launched, both ubuntu and my_user users exist in /etc/passwd:
my_user:1000:1002:Ubuntu:/home/my_user:/bin/bash
ubuntu:x:1001:1003:Ubuntu:/home/ubuntu:/bin/bash
At what point does the ubuntu user get created, and is there a way to install the authorized_keys file under /home/my_user/.ssh at launch instead of ubuntu?
To persist the default user when using the AMI to launch new EC2 instances from it you have to change the value is /etc/cloud/cloud.cfg and update this part:
system_info:
default_user:
# Update this!
name: ubuntu
You can add your public keys when you create the user using cloud-init. Here is how you do it.
users:
- name: <username>
groups: [ wheel ]
sudo: [ "ALL=(ALL) NOPASSWD:ALL" ]
shell: /bin/bash
ssh-authorized-keys:
- ssh-rsa AAAAB3Nz<your public key>...
Addding additional SSH user account with cloud-init

SSH timeout error when building AWS AMI with Vagrant

I am trying to setup an AWS AMI vagrant provision: http://www.packer.io/docs/builders/amazon-ebs.html
I am using the standard .json config:
{
"type": "amazon-instance",
"access_key": "YOUR KEY HERE",
"secret_key": "YOUR SECRET KEY HERE",
"region": "us-east-1",
"source_ami": "ami-d9d6a6b0",
"instance_type": "m1.small",
"ssh_username": "ubuntu",
"account_id": "0123-4567-0890",
"s3_bucket": "packer-images",
"x509_cert_path": "x509.cert",
"x509_key_path": "x509.key",
"x509_upload_path": "/tmp",
"ami_name": "packer-quick-start {{timestamp}}"
}
It connects fine, and I see it create the instance in my AWS account. However, I keep getting Timeout waiting for SSH as an error. What could be causing this problem and how can I resolve it?
As I mentioned in my comment above this is just because sometimes it takes more than a minute for an instance to launch and be SSH ready.
If you want you could set the timeout to be longer - the default timeout with packer is 1 minute.
So you could set it to 5 minutes by adding the following to your json config:
"ssh_timeout": "5m"