I would like to create a new centos 7 vm with kickstart option over the golang library libvirt-go.
To create the new vm I need a XML-Config which I generate over the libvirt-go-xml package.
Here is my function to generate a domain struct which I parse later into a XML-Dom.
func defineDomain(domainName string, vcpu *libvirtxml.DomainVCPU, disks []libvirtxml.DomainDisk, interfaces []libvirtxml.DomainInterface, memory *libvirtxml.DomainMemory) *libvirtxml.Domain {
domainId := 10
domain := &libvirtxml.Domain{
XMLName: xml.Name{
Space: "Hello",
Local: "World",
},
Type: "kvm",
ID: &domainId,
Name: domainName,
UUID: uuid.Must(uuid.NewV4()).String(),
Title: domainName,
Description: domainName,
Metadata: &libvirtxml.DomainMetadata{
XML: "",
},
Memory: memory,
VCPU: vcpu,
OS: &libvirtxml.DomainOS{
BootDevices: []libvirtxml.DomainBootDevice{
libvirtxml.DomainBootDevice{
Dev: "hd",
},
},
Kernel: "",
Initrd: "/home/markus/workspace/worker-management/centos/kvm-centos.ks",
Cmdline: "ks=file:/home/markus/workspace/worker-management/centos/kvm-centos.ks method=http://repo02.agfa.be/CentOS/7/os/x86_64/",
Type: &libvirtxml.DomainOSType{
Arch: "x86_64",
Type: "hvm",
},
},
OnCrash: "restart",
OnPoweroff: "destroy",
OnReboot: "restart",
Devices: &libvirtxml.DomainDeviceList{
Emulator: "/usr/bin/kvm-spice",
Disks: disks,
Interfaces: interfaces,
Graphics: []libvirtxml.DomainGraphic{
libvirtxml.DomainGraphic{
VNC: &libvirtxml.DomainGraphicVNC{
AutoPort: "yes",
Listen: "127.0.0.1",
Keymap: "de",
Listeners: []libvirtxml.DomainGraphicListener{
libvirtxml.DomainGraphicListener{
Address: &libvirtxml.DomainGraphicListenerAddress{
Address: "127.0.0.1",
},
},
},
},
},
},
},
}
return domain
}
When I want to create the new vm with my XML-Dom, I get the following error.
2018/09/25 08:12:45 virError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2018-09-25T06:12:45.683418Z qemu-system-x86_64: -append only allowed with -kernel option')
I defined an empty string as Kernel option, because I don't know what to show off.
What exactly do I need to specify under kernel option for my VM to boot properly and
where can I find good documentation about setting the kernel option?
You've provided a wrong Initrd:
Initrd: "/home/markus/workspace/worker-management/centos/kvm-centos.ks",
This is not an initrd, but your kickstart file (which you correctly specified in Cmdline.
Specifying Kernel and Initrd is not recommended. This is intended for booting a VM from a kernel which is outside the VM instance itself. In almost all circumstances, you do not want this.
Instead, this should also be the empty string, the same as Kernel. The VM will then boot from the virtual boot media (hard drive, ISO image, etc.) that you will provide.
Related
i am following this document to set up the distributed tracing : https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengistio-intro-topic.htm#exploring_istio_observability
My Cluster is on GKE GCP for testing purposes, installed istio top of it and followed document and setup services.
Services are up and running with Prometheus, Grafana, Jeger & Zipkin.
It's failing from step : Performing Distributed Tracing with OCI Application Performance Monitoring.
Tried udpating configmap for sidecar injector so that i can push tracing details to zipkin domain.
Configured Zipkin domain and using public-span use of now in configmap.
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-custom-bootstrap-config
namespace: default
data:
custom_bootstrap.json: |
{
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"#type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"collector_endpoint": "/20200101/observations/private-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=2C6YOLQSUZ5Q7IGN", // [Replace with the private datakey of your apm domain. You can also use public datakey but change the observation type to public-span]
"collectorEndpointVersion": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain:443]
"type": "STRICT_DNS",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"#type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com" // [Replace this with data upload endpoint of your apm domain]
}
}
}]
}
}
above configmap not working as expected, the sidecar is crashing due to missing key connection_timeout although after adding in configmap sidecar not showing error.
There is no error found in Zipkin or Istiod containers, not sure how to debug further.
Error log :
2022-03-30T05:59:33.146580Z info FLAG: --concurrency="2"
2022-03-30T05:59:33.146632Z info FLAG: --domain="default.svc.cluster.local"
2022-03-30T05:59:33.146642Z info FLAG: --help="false"
2022-03-30T05:59:33.146648Z info FLAG: --log_as_json="false"
2022-03-30T05:59:33.146672Z info FLAG: --log_caller=""
2022-03-30T05:59:33.146678Z info FLAG: --log_output_level="default:info"
2022-03-30T05:59:33.146682Z info FLAG: --log_rotate=""
2022-03-30T05:59:33.146687Z info FLAG: --log_rotate_max_age="30"
2022-03-30T05:59:33.146693Z info FLAG: --log_rotate_max_backups="1000"
2022-03-30T05:59:33.146699Z info FLAG: --log_rotate_max_size="104857600"
2022-03-30T05:59:33.146704Z info FLAG: --log_stacktrace_level="default:none"
2022-03-30T05:59:33.146715Z info FLAG: --log_target="[stdout]"
2022-03-30T05:59:33.146725Z info FLAG: --meshConfig="./etc/istio/config/mesh"
2022-03-30T05:59:33.146730Z info FLAG: --outlierLogPath=""
2022-03-30T05:59:33.146736Z info FLAG: --proxyComponentLogLevel="misc:error"
2022-03-30T05:59:33.146741Z info FLAG: --proxyLogLevel="warning"
2022-03-30T05:59:33.146747Z info FLAG: --serviceCluster="reviews.default"
2022-03-30T05:59:33.146753Z info FLAG: --stsPort="0"
2022-03-30T05:59:33.146760Z info FLAG: --templateFile=""
2022-03-30T05:59:33.146767Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2022-03-30T05:59:33.146784Z info Version 1.8.0-c87a4c874df27e37a3e6c25fa3d1ef6279685d23-Clean
2022-03-30T05:59:33.146991Z info Obtained private IP [10.4.1.6]
2022-03-30T05:59:33.147107Z info Apply proxy config from env {"tracing":{"zipkin":{"address":"caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443"}},"proxyMetadata":{"DNS_AGENT":""}}
2022-03-30T05:59:33.148650Z info Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
proxyMetadata:
DNS_AGENT: ""
serviceCluster: reviews.default
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
zipkin:
address: caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443
2022-03-30T05:59:33.148721Z info Proxy role: &model.Proxy{RWMutex:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, Type:"sidecar", IPAddresses:[]string{"10.4.1.6"}, ID:"reviews-v1-5d6559df86-qbg6b.default", Locality:(*envoy_config_core_v3.Locality)(nil), DNSDomain:"default.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), PrevSidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), IstioVersion:(*model.IstioVersion)(nil), VerifiedIdentity:(*spiffe.Identity)(nil), ipv6Support:false, ipv4Support:false, GlobalUnicastIP:"", XdsResourceGenerator:model.XdsResourceGenerator(nil), WatchedResources:map[string]*model.WatchedResource(nil)}
2022-03-30T05:59:33.148732Z info JWT policy is third-party-jwt
2022-03-30T05:59:33.148777Z info PilotSAN []string{"istiod.istio-system.svc"}
2022-03-30T05:59:33.148827Z info sa.serverOptions.CAEndpoint == istiod.istio-system.svc:15012 Citadel
2022-03-30T05:59:33.148916Z info Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.149082Z info citadelclient Citadel client using custom root: istiod.istio-system.svc:15012 -----BEGIN CERTIFICATE-----
MIIC/DCCAeSgAwIBAgIQOzOVPb98v+UHCpf80MI1pTANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTIyMDMyOTE3MzIyOFoXDTMyMDMy
NjE3MzIyOFowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAO4j6Sa5VoFCUctY/ehMsFfXjejVHE05PzgaTt0x
zGK6WDLd4bQHVxiEERs2bQcPYP55T+AqBo4cyU5BFi7gEvrVdfHDMGdl4f3rhojB
RNdPLw9axyBNulOYBGIOIthpYY45fPLqvADQmU6GIUqcpg83zuwiyufbaCuElVuJ
h3eMebBQL6zsm+4BFZOTECvjMMpH/HSjOKdW/XsUU71FSVPo9q6devzLgCquZemO
kWHGjTtibwPcyRTZiL9FgBMnFF5gXe5K8FauIQlgkTDTWPj99n2FPGrfgEEC+z3q
O12NYi41zdY9RTk7f6kFHTzLRcGQ8ItG9MRebfZSfDqudCsCAwEAAaNCMEAwDgYD
VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFKAN5Ltn7oIN
l+9yoTfvOIvhBdTCMA0GCSqGSIb3DQEBCwUAA4IBAQCOKu1XEvJKXwRR/VNaL19L
iTIsC5csW4Dg1Z8aFQk+1UwroBbsdjCkPiwK0FJKHMoobIOtSbjn9k+OaUfv4pZo
D8dsDznqGJpkkiZ7zviwmpS3+B2YHoKFRs0ZXHu4hC081AUFjfFvFcwjtfPYKSGU
KqtxKPuvXCVGqaPdmkg5J4gG5q+Yutxno4m3VxGVocuHzXI9/Kox2Lz0C3royfF7
XoTxNy08TzkjDPuPCLqYy85zFOM7PzuuuK7ZIkdXpKbStIWLbjkciqLPzwi18JaH
eyS1/hORUC7AKMj8a3fKWrFsRiMu4Mdv+knnQ1ntLqb5Vy85VTvNFAvAB7mwD/NN
-----END CERTIFICATE-----
2022-03-30T05:59:33.219251Z info sds SDS gRPC server for workload UDS starts, listening on "./etc/istio/proxy/SDS"
2022-03-30T05:59:33.219548Z info xdsproxy Initializing with upstream address istiod.istio-system.svc:15012 and cluster Kubernetes
2022-03-30T05:59:33.219346Z info sds Start SDS grpc server
2022-03-30T05:59:33.220303Z info xdsproxy adding watcher for certificate var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.220894Z info Starting proxy agent
2022-03-30T05:59:33.222017Z info Opening status port 15020
2022-03-30T05:59:33.222278Z info Received new config, creating new Envoy epoch 0
2022-03-30T05:59:33.222328Z info Epoch 0 starting
2022-03-30T05:59:33.239683Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster reviews.default --service-node sidecar~10.4.1.6~reviews-v1-5d6559df86-qbg6b.default~default.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error --config-yaml {
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"#type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=MAYH36IJELZRXTEETKL7QEA7NPA5UNEI",
"collectorEndpointVersion": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com:443",
"connect_timeout": "5s",
"type": "STRICT_DNS",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"#type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com"
}
}
}]
}
}
--concurrency 2]
2022-03-30T05:59:33.315619Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.315693Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316469Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316542Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.390651Z info xdsproxy Envoy ADS stream established
2022-03-30T05:59:33.391110Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012
2022-03-30T05:59:33.396461Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2022-03-30T05:59:33.478768Z info sds resource:ROOTCA new connection
2022-03-30T05:59:33.479543Z info sds Skipping waiting for gateway secret
2022-03-30T05:59:33.479419Z info sds resource:default new connection
2022-03-30T05:59:33.479917Z info sds Skipping waiting for gateway secret
2022-03-30T05:59:33.682346Z info cache Root cert has changed, start rotating root cert for SDS clients
2022-03-30T05:59:33.682714Z info cache GenerateSecret default
2022-03-30T05:59:33.683386Z info sds resource:default pushed key/cert pair to proxy
2022-03-30T05:59:34.079948Z info cache Loaded root cert from certificate ROOTCA
2022-03-30T05:59:34.080300Z info sds resource:ROOTCA pushed root cert to proxy
2022-03-30T05:59:34.154971Z warning envoy config gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 10.8.14.87_14250: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_20001: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.76_3000: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9411: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.191_15021: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9080: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.5.43_443: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15010: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15014: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.14.87_14268: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_80: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9090: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
virtualInbound: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
2022-03-30T05:59:35.844720Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 reje
After 2-3 days of debugging was able to resolve distributed tracing issue with istio, Zipkin and OCI APM.
Note : With root user it was not working, so I created one compartment in OCI created IAM policy, group and give full access of compartment to the group.
Added root user to group and weirdly it started working while with direct root user and default policy it was not working.
Ref doc for policy : https://docs-uat.us.oracle.com/en/cloud/paas/application-performance-monitoring/apmgn/perform-oracle-cloud-infrastructure-prerequisite-tasks.html
Working configmap sidecar
connect_timeout key is required otherwise sidecar is failing and due to that PODs won't come in Ready state. Port 443 mentioned in the official documentation is not required.
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-custom-bootstrap-config
data:
custom_bootstrap.json: |
{
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"#type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=M7SOSHXXXXXXXXXXXXXXXXXXXUZEHOGRSA",
"collector_endpoint_version": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"type": "STRICT_DNS",
"connect_timeout": "5s",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"#type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com"
}
}
}]
}
}
Istio config
sampling: 100 will push mostly all traces to Zipkin and OCI APM domain. Also i enabled enableTracing: true
Read more at : https://istio.io/latest/docs/tasks/observability/distributed-tracing/mesh-and-proxy-config/
data:
mesh: |-
accessLogFile: /dev/stdout
enableTracing: true
defaultConfig:
discoveryAddress: istiod.istio-system.svc:15012
proxyMetadata: {}
tracing:
sampling: 100
zipkin:
address: aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443
enablePrometheusMerge: true
rootNamespace: istio-system
outboundTrafficPolicy:
mode: ALLOW_ANY
trustDomain: cluster.local
meshNetworks: 'networks: {}'
OCI console
I must create new nexus server on GCP. I have decided to use nfs point for datastorage. All must be done with ansible ( instance is already created with terraform)
I must get the dynamic IP setted by GCP and create the mount point.
It's working fine with gcloud command, but how to get only IP info ?
Code:
- name: get info
shell: gcloud filestore instances describe nfsnexus --project=xxxxx --zone=xxxxx --format='get(networks.ipAddresses)'
register: ip
- name: Print all available facts
ansible.builtin.debug:
msg: "{{ip}}"
result:
ok: [nexus-ppd.preprod.d-aim.com] => {
"changed": false,
"msg": {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": true,
"cmd": "gcloud filestore instances describe nfsnexus --project=xxxxx --zone=xxxxx --format='get(networks.ipAddresses)'",
"delta": "0:00:00.763235",
"end": "2021-03-14 00:33:43.727857",
"failed": false,
"rc": 0,
"start": "2021-03-14 00:33:42.964622",
"stderr": "",
"stderr_lines": [],
"stdout": "['1x.x.x.1xx']",
"stdout_lines": [
"['1x.x.x.1xx']"
]
}
}
Thanks
Just use the proper format string, eg. to get the first IP:
--format='get(networks.ipAddresses[0])'
Find solution just add this:
- name:
debug:
msg: "{{ip.stdout_lines}}"
I'am feeling so stupid :(, I must stop to work after 2h AM :)
Thx
As a learning exercise, I'm trying to set up a docker swarm on two test AWS EC2 instances, but I'm running into a problem when I try to access the service from the IP address of the worker node.
On the master server, I ran docker swarm init. Then I took the output token and ran docker swarm join --token <token> <Master Private IP>:2377
Then I did a simple docker service create -p 80:80 --name nginx nginx on the master, followed by a docker service scale nginx=2. Now, checking with docker service ps nginx gives the following:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
idux5dftj9oj nginx.1 nginx:latest ip-172-31-13-2 Running Running 12 minutes ago
2nwfw3fncybj nginx.2 nginx:latest ip-172-31-14-130 Running Running 38 seconds ago
I've opened the inbound ports on the security groups according to this guide, specifically:
TCP port 2377
TCP and UDP port 7946
UDP port 4789
The master and worker servers have the same security group, so I just set the source to itself.
When I run curl http://localhost on the master, it gives me this, which proves it works:
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
<!-- Omitting this for brevity -->
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<!-- Omitting this for brevity -->
</body>
But on the worker, I just get curl: (7) Failed to connect to localhost port 80: Connection refused
A docker ps on the worker gives me:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b37770b153db nginx:latest "nginx -g 'daemon of…" 34 minutes ago Up 34 minutes 80/tcp nginx.2.2nwfw3fncybjj7qzeierlx0xr
Running docker service inspect nginx on the master gives:
[
{
"ID": "887xm47oavn367w0o4bo1nmce",
"Version": {
"Index": 652
},
"CreatedAt": "2019-05-19T07:50:54.491113206Z",
"UpdatedAt": "2019-05-19T08:02:53.454804111Z",
"Spec": {
"Name": "nginx",
"Labels": {},
"TaskTemplate": {
"ContainerSpec": {
"Image": "nginx:latest#sha256:23b4dcdf0d34d4a129755fc6f52e1c6e23bb34ea011b315d87e193033bcd1b68",
"Init": false,
"StopGracePeriod": 10000000000,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {
"Limits": {},
"Reservations": {}
},
"RestartPolicy": {
"Condition": "any",
"Delay": 5000000000,
"MaxAttempts": 0
},
"Placement": {
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 2
}
},
"UpdateConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"RollbackConfig": {
"Parallelism": 1,
"FailureAction": "pause",
"Monitor": 5000000000,
"MaxFailureRatio": 0,
"Order": "stop-first"
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"PreviousSpec": {
"Name": "nginx",
"Labels": {},
"TaskTemplate": {
"ContainerSpec": {
"Image": "nginx:latest#sha256:23b4dcdf0d34d4a129755fc6f52e1c6e23bb34ea011b315d87e193033bcd1b68",
"Init": false,
"DNSConfig": {},
"Isolation": "default"
},
"Resources": {
"Limits": {},
"Reservations": {}
},
"Placement": {
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
},
{
"OS": "linux"
},
{
"Architecture": "arm64",
"OS": "linux"
},
{
"Architecture": "386",
"OS": "linux"
},
{
"Architecture": "ppc64le",
"OS": "linux"
},
{
"Architecture": "s390x",
"OS": "linux"
}
]
},
"ForceUpdate": 0,
"Runtime": "container"
},
"Mode": {
"Replicated": {
"Replicas": 1
}
},
"EndpointSpec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
}
},
"Endpoint": {
"Spec": {
"Mode": "vip",
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
]
},
"Ports": [
{
"Protocol": "tcp",
"TargetPort": 80,
"PublishedPort": 80,
"PublishMode": "ingress"
}
],
"VirtualIPs": [
{
"NetworkID": "6scdvoeno2tviu4zgyldmq6b4",
"Addr": "10.255.0.82/16"
}
]
}
}
]
Here's the master's docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.09.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: q4h5ahgxf1xwuyi2aotyt20iy
Is Manager: true
ClusterID: r88oqh59x74bl1kqrcg5od2qd
Managers: 1
Nodes: 2
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.31.13.2
Manager Addresses:
172.31.13.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1021-aws
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.945GiB
Name: ip-172-31-13-2
ID: RM34:I2IM:EJ2V:W74X:ECSD:ABCC:ZB4T:B7UO:OIWW:SUQ2:ILDB:HQLQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
And here's the worker's docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.09.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: slya32xwjmklumhm23bt7xs6m
Is Manager: false
Node Address: 172.31.14.130
Manager Addresses:
172.31.13.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-1021-aws
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.945GiB
Name: ip-172-31-14-130
ID: X7FI:3VCW:OCVI:5XSX:HJ24:2NOD:NQYU:SEYL:JVIJ:J4DI:F5UL:NKZT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: bizmd
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
As far as I've read, there should not be any problems after adding the worker to the swarm and creating a service. Despite that, the worker cannot access the nginx service that it is already hosting.
What could be causing this issue?
I had the idea to check which ports were actually opened in my worker server (as opposed to just which were opened on the firewall).
netstat -tulpn showed me:
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp6 0 0 :::9443 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
udp 19968 0 127.0.0.53:53 0.0.0.0:* -
udp 0 0 172.31.14.130:68 0.0.0.0:* -
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
I noticed that no process was consuming 7946, which is one of the ports that needed to be opened up. So I restarted the docker service: sudo service docker restart
After the restart finished, I saw a process start up and consumed the port. Sure enough, I was then able to execute curl localhost against either node.
I'm building up a custom platform to run our application. We have default VPC deleted, so according to the documentation I have to specify the VPC and subnet id almost everywhere. So the command I run for ebp looks like following:
ebp create -v --vpc.id vpc-xxxxxxx --vpc.subnets subnet-xxxxxx --vpc.publicip{code}
The above spins up the pcakcer environment without any issue however when the packer start to build an instance I'm getting the following error:
2017-12-07 18:07:05 UTC+0100 ERROR [Instance: i-00f376be9fc2fea34] Command failed on instance. Return code: 1 Output: 'packer build' failed, the build log has been saved to '/var/log/packer-builder/XXX1.0.19-builder.log'. Hook /opt/elasticbeanstalk/hooks/packerbuild/build.rb failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
2017-12-07 18:06:55 UTC+0100 ERROR 'packer build' failed, the build log has been saved to '/var/log/packer-builder/XXX:1.0.19-builder.log'
2017-12-07 18:06:55 UTC+0100 ERROR Packer failed with error: '--> HVM AMI builder: VPCIdNotSpecified: No default VPC for this user status code: 400, request id: 28d94e8c-e24d-440f-9c64-88826e042e9d'{code}
Both the template and the platform.yaml specify vpc_id and subnet id, however this is not taken into account by packer.
platform.yaml:
version: "1.0"
provisioner:
type: packer
template: tomcat_platform.json
flavor: ubuntu1604
metadata:
maintainer: <Enter your contact details here>
description: Ubuntu running Tomcat
operating_system_name: Ubuntu Server
operating_system_version: 16.04 LTS
programming_language_name: Java
programming_language_version: 8
framework_name: Tomcat
framework_version: 7
app_server_name: "none"
app_server_version: "none"
option_definitions:
- namespace: "aws:elasticbeanstalk:container:custom:application"
option_name: "TOMCAT_START"
description: "Default application startup command"
default_value: ""
option_settings:
- namespace: "aws:ec2:vpc"
option_name: "VPCId"
value: "vpc-xxxxxxx"
- namespace: "aws:ec2:vpc"
option_name: "Subnets"
value: "subnet-xxxxxxx"
- namespace: "aws:elb:listener:80"
option_name: "InstancePort"
value: "8080"
- namespace: "aws:elasticbeanstalk:application"
option_name: "Application Healthcheck URL"
value: "TCP:8080"
tomcat_platform.json:
{
"variables": {
"platform_name": "{{env `AWS_EB_PLATFORM_NAME`}}",
"platform_version": "{{env `AWS_EB_PLATFORM_VERSION`}}",
"platform_arn": "{{env `AWS_EB_PLATFORM_ARN`}}"
},
"builders": [
{
"type": "amazon-ebs",
"region": "eu-west-1",
"source_ami": "ami-8fd760f6",
"instance_type": "t2.micro",
"ami_virtualization_type": "hvm",
"ssh_username": "admin",
"ami_name": "Tomcat running on Ubuntu Server 16.04 LTS (built on {{isotime \"20060102150405\"}})",
"ami_description": "Tomcat running on Ubuntu Server 16.04 LTS (built on {{isotime \"20060102150405\"}})",
"vpc_id": "vpc-xxxxxx",
"subnet_id": "subnet-xxxxxx",
"associate_public_ip_address": "true",
"tags": {
"eb_platform_name": "{{user `platform_name`}}",
"eb_platform_version": "{{user `platform_version`}}",
"eb_platform_arn": "{{user `platform_arn`}}"
}
}
],
"provisioners": [
{
"type": "file",
"source": "builder",
"destination": "/tmp/"
},
{
"type": "shell",
"execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo {{ .Path }}",
"scripts": [
"builder/builder.sh"
]
}
]
}
Appreciate any idea on how to make this work as expected. I found couple of issues with the Packer, but seems to be resolved on their side so the documentation just says that the template must specify target VPC and Subnet.
The AWS documentation is a little misleading in this instance. You do need a default VPC in order to create a custom platform. From what I've seen, this is because the VPC flags that you are passing in to the ebp create command aren't passed along to the packer process that actually builds the platform.
To get around the error, you can just create a new default VPC that you just use for custom platform creation.
Packer looks for a default VPC (default behavior of Packer) while creating the resources required for building a custom platform which includes launching an EC2 instance, creating a Security Group etc., However, if a default VPC is not present in the region (for example, if it is deleted), Packer Build Task would fail with the following error:
Packer failed with error: '--> HVM AMI builder: VPCIdNotSpecified: No default VPC for this user status code: 400, request id: xyx-yxyx-xyx'
To fix this error, use the following attributes in the "builders" section of the 'template.json' file for packer to use a custom VPC and Subnets while creating the resources :
▸ vpc_id
▸ subnet_id
I'm trying to push a very simple rails app on a DigitalOcean droplet. Unfortunatly, i'm enable to continue the deployment : i get stuck with a very elusive error message :
info: [agent] get agent status
info: [agent] agent is running: true
info: [agent] get agent status
info: [agent] agent is running: true
info: [agent] get agent status
info: [agent] agent is running: true
info: [agent] get agent status
info: [agent] agent is running: true
info: Connecting to http://192.168.50.4:2375
PLAY [all] *********************************************************************
TASK [setup] *******************************************************************
fatal: [default]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n", "msg": "MODULE FAILURE", "parsed": false}
NO MORE HOSTS LEFT *************************************************************
to retry, use: --limit #playbooks/setup.retry
PLAY RECAP *********************************************************************
default : ok=0 changed=0 unreachable=0 failed=1
Am i the only one who ever encounter this problem ?
Here is my Azkfile too :
/**
* Documentation: http://docs.azk.io/Azkfile.js
*/
// Adds the systems that shape your system
systems({
'apptelier-website': {
// Dependent systems
depends: [],
// More images: http://images.azk.io
image: {docker: 'azukiapp/ruby:2.3.0'},
// Steps to execute before running instances
provision: [
"bundle install --path /azk/bundler"
],
workdir: "/azk/#{manifest.dir}",
shell: "/bin/bash",
command: ["bundle", "exec", "rackup", "config.ru", "--pid", "/tmp/ruby.pid", "--port", "$HTTP_PORT", "--host", "0.0.0.0"],
wait: 20,
mounts: {
'/azk/#{manifest.dir}': sync("."),
'/azk/bundler': persistent("./bundler"),
'/azk/#{manifest.dir}/tmp': persistent("./tmp"),
'/azk/#{manifest.dir}/log': path("./log"),
'/azk/#{manifest.dir}/.bundle': path("./.bundle")
},
scalable: {"default": 1},
http: {
domains: [
'#{env.HOST_DOMAIN}', // used if deployed
'#{env.HOST_IP}', // used if deployed
'#{system.name}.#{azk.default_domain}' // default azk domain
]
},
ports: {
// exports global variables
http: "3000/tcp"
},
envs: {
// Make sure that the PORT value is the same as the one
// in ports/http below, and that it's also the same
// if you're setting it in a .env file
RUBY_ENV: "production",
RAILS_ENV: "production",
RACK_ENV: 'production',
WORKER_RETRY: 1,
BUNDLE_APP_CONFIG: '/azk/bundler',
APP_URL: '#{system.name}.#{azk.default_domain}'
}
},
deploy: {
image: {docker: 'azukiapp/deploy-digitalocean'},
mounts: {
'/azk/deploy/src': path('.'),
'/azk/deploy/.ssh': path('#{env.HOME}/.ssh'), // Required to connect with the remote server
'/azk/deploy/.config': persistent('deploy-config')
},
// This is not a server. Just run it with `azk deploy`
scalable: {default: 0, limit: 0},
envs: {
GIT_REF: 'master',
AZK_RESTART_COMMAND: 'azk restart -Rvv',
BOX_SIZE: '512mb'
}
}
});
Thanks for the help !
Edouard, nice to meet you. I'm from azk core team and I'm not sure if you're using the latest deployment image.
Please follow these steps to update it:
adocker pull azukiapp/deploy-digitalocean:0.0.7;
Edit the deploy system in your Azkfile and add the tag 0.0.7 to the used deployment image to ensure we're using the latest one. It should be like: image: {docker: 'azukiapp/deploy-digitalocean:0.0.7'};
Next, be sure you have the env DEPLOY_API_TOKEN set in your .env file. If you don't have it set yet, take a look on the Step 7 of the article we've published on DigitalOcean Community Tutorials: https://www.digitalocean.com/community/tutorials/how-to-deploy-a-rails-app-with-azk#step-7-%E2%80%94-obtaining-a-digitalocean-api-token
Finally, re-run the deploy command:
azk deploy clear-cache;
azk deploy
Please let me know if this is enough to solve your problem.