OCI APM domain with Istio zipkin not pushing tracing details - istio

i am following this document to set up the distributed tracing : https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengistio-intro-topic.htm#exploring_istio_observability
My Cluster is on GKE GCP for testing purposes, installed istio top of it and followed document and setup services.
Services are up and running with Prometheus, Grafana, Jeger & Zipkin.
It's failing from step : Performing Distributed Tracing with OCI Application Performance Monitoring.
Tried udpating configmap for sidecar injector so that i can push tracing details to zipkin domain.
Configured Zipkin domain and using public-span use of now in configmap.
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-custom-bootstrap-config
namespace: default
data:
custom_bootstrap.json: |
{
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"#type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"collector_endpoint": "/20200101/observations/private-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=2C6YOLQSUZ5Q7IGN", // [Replace with the private datakey of your apm domain. You can also use public datakey but change the observation type to public-span]
"collectorEndpointVersion": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain:443]
"type": "STRICT_DNS",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com", // [Replace this with data upload endpoint of your apm domain]
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"#type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com" // [Replace this with data upload endpoint of your apm domain]
}
}
}]
}
}
above configmap not working as expected, the sidecar is crashing due to missing key connection_timeout although after adding in configmap sidecar not showing error.
There is no error found in Zipkin or Istiod containers, not sure how to debug further.
Error log :
2022-03-30T05:59:33.146580Z info FLAG: --concurrency="2"
2022-03-30T05:59:33.146632Z info FLAG: --domain="default.svc.cluster.local"
2022-03-30T05:59:33.146642Z info FLAG: --help="false"
2022-03-30T05:59:33.146648Z info FLAG: --log_as_json="false"
2022-03-30T05:59:33.146672Z info FLAG: --log_caller=""
2022-03-30T05:59:33.146678Z info FLAG: --log_output_level="default:info"
2022-03-30T05:59:33.146682Z info FLAG: --log_rotate=""
2022-03-30T05:59:33.146687Z info FLAG: --log_rotate_max_age="30"
2022-03-30T05:59:33.146693Z info FLAG: --log_rotate_max_backups="1000"
2022-03-30T05:59:33.146699Z info FLAG: --log_rotate_max_size="104857600"
2022-03-30T05:59:33.146704Z info FLAG: --log_stacktrace_level="default:none"
2022-03-30T05:59:33.146715Z info FLAG: --log_target="[stdout]"
2022-03-30T05:59:33.146725Z info FLAG: --meshConfig="./etc/istio/config/mesh"
2022-03-30T05:59:33.146730Z info FLAG: --outlierLogPath=""
2022-03-30T05:59:33.146736Z info FLAG: --proxyComponentLogLevel="misc:error"
2022-03-30T05:59:33.146741Z info FLAG: --proxyLogLevel="warning"
2022-03-30T05:59:33.146747Z info FLAG: --serviceCluster="reviews.default"
2022-03-30T05:59:33.146753Z info FLAG: --stsPort="0"
2022-03-30T05:59:33.146760Z info FLAG: --templateFile=""
2022-03-30T05:59:33.146767Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2022-03-30T05:59:33.146784Z info Version 1.8.0-c87a4c874df27e37a3e6c25fa3d1ef6279685d23-Clean
2022-03-30T05:59:33.146991Z info Obtained private IP [10.4.1.6]
2022-03-30T05:59:33.147107Z info Apply proxy config from env {"tracing":{"zipkin":{"address":"caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443"}},"proxyMetadata":{"DNS_AGENT":""}}
2022-03-30T05:59:33.148650Z info Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
proxyMetadata:
DNS_AGENT: ""
serviceCluster: reviews.default
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
zipkin:
address: caadc76wvdp7edddddddccclii.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443
2022-03-30T05:59:33.148721Z info Proxy role: &model.Proxy{RWMutex:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, Type:"sidecar", IPAddresses:[]string{"10.4.1.6"}, ID:"reviews-v1-5d6559df86-qbg6b.default", Locality:(*envoy_config_core_v3.Locality)(nil), DNSDomain:"default.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), PrevSidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), IstioVersion:(*model.IstioVersion)(nil), VerifiedIdentity:(*spiffe.Identity)(nil), ipv6Support:false, ipv4Support:false, GlobalUnicastIP:"", XdsResourceGenerator:model.XdsResourceGenerator(nil), WatchedResources:map[string]*model.WatchedResource(nil)}
2022-03-30T05:59:33.148732Z info JWT policy is third-party-jwt
2022-03-30T05:59:33.148777Z info PilotSAN []string{"istiod.istio-system.svc"}
2022-03-30T05:59:33.148827Z info sa.serverOptions.CAEndpoint == istiod.istio-system.svc:15012 Citadel
2022-03-30T05:59:33.148916Z info Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.149082Z info citadelclient Citadel client using custom root: istiod.istio-system.svc:15012 -----BEGIN CERTIFICATE-----
MIIC/DCCAeSgAwIBAgIQOzOVPb98v+UHCpf80MI1pTANBgkqhkiG9w0BAQsFADAY
MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTIyMDMyOTE3MzIyOFoXDTMyMDMy
NjE3MzIyOFowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
AQEBBQADggEPADCCAQoCggEBAO4j6Sa5VoFCUctY/ehMsFfXjejVHE05PzgaTt0x
zGK6WDLd4bQHVxiEERs2bQcPYP55T+AqBo4cyU5BFi7gEvrVdfHDMGdl4f3rhojB
RNdPLw9axyBNulOYBGIOIthpYY45fPLqvADQmU6GIUqcpg83zuwiyufbaCuElVuJ
h3eMebBQL6zsm+4BFZOTECvjMMpH/HSjOKdW/XsUU71FSVPo9q6devzLgCquZemO
kWHGjTtibwPcyRTZiL9FgBMnFF5gXe5K8FauIQlgkTDTWPj99n2FPGrfgEEC+z3q
O12NYi41zdY9RTk7f6kFHTzLRcGQ8ItG9MRebfZSfDqudCsCAwEAAaNCMEAwDgYD
VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFKAN5Ltn7oIN
l+9yoTfvOIvhBdTCMA0GCSqGSIb3DQEBCwUAA4IBAQCOKu1XEvJKXwRR/VNaL19L
iTIsC5csW4Dg1Z8aFQk+1UwroBbsdjCkPiwK0FJKHMoobIOtSbjn9k+OaUfv4pZo
D8dsDznqGJpkkiZ7zviwmpS3+B2YHoKFRs0ZXHu4hC081AUFjfFvFcwjtfPYKSGU
KqtxKPuvXCVGqaPdmkg5J4gG5q+Yutxno4m3VxGVocuHzXI9/Kox2Lz0C3royfF7
XoTxNy08TzkjDPuPCLqYy85zFOM7PzuuuK7ZIkdXpKbStIWLbjkciqLPzwi18JaH
eyS1/hORUC7AKMj8a3fKWrFsRiMu4Mdv+knnQ1ntLqb5Vy85VTvNFAvAB7mwD/NN
-----END CERTIFICATE-----
2022-03-30T05:59:33.219251Z info sds SDS gRPC server for workload UDS starts, listening on "./etc/istio/proxy/SDS"
2022-03-30T05:59:33.219548Z info xdsproxy Initializing with upstream address istiod.istio-system.svc:15012 and cluster Kubernetes
2022-03-30T05:59:33.219346Z info sds Start SDS grpc server
2022-03-30T05:59:33.220303Z info xdsproxy adding watcher for certificate var/run/secrets/istio/root-cert.pem
2022-03-30T05:59:33.220894Z info Starting proxy agent
2022-03-30T05:59:33.222017Z info Opening status port 15020
2022-03-30T05:59:33.222278Z info Received new config, creating new Envoy epoch 0
2022-03-30T05:59:33.222328Z info Epoch 0 starting
2022-03-30T05:59:33.239683Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster reviews.default --service-node sidecar~10.4.1.6~reviews-v1-5d6559df86-qbg6b.default~default.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error --config-yaml {
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"#type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=MAYH36IJELZRXTEETKL7QEA7NPA5UNEI",
"collectorEndpointVersion": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com:443",
"connect_timeout": "5s",
"type": "STRICT_DNS",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com",
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"#type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com"
}
}
}]
}
}
--concurrency 2]
2022-03-30T05:59:33.315619Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.315693Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316469Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.316542Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2022-03-30T05:59:33.390651Z info xdsproxy Envoy ADS stream established
2022-03-30T05:59:33.391110Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012
2022-03-30T05:59:33.396461Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2022-03-30T05:59:33.478768Z info sds resource:ROOTCA new connection
2022-03-30T05:59:33.479543Z info sds Skipping waiting for gateway secret
2022-03-30T05:59:33.479419Z info sds resource:default new connection
2022-03-30T05:59:33.479917Z info sds Skipping waiting for gateway secret
2022-03-30T05:59:33.682346Z info cache Root cert has changed, start rotating root cert for SDS clients
2022-03-30T05:59:33.682714Z info cache GenerateSecret default
2022-03-30T05:59:33.683386Z info sds resource:default pushed key/cert pair to proxy
2022-03-30T05:59:34.079948Z info cache Loaded root cert from certificate ROOTCA
2022-03-30T05:59:34.080300Z info sds resource:ROOTCA pushed root cert to proxy
2022-03-30T05:59:34.154971Z warning envoy config gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) 10.8.14.87_14250: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_20001: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.76_3000: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9411: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.1.191_15021: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9080: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.5.43_443: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15010: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_15014: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
10.8.14.87_14268: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_80: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
0.0.0.0_9090: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
virtualInbound: envoy.tracers.zipkin: unknown cluster 'aaaabbbb.apm-agt.us-ashburn-1.oci.oraclecloud.com'
2022-03-30T05:59:35.844720Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 reje

After 2-3 days of debugging was able to resolve distributed tracing issue with istio, Zipkin and OCI APM.
Note : With root user it was not working, so I created one compartment in OCI created IAM policy, group and give full access of compartment to the group.
Added root user to group and weirdly it started working while with direct root user and default policy it was not working.
Ref doc for policy : https://docs-uat.us.oracle.com/en/cloud/paas/application-performance-monitoring/apmgn/perform-oracle-cloud-infrastructure-prerequisite-tasks.html
Working configmap sidecar
connect_timeout key is required otherwise sidecar is failing and due to that PODs won't come in Ready state. Port 443 mentioned in the official documentation is not required.
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-custom-bootstrap-config
data:
custom_bootstrap.json: |
{
"tracing": {
"http": {
"name": "envoy.tracers.zipkin",
"typed_config": {
"#type": "type.googleapis.com/envoy.config.trace.v3.ZipkinConfig",
"collector_cluster": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"collector_endpoint": "/20200101/observations/public-span?dataFormat=zipkin&dataFormatVersion=2&dataKey=M7SOSHXXXXXXXXXXXXXXXXXXXUZEHOGRSA",
"collector_endpoint_version": "HTTP_JSON",
"trace_id_128bit": true,
"shared_span_context": false
}
}
},
"static_resources": {
"clusters": [{
"name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"type": "STRICT_DNS",
"connect_timeout": "5s",
"lb_policy": "ROUND_ROBIN",
"load_assignment": {
"cluster_name": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"endpoints": [{
"lb_endpoints": [{
"endpoint": {
"address": {
"socket_address": {
"address": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com",
"port_value": 443
}
}
}
}]
}]
},
"transport_socket": {
"name": "envoy.transport_sockets.tls",
"typed_config": {
"#type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext",
"sni": "aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com"
}
}
}]
}
}
Istio config
sampling: 100 will push mostly all traces to Zipkin and OCI APM domain. Also i enabled enableTracing: true
Read more at : https://istio.io/latest/docs/tasks/observability/distributed-tracing/mesh-and-proxy-config/
data:
mesh: |-
accessLogFile: /dev/stdout
enableTracing: true
defaultConfig:
discoveryAddress: istiod.istio-system.svc:15012
proxyMetadata: {}
tracing:
sampling: 100
zipkin:
address: aacytncaaaaaaaal2a.apm-agt.ap-mumbai-1.oci.oraclecloud.com:443
enablePrometheusMerge: true
rootNamespace: istio-system
outboundTrafficPolicy:
mode: ALLOW_ANY
trustDomain: cluster.local
meshNetworks: 'networks: {}'
OCI console

Related

Connecting KafkaSource to SASL enabled kafka broker of AWS MSK cluster forknative eventing

We are trying to implement event-driven architecture with our applications using knative eventing.
We wish to connect to an Apache Kafka cluster(AWS MSK) and have those messages flow through Knative Eventing.
Using the following blog we have deployed the kind : KafkaSource but it failing to connect MSK brokers when SASL authentication is enable to MSK cluster side.
https://knative.dev/docs/eventing/sources/kafka-source/#enabling-sasl-for-kafkasources
Note: And we are able to connect over plaintext communication with no authentication.
Please suggest a way to connect KafkaSource to MKS brokers having SASL enabled.
Please find the KafkaSource here
kind: KafkaSource
metadata:
name: kafka-source
spec:
consumerGroup: kntive-groups
bootstrapServers:
- my-cluster-kafka-bootstrap.kafka:9096 #MSK broker
topics:
- knative-input-topic
sink:
ref:
apiVersion: serving.knative.dev/v1
kind: Service
name: my-app-service
uri: /myAppUrl
net:
sasl:
enable: true
user:
secretKeyRef:
name: msk-secret
key: user
password:
secretKeyRef:
name: msk-secret
key: password
type:
secretKeyRef:
name: msk-secret
key: saslType
tls:
enable: false
caCert:
secretKeyRef:
name: msk-secret-tf
key: ca.crt
Please find the logs here(Noticed that it is always throwing one common error irrespective of any mistake in code- (client has run out of available brokers to talk to) But the brokers are actually rachable.
kubectl get kafkasource kafka-source -n my-ns -o yaml
status: conditions:
- lastTransitionTime: "2023-02-03T06:10:15Z"
message: 'kafka: client has run out of available brokers to talk to (Is your cluster
reachable?)'
reason: ClientCreationFailed
status: "False"
type: ConnectionEstablished
- lastTransitionTime: "2023-02-03T06:10:15Z"
status: Unknown
type: Deployed
- lastTransitionTime: "2023-02-03T06:10:15Z"
status: Unknown
type: InitialOffsetsCommitted
- lastTransitionTime: "2023-02-03T06:10:15Z"
message: 'kafka: client has run out of available brokers to talk to (Is your cluster
reachable?)'
reason: ClientCreationFailed
status: "False"
type: Ready
- lastTransitionTime: "2023-02-03T06:10:15Z"
status: "True"
type: SinkProvided
Logs:
kubectl logs deployment.apps/kafka-controller-manager -n knative-eventing
{
"level": "info",
"ts": "2023-02-03T06:21:18.865Z",
"logger": "kafka-controller",
"caller": "client/config.go:288",
"msg": "Built Sarama config: &{Admin:{Retry:{Max:5 Backoff:100ms} Timeout:3s} Net:{MaxOpenRequests:5 DialTimeout:30s ReadTimeout:30s WriteTimeout:30s TLS:{Enable:false Config:<nil>} SASL:{Enable:true Mechanism:SCRAM-SHA-512 Version:0 Handshake:true AuthIdentity: User:maas-ml-testuser2 Password: SCRAMAuthzID: SCRAMClientGeneratorFunc:0x1861980 TokenProvider:<nil> GSSAPI:{AuthType:0 KeyTabPath: KerberosConfigPath: ServiceName: Username: Password: Realm: DisablePAFXFAST:false}} KeepAlive:0s LocalAddr:<nil> Proxy:{Enable:false Dialer:<nil>}} Metadata:{Retry:{Max:3 Backoff:250ms BackoffFunc:<nil>} RefreshFrequency:10m0s Full:true Timeout:0s AllowAutoTopicCreation:true} Producer:{MaxMessageBytes:1000000 RequiredAcks:1 Timeout:10s Compression:none CompressionLevel:-1000 Partitioner:0x17cf660 Idempotent:false Return:{Successes:true Errors:true} Flush:{Bytes:0 Messages:0 Frequency:0s MaxMessages:0} Retry:{Max:3 Backoff:100ms BackoffFunc:<nil>} Interceptors:[]} Consumer:{Group:{Session:{Timeout:10s} Heartbeat:{Interval:3s} Rebalance:{Strategy:0x2f67290 Timeout:1m0s Retry:{Max:4 Backoff:2s}} Member:{UserData:[]}} Retry:{Backoff:2s BackoffFunc:<nil>} Fetch:{Min:1 Default:1048576 Max:0} MaxWaitTime:250ms MaxProcessingTime:100ms Return:{Errors:true} Offsets:{CommitInterval:0s AutoCommit:{Enable:true Interval:1s} Initial:-2 Retention:0s Retry:{Max:3}} IsolationLevel:0 Interceptors:[]} ClientID:sarama RackID: ChannelBufferSize:256 ApiVersionsRequest:true Version:1.0.0 MetricRegistry:0xc002ca4080}",
"commit": "394f005-dirty",
"knative.dev/controller": "knative.dev.eventing-kafka.pkg.source.reconciler.source.Reconciler",
"knative.dev/kind": "sources.knative.dev.KafkaSource",
"knative.dev/traceid": "4d6b80c4-2116-4acb-b5bc-e1d074c2a380",
"knative.dev/key": "coal-dev/uat-kafka-source"
}
{
"level": "error",
"ts": "2023-02-03T06:21:19.654Z",
"logger": "kafka-controller",
"caller": "source/kafkasource.go:184",
"msg": "unable to create a kafka client",
"commit": "394f005-dirty",
"knative.dev/controller": "knative.dev.eventing-kafka.pkg.source.reconciler.source.Reconciler",
"knative.dev/kind": "sources.knative.dev.KafkaSource",
"knative.dev/traceid": "4d6b80c4-2116-4acb-b5bc-e1d074c2a380",
"knative.dev/key": "coal-dev/uat-kafka-source",
"error": "kafka: client has run out of available brokers to talk to (Is your cluster reachable?)",
"stacktrace": "knative.dev/eventing-kafka/pkg/source/reconciler/source.(*Reconciler).ReconcileKind\n\tknative.dev/eventing-kafka/pkg/source/reconciler/source/kafkasource.go:184\nknative.dev/eventing-kafka/pkg/client/injection/reconciler/sources/v1beta1/kafkasource.(*reconcilerImpl).Reconcile\n\tknative.dev/eventing-kafka/pkg/client/injection/reconciler/sources/v1beta1/kafkasource/reconciler.go:239\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg#v0.0.0-20220818004048-4a03844c0b15/controller/controller.go:542\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg#v0.0.0-20220818004048-4a03844c0b15/controller/controller.go:491"
}
{
"level": "error",
"ts": "2023-02-03T06:21:19.655Z",
"logger": "kafka-controller",
"caller": "kafkasource/reconciler.go:302",
"msg": "Returned an error",
"commit": "394f005-dirty",
"knative.dev/controller": "knative.dev.eventing-kafka.pkg.source.reconciler.source.Reconciler",
"knative.dev/kind": "sources.knative.dev.KafkaSource",
"knative.dev/traceid": "4d6b80c4-2116-4acb-b5bc-e1d074c2a380",
"knative.dev/key": "coal-dev/uat-kafka-source",
"targetMethod": "ReconcileKind",
"error": "kafka: client has run out of available brokers to talk to (Is your cluster reachable?)",
"stacktrace": "knative.dev/eventing-kafka/pkg/client/injection/reconciler/sources/v1beta1/kafkasource.(*reconcilerImpl).Reconcile\n\tknative.dev/eventing-kafka/pkg/client/injection/reconciler/sources/v1beta1/kafkasource/reconciler.go:302\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg#v0.0.0-20220818004048-4a03844c0b15/controller/controller.go:542\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg#v0.0.0-20220818004048-4a03844c0b15/controller/controller.go:491"
}
{
"level": "error",
"ts": "2023-02-03T06:21:19.655Z",
"logger": "kafka-controller",
"caller": "controller/controller.go:566",
"msg": "Reconcile error",
"commit": "394f005-dirty",
"knative.dev/controller": "knative.dev.eventing-kafka.pkg.source.reconciler.source.Reconciler",
"knative.dev/kind": "sources.knative.dev.KafkaSource",
"knative.dev/traceid": "4d6b80c4-2116-4acb-b5bc-e1d074c2a380",
"knative.dev/key": "coal-dev/uat-kafka-source",
"duration": 0.813508291,
"error": "kafka: client has run out of available brokers to talk to (Is your cluster reachable?)",
"stacktrace": "knative.dev/pkg/controller.(*Impl).handleErr\n\tknative.dev/pkg#v0.0.0-20220818004048-4a03844c0b15/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg#v0.0.0-20220818004048-4a03844c0b15/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg#v0.0.0-20220818004048-4a03844c0b15/controller/controller.go:491"
}
{
"level": "info",
"ts": "2023-02-03T06:21:19.655Z",
"logger": "kafka-controller.event-broadcaster",
"caller": "record/event.go:285",
"msg": "Event(v1.ObjectReference{Kind:\"KafkaSource\", Namespace:\"coal-dev\", Name:\"uat-kafka-source\", UID:\"1b6dd5c4-539a-424a-811c-fd16a5d2468d\", APIVersion:\"sources.knative.dev/v1beta1\", ResourceVersion:\"56774522\", FieldPath:\"\"}): type: 'Warning' reason: 'InternalError' kafka: client has run out of available brokers to talk to (Is your cluster reachable?)",
"commit": "394f005-dirty"
}

Data Prepper Pipelines + OpenSearch Trace Analytics

I'm using the latest version of AWS OpenSearch but somehow, when I'm trying to go to the Trace analytics Dashboard it does not show the traces sent by the Data Prepper.
Manual OpenTelemetry instrumented application
Data Prepper is running in a Docker (opensearchproject/data-prepper:latest)
OpenSearch is running on the latest version
Sample Configuration
data-prepper-config.yaml
ssl: false
pipelines.yaml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
processor:
- otel_trace_raw:
sink:
- opensearch:
hosts: [ "https://opensearch-domain" ]
username: "admin"
password: "admin"
index_type: trace-analytics-raw
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
processor:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://opensearch-domain"]
username: "admin"
password: "admin"
index_type: trace-analytics-service-map
remote-collector.yaml
...
exporters:
otlp/data-prepper:
endpoint: data-prepper-address:21890
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/data-prepper]
When I try to go to the Query Workbench and run the query SELECT * FROM otel-v1-apm-span, I'm getting the list of received trace spans. But I'm unable to see a chart or something on the Trace Analytics Dashboard (both Traces and Services). It's just an empty dashboard.
I'm also getting a warning:
WARN org.opensearch.dataprepper.plugins.processor.oteltrace.OTelTraceRawProcessor - Missing trace group for SpanId: xxxxxxxxxxxx
The traceGroupFields are also empty.
"traceGroupFields": {
"endTime": null,
"durationInNanos": null,
"statusCode": null
}
Is there something wrong with my setup? Any help is appreciated.

How to publish kubernetes LoadBalancer Ingress URL to aws route53

Today when I launch an app using kubernetes over aws it exposes a publicly visible LoadBalancer Ingress URL, however to link that to my domain to make the app accessible to the public, I need to manually go into the aws route53 console in a browser on every launch. Can I update the aws route53 Resource Type A to match the latest Kubernetes LoadBalancer Ingress URL from the command line ?
Kubernetes over gcloud shares this challenge of having to either predefine a Static IP which is used in launch config or manually do a browser based domain linkage post launch. On aws I was hoping I could use something similar to this from the command line
aws route53domains update-domain-nameservers ???
__ OR __ can I predefine an aws kubernetes LoadBalancer Ingress similar to doing a predefined Static IP when over gcloud ?
to show the deployed app's LoadBalancer Ingress URL issue
kubectl describe svc
... output
Name: aaa-deployment-407
Namespace: ruptureofthemundaneplane
Labels: app=bbb
pod-template-hash=4076262206
Selector: app=bbb,pod-template-hash=4076262206
Type: LoadBalancer
IP: 10.0.51.82
LoadBalancer Ingress: a244bodhisattva79c17cf7-61619.us-east-1.elb.amazonaws.com
Port: port-1 80/TCP
NodePort: port-1 32547/TCP
Endpoints: 10.201.0.3:80
Port: port-2 443/TCP
NodePort: port-2 31248/TCP
Endpoints: 10.201.0.3:443
Session Affinity: None
No events.
UPDATE:
Getting error trying new command line technique (hat tip to #error2007s comment) ... issue this
aws route53 list-hosted-zones
... outputs
{
"HostedZones": [
{
"ResourceRecordSetCount": 6,
"CallerReference": "2D58A764-1FAC-DEB4-8AC7-AD37E74B94E6",
"Config": {
"PrivateZone": false
},
"Id": "/hostedzone/Z3II3949ZDMDXV",
"Name": "chainsawhaircut.com."
}
]
}
Important bit used below : hostedzone Z3II3949ZDMDXV
now I craft following using this Doc (and this Doc as well) as file /change-resource-record-sets.json (NOTE I can successfully change Type A using a similar cli call ... however I need to change Type A with an Alias Target of LoadBalancer Ingress URL)
{
"Comment": "Update record to reflect new IP address of fresh deploy",
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "chainsawhaircut.com.",
"Type": "A",
"TTL": 60,
"AliasTarget": {
"HostedZoneId": "Z3II3949ZDMDXV",
"DNSName": "a244bodhisattva79c17cf7-61619.us-east-1.elb.amazonaws.com",
"EvaluateTargetHealth": false
}
}
}]
}
on command line I then issue
aws route53 change-resource-record-sets --hosted-zone-id Z3II3949ZDMDXV --change-batch file:///change-resource-record-sets.json
which give this error message
An error occurred (InvalidInput) when calling the ChangeResourceRecordSets operation: Invalid request
Any insights ?
Here is the logic needed to update aws route53 Resource Record Type A with value from freshly minted kubernetes LoadBalancer Ingress URL
step 1 - identify your hostedzone Id by issuing
aws route53 list-hosted-zones
... from output here is clip for my domain
"Id": "/hostedzone/Z3II3949ZDMDXV",
... importantly never populate json with hostedzone Z3II3949ZDMDXV its only used as a cli parm ... there is a second similarly named token HostedZoneId which is entirely different
step 2 - see current value of your route53 domain record ... issue :
aws route53 list-resource-record-sets --hosted-zone-id Z3II3949ZDMDXV --query "ResourceRecordSets[?Name == 'scottstensland.com.']"
... output
[
{
"AliasTarget": {
"HostedZoneId": "Z35SXDOTRQ7X7K",
"EvaluateTargetHealth": false,
"DNSName": "dualstack.asomepriorvalue39e7db-1867261689.us-east-1.elb.amazonaws.com."
},
"Type": "A",
"Name": "scottstensland.com."
},
{
"ResourceRecords": [
{
"Value": "ns-1238.awsdns-26.org."
},
{
"Value": "ns-201.awsdns-25.com."
},
{
"Value": "ns-969.awsdns-57.net."
},
{
"Value": "ns-1823.awsdns-35.co.uk."
}
],
"Type": "NS",
"Name": "scottstensland.com.",
"TTL": 172800
},
{
"ResourceRecords": [
{
"Value": "ns-1238.awsdns-26.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400"
}
],
"Type": "SOA",
"Name": "scottstensland.com.",
"TTL": 900
}
]
... in above notice value of
"HostedZoneId": "Z35SXDOTRQ7X7K",
which is the second similarly name token Do NOT use wrong Hosted Zone ID
step 3 - put below into your change file aws_route53_type_A.json (for syntax Doc see link mentioned in comment above)
{
"Comment": "Update record to reflect new DNSName of fresh deploy",
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"AliasTarget": {
"HostedZoneId": "Z35SXDOTRQ7X7K",
"EvaluateTargetHealth": false,
"DNSName": "dualstack.a0b82c81f47d011e6b98a0a28439e7db-1867261689.us-east-1.elb.amazonaws.com."
},
"Type": "A",
"Name": "scottstensland.com."
}
}
]
}
To identify value for above field "DNSName" ... after the kubernetes app deploy on aws it responds with a LoadBalancer Ingress as shown in output of cli command :
kubectl describe svc --namespace=ruptureofthemundaneplane
... as in
LoadBalancer Ingress: a0b82c81f47d011e6b98a0a28439e7db-1867261689.us-east-1.elb.amazonaws.com
... even though my goal is to execute a command line call I can do this manually by getting into the aws console browser ... pull up my domain on route53 ...
... In this browser picklist editable text box (circled in green) I noticed the URL gets magically prepended with : dualstack. Previously I was missing that magic string ... so json key "DNSName" wants this
dualstack.a0b82c81f47d011e6b98a0a28439e7db-1867261689.us-east-1.elb.amazonaws.com.
finally execute the change request
aws route53 change-resource-record-sets --hosted-zone-id Z3II3949ZDMDXV --change-batch file://./aws_route53_type_A.json
... output
{
"ChangeInfo": {
"Status": "PENDING",
"Comment": "Update record to reflect new DNSName of fresh deploy",
"SubmittedAt": "2016-07-13T14:53:02.789Z",
"Id": "/change/CFUX5R9XKGE1C"
}
}
.... now to confirm change is live run this to show record
aws route53 list-resource-record-sets --hosted-zone-id Z3II3949ZDMDXV
You can also use external-dns project.
AWS specific setup can be found here
After installation it can be used with an annotation e.g.: external-dns.alpha.kubernetes.io/hostname: nginx.external-dns-test.my-org.com.
Note the IAM permissions needs to be set properly.

Can't access my kubernetes services even after exposing it with LoadBalancer

I have created the replication controller in Kubernetes with the following configuration:
{
"kind":"ReplicationController",
"apiVersion":"v1",
"metadata":{
"name":"guestbook",
"labels":{
"app":"guestbook"
}
},
"spec":{
"replicas":1,
"selector":{
"app":"guestbook"
},
"template":{
"metadata":{
"labels":{
"app":"guestbook"
}
},
"spec":{
"containers":[
{
"name":"guestbook",
"image":"username/fsharp-microservice:v1",
"ports":[
{
"name":"http-server",
"containerPort":3000
}
],
"command": ["fsharpi", "/home/SuaveServer.fsx"]
}
]
}
}
}
}
The code of the service that is running on the port 3000 is basically this:
#r "Suave.dll"
#r "Mono.Posix.dll"
open Suave
open Suave.Http
open Suave.Successful
open System
open System.Net
open System.Threading
open System.Diagnostics
open Mono.Unix
open Mono.Unix.Native
let app = OK "PONG"
let port = 3000us
let config =
{ defaultConfig with
bindings = [ HttpBinding.mk HTTP IPAddress.Loopback port ]
bufferSize = 8192
maxOps = 10000
}
open System.Text.RegularExpressions
let cts = new CancellationTokenSource()
let listening, server = startWebServerAsync config app
Async.Start(server, cts.Token)
Console.WriteLine("Server should be started at this point")
Console.ReadLine()
After I created the service I can see the pod:
$kubectl create -f guestbook.json
replicationcontroller "guestbook" created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
guestbook-0b9py 1/1 Running 0 32m
I want to access my web service and create the service with type=LoadBalancer that will expose the 3000 port with the following configuration:
{
"kind":"Service",
"apiVersion":"v1",
"metadata":{
"name":"guestbook",
"labels":{
"app":"guestbook"
}
},
"spec":{
"ports": [
{
"port":3000,
"targetPort":"http-server"
}
],
"selector":{
"app":"guestbook"
},
"type": "LoadBalancer"
}
}
Here is the result:
$ kubectl create -f guestbook-service.json
service "guestbook" created
$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
guestbook 10.0.82.40 3000/TCP 7s
kubernetes 10.0.0.1 <none> 443/TCP 3h
$ kubectl describe services
Name: guestbook
Namespace: default
Labels: app=guestbook
Selector: app=guestbook
Type: LoadBalancer
IP: 10.0.82.40
LoadBalancer Ingress: a43eee4a008cf11e68f210a4fa30c03e-1918213320.us-west-2.elb.amazonaws.com
Port: <unset> 3000/TCP
NodePort: <unset> 30877/TCP
Endpoints: 10.244.1.6:3000
Session Affinity: None
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
18s 18s 1 {service-controller } Normal CreatingLoadBalancer Creating load balancer
17s 17s 1 {service-controller } Normal CreatedLoadBalancer Created load balancer
Name: kubernetes
Namespace: default
Labels: component=apiserver,provider=kubernetes
Selector: <none>
Type: ClusterIP
IP: 10.0.0.1
Port: https 443/TCP
Endpoints: 172.20.0.9:443
Session Affinity: None
No events.
The "External IP" column is empty
I have tried to access the service using "LoadBalancer Ingress" but DNS name can't be resolved.
If I check in the AWS console - load balancer is created (but in the details panel there is a message "0 of 2 instances in service" because of health-checks).
I have also tried to expose my RC using kubectl expose --type=Load-Balancer, but result is the same.
What is the problem?
I fixed the error.
The thing was in the actual service. It needs to be listening on 0.0.0.0 instead of 127.0.0.1 or localhost. That way it will listen on every available network interface. More details on the difference between 0.0.0.0 and 127.0.0.1: https://serverfault.com/questions/78048/whats-the-difference-between-ip-address-0-0-0-0-and-127-0-0-1

AWS BeanStalk environment running multi-container Docker fail to start with Health: Severe

I am trying to launch an AWS BeanStalk environment running multi container Docker but it fails with the following list of events:
2016-01-18 16:58:57 UTC+0100 WARN Removed instance [i-a7162d2c] from your environment due to a EC2 health check failure.
2016-01-18 16:57:57 UTC+0100 WARN Environment health has transitioned from Degraded to Severe. None of the instances are sending data.
2016-01-18 16:47:58 UTC+0100 WARN Environment health has transitioned from Pending to Degraded. Command is executing on all instances. Command failed on all instances.
2016-01-18 16:43:58 UTC+0100 INFO Added instance [i-a7162d2c] to your environment.
2016-01-18 16:43:27 UTC+0100 INFO Waiting for EC2 instances to launch. This may take a few minutes.
2016-01-18 16:41:58 UTC+0100 INFO Environment health has transitioned to Pending. There are no instances.
2016-01-18 16:41:54 UTC+0100 INFO Created security group named: awseb-e-ih2exekpvz-stack-AWSEBSecurityGroup-M2O11DNNCJXW
2016-01-18 16:41:54 UTC+0100 INFO Created EIP: 52.48.132.172
2016-01-18 16:41:09 UTC+0100 INFO Using elasticbeanstalk-eu-west-1-936425941972 as Amazon S3 storage bucket for environment data.
2016-01-18 16:41:08 UTC+0100 INFO createEnvironment is starting.
Health status is marked "Severe" and I have the following logs:
98 % of CPU is in use.
Initialization failed at 2016-01-18T15:54:33Z with exit status 1 and error: Hook /opt/elasticbeanstalk/hooks/preinit/02ecs.sh failed.
. /opt/elasticbeanstalk/hooks/common.sh
/opt/elasticbeanstalk/bin/get-config container -k ecs_cluster
EB_CONFIG_ECS_CLUSTER=awseb-figure-test-ih2exekpvz
/opt/elasticbeanstalk/bin/get-config container -k ecs_region
EB_CONFIG_ECS_REGION=eu-west-1
/opt/elasticbeanstalk/bin/get-config container -k support_files_dir
EB_CONFIG_SUPPORT_FILES_DIR=/opt/elasticbeanstalk/containerfiles/support
is_baked ecs_agent
[[ -f /etc/elasticbeanstalk/baking_manifest/ecs_agent ]]
true
aws configure set default.output json
aws configure set default.region eu-west-1
echo ECS_CLUSTER=awseb-figure-test-ih2exekpvz
grep -q 'ecs start/'
initctl status ecs
initctl start ecs
ecs start/running, process 8418
TIMEOUT=120
jq -r .ContainerInstanceArn
curl http://localhost:51678/v1/metadata
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 51678: Connection refused
My configuration:
Environment type: Single Instance
Instance type: Medium
Root volume type: SSD
Root Volume Size: 8GB
Zone: EU-West
My Dockerrun.aws.json:
{
"AWSEBDockerrunVersion": "2",
"containerDefinitions": [
{
"essential": true,
"memory": 252,
"links": [
"redis"
],
"mountPoints": [
{
"containerPath": "/srv/express",
"sourceVolume": "_WebExpress"
},
{
"containerPath": "/srv/express/node_modules",
"sourceVolume": "SrvExpressNode_Modules"
}
],
"name": "web",
"image": "figure/web:latest",
"portMappings": [
{
"containerPort": 3000,
"hostPort": 3000
}
]
},
{
"essential": true,
"memory": 252,
"image": "redis",
"name": "redis",
"portMappings": [
{
"containerPort": 6379,
"hostPort": 6379
}
]
}
],
"family": "",
"volumes": [
{
"host": {
"sourcePath": "./web/express"
},
"name": "_WebExpress"
},
{
"host": {
"sourcePath": "/srv/express/node_modules"
},
"name": "SrvExpressNode_Modules"
}
]
}
#Kilianc: If you are creating elastic beanstalk application using multi-docker container platform then you have to add the following list of policies to the default role "aws-elasticbeanstalk-ec2-role" or you can create your own role.
AWSElasticBeanstalkWebTier
AWSElasticBeanstalkMulticontainerDocker
AWSElasticBeanstalkWorkerTier
Thanks
It appears I forgot to set up policies and permissions as described in AWS BeanStalk documentation:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker_ecstutorial.html