Data Prepper Pipelines + OpenSearch Trace Analytics - amazon-web-services

I'm using the latest version of AWS OpenSearch but somehow, when I'm trying to go to the Trace analytics Dashboard it does not show the traces sent by the Data Prepper.
Manual OpenTelemetry instrumented application
Data Prepper is running in a Docker (opensearchproject/data-prepper:latest)
OpenSearch is running on the latest version
Sample Configuration
data-prepper-config.yaml
ssl: false
pipelines.yaml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
processor:
- otel_trace_raw:
sink:
- opensearch:
hosts: [ "https://opensearch-domain" ]
username: "admin"
password: "admin"
index_type: trace-analytics-raw
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
processor:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://opensearch-domain"]
username: "admin"
password: "admin"
index_type: trace-analytics-service-map
remote-collector.yaml
...
exporters:
otlp/data-prepper:
endpoint: data-prepper-address:21890
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/data-prepper]
When I try to go to the Query Workbench and run the query SELECT * FROM otel-v1-apm-span, I'm getting the list of received trace spans. But I'm unable to see a chart or something on the Trace Analytics Dashboard (both Traces and Services). It's just an empty dashboard.
I'm also getting a warning:
WARN org.opensearch.dataprepper.plugins.processor.oteltrace.OTelTraceRawProcessor - Missing trace group for SpanId: xxxxxxxxxxxx
The traceGroupFields are also empty.
"traceGroupFields": {
"endTime": null,
"durationInNanos": null,
"statusCode": null
}
Is there something wrong with my setup? Any help is appreciated.

Related

client disconnected before any response GCLB

We deployed our site in front GCLB.
LB -> Cloud run -> APP ENGINE API
Cloud run is hosting a react site and App Engine golang API.
After 12 hours we started to saw decline in the amount of clicks via google analytics but traffic was pretty much the same.
Our assumption is that "lost" traffic somehow, I can see in logs 2 main issue.
404 with address of old site components.
client disconnected before any response error.
I can understand the 404 error its cache request that looking for old site components.
But i don`t understand client disconnected error and if its related to our "lost" traffic.
Any suggestion how to analyze our "lost" traffic?
UPDATE:
I found some correlation to the client client disconnected error.
The requestUrl contains images resources for exemple
images/zoom.png?v1.0
Back end service name is empty backend_service_name: ""
not sure how it can be empty, I mapped all the resources and host
LOG
{
"insertId": "cs2fmdg2eo8nba",
"jsonPayload": {
"cacheId": "FRA-1209ea83",
"#type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
"statusDetails": "client_disconnected_before_any_response"
},
"httpRequest": {
"requestMethod": "GET",
"requestUrl": "https://travelpricedrops.com/images/aero.png?v1.0",
"requestSize": "78",
"userAgent": "Mozilla/5.0 (iPhone; CPU iPhone OS 14_8 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1",
"remoteIp": "109.104.52.1",
"referer": "https://travelpricedrops.com/passthru?tab=front&vert=flights&origin-iata=LEJ&destination-iata=JFK&departure-time=2021-12-26T11%3A00%3A00Z&cabin-class=economy&num-adults=1&num-youth=0&rental-duration=6&dta=48&return-time=2022-01-01T11%3A00%3A00Z&f=cf&fuid=1102&b=k&buid=1043",
"cacheLookup": true,
"latency": "0.071958s"
},
"resource": {
"type": "http_load_balancer",
"labels": {
"zone": "global",
"backend_service_name": "",
"forwarding_rule_name": "tpd-int-https-ipv4",
"target_proxy_name": "int-tpd-target-proxy-2",
"url_map_name": "int-tpd",
"project_id": "tpdrops"
}
},
"timestamp": "2021-11-09T06:13:55.121455Z",
"severity": "INFO",
"logName": "projects/tpdrops/logs/requests",
"trace": "projects/tpdrops/traces/13821ba38ae9e3191381f3f64b0a7b1a",
"receiveTimestamp": "2021-11-09T06:13:55.343086132Z",
"spanId": "a5ae86336a24bc32"
}
Config
**gcloud compute forwarding-rules describe tpd-int-https-ipv4**
IPAddress: 34.149.93.11
IPProtocol: TCP
creationTimestamp: '2021-08-30T11:49:06.047-07:00'
description: ''
fingerprint: CIAg3TcEb9Y=
id: '1815919129513727693'
kind: compute#forwardingRule
labelFingerprint: 42WmSpB8rSM=
loadBalancingScheme: EXTERNAL
name: tpd-int-https-ipv4
networkTier: PREMIUM
portRange: 443-443
selfLink: https://www.googleapis.com/compute/v1/projects/tpdrops/global/forwardingRules/tpd-int-https-ipv4
target: https://www.googleapis.com/compute/v1/projects/tpdrops/global/targetHttpsProxies/int-tpd-target-proxy-2
**gcloud compute backend-services describe tpd-prod-back**
affinityCookieTtlSec: 0
backends:
- balancingMode: UTILIZATION
capacityScaler: 0.0
group: https://www.googleapis.com/compute/v1/projects/tpdrops/regions/us-central1/networkEndpointGroups/tpd-front
cdnPolicy:
cacheKeyPolicy:
includeHost: true
includeProtocol: true
includeQueryString: true
cacheMode: CACHE_ALL_STATIC
clientTtl: 3600
defaultTtl: 3600
maxTtl: 86400
negativeCaching: false
requestCoalescing: true
serveWhileStale: 86400
signedUrlCacheMaxAgeSec: '0'
connectionDraining:
drainingTimeoutSec: 0
creationTimestamp: '2021-10-25T04:09:29.908-07:00'
description: ''
enableCDN: true
fingerprint: 5FNZk6GXJTw=
iap:
enabled: false
id: '6357784085114072710'
kind: compute#backendService
loadBalancingScheme: EXTERNAL
logConfig:
enable: true
sampleRate: 1.0
name: tpd-prod-back
port: 80
portName: http
protocol: HTTP
selfLink: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-prod-back
sessionAffinity: NONE
timeoutSec: 30
**gcloud compute url-maps describe int-tpd**
creationTimestamp: '2021-08-29T06:08:35.918-07:00'
defaultService: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-prod-back
fingerprint: trtG9xBMlvE=
hostRules:
- hosts:
- acpt.travelpricedrops.com
pathMatcher: path-matcher-2
- hosts:
- int.travelpricedrops.com
pathMatcher: path-matcher-1
- hosts:
- api.acpt.travelpricedrops.com
pathMatcher: path-matcher-3
- hosts:
- api.int.travelpricedrops.com
pathMatcher: path-matcher-4
- hosts:
- api.travelpricedrops.com
pathMatcher: path-matcher-5
- hosts:
- travelpricedrops.com
pathMatcher: path-matcher-6
id: '6018005644614187068'
kind: compute#urlMap
name: int-tpd
pathMatchers:
- defaultService: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-acpt-back
name: path-matcher-2
- defaultService: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-int-http
name: path-matcher-1
- defaultService: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-api-acpt
name: path-matcher-3
- defaultService: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-api-int
name: path-matcher-4
- defaultService: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-api
name: path-matcher-5
- defaultService: https://www.googleapis.com/compute/v1/projects/tpdrops/global/backendServices/tpd-prod-back
name: path-matcher-6
selfLink: https://www.googleapis.com/compute/v1/projects/tpdrops/global/urlMaps/int-tpd
**gcloud compute target-http-proxies describe int-tpd-target-proxy-2**
ERROR: (gcloud.compute.target-http-proxies.describe) Could not fetch resource:
- The resource 'projects/tpdrops/global/targetHttpProxies/int-tpd-target-proxy-2' was not found
Your load balancer's configuration looks ok; you have a https-ssl-secured frontend on port 443 pointing to a http backend on port 80 which means that SSL is resolved at the load balancer and sent in plain http to your backend.
Error you're getting means (as per documentation) that the client disconnected before load balancer could reply:
client_disconnected_before_any_response - The connection to the client was broken before the load balancer sent any response.
Now to answer your questions.
Since the images are served directly by your app (I didn't see any host-path rules saying otherwise) make sure that application can serve images in time. Set your application response timeout to 10 seconds or more and this should solve the issue. Have a look at this discussion which may be quite usefull for you.
1.1 - there's also a configurable request timeout for Cloud Run services - you can check it by running gcloud run services describe SERVICE_NAME
The backend_service_name: "" string you mentioned may be empty - nothing to worry about - this is an expected behavior.
Additionally have a look at the Backend service timeout Timeouts and retries in external load balancing which may also put some light onto your case.
Lastly - have a look at How to debug failed requests with client_disconnected_before_any_response.

GCP Deployment Manager "ResourceErrorCode":"400" while database user creation

I am experimenting with deployment manager and each time I try to deploy an SQL instance with a DB on it and 2 users; some of the tasks are failing. Most of the time they are the users:
conf.yaml:
resources:
- name: mycloudsql
type: gcp-types/sqladmin-v1beta4:instances
properties:
name: mycloudsql-01
backendType: SECOND_GEN
instanceType: CLOUD_SQL_INSTANCE
databaseVersion: MYSQL_5_7
region: europe-west6
settings:
tier: db-f1-micro
locationPreference:
zone: europe-west6-a
activationPolicy: ALWAYS
dataDiskSizeGb: 10
- name: mydjangodb
type: gcp-types/sqladmin-v1beta4:databases
properties:
name: django-db-01
instance: $(ref.mycloudsql.name)
charset: utf8
- name: sqlroot
type: gcp-types/sqladmin-v1beta4:users
properties:
name: root
host: "%"
instance: $(ref.mycloudsql.name)
password: root
- name: sqluser
type: gcp-types/sqladmin-v1beta4:users
properties:
name: user
instance: $(ref.mycloudsql.name)
password: user
Error:
PS C:\Users\user\Desktop\Python\GCP> gcloud --project=sound-catalyst-263911 deployment-manager deployments create dm-sql-test-11 --config conf.yaml
The fingerprint of the deployment is TZ_wYom9Q64Hno6X0bpv9g==
Waiting for create [operation-1589869946223-5a5fa71623bc9-1912fcb9-bc59aafc]...failed.
ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation [operation-1589869946223-5a5fa71623bc9-1912fcb9-bc59aafc]: errors:
- code: RESOURCE_ERROR
location: /deployments/dm-sql-test-11/resources/sqluser
message: '{"ResourceType":"gcp-types/sqladmin-v1beta4:users","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Precondition
check failed.","status":"FAILED_PRECONDITION","statusMessage":"Bad Request","requestPath":"https://www.googleapis.com/sql/v1beta4/projects/sound-catalyst-263911/instances/mycloudsql-01/users","httpMethod":"POST"}}'
- code: RESOURCE_ERROR
location: /deployments/dm-sql-test-11/resources/sqlroot
message: '{"ResourceType":"gcp-types/sqladmin-v1beta4:users","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Precondition
check failed.","status":"FAILED_PRECONDITION","statusMessage":"Bad Request","requestPath":"https://www.googleapis.com/sql/v1beta4/projects/sound-catalyst-263911/instances/mycloudsql-01/users","httpMethod":"POST"}}'
Console View:
It doesn`t say what that precondition failing is or am I missing something?
It seems the installation of database is not completed by the time the Deployment Manager starts to create users despite the reference notation is used in the YAML code to take care of dependencies. That is why you receive the "FAILED_PRECONDITION" error.
As a workaround you can split the deployment into two parts:
Create a CloudSQL instance and a database;
Create users.
This does not look elegant, but it works.
Alternatively, you can consider using Terraform. Fortunately, Cloud Shell instance is provided with Terraform pre-installed. There are sample Terraform code for Cloud SQL out there, for example this one:
CloudSQL deployment with Terraform

Google endpoint path template "Path does not match any requirement URI template."

Hi to all I created and used openAPI by yaml and I created endpoint that maps 2 cloud functions which use path templating to route the call no error by google sdk cli.
Now I call by POST https://myendpointname-3p5hncu3ha-ew.a.run.app/v1/setdndforrefcli/12588/dnd?key=[apikey] because it's mapped by below open api and reply me "Path does not match any requirement URI template.".
I don't know why path template in endpoint not work I added path_translation: APPEND_PATH_TO_ADDRESS to avoid google to use CONSTANT_ADDRESS default which append id in query string with brutal [name of cloud function]?GETid=12588 and overwrite query parameters with same name.
Somebody can tell me how can I debug the endpoint or the error in openAPI (that have green check ok icon in endpoint)?
# [START swagger]
swagger: '2.0'
info:
description: "Get data "
title: "Cloud Endpoint + GCF"
version: "1.0.0"
host: myendpointname-3p5hncu3ha-ew.a.run.app
# [END swagger]
basePath: "/v1"
#consumes:
# - application/json
#produces:
# - application/json
schemes:
- https
paths:
/setdndforrefcli/{id}/dnd:
post:
summary:
operationId: setdndforrefcli
parameters:
- name: id # is the id parameter in the path
in: path # is the parameter where is in query for rest or path for restful
required: true
type: integer
format: int64
minimum: 1
security:
- api_key: []
x-google-backend:
address: https://REGION-PROJECT-ID.cloudfunctions.net/mycloudfunction
path_translation: APPEND_PATH_TO_ADDRESS
protocol: h2
responses:
'200':
description: A successful response
schema:
type: string
# [START securityDef]
securityDefinitions:
# This section configures basic authentication with an API key.
api_key:
type: "apiKey"
name: "key"
in: "query"
# [END securityDef]
I had the same error, but after did some test I was able to use successfully the path templating (/endpoint/{id}). I resolved this issue as follows:
1 .- gcloud endpoints services deploy openapi-functions.yaml \
--project project
Here you will get a new Service Configuration that you will to use in the next steps.
2.-
chmod +x gcloud_build_image
./gcloud_build_image -s SERVICE \
-c NEWSERVICECONFIGURATION -p project
Its very important change the service configuration with every new deployment of the managed service.
3.- gcloud run deploy SERVICE \
--image="gcr.io/PROJECT/endpoints-runtime-serverless:SERVICE-NEW_SERVICE_CONFIGURATION" \
--allow-unauthenticated \
--platform managed \
--project=PROJECT

GCP Deployment Manager Delete RESOURCE_ERROR

I created a Deployment Manager Template (python) to create a GKE Zonal cluster (v1beta1 feature). When I run gcloud deployment-manager deployments create <deploymentname> --config <config.yaml>, GKE cluster is created as expected.
I used type:gcp-types/container-v1beta1:projects.zones.clusters in my python template.
However, when I run the delete command on DM i.e. gcloud deployment-manager deployments delete <deploymentname> I get the following error:
Error says that field name could not be found. However, I did specify name in my config.yaml file.
Error in Operation [operation-1536152440470-5751f5c88f9f3-5ca3a167-d12a593d]: errors:
- code: RESOURCE_ERROR
location: /deployments/test-project-gke-xhqgxn6pkd/resources/test-gkecluster-xhqgxn6pkd
message: "{"ResourceType":"gcp-types/container-v1beta1:projects.zones.clusters"
,"ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message"
:"Invalid JSON payload received. Unknown name "name": Cannot bind query
parameter. Field 'name' could not be found in request message.","status"
:"INVALID_ARGUMENT","details":[{"#type":"type.googleapis.com/google.rpc.BadRequest"
,"fieldViolations":[{"description":"Invalid JSON payload received. Unknown
name "name": Cannot bind query parameter. Field 'name' could not be found
in request message."}]}],"statusMessage":"Bad Request","requestPath"
:"https://container.googleapis.com/v1beta1/projects/test-project/zones/us-east1-b/clusters/"
,"httpMethod":"GET"}}"
Here's the sample config.yaml
imports:
- path: templates/gke/gke.py
name: gke.py
resources:
- name: ${CLUSTER_NAME}
type: gke.py
properties:
zone: ${ZONE}
cluster:
name: ${CLUSTER_NAME}
description: test gke cluster
network: ${NETWORK_NAME}
subnetwork: ${SUBNET_NAME}
initialClusterVersion: ${CLUSTER_VERSION}
nodePools:
- name: ${NODEPOOL_NAME}
initialNodeCount: ${NODE_COUNT}
config:
machineType: ${MACHINE_TYPE}
diskSizeGb: 100
imageType: cos
oauthScopes:
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
localSsdCount: ${LOCALSSD_COUNT}
Any ideas what I'm missing here?

Cloud Founry Loggregator failed to establish a connection

Good Morning,
I have a problem with cf doppel. Using cf logs app --recent I got an error
cf logs app FAILED Error dialing trafficcontroller server: read tcp
10.0.0.6:45719->139.25.25.233:4443: i/o timeout. Please ask your Cloud Foundry Operator to check the platform configuration
(trafficcontroller is wss://doppler.de.cloudlab.com:4443).
the same problem with cf push.
We are using CF 239 and CF CLI 6.22.1.
The doppler config is:
- name: doppler instances: 1 vm_type: medium azs: [INDIA] stemcell: ubuntu-trusty templates:
- {name: doppler, release: cf}
- {name: metron_agent, release: cf}
- {name: syslog_drain_binder, release: cf} networks:
- name: private properties:
doppler_endpoint:
shared_secret: password
- name: loggregator_trafficcontroller instances: 1 vm_type: medium azs: [INDIA] stemcell: ubuntu-trusty templates:
- {name: loggregator_trafficcontroller, release: cf}
- {name: metron_agent, release: cf}
- {name: route_registrar, release: cf} networks:
- name: private properties:
route_registrar:
routes:
- name: doppler
registration_interval: 20s
port: 8081
uris:
- "doppler.<%= system_domain %>"
- name: loggregator
registration_interval: 20s
port: 8080
uris:
- "loggregator.<%= system_domain %>"
nc is able to establish a connection to the router.
nc -vz 139.25.25.233 4443
Connection to 139.25.25.233 4443 port [tcp/*] succeeded!
any ideas?
Update:
some more information
$ cf curl /v2/info
{
"name": "",
"build": "",
"support": "",
"version": 0,
"description": "CloudFoundry IN",
"authorization_endpoint": "https://login.de.cloudlab.com",
"token_endpoint": "https://uaa.de.cloudlab.com",
"min_cli_version": null,
"min_recommended_cli_version": null,
"api_version": "2.57.0",
"app_ssh_endpoint": "ssh.de.cloudlab.com:2222",
"app_ssh_host_key_fingerprint": null,
"app_ssh_oauth_client": "ssh-proxy",
"logging_endpoint": "wss://loggregator.de.cloudlab.com:4443",
"doppler_logging_endpoint": "wss://doppler.de.cloudlab.com:443"
}
Cloud Foundry CLI is upgraded to 6.22.2, due to that you might face issues with existing CF CLI command prompt that you have installed in your box for pushing the app. please try installing the latest CLI from the below location and verify once.
https://github.com/cloudfoundry/cli/releases
It looks like your Cloud Foundry operator changed the Doppler endpoint port from 4443 to 443.
The cf CLI caches the doppler endpoint url locally, and is still pointing at the old port.
Use cf api or cf login -a to set your endpoint again; that will refresh the locally cached data.
Finally, I found it. I was a problem with the connection to the DNS Server. After fixing that it works. DNS was not answering.