Send logs from specific pod to external server - amazon-web-services

We need to send large (very) amount of logs to Splunk server from only one k8s pod( pod with huge traffic load), I look at the docs and found this:
https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
However, there is a Note in the docs, that is stating about a significant resource consumption. Is there any other option to do it? I mean more efficient ? As these pods handle traffic and we cannot add the additional load, that can risk it stability...

There's an official solution to get Kubernets logs: Splunk Connect for Kubernetes. Under the hood it also uses fluentd for the logging part.
https://github.com/splunk/splunk-connect-for-kubernetes
You will find a sample config and a methodology to test it on microK8s first to get acquainted with the config and deployment: https://mattymo.io/deploying-splunk-connect-for-kubernetes-on-microk8s-with-helm/
And if you only want logs from a specific container you can use this section of the values file to select only logs from the container you're interested in:
fluentd:
# path of logfiles, default /var/log/containers/*.log
path: /var/log/containers/*.log
# paths of logfiles to exclude. object type is array as per fluentd specification:
# https://docs.fluentd.org/input/tail#exclude_path
exclude_path:
# - /var/log/containers/kube-svc-redirect*.log
# - /var/log/containers/tiller*.log
# - /var/log/containers/*_kube-system_*.log (to exclude `kube-system` namespace)

Related

Druid can not see/read GOOGLE_APPLICATION_CREDENTIALS defined on env path

I installed apache-druid-0.22.1 as a cluster (master, data and query nodes) and enabled “druid-google-extensions” by adding it to the array druid.extensions.loadList in common.runtime.properties.
Finally I defined GOOGLE_APPLICATION_CREDENTIALS ( which has the value of service account json as defined in https://cloud.google.com/docs/authentication/production )as an environment variable of user that run the druid services.
However, I got the following error when I try to ingest data from GCR buckets:
Error: Cannot construct instance of
org.apache.druid.data.input.google.GoogleCloudStorageInputSource,
problem: Unable to provision, see the following errors: 1) Error in
custom provider, java.io.IOException: The Application Default
Credentials are not available. They are available if running on Google
App Engine, Google Compute Engine, or Google Cloud Shell. Otherwise,
the environment variable GOOGLE_APPLICATION_CREDENTIALS must be
defined pointing to a file defining the credentials. See
https://developers.google.com/accounts/docs/application-default-credentials
for more information. at
org.apache.druid.common.gcp.GcpModule.getHttpRequestInitializer(GcpModule.java:60)
(via modules: com.google.inject.util.Modules$OverrideModule ->
org.apache.druid.common.gcp.GcpModule) at
org.apache.druid.common.gcp.GcpModule.getHttpRequestInitializer(GcpModule.java:60)
(via modules: com.google.inject.util.Modules$OverrideModule ->
org.apache.druid.common.gcp.GcpModule) while locating
com.google.api.client.http.HttpRequestInitializer for the 3rd
parameter of
org.apache.druid.storage.google.GoogleStorageDruidModule.getGoogleStorage(GoogleStorageDruidModule.java:114)
at
org.apache.druid.storage.google.GoogleStorageDruidModule.getGoogleStorage(GoogleStorageDruidModule.java:114)
(via modules: com.google.inject.util.Modules$OverrideModule ->
org.apache.druid.storage.google.GoogleStorageDruidModule) while
locating org.apache.druid.storage.google.GoogleStorage 1 error at
[Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1,
column: 180] (through reference chain:
org.apache.druid.indexing.overlord.sampler.IndexTaskSamplerSpec["spec"]->org.apache.druid.indexing.common.task.IndexTask$IndexIngestionSpec["ioConfig"]->org.apache.druid.indexing.common.task.IndexTask$IndexIOConfig["inputSource"])
A case reported on this matter caught my attention. But I can not see
any verified solution to that case. Please help me.
We want to take data from GCP to on prem Druid. We don’t want to take cluster in GCP. So that we want solve this problem.
For future visitors:
If you run Druid by systemctl you then need to add required environments in service file of systemctl, to ensure it is always delivered to druid regardless of user or environment changes.
You must define the GOOGLE_APPLICATION_CREDENTIALS that points to a file path, and not contain the file content.
In a cluster (like Kubernetes), it's usual to mount a volume with the file in it, and to se the env var to point to that volume.

Where to find node logs in AWS EMR cluster?

I have pyspark program running on AWS EMR cluster.
Cluster config is like this - emr-5.31.0, hadoop 2.10.0, hive 2.3.7, hue 4.7.1, pig 0.17.0.
Program processes some files on hdfs file system but at some moment it is getting errors.
In amazon console - YARN applications - application_XXX (Spark) - executors - driver - stderr:
'could not obtain block ... file=
A little before this message there is 'Task 0 in stage 35 failed 4 times. aborting job'
If i go to amazon console - YARN applications - application_XXX (Spark) - stages - 35 - tasks - 0 - stdout - i dont see anything bad at first glance except a lot of 'GC (allocation Failure)' messages.
In its stderr - there is a WARN - 'Could not obtain block XXX, file= No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException.
If i go to monitoring tab - node status - i see that one node became unhealthy at that time and thats it. Number of nodes also changed at 'live data nodes', 'MR total nodes', 'MR active nodes', MR lost nodes' charts.
As i understand, task cannot find file on hdfs because node it was hosted on became unhealthy.
My question is where i can find the reasons node became unhealthy. I wasnt able to find any other logs on amazon console. May be there are some node-local places where this reason is stored?
Hi I launched a EMR myself some time ago, dont remember about the logs. But consulting the docs here:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html
It states that they are stored on the machines (which I assume you have the keys), they are also stored on S3 by default. Not sure in which bucket they will be created.
Best Regards :)
On the Summary page for your EMR cluster there is a section named "Configuration details".
Below that, there is a label named "Log URI". It points to an S3 URI, but, there is also a small folder icon.
Click on that icon and you can browse to the logs on the nodes for your EMR cluster.
Actually, for amazon there are more logs accessible via s3 location - there are logs for node boot and configuration part, and logs from running services on node - hdfs and yarn, which i was looking for. Path looks like this - s3 location/cluster id/node/node id/applications - here i was able to find hdfs and yarn logs.

how to check if gcloud backend service/url map are ready

Is there a way to determine if a backend service is ready? I ask because I run a script that creates a backend then a url map that uses this backend. The problem is I sometimes get errors saying the backend is not ready for use. I need to be able to pause until the backend is ready before I create a url map. I could check the error response for the phrase 'is not ready' but this isn't reliable for future versions of gcloud. This is somewhat related to another post I recently made on how to reliably check for gcloud errors.
I could also say the same for the url map. When i create a proxy that uses the url map, sometimes i get the error saying the url map is not ready.
Here's an example of what I'm experiencing:
gcloud compute url-maps add-path-matcher app-url-map
--path-matcher-name=web-path-matcher
--default-service=web-backend
--new-hosts="example.com"
--path-rules="/*=web-backend"
ERROR: (gcloud.compute.url-maps.add-path-matcher) Could not fetch resource:
- The resource 'projects/my-project/global/backendServices/web-backend' is not ready
gcloud compute target-https-proxies create app-https-proxy
--url-map app-url-map
--ssl-certificates app-ssl-cert
ERROR: (gcloud.compute.target-https-proxies.create) Could not fetch resource:
- The resource 'projects/my-project/global/urlMaps/app-url-map' is not ready
gcloud -v
Google Cloud SDK 225.0.0
beta 2018.11.09
bq 2.0.37
core 2018.11.09
gsutil 4.34
would assume it's gcloud alpha resources list ...
see the Error Messages of the Resource Manager and scroll down to the bottom, there it reads:
notReady The API server is not ready to accept requests.
which equals HTTP 503, SERVICE_UNAVAILABLE.
adding the --verbosity option might provide some more details.
see the documentation.

Prometheus Federation match params do not work

I have been trying to achive federation in my Prometheus setup. While doing this, I want to exclude some metrics to be scraped by my scraper Prometheus.
Here is my federation config:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'xxxxxxxx'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job!="kubernetes-nodes"}'
static_configs:
- targets:
- 'my-metrics-source'
As it can be seen from the config, I want to exclude any metric that has kubernetes-nodes job label, and retrieve the rest of the metrics. However, when I deploy my config, no metric is scraped.
Is it a bug in Prometheus or I simply misunderstood how the match params work?
If you really need to do this you need a primary vector selector which includes results.
Otherwise you'll get the error vector selector must contain at least one non-empty matcher.
So for example with these matchers you'll get what you are trying to achieve:
curl -G --data-urlencode 'match[]={job=~".+", job!="kubernetes-nodes"}' http://your-url.example.com/federate
As a safety measure to avoid you accidentally writing an instant vector that returns all the time series in your Prometheus, selectors must contain at least one matcher that does not match the empty string. Your selector has no such matcher (job!="kubernetes-nodes" matches an empty job label), so this is giving you an error.
You could add a selector such as __name__=~".+" however at a higher level this is an abuse of federation as it is not meant for pulling entire Prometheus servers. See https://www.robustperception.io/federation-what-is-it-good-for/

Is it possible to edit configuration nodes in a Node-Red flow?

In Node-Red, I'm using some Amazon Web Services nodes (from module node-red-node-aws), and I would like to read some configuration settings from a file (e.g. the access key ID & the secret key for the S3 nodes), but I can't find a way to set everything up dynamically, as this configuration has to be made in a config node, which can't be used in a flow.
Is there a way to do this in Node-Red?
Thanks!
Unless a node implementation specifically allows for dynamic configuration, this is not something that Node-RED does generically.
One approach I have seen is to have a flow update itself using the admin REST API into the runtime - see https://nodered.org/docs/api/admin/methods/post/flows/
That requires you to first GET the current flow configuration, modify the flow definition with the desired values and then post it back.
That approach is not suitable in all cases; the config node still only has a single active configuration.
Another approach, if the configuration is statically held in a file, is to insert them into your flow configuration before starting Node-RED - ie, have a place-holding config node configuration in the flow that you insert the credentials into.
Finally, you can use environment variables: if you set the configuration node's property to be something like $(MY_AWS_CREDS), then the runtime will substitute that environment variable on start-up.
You can update your package.json start script to start Node-RED with your desired credentials as environment variables:
"scripts": {
"start": "AWS_SECRET_ACCESS_KEY=<SECRET_KEY> AWS_ACCESS_KEY_ID=<KEY_ID> ./node_modules/.bin/node-red -s ./settings.js"
}
This worked perfect for me when using the node-red-contrib-aws-dynamodbnode. Just leave the credentials in the node blank and they get picked up from your environment variables.