Regex to capture a deployment task - regex

Deployment task '[5.1.4] **Configure Redis, service bus and Update Databases and Samples**'
with id '04d8e453-7f22-420d' and with scenario_id '9349bff9-9e41-4c26-9a90'
Given the above text, I need a regex which should give this output:
Configure Redis, service bus and Update Databases and Samples

Assuming that the text ends with a single quote, this is the regex you are looking for:
Deployment task '\[.*?\]\s*([^']+)
And here is an example how you can grab the value:
[regex]::Match($yourString, "Deployment task '\[.*?\]\s*([^']+)").Groups[1].Value

Related

Send logs from specific pod to external server

We need to send large (very) amount of logs to Splunk server from only one k8s pod( pod with huge traffic load), I look at the docs and found this:
https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
However, there is a Note in the docs, that is stating about a significant resource consumption. Is there any other option to do it? I mean more efficient ? As these pods handle traffic and we cannot add the additional load, that can risk it stability...
There's an official solution to get Kubernets logs: Splunk Connect for Kubernetes. Under the hood it also uses fluentd for the logging part.
https://github.com/splunk/splunk-connect-for-kubernetes
You will find a sample config and a methodology to test it on microK8s first to get acquainted with the config and deployment: https://mattymo.io/deploying-splunk-connect-for-kubernetes-on-microk8s-with-helm/
And if you only want logs from a specific container you can use this section of the values file to select only logs from the container you're interested in:
fluentd:
# path of logfiles, default /var/log/containers/*.log
path: /var/log/containers/*.log
# paths of logfiles to exclude. object type is array as per fluentd specification:
# https://docs.fluentd.org/input/tail#exclude_path
exclude_path:
# - /var/log/containers/kube-svc-redirect*.log
# - /var/log/containers/tiller*.log
# - /var/log/containers/*_kube-system_*.log (to exclude `kube-system` namespace)

AWS Waiter TasksStopped failed: taskId length should be one of

For some reason I am getting the following error:
Waiter TasksStopped failed: taskId length should be one of [32,36]
I really don't know what taskId is supposed to mean and aws documentation isn't helping. Does anyone know what is going wrong in this pipeline script?
- step:
name: Run DB migrations
script:
- >
export BackendTaskArn=$(aws cloudformation list-stack-resources \
--stack-name=${DEXB_PRODUCTION_STACK} \
--output=text \
--query="StackResourceSummaries[?LogicalResourceId=='BackendECSTask'].PhysicalResourceId")
- >
SequelizeTask=$(aws ecs run-task --cluster=${DEXB_PRODUCTION_ECS_CLUSTER} --task-definition=${BackendTaskArn} \
--overrides='{"containerOverrides":[{"name":"NodeBackend","command":["./node_modules/.bin/sequelize","db:migrate"]}]}' \
--launch-type=EC2 --output=text --query='tasks[0].taskArn')
- aws ecs wait tasks-stopped --cluster=${DEXB_PRODUCTION_ECS_CLUSTER} --tasks ${SequelizeTask}
AWS introduced a new ARN format for tasks, container instances, and services. This format now contains the cluster name, which might break scripts and applications that were counting on the ARN only containing the task resource ID.
# Previous format (taskId contains hyphens)
arn:aws:ecs:$region:$accountID:task/$taskId
# New format (taskI does not contain hyphens)
arn:aws:ecs:$region:$accountId:task/$clusterName/$taskId
Until March 31, 2021, it will be possible to opt-out of this change per-region, using https://console.aws.amazon.com/ecs/home?#/settings. In order to change the behavior for the whole account, you will need to use the Root IAM user.
It turns out I had a duplicate task running in the background. I went to the ECS clusters page and stopped the duplicate task. However this may be dangerous to do if you have used cloudformation to set up your tasks and services. Proceed cautiously if you're in the same boat.
We were bit with this cryptic error message, and what it actually means is that the task_id you're sending to cloudformation script is invalid. Task ids must have a length of 32 or 36 chars.
In our case, an undocumented change in the way AWS sent back taskArn key value was causing us to grab the incorrect value, and sending an unrelated string as the task_id. AWS detected this and blew up. So double check the task_id string and you should be good.

Prometheus Federation match params do not work

I have been trying to achive federation in my Prometheus setup. While doing this, I want to exclude some metrics to be scraped by my scraper Prometheus.
Here is my federation config:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'xxxxxxxx'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job!="kubernetes-nodes"}'
static_configs:
- targets:
- 'my-metrics-source'
As it can be seen from the config, I want to exclude any metric that has kubernetes-nodes job label, and retrieve the rest of the metrics. However, when I deploy my config, no metric is scraped.
Is it a bug in Prometheus or I simply misunderstood how the match params work?
If you really need to do this you need a primary vector selector which includes results.
Otherwise you'll get the error vector selector must contain at least one non-empty matcher.
So for example with these matchers you'll get what you are trying to achieve:
curl -G --data-urlencode 'match[]={job=~".+", job!="kubernetes-nodes"}' http://your-url.example.com/federate
As a safety measure to avoid you accidentally writing an instant vector that returns all the time series in your Prometheus, selectors must contain at least one matcher that does not match the empty string. Your selector has no such matcher (job!="kubernetes-nodes" matches an empty job label), so this is giving you an error.
You could add a selector such as __name__=~".+" however at a higher level this is an abuse of federation as it is not meant for pulling entire Prometheus servers. See https://www.robustperception.io/federation-what-is-it-good-for/

Monitor AWS Service Status using Splunk

Problem
Dependency on AWS Services status
If you depend on Amazon AWS service to operate, you need to keep a close eye on the status of their services. Amazon uses the website http://status.aws.amazon.com/, which provides links to RSS feeds to specific services in specific regions.
Potential Errors
Our service uses S3, CloudFront, and other services to operate. We'd like to be informed on any service that might go down during hours of operations, and automate what we should do in case something goes wrong.
Splunk Logging
We use Splunk for Logging all of our services.
Requirement
For instance, if errors occurs in the application while writing to S3, we'd like to know if that was caused by a potential outage in AWS.
How to monitor the Status RSS feed in Splunk?
Is there an HTTP client for that? A background service?
Solution
You can use the Syndication Input app to collect the RSS feed data from the AWS Status
Create a query that fetches the RSS Items that have errors and stores in Splunk indexes under the syndication sourcetype.
Create an alert based on the query, a since field so that we can adjust the alerts over time.
How
Ask your Splunk team to install the app "Syndication Input" on the environments you need.
After that, just collect each of the RSS feeds needed and add them to the Settings -> Data Input -> Syndication Feed. Take all the URLs from the Amazon Status RSS feeds and use them as Splunk Data Input, filling out the form with certain interval:
http://status.aws.amazon.com/rss/cloudfront.rss
http://status.aws.amazon.com/rss/s3-us-standard.rss
http://status.aws.amazon.com/rss/s3-us-west-1.rss
http://status.aws.amazon.com/rss/s3-us-west-2.rss
When you are finished, the Syndication App has the following:
Use the search for the errors when the occur, adjusting the “since” date so that you can create an alert for the results. I added a day in the past just for display purpose.
since should be some start day you will start monitoring AWS. This helps the query to result in any new event when Amazon publishes new errors captured from the text Informational message:.
The query should not return anything new because the since will not return any date.
Since the token RESOLVED is appended to a new RSS feed item, we exclude them from the alerts.
.
sourcetype=syndication "Informational message:" NOT "RESOLVED"
| eval since=strptime("2010-08-01", "%Y-%m-%d")
| eval date=strptime(published_parsed, "%Y-%m-%dT%H:%M:%SZ")
| rex field=summary_detail_base "rss\/(?<aws_object>.*).rss$"
| where date > since
| table aws_object, published_parsed, id, title, summary
| sort -published_parsed
Create an Alert with the Query. For instance, to send an email:

Is there an api to send notifications based on job outputs?

I know there are api to configure the notification when a job is failed or finished.
But what if, say, I run a hive query that count the number of rows in a table. If the returned result is zero I want to send out emails to the concerned parties. How can I do that?
Thanks.
You may want to look at Airflow and Qubole's operator for airflow. We use airflow to orchestrate all jobs being run using Qubole and in some cases non Qubole environments. We DataDog API to report success / failures of each task (Qubole / Non Qubole). DataDog in this case can be replaced by Airflow's email operator. Airflow also has some chat operator (like Slack)
There is no direct api for triggering notification based on results of a query.
However there is a way to do this using Qubole:
-Create a work flow in qubole with following steps:
1. Your query (any query) that writes output to a particular location on s3.
2. A shell script - This script reads result from your s3 and fails the job based on any criteria. For instance in your case, fail the job if result returns 0 rows.
-Schedule this work flow using "Scheduler" API to notify on failure.
You can also use "Sendmail" shell command to send mail based on results in step 2 above.