kubectl display all (pod name, container name) couples - kubectl

Suppose I have 2 pods podA and podB, each one having 2 containers.
I'd like to diplay:
podA containerA
podA containerB
podB containerA
podB containerB
Using jsonpath, I'm only able to display this kind of output
podA containerA
containerB
podB containerA
containerB
using this command:
kubectl get pods -o jsonpath="{range .items[*]}{.metadata.name}{range .spec.containers[*]}{.name}{'\n'}{end}{end}"
Is it possible to repeat the pod name, using only kubectl command?

I think not.
There's no support for variable assignment so it's not possible to capture a value for Pod's name from the outer loop (range .items[*]) to access from within the inner loop once the inner range rescopes the pipeline.
I understand the intent in wanting to use only kubectl but this limits you to kubectl's JSONPath implementation.
jq is a more general-purpose JSON processor that is reasonable to expect|assume be installed by your users. It would permit capturing the Pod's name.
I don't have access to a cluster but something like:
FILTER='
.items[]
| metadata.name as $pod
| .spec.containers[].name as $container
| $pod+" "+$container
'
jq -r "${FILTER}"

Related

Display slice length in kubectl custom columns output

Let's say I want to list pods, and show their name and the number of containers they're running. If I just want the images tags themselves, I could do something like
λ kubectl get pods --output custom-columns='NAME:.metadata.namespace,IMAGES:.spec.containers[*].image'
NAME IMAGES
prometheus-system quay.io/prometheus/prometheus:v2.21.0,quay.io/prometheus-operator/prometheus-config-reloader:v0.42.1,jimmidyson/configmap-reload:v0.4.0
prometheus-system quay.io/prometheus-operator/prometheus-operator:v0.42.1
But how do I make it display just the number of containers? In other words, what do I put for the selector to get the lenght of the slice, to give me output like this instead?
λ kubectl get pods --output custom-columns='NAME:.metadata.namespace,CONTAINERS:<what goes here?>'
NAME CONTAINERS
prometheus-system 3
prometheus-system 1
(Eventually, I want to put this on a CRD to display the length of a list in its default output, but I figure this use case is more reproducible, and therefore easier to relate to. IIUC - but please correct me if I'm wrong! - a solution that works for this question, will also work for the display-columns of a CRD...)

is possible to use aws-cli start-query function without start-time/end-time?

I'm trying to use aws logs start-query function, but I need something more dynamic than start-time/end-time (with unix timestamp). Like last 5 minutes or something like that. Is this possible?
AWS CLI doesn't offer such possibilities like "last X minutes" for logs regardless of function you use to find logs. But start-time and end-time is fully flexible way to get logs - you just need to pass proper values.
It means that you can create own script doing exactly what you need, i.e. it could calculate start and end time and just pass them to to start-query function.
Example of simple calculation of start_time and end_time in bash:
#!/bin/bash
declare -i start_time
declare -i end_time
declare -i last_minutes
declare -i last_millis
end_time=$(date +%s)
last_minutes=$1
last_millis=$last_minutes*60*1000
start_time=$end_time-$last_millis
echo "$start_time"
echo "$end_time"
so you can invoke this script passing number of last minutes and it will calculate start_time and end_time. Then you just need to invoke proper command you need, e.g. aws logs start-query --start-time $start_time --end-time $end_time instead of printing start_time and end_time. You can introduce other options in the script depending on your needs as well.

allocate largest partition with ansible when parition name varies

I'm using ansible to configure some AWS servers. All the servers have a 1024 GB, or larger, partition. I want to allocate this partition and assign it to /data.
I have an ansible script that does this on my test machine. However, when I tried running it against all my AWS machines it failed on some, complaining /dev/nvme1n1 device doesn't exist. The problem is that some of the servers had a 20 GB root partition separate from the larger partition, and some don't. That means sometimes the partition I care about is nvme1n1 and sometimes it's nvm0n1.
I don't want to place a variable in the hosts file, since that file is being dynamically loaded from AWS anyways. Given that what is the easiest way to look up the largest device and get it's name in ansible so I can correctly tell ansible to allocate whichever device is largest?
I assume, when you talk, about "partitions", you mean "disks", as a partition will have a name like nvme0n1p1, while the disk would be called nvme0n1.
That said, I have not found an "ansible-way" to do that, so I usually parse lsblk and do some grep-magic. In your case this is what you need to run:
- name: find largest disk
shell: |
set -euo pipefail
lsblk -bl -o SIZE,NAME | grep -P 'nvme\dn\d$' | sort -nr | awk '{print $2}' | head -n1
args:
executable: /bin/bash
register: largest_disk
- name: print name of largest disk
debug:
msg: "{{ largest_disk.stdout }}"
You can use the name of the disk in the parted-module to do whatever you need with it.
Apart from that, you should add some checks before formatting your disks, so you don't overwrite something (e.g. if your playbook ran on a host before, the disk might already be formatted and contain data, and in that case, you would not format it again, because you would overwrite the existing data).
Explanation:
lsblk -bl -o SIZE,NAME prints the size and names of all block-devices
grep -P 'nvme\dn\d$' greps out all disks (partitions have some pXX in the end, remember?)
sort -nr sorts the output numerically by the first column (so you get the largest on top)
awk '{print $2}' prints only the second column (the name that is)
head -n1 returns the first line (containing the name of the largest disk)

What gcloud command can be used to obtain a list of default compute engine service accounts across your organisation?

I have tried this command;
gcloud alpha scc assets list <ORGANISATION-ID> --filter "security_center_properties.resource.type="google.iam.ServiceAccount" AND resource_properties.name:\"Compute Engine default service account\""
but I am recieving the following error;
(gcloud.alpha.scc.assets.list) INVALID_ARGUMENT: Invalid filter.
When I remove the filter after AND, I don't get an error message but I just see an >
Any ideas where I am going wrong?
I have reviewed this documentation to support me building the command but not sure which is the right filter to use.
I wonder if i should be filtering on the email of a compute engine default service account that ends "-compute#developer.gserviceaccount.com" but I can't identify what the right filter for this is.
The problem is the use of " on the filter.
You need to type --filter and put the filter like this: "FILTER_EXPRESION".
One filter expression could be: security_center_properties.resource_type="google.compute.Instance"
But you can not put a double quote inside a double quote block. So you need to use the back slash (\),if not, the command interpret the first double quote of the filter as the end of the filter.
On the other hand if you delete part of the command the prompt shows you '>' because there is a double quote block that is not end and it is waiting that you ends the command.
So the filter that you want has to be like this, for example:
gcloud alpha scc assets list <ORGANIZATION ID> \
--filter "security_center_properties.resource_type=\"google.compute.Instance\" AND security_center_properties.resource_type=\"google.cloud.resourcemanager.Organization\""
I hope that this explanation could help you!

How to pass arguments to streaming job on Amazon EMR

I want to produce the output of my map function, filtering the data by dates.
In local tests, I simply call the application passing the dates as parameters as:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 | ./reducer.py
Then the parameters are taken in the map function
#!/usr/bin/python
date1 = sys.argv[1];
date2 = sys.argv[2];
The question is:
How do I pass the date parameters to the map calling on Amazon EMR?
I am a beginner in Map reduce. Will appreciate any help.
First of all,
When you run a local test, and you should as often as possible.
the correct format (in order to reproduce how map-reduce works) is:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 | sort | ./reducer.py | sort
That the way the hadoop framework works.
If you are looking on a big file, you should do it in steps to verify results of each line.
meaning:
cat access_log | ./mapper.py 20/12/2014 31/12/2014 > map_result.txt
cat map_result.txt | sort > map_result_sorted.txt
cat map_result_sorted.txt | ./reducer.py > reduce_result.txt
cat reduce_result.txt | sort > map_reduce_result.txt
In regard to your main question:
Its the same thing.
If you are going to use the amazon web console to create your cluster, in the add step window you just write as fallowing:
name: learning amazon emr
Mapper: (here they say: please give us s3 path to your mapper, we will ignore that, and just write our script name and parameters, no backslash...) mapper.py 20/12/2014 31/12/2014
Reducer: (the same as in the mapper) reducer.py (you can add here params too)
Input location: ...
Output location: ... (just remember to use a new output every time, or your task will fail)
Arguments: -files s3://cod/mapper.py,s3://cod/reducer.py (use your file path here, even if you add only one file use the -files argument)
That's it
If you are going into the all argument thing, i suggest you see this guy blog on how to use the passing of arguments in order to use only a single map,reduce file.
Hope it helped