When I try to list anything, my result is not grouped as a table ( as in the video). Each region is listed separately with its descriptions. Something like this
NAME: us-west3
CPUS: 0/24
DISKS_GB: 0/4096
ADDRESSES: 0/8
RESERVED_ADDRESSES: 0/8
STATUS: UP
TURNDOWN_DATE:
NAME: us-west4
CPUS: 0/24
DISKS_GB: 0/4096
ADDRESSES: 0/8
RESERVED_ADDRESSES: 0/8
STATUS: UP
TURNDOWN_DATE:
Please try:
gcloud config set accessibility/screen_reader False
And then repeat the command.
Related
I'm defining a data series for testing a Prometheus alert using the container_last_seen metric from the cadvisor exporter.
How do I enter timestamp series values, as returned by the container_last_seen metric? I'm testing Prometheus alerts on an Apple Mac which run in production on Linux boxes.
Here's one thing I tried:
input_series:
- series: |
container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
values: '1563968832+0x61'
It seems whatever I put in the values for the series is not accepted.
I've also tried durations: '0h+1mx60'
As this is legal: time() - container_last_seen{...} cls is definitely a timestamp, and I would expect a timestamp to be represented by a Unix epoch number. Executing the query on Prometheus gives Unix epoch times, but putting numbers in a series is rejected with the error below.
promtool is recognising the different types but giving much the same error:
➜ promtool test rules alertrules-service-oriented-test.yml
Unit Testing: alertrules-service-oriented-test.yml
FAILED:
1:1: parse error: unexpected number "0" in series values
If the values are '1h+0mx61', promtool correctly identifies the values as durations:
1:1: parse error: unexpected duration "1h" in series values
Note that when this test is commented out, there is no 1:1: parse error and the tests complete successfully. This is not a problem with out of sight parts of the test file.
Thanks for any insights.
Here's the alert:
alertrules.yaml:
- name: containers
interval: 15s
rules:
- alert: prod_container_crashing
expr: |
count by (instance, container_label_com_docker_swarm_service_name)
(
count_over_time(container_last_seen{container_label_com_docker_swarm_service_name!="",env="prod"}[15m])
) - 1 > 2
for: 5m
labels:
service: prod
type: container
severity: critical
annotations:
summary: "pdce {{ $labels.container_label_com_docker_swarm_service_name }}"
description: "{{ $labels.container_label_com_docker_swarm_service_name }} in prod cluster on {{ $labels.instance }} is crashing"
and here's the test file:
alertrules_test.yml:
rule_files:
- alertrules.yml
evaluation_interval: 1m
tests:
- name: container_tests
interval: 15s
input_series:
- series: |
container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
values: '1563968832+0x61'
alert_rule_test:
- eval_time: 15m
alertname: prod_container_crashing
exp_alerts:
- exp_labels:
service: prod
type: container
severity: critical
exp_annotations:
summary: prod service1
description: service1 in prod cluster on 10.0.0.1 is crashing
When the series: value is all on one line, without a > or | yaml flow operator, e.g.
- series: container_last_seen{container_label_com_docker_swarm_service_name="service1",env="prod",instance="10.0.0.1"}
values: '1563968832+0x61'
the error is not there, I don't know why. So this doesn't appear to be a data typing issue.
It's a shame for readability reasons-- either Prometheus or GoLang may have a squeaky wheel in their YAML implementation.
I have roughly formatted yml files with key/value pairs in them. I then imported the values of both of these files successfully into a running playbook using the include_vars module.
Now, I want to be able to compare the value of the key/value pair from file/list 1, to all of the keys of file/list 2. Then finally when there is a match, print and preferably save/register the value of the matching key from file/list 2.
Essentially I am comparing a machine name to an IP list to try to grab the IP the machine needs out of that list. The name is "dynamic" and is different each time the playbook is run, as file/list 1 is always dynamically populated on each run.
Examples:
file/list 1 contents
machine_serial: m60
s_iteration: a
site_name: dud
t_number: '001'
file/list 2 contents
m51: 10.2.5.201
m52: 10.2.5.202
m53: 10.2.5.203
m54: 10.2.5.204
m55: 10.2.5.205
m56: 10.2.5.206
m57: 10.2.5.207
m58: 10.2.5.208
m59: 10.2.5.209
m60: 10.2.5.210
m61: 10.2.5.211
In a nutshell, I want to be able to get the file/list 1 ct_machine_serial key who's value is currently: m60 to be able to find it's key match in file/list 2, and then print and/or preferably register it's value of 10.2.5.210.
What I've tried so far:
Playbook:
- name: IP gleaning comparison.
hosts: localhost
remote_user: ansible
become: yes
become_method: sudo
vars:
ansible_ssh_pipelining: yes
tasks:
- name: Try to do a variable import of the file1 file.
include_vars:
file: ~/active_ct-scanner-vars.yml
name: ctfile1_vars
become: no
- name: Try to do an import of file2 file for lookup comparison to get an IP match.
include_vars:
file: ~/machine-ip-conversion.yml
name: ip_vars
become: no
- name: Best, but failing attempt to get the value of the match-up IP.
debug:
msg: "{{ item }}"
when: ctfile1_vars.machine_serial == ip_vars
with_items:
- "{{ ip_vars }}"
Every task except the final one works perfectly.
My failed output final task:
TASK [Best, but failing attempt to get the value of the match-up IP.] ***********************************************************************************
skipping: [localhost] => (item={'m51': '10.200.5.201', 'm52': '10.200.5.202', 'm53': '10.200.5.203', 'm54': '10.200.5.204', 'm55': '10.200.5.205', 'm56': '10.200.5.206', 'm57': '10.200.5.207', 'm58': '10.200.5.208', 'm59': '10.200.5.209', 'm60': '10.200.5.210', 'm61': '10.200.5.211'})
skipping: [localhost]
What I hoped for hasn't happened, it simply skips the task, and doesn't iterate over the list like I was hoping, so there must be a problem somewhere. Hopefully there is an easy solution to this I just missed. What could be the correct answer?
Given the files
shell> cat active_ct-scanner-vars.yml
machine_serial: m60
s_iteration: a
site_name: dud
t_number: '001'
shell> cat machine-ip-conversion.yml
m58: 10.2.5.208
m59: 10.2.5.209
m60: 10.2.5.210
m61: 10.2.5.211
Read the files
- include_vars:
file: active_ct-scanner-vars.yml
name: ctfile1_vars
- include_vars:
file: machine-ip-conversion.yml
name: ip_vars
Q: "Compare the machine name to an IP list and grab the IP."
A: Both variables ip_vars and ctfile1_vars are dictionaries. Use ctfile1_vars.machine_serial as index in ip_vars
match_up_IP: "{{ ip_vars[ctfile1_vars.machine_serial] }}"
gives
match_up_IP: 10.2.5.210
Example of a complete playbook for testing
- hosts: localhost
gather_facts: false
vars:
match_up_IP: "{{ ip_vars[ctfile1_vars.machine_serial] }}"
tasks:
- include_vars:
file: active_ct-scanner-vars.yml
name: ctfile1_vars
- include_vars:
file: machine-ip-conversion.yml
name: ip_vars
- debug:
var: match_up_IP
I am looking to train a model using Google Cloud's new service - the Unified AI Platform. To do so I am using a config.yaml that looks like this:
workerPoolSpecs:
workerPoolSpec:
machineSpec:
machineType: n1-highmem-16
acceleratorType: NVIDIA_TESLA_P100
acceleratorCount: 2
replicaCount: 1
pythonPackageSpec:
executorImageUri: us-docker.pkg.dev/cloud-aiplatform/training/tf-gpu.2-4:latest
packageUris: gs://path/to/bucket/unified_ai_platform/src_dist/trainer-0.1.tar.gz
pythonModule: trainer.task
workerPoolSpec:
machineSpec:
machineType: n1-highmem-16
acceleratorType: NVIDIA_TESLA_P100
acceleratorCount: 2
replicaCount: 2
pythonPackageSpec:
executorImageUri: us-docker.pkg.dev/cloud-aiplatform/training/tf-gpu.2-4:latest
packageUris: gs://path/to/bucket/unified_ai_platform/src_dist/trainer-0.1.tar.gz
pythonModule: trainer.task
However for distributed training I am unable to understand how to pass multiple workerPoolSpecs in this file. The example yaml file provided does not look at the case wherein I can provide multiple workerPoolSpecs.
The example's documentation also saying that "You can specify multiple worker pool specs in order to create a custom job with multiple worker pools".
Any help in this regard will be appreciated.
Answering my own question. The config.yaml file should look like this:
workerPoolSpecs:
- machineSpec:
machineType: n1-standard-16
acceleratorType: NVIDIA_TESLA_P100
acceleratorCount: 2
replicaCount: 1
containerSpec:
imageUri: gcr.io/path/to/container:v2
args:
- --model-dir=gs://path/to/model
- --tfrecord-dir=gs://path/to/training/data/
- --epochs=2
- machineSpec:
machineType: n1-standard-16
acceleratorType: NVIDIA_TESLA_P100
acceleratorCount: 2
replicaCount: 2
containerSpec:
imageUri: gcr.io/path/to/container:v2
args:
- --model-dir=gs://path/to/models
- --tfrecord-dir=gs://path/to/training/data/
- --epochs=2
I have a simple playbook that run Cisco nxos command, which the playbook ran successful.
Would like to know what is the code save all the result into a file regardless how many hosts I have and use Survey to input the filename.
Currently, here is my code:
---
- name: run multiple commands on remote nodes
nxos_command:
commands:
- show clock
- show int status
- show cdp neigh
- show int desc
- show port-channel summ
- show vpc
- show vpc role
Try with code
---
- name: run multiple commands on remote nodes
register: myshell_output
nxos_command:
commands:
- show clock
- show int status
- show cdp neigh
- show int desc
- show port-channel summ
- show vpc
- show vpc role
- name: Saving data to local file
copy:
content: "{{ myshell_output.stdout|join('\n') }}"
dest: "/tmp/hello.txt"
delegate_to: localhost
It give me an error:
FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.utils.unsafe_proxy.AnsibleUnsafeText object' has no attribute 'stdout'\n\nThe error appears to be in '/tmp/awx_1869_7__9l_9l/project/roles/bcpcommands/tasks/main.yml': line 3, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: run multiple commands on remote nodes\n ^ here\n"}
The host normally I limit it at Ansible-Tower LIMIT column.
The ideal output of the file possible to include the hostname and commands that I key in?
Thanks
You probably got the indenting wrong. Try;
---
- hosts: my_host
tasks:
- name: run multiple commands on remote nodes
nxos_command:
commands: "{{ item }}"
loop:
- show clock
- show int status
- show cdp neigh
- show int desc
- show port-channel summ
- show vpc
- show vpc role
register: myshell_output
- debug:
msg: "{{ myshell_output }}"
- name: Saving data to local file and include hostname
copy:
content: "{{ myshell_output.stdout|join('\n') }} hostname: {{ inventory_hostname }}"
dest: "/tmp/hello.txt"
delegate_to: localhost
Edit the hostname.
The debug task must output an 'stdout' message. If that one is not present, then your copy task will fail.
I've provisions a Keyspace on AWS and in order to make sure it can achieve our desired performance I'm trying to run the cassandra-stress tool on it and compare it to other architectures we're experimenting with.
I managed to connect to it using the following cqlshrc:
[connection]
port = 9142
factory = cqlshlib.ssl.ssl_transport_factory
[ssl]
validate = true
certfile = /root/.cassandra/AmazonRootCA1.pem
And the following command (hoping that soon enough there will be Python3 support, the development was completed this February according to their Jira ticket):
cqlsh cassandra.eu-central-1.amazonaws.com 9142 -u "myuser-at-722222222222" -p "12/12ZmHmtD1klsDk9cgqt/XXXXXXXXxUz6Sy687z/U=" --ssl --cqlversion="3.4.4"
Surprisingly or not, when using the official AWS guides things tend to work.
So I went on and tried connecting the cassandra-stress tool (I have it inside a Docker container, I'd rather keep my OS Java free) to the same Keyspace.
First I converted the AWS AmazonRootCA1.pem into cassandra_truststore.jks using the following commands (explained here):
openssl x509 -outform der -in AmazonRootCA1.pem -out temp_file.der
keytool -import -alias cassandra -keystore cassandra_truststore.jks -file temp_file.der
Now when I'm trying to run the actual tool like this:
./cassandra-stress write -node cassandra.eu-central-1.amazonaws.com -port native=9142 thrift=9142 jmx=9142 -transport truststore=/root/.cassandra/cassandra_truststore.jks truststore-password=mypassword -mode native cql3 user="myuser-at-722222222222" password="12/12ZmHmtD1klsDk9cgqt/XXXXXXXXxUz6Sy687z/U="
I'm getting the following error:
******************** Stress Settings ********************
Command:
Type: write
Count: -1
No Warmup: false
Consistency Level: LOCAL_ONE
Target Uncertainty: 0.020
Minimum Uncertainty Measurements: 30
Maximum Uncertainty Measurements: 200
Key Size (bytes): 10
Counter Increment Distibution: add=fixed(1)
Rate:
Auto: true
Min Threads: 4
Max Threads: 1000
Population:
Sequence: 1..1000000
Order: ARBITRARY
Wrap: true
Insert:
Revisits: Uniform: min=1,max=1000000
Visits: Fixed: key=1
Row Population Ratio: Ratio: divisor=1.000000;delegate=Fixed: key=1
Batch Type: not batching
Columns:
Max Columns Per Key: 5
Column Names: [C0, C1, C2, C3, C4]
Comparator: AsciiType
Timestamp: null
Variable Column Count: false
Slice: false
Size Distribution: Fixed: key=34
Count Distribution: Fixed: key=5
Errors:
Ignore: false
Tries: 10
Log:
No Summary: false
No Settings: false
File: null
Interval Millis: 1000
Level: NORMAL
Mode:
API: JAVA_DRIVER_NATIVE
Connection Style: CQL_PREPARED
CQL Version: CQL3
Protocol Version: V4
Username: myuser-at-722222222222
Password: *suppressed*
Auth Provide Class: null
Max Pending Per Connection: 128
Connections Per Host: 8
Compression: NONE
Node:
Nodes: [cassandra.eu-central-1.amazonaws.com]
Is White List: false
Datacenter: null
Schema:
Keyspace: keyspace1
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Strategy Pptions: {replication_factor=1}
Table Compression: null
Table Compaction Strategy: null
Table Compaction Strategy Options: {}
Transport:
factory=org.apache.cassandra.thrift.TFramedTransportFactory; truststore=/root/.cassandra/cassandra_truststore.jks; truststore-password=mypassword; keystore=null; keystore-password=null; ssl-protocol=TLS; ssl-alg=SunX509; store-type=JKS; ssl-ciphers=TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA;
Port:
Native Port: 9142
Thrift Port: 9142
JMX Port: 9142
Send To Daemon:
*not set*
Graph:
File: null
Revision: unknown
Title: null
Operation: WRITE
TokenRange:
Wrap: false
Split Factor: 1
java.lang.RuntimeException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.eu-central-1.amazonaws.com/3.127.48.183:9142 (com.datastax.driver.core.exceptions.TransportException: [cassandra.eu-central-1.amazonaws.com/3.127.48.183] Channel has been closed))
at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:220)
at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesNative(SettingsSchema.java:79)
at org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:69)
at org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:228)
at org.apache.cassandra.stress.StressAction.run(StressAction.java:57)
at org.apache.cassandra.stress.Stress.run(Stress.java:143)
at org.apache.cassandra.stress.Stress.main(Stress.java:62)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.eu-central-1.amazonaws.com/3.127.48.183:9142 (com.datastax.driver.core.exceptions.TransportException: [cassandra.eu-central-1.amazonaws.com/3.127.48.183] Channel has been closed))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1424)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:403)
at org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:160)
at org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:211)
... 6 more
I've tried changing some parameters such as the jks password etc. (Just in case I was wrong) but I got a different error message so it's probably not the case.
Did I miss something?
Try using TLP Stress instead.
tlp-stress run RandomPartitionAccess -d 10m --host cassandra.us-east-1.amazonaws.com --port 9142 --username alice --password fLyWYFlTCD5J2gzGAZ –ssl --max-requests 4000 --dc us-east-2 --threads 10
https://thelastpickle.com/tlp-stress/