robotframework: Dictionary has no key error - list

When I run the test DICT-WITH-LISTS, I get an error that says dictionary has no key. But as you can see I did print the key and the key whose value is meant to be a list seems to be there.
Not sure what I am doing wrong and why my key is showing up differently in the error message.
I would greatly appreciate any help in resolving this issue
*** Variables ***
${p1}= P-1
${f1}= F-2
${p2}= P-2
${p3}= P-3
*** Test Cases ***
DICT-WITH-LISTS
[tags] run
${resNameList} = Create List
${laneInfoList}= Create Dictionary L0=res1 L1=res2 L2=res3 L3=res4 L4=res5
${lane2resNameMappingList}= Create List
log to console LANE ID LIST : ${laneInfoList.keys()}
:FOR ${laneID} IN #{laneInfoList.keys()}
\ Append To List ${resNameList} ${laneInfoList['${laneID}']}
\ ${lane2resName}= Create dictionary lane=${laneID} resName=${laneInfoList['${laneID}']}
\ Append To List ${lane2resNameMappingList} ${lane2resName}
${resName2resObjMappingList}= Create Dictionary
#{resNameList}= Remove Duplicates ${resNameList}
log to console res LIST: #{resNameList}
${totalres}= Get length ${resNameList}
: FOR ${index} IN RANGE 0 ${totalres}
\ log to console INDEX:${index} res:${resNameList}[${index}] AFMObj:afm${index}
\ set to dictionary ${resName2resObjMappingList} ${resNameList}[${index}]=afm${index}
log to console lane2resNameMappingList: ${lane2resNameMappingList}
log to console resName2resObjMappingList: ${resName2resObjMappingList}
set global variable ${lane2resNameMappingList}
set global variable ${resName2resObjMappingList}
${totalObjList}= create list ${p1} ${f1} ${p2} ${p3}
${totalObjsInTest}= Get length ${totalObjList}
${totalLanesAvailable}= Get length ${lane2resNameMappingList}
${totalObjs} = Set Variable IF ${totalObjsInTest} > ${totalLanesAvailable} ${totalLanesAvailable} ${totalObjsInTest}
${object2resMappingList}= Create Dictionary
${resList} = Create List
log to console Attaching res for a total of ${totalObjs} objects based on current availability!
: FOR ${index} IN RANGE 0 ${totalObjs}
\ log to console Obj:${totalObjList}[${index}]
\ log to console res: ${lane2resNameMappingList[${index}]['resName']}
\ log to console res Obj:${resName2resObjMappingList['${lane2resNameMappingList[${index}]['resName']}']}
\ log to console Lane:${lane2resNameMappingList[${index}]['lane']}
\ log to console BW:${lane2resNameMappingList[${index}]['lane']}
\ ${isKeyPresent}= Run Keyword And Return Status Dictionary Should Contain Key ${object2resMappingList} ${totalObjList}[${index}]
\ log to console KEY:${isKeyPresent}
\ Run Keyword Unless ${isKeyPresent} set to dictionary ${object2resMappingList} ${totalObjList}[${index}]=${resList}
\ log to console AFTER:object2resMappingList: ${object2resMappingList}
\ log to console OBJ:${totalObjList}[${index}] VALUE:${resName2resObjMappingList['${lane2resNameMappingList[${index}]['resName']}']}
\ log to console DICT : ${object2resMappingList.keys()}
\ log to console DICT : &{object2resMappingList} KEY:${totalObjList}[${index}]
\ Run Keyword Append To List &{object2resMappingList}[${totalObjList}[${index}]] ${resName2resObjMappingList['${lane2resNameMappingList[${index}]['resName']}']}
\ ${resList} = Create List
log to console object2resMappingList: ${object2resMappingList}
OUTPUT
DICT-WITH-LISTS ...LANE ID LIST : odict_keys(['L0', 'L1', 'L2', 'L3', 'L4'])
....res LIST: ['res1', 'res2', 'res3', 'res4', 'res5']
DICT-WITH-LISTS .INDEX:0 res:res1 AFMObj:afm0
INDEX:1 res:res2 AFMObj:afm1
INDEX:2 res:res3 AFMObj:afm2
INDEX:3 res:res4 AFMObj:afm3
INDEX:4 res:res5 AFMObj:afm4
.lane2resNameMappingList: [{'lane': 'L0', 'resName': 'res1'}, {'lane': 'L1', 'resName': 'res2'}, {'lane': 'L2', 'resName': 'res3'}, {'lane': 'L3', 'resName': 'res4'}, {'lane': 'L4', 'resName': 'res5'}]
.resName2resObjMappingList: {'res1': 'afm0', 'res2': 'afm1', 'res3': 'afm2', 'res4': 'afm3', 'res5': 'afm4'}
DICT-WITH-LISTS ....Attaching res for a total of 4 objects based on current availability!
.Obj:P-1
res: res1
res Obj:afm0
Lane:L0
BW:L0
KEY:False
AFTER:object2resMappingList: {'P-1': []}
OBJ:P-1 VALUE:afm0
DICT : odict_keys(['P-1'])
DICT : {'P-1': []} KEY:P-1
DICT-WITH-LISTS | FAIL |
Dictionary '&{object2resMappingList}' has no key '['P-1', 'F-2', 'P-2', 'P-3'][0'.
Test | FAIL |
1 critical test, 0 passed, 1 failed
1 test total, 0 passed, 1 failed

Change the way you are referencing the key here, from:
Run Keyword Append To List &{object2resMappingList}[${totalObjList}[${index}]] ${resName2resObjMappingList['${lane2resNameMappingList[${index}]['resName']}']}
to:
Run Keyword Append To List ${object2resMappingList['${totalObjList}[${index}]']} ${resName2resObjMappingList['${lane2resNameMappingList[${index}]['resName']}']}

Related

Nextflow sarek pipeline on AWS batch

How to run nextflow sarek pipeline using aws batch in cloud9 environment?
I tried and I am getting "
Essential container in task exited
In order to be able to run jobs using AWS Batch, Nextflow requires access to the AWS CLI (i.e. aws) from within each of the containers that the pipeline has specified. To do this, you will need to create a custom AMI and use Conda (or another package manager) to install the AWS CLI tool. Ensure that your AMI also has Docker installed, see: docker installation.
The reason is that when the AWS CLI tool executes using Conda it will
use the version of python supplied by Conda. If you don’t use Conda
and install the AWS CLI using something like pip the aws command will
attempt to run using the version of python found in the running
container which won’t be able to find the necessary dependencies.
In your IAM settings, create an ecsInstanceRole and attach the AmazonS3FullAccess and AmazonEC2ContainerServiceforEC2Role policies. Then, when configuring a Compute Environment for AWS Batch, you will need to specify this instance role in step 1. Make sure to also supply the custom AMI ID (created above) when configuring the instance (under additional configuration) in step 2. You can then create a Job Queue and attach the compute environment to it. Finally, create an S3 bucket to write the results to.
Then get started with Cloud9 by creating and opening up an environment. The first job is to install Nextflow and move it to somewhere in your $PATH:
$ curl -s https://get.nextflow.io | bash
$ mkdir ~/bin && mv nextflow ~/bin
Then, with the following in ~/.nextflow/config for example:
plugins {
id 'nf-amazon'
}
process {
executor = 'awsbatch'
queue = 'test-queue'
errorStrategy = 'retry'
maxRetries = 3
}
aws {
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
region = 'us-east-1'
}
Test the pipeline:
$ nextflow run nf-core/sarek \
-ansi-log false \
-revision 3.1.1 \
-profile test \
-work-dir s3://mybucket/work \
--outdir s3://mybucket/results
Results:
N E X T F L O W ~ version 22.10.3
Pulling nf-core/sarek ...
downloaded from https://github.com/nf-core/sarek.git
Launching `https://github.com/nf-core/sarek` [chaotic_cray] DSL2 - revision: 96749f7421 [3.1.1]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
____
.´ _ `.
/ |\`-_ \ __ __ ___
| | \ `-| |__` /\ |__) |__ |__/
\ | \ / .__| /¯¯\ | \ |___ | \
`|____\´
nf-core/sarek v3.1.1
------------------------------------------------------
Core Nextflow options
revision : 3.1.1
runName : chaotic_cray
launchDir : /home/ec2-user
workDir : /mybucket/work
projectDir : /home/ec2-user/.nextflow/assets/nf-core/sarek
userName : ec2-user
profile : test
configFiles : /home/ec2-user/.nextflow/config, /home/ec2-user/.nextflow/assets/nf-core/sarek/nextflow.config
Input/output options
input : /home/ec2-user/.nextflow/assets/nf-core/sarek/tests/csv/3.0/fastq_single.csv
outdir : s3://mybucket/results
Main options
split_fastq : 0
intervals : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.interval_list
tools : strelka
Reference genome options
genome : null
dbsnp : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz
fasta : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.fasta
germline_resource : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/gnomAD.r2.1.1.vcf.gz
known_indels : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/mills_and_1000G.indels.vcf.gz
snpeff_db : WBcel235.105
snpeff_genome : WBcel235
snpeff_version : 5.1
vep_genome : WBcel235
vep_species : caenorhabditis_elegans
vep_cache_version : 106
vep_version : 106.1
igenomes_base : s3://ngi-igenomes/igenomes
igenomes_ignore : true
Institutional config options
config_profile_name : Test profile
config_profile_description: Minimal test dataset to check pipeline function
Max job request options
max_cpus : 2
max_memory : 6.5GB
max_time : 8.h
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/sarek for your analysis please cite:
* The pipeline
https://doi.org/10.12688/f1000research.16665.2
https://doi.org/10.5281/zenodo.4468605
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/sarek/blob/master/CITATIONS.md
------------------------------------------------------
WARN: There's no process matching config selector: .*:FREEC_SOMATIC -- Did you mean: FREEC_SOMATIC?
WARN: There's no process matching config selector: .*:FILTERVARIANTTRANCHES -- Did you mean: FILTERVARIANTTRANCHES?
WARN: There's no process matching config selector: NFCORE_SAREK:SAREK:CRAM_QC_NO_MD:SAMTOOLS_STATS -- Did you mean: NFCORE_SAREK:SAREK:CRAM_QC_RECAL:SAMTOOLS_STATS?
[0a/34e54c] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:GATK4_INTERVALLISTTOBED (genome)
[68/90b2eb] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_DBSNP (dbsnp_146.hg38.vcf)
[58/00228d] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:SAMTOOLS_FAIDX (genome.fasta)
[87/c64131] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY (genome.fasta)
[91/5140a7] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM1_INDEX (genome.fasta)
[a2/823190] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED (genome.interval_list)
[c2/b42dd9] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_INDELS (mills_and_1000G.indels.vcf)
Staging foreign file: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test_1.fastq.gz
[87/cb0449] Submitted process > NFCORE_SAREK:SAREK:FASTQC (test-test_L1)
[f4/86267b] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_SPLIT (chr22_1-40001)
[eb/dea090] Submitted process > NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP:BWAMEM1_MEM (test)
[4c/f5096d] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (test)
[b4/ebcc15] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:INDEX_MARKDUPLICATES (test)
[c0/8de864] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS (test)
[be/d73b9d] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:MOSDEPTH (test)
[68/acdf3e] Submitted process > NFCORE_SAREK:SAREK:BAM_BASERECALIBRATOR:GATK4_BASERECALIBRATOR (test)
[79/cff52c] Submitted process > NFCORE_SAREK:SAREK:BAM_APPLYBQSR:GATK4_APPLYBQSR (test)
[5b/cde6db] Submitted process > NFCORE_SAREK:SAREK:BAM_APPLYBQSR:CRAM_MERGE_INDEX_SAMTOOLS:INDEX_CRAM (test)
[20/d44d7e] Submitted process > NFCORE_SAREK:SAREK:CRAM_QC_RECAL:SAMTOOLS_STATS (test)
[99/f6362e] Submitted process > NFCORE_SAREK:SAREK:CRAM_QC_RECAL:MOSDEPTH (test)
[0f/892e88] Submitted process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:STRELKA_SINGLE (test)
[69/ca112a] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:BCFTOOLS_STATS (test)
[82/2d90d6] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT (test)
[cd/5be221] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_QUAL (test)
[b8/142b75] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_SUMMARY (test)
[25/397520] Submitted process > NFCORE_SAREK:SAREK:CUSTOM_DUMPSOFTWAREVERSIONS (1)
[f6/a9cc92] Submitted process > NFCORE_SAREK:SAREK:MULTIQC
Waiting for file transfers to complete (1 files)
-[nf-core/sarek] Pipeline completed successfully-

How to input shell parametes in AWS CLI

I've got two shell parameters
AID="subnet-00000"
BID="subnet-11111"
And I can't execute below statement.
aws rds create-db-subnet-group \
--db-subnet-group-name dbsubnet-$service_name \
--db-subnet-group-description "dbsubnet-$service_name" \
--subnet-ids '[$AID, $BID]'
The error message is saying that
Expecting value: line 1 column 2 (char 1)
How can I put my parameters into aws cli statement?
Since you've used single-quote, the variables wont be resolved. Also you can skip square brackets:
aws rds create-db-subnet-group \
--db-subnet-group-name dbsubnet-$service_name \
--db-subnet-group-description "dbsubnet-$service_name" \
--subnet-ids $AID $BID

Find Google Cloud Platform Operations Performed by a User

Is there a way to track what Google Cloud Platform operations were performed by a user? We want to audit our costs and track usage accordingly.
Edit: there's a Cloud SDK (gcloud) command:
compute operations list
that lists actions taken on Compute Engine instances. Is there a way to see what user performed these actions?
While you can't see a list of gcloud commands executed, you can see a list of API actions. gcloud beta logging surface help with listing/reading logs, but via the console it's a bit harder to use. Try checking the logs on the cloud console.
If you wish to only track Google Cloud Project (GCP) Compute Engine (GCE) operations with the list command for the operations subgroup, you are able to use the --filter flag to see operations performed by a given user $GCE_USER_NAME:
gcloud compute operations list \
--filter="user=$GCE_USER_NAME" \
--limit=1 \
--sort-by="~endTime"
#=>
NAME TYPE TARGET HTTP_STATUS STATUS TIMESTAMP
$GCP_COMPUTE_OPERATION_NAME start $GCP_COMPUTE_INSTANCE_NAME 200 DONE 1970-01-01T00:00:00.001-00:00
Note: feeding the string "~endTime" into the --sort-by flag puts the most recent GCE operation first.
It might help to retrieve the entire log object in JSON:
gcloud compute operations list \
--filter="user=$GCE_USER_NAME" \
--format=json \
--limit=1 \
--sort-by="~endTime"
#=>
[
{
"endTime": "1970-01-01T00:00:00.001-00:00",
. . .
"user": "$GCP_COMPUTE_USER"
}
]
or YAML:
gcloud compute operations list \
--filter="user=$GCE_USER_NAME" \
--format=yaml \
--limit=1 \
--sort-by="~endTime"
#=>
---
endTime: '1970-01-01T00:00:00.001-00:00'
. . .
user: $GCP_COMPUTE_USER
You are also able to use the Cloud SDK (gcloud) to explore all audit logs, not just audit logs for GCE; it is incredibly clunky, as the other existing answer points out. However, for anyone who wants to use gcloud instead of the console:
gcloud logging read \
'logName : "projects/$GCP_PROJECT_NAME/logs/cloudaudit.googleapis.com"
protoPayload.authenticationInfo.principalEmail="GCE_USER_NAME"
severity>=NOTICE' \
--freshness="1d" \
--limit=1 \
--order="desc" \
--project=$GCP_PROJECT_NAME
#=>
---
insertId: . . .
. . .
protoPayload:
'#type': type.googleapis.com/google.cloud.audit.AuditLog
authenticationInfo:
principalEmail: $GCP_COMPUTE_USER
. . .
. . .
The read command defaults to YAML format, but you can also get your audit logs in JSON:
gcloud logging read \
'logName : "projects/$GCP_PROJECT_NAME/logs/cloudaudit.googleapis.com"
protoPayload.authenticationInfo.principalEmail="GCE_USER_NAME"
severity>=NOTICE' \
--format=json \
--freshness="1d" \
--limit=1 \
--order="desc" \
--project=$GCP_PROJECT_NAME
#=>
[
{
. . .
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {
"principalEmail": "$GCE_USER_NAME"
},
. . .
},
. . .
}
]

telegraf - exec plugin - aws ec2 ebs volumen info - metric parsing error, reason: [missing fields] or Errors encountered: [ invalid number]

Machine - CentOS 7.2 or Ubuntu 14.04/16.xx
Telegraf version: 1.0.1
Python version: 2.7.5
Telegraf supports an INPUT plugin named: exec. First please see EXAMPLE 2 in the README doc there. I can't use JSON format as it only consumes Numeric values for metrics. As per the docs:
If using JSON, only numeric values are parsed and turned into floats. Booleans and strings will be ignored.
So, the idea is simple, you specify a script in exec plugin section, which should spit some meaningful info(in either JSON -or- influx data format in my case as I have some metrics which contains non-numeric values) which you would want to catch/show somewhere in a cool dashboard like for example Wavefront Dashboard shown here:
:
Basically one can use these metrics, tags, sources from where these metrics are coming from to find out various info about memory, cpu, disk, networking, other meaningful info and also create alerts using those if something unwanted happens.
OK, I came up with this python script available here:
#!/usr/bin/python
# sudo pip install boto3 if you don't have it on your machine.
import boto3
def generate(key, value):
"""
Creates a nicely formatted Key(Value) item for output
"""
return '{}="{}"'.format(key, value)
#return '{}={}'.format(key, value)
def main():
ec2 = boto3.resource('ec2', region_name="us-west-2")
volumes = ec2.volumes.all()
for vol in volumes:
# You don't need to wrap everything in `str` unless it is not a string
# By default most things will come back as a string
# unless they are very obviously not (complex, date time, etc)
# but since we are printing these (and formatting them into strings)
# the cast to string will be implicit and we don't need to make it
# explicit
# vol is already a fully returned volume you are essentially DOUBLING
# your API calls when you do this
#iv = ec2.Volume(vol.id)
output_parts = [
# Volume level details
generate('create_time', vol.create_time),
generate('availability_zone', vol.availability_zone),
generate('volume_id', vol.volume_id),
generate('volume_type', vol.volume_type),
generate('state', vol.state),
generate('size', vol.size),
generate('iops', vol.iops),
generate('encrypted', vol.encrypted),
generate('snapshot_id', vol.snapshot_id),
generate('kms_key_id', vol.kms_key_id),
]
for _ in vol.attachments:
# Will get any attachments and since it is a list
# we should write this to handle MULTIPLE attachments
output_parts.extend([
generate('InstanceId', _.get('InstanceId')),
generate('InstanceVolumeState', _.get('State')),
generate('DeleteOnTermination', _.get('DeleteOnTermination')),
generate('Device', _.get('Device')),
])
# only process when there are tags to process
if vol.tags:
for _ in vol.tags:
# Get all of the tags
output_parts.extend([
generate(_.get('Key'), _.get('Value')),
])
# output everything at once..
print ','.join(output_parts)
if __name__ == '__main__':
main()
This script will talk to AWS EC2 EBS volumes and outputs all values it can find (usually what you see in AWS EC2 EBS volume console) and format that info into a meaningful CSV format which I'm redirecting to a .csv log file.
We don't want to run the python script all the time (AWS API limits / cost factor).
So, once the .csv file is created, I created this small shell script which I'll set in Telegraf's exec plugin's section.
Shell script /tmp/aws-vol-info.sh set in Telegraf exec plugin is:
#!/bin/bash
cat /tmp/aws-vol-info.csv
Telegraf configuration file created using exec plugin (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf):
#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec
[[inputs.exec]]
commands = ["/tmp/aws-vol-info.sh"]
## Timeout for each command to complete.
timeout = "5s"
# Data format to consume.
# NOTE json only reads numerical measurements, strings and booleans are ignored.
data_format = "influx"
name_suffix = "_telegraf_execplugin"
I tweaked the .py (Python script for generate function) to generate the following three type of output formats (.csv file) and wanted to test how telegraf would handle this data before I enable the config file (/etc/telegraf/telegraf.d/catch-aws-ebs-info.conf) and restart telegraf service.
Format 1: (with double quotes " wrapped for every value)
create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"
Testing telegraf configuration on the telegraf directory gives me the following error.
Command: $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:37:48 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:37:48Z E! Errors encountered: [ metric parsing error, reason: [invalid field format], buffer: [create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"], index: [372]]
[vagrant#myvagrant ~] $
Format 2: (without any " double quotes)
create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app
Getting same error while testing Telegraf's configuration for exec plugin:
2017/03/10 00:45:01 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:45:01Z E! Errors encountered: [ metric parsing error, reason: [invalid value], buffer: [create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [63]]
Format 3: (this format doesn't have any " double quote and space character in the values). Substituted space with _ character.
create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app
Still didn't work, getting same error:
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:50:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:50:30Z E! Errors encountered: [ metric parsing error, reason: [missing fields], buffer: [create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [476]]
Format 4: If I follow influx line protocol as per this page: https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/
awsebs,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1
I'm getting this error:
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 02:34:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T02:34:30Z E! Errors encountered: [ invalid number]
HOW can I get rid of this error and get telegraf to work with exec plugin (which runs the .sh script)?
Other Info:
Python script will run once/twice per day (via cron) and telegraf will run every 1 minute (to run exec plugin - which runs .sh script - which will cat the .csv file so that telegraf can consume it in influx data format).
https://galaxy.ansible.com/wavefrontHQ/wavefront-ansible/
https://github.com/influxdata/telegraf/issues/2525
It seems like the rules are very strict, I should have looked more closely.
Syntax of the output of any program that you can to consume MUST match or follow INFLUX LINE PROTOCOL format shown below and also all the RULES which comes with it.
For ex:
weather,location=us-midwest temperature=82 1465839830100400200
| -------------------- -------------- |
| | | |
| | | |
+-----------+--------+-+---------+-+---------+
|measurement|,tag_set| |field_set| |timestamp|
+-----------+--------+-+---------+-+---------+
You can read more about what's measurement, tag, field and optional(timestamp) here: https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/
Important rules are:
1) There must be a , and no space between measurement and tag set.
2) There must be a space between tag set and field set.
3) For tag keys, tag values, and field keys always use a backslash character \ to escape if you want to escape any character in measurement name, tag or field set name and their values!
4) You can't escape \ with \
5) Line Protocol handles emojis with no problem :)
6) TAG / TAG set (tags comma separated) in OPTIONAL
7) FIELD / FIELD set (fields, comma separated) - At least ONE is required per line.
8) TIMESTAMP (last value shown in the format) is OPTIONAL.
9) VERY IMPORTANT QUOTING rules are below:
a) Never double or single quote the timestamp. It’s not valid Line Protocol. '123123131312313' or "1231313213131" won't work if that # is valid.
b) Never single quote field values (even if they’re strings!). It’s also not valid Line Protocol. i.e. fieldname='giga' won't work.
c) Do not double or single quote measurement names, tag keys, tag values, and field keys. NOTE: THIS does say !!! tag values !!!! so careful.
d) Do not double quote field values that are ONLY in floats, integers, or booleans format, otherwise InfluxDB will assume that those values are strings.
e) Do double quote field values that are strings.
f) AND the MOST IMPORTANT one (which will save you from getting BALD): If a FIELD value is set without double quote / i.e. you think it's an integer value or float in one line (for ex: anyone will say fields size or iops) and in some other lines (anywhere in the file that telegraf will read/parse using exec plugin) if you have a non-integer value set (i.e. a String), then you'll get the following error message Errors encountered: [ invalid number error.
So to fix it, the RULE is, if any possible FIELD value for a FIELD key is a string, then you MUST make sure to use " to wrap it (in every lines), it doesn't matter whether it has value 1, 200 or 1.5 in some lines (for ex: iops can be 1, 5) and in some other lines that value (iops can be None).
Error message: Errors encountered: [ invalid number
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 11:13:18 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T11:13:18Z E! Errors encountered: [ invalid number metric parsing error, reason: [invalid field format], buffer: [awsebsvol,host=myvagrant ], index: [25]]
So, after all this learning, it's clear that first I was missing the Influx Line protocol format and ALSO the RULES!!
Now, my output that I want my python script to generate should be like this (acc. to the INFLUX LINE PROTOCOL). You can just change the .sh file and use sed "s/^/awsec2ebs,/" or also do sed "s/^/awsec2ebs,sourcehost=$(hostname) /" (note: the space before the closing sed / character) and then you can have " around any key=value pair. I did change .py file to not use " for size and iops fields.
Anyways, if the output is something like this:
awsec2ebs,volume_id=vol-058e1d47dgh721121 create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"
In the above final working solution, I created a measurement named awsec2ebs then gave , between this measurement and tag key volume_id and for tag value, I did NOT use any ' or " quotes and then I gave a space character (as I just wanted only one tag for now otherwise you can have more tag using command separated way and following the rules) between tag set and field set.
Finally ran the command:
$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec which worked like a shenzi!
2017/03/10 03:33:54 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
> awsec2ebs_telegraf_execplugin,volume_id=vol-058e1d47dgh721121,host=myvagrant volume_type="gp2",iops="100",kms_key_id="None",role="app",size="8",encrypted="False",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",Name="[company-2b-app90] secondary",snapshot_id="snap-06h1h1b91bh662avn",DeleteOnTermination="True",mirror="secondary",cluster="company",autoscale="true",high_availability="1",create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",state="in-use",Device="/dev/sda1",hostname="company-2b-app90-i-0jjb1boop26f42f50" 1489116835000000000
[vagrant#myvagrant ~] $ echo $?
0
In the above example, size is the only field which will always be a number/numeric value, so we don't need to wrap it with " but it's up to you. Recall the MOST IMPORTANT rule.. above and the error it generates.
So final python file is:
#!/usr/bin/python
#Do `sudo pip install boto3` first
import boto3
def generate(key, value, qs, qe):
"""
Creates a nicely formatted Key(Value) item for output
"""
return '{}={}{}{}'.format(key, qs, value, qe)
def main():
ec2 = boto3.resource('ec2', region_name="us-west-2")
volumes = ec2.volumes.all()
for vol in volumes:
# You don't need to wrap everything in `str` unless it is not a string
# By default most things will come back as a string
# unless they are very obviously not (complex, date time, etc)
# but since we are printing these (and formatting them into strings)
# the cast to string will be implicit and we don't need to make it
# explicit
# vol is already a fully returned volume you are essentially DOUBLING
# your API calls when you do this
#iv = ec2.Volume(vol.id)
output_parts = [
# Volume level details
generate('volume_id', vol.volume_id, '"', '"'),
generate('create_time', vol.create_time, '"', '"'),
generate('availability_zone', vol.availability_zone, '"', '"'),
generate('volume_type', vol.volume_type, '"', '"'),
generate('state', vol.state, '"', '"'),
generate('size', vol.size, '', ''),
#The following vol.iops variable can be a number or None so you must wrap it with double quotes otherwise "invalid number" error will come.
generate('iops', vol.iops, '"', '"'),
generate('encrypted', vol.encrypted, '"', '"'),
generate('snapshot_id', vol.snapshot_id, '"', '"'),
generate('kms_key_id', vol.kms_key_id, '"', '"'),
]
for _ in vol.attachments:
# Will get any attachments and since it is a list
# we should write this to handle MULTIPLE attachments
output_parts.extend([
generate('InstanceId', _.get('InstanceId'), '"', '"'),
generate('InstanceVolumeState', _.get('State'), '"', '"'),
generate('DeleteOnTermination', _.get('DeleteOnTermination'), '"', '"'),
generate('Device', _.get('Device'), '"', '"'),
])
# only process when there are tags to process
if vol.tags:
for _ in vol.tags:
# Get all of the tags
output_parts.extend([
generate(_.get('Key'), _.get('Value'), '"', '"'),
])
# output everything at once..
print ','.join(output_parts)
if __name__ == '__main__':
main()
Final aws-vol-info.sh is:
#!/bin/bash
cat aws-vol-info.csv | sed "s/^/awsebsvol,host=`hostname|head -1|sed "s/[ \t][ \t]*/_/g"` /"
Final telegraf exec plugin config file is (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf) give any name with .conf:
#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec
[[inputs.exec]]
commands = ["/some/valid/path/where/csvfileexists/aws-vol-info.sh"]
## Timeout for each command to complete.
timeout = "5s"
# Data format to consume.
# NOTE json only reads numerical measurements, strings and booleans are ignored.
data_format = "influx"
name_suffix = "_telegraf_exec"
Run: and everything will work now!
$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec

How to paginate over an AWS CLI response?

I'm trying to paginate over EC2 Reserved Instance offerings, but can't seem to paginate via the CLI (see below).
% aws ec2 describe-reserved-instances-offerings --max-results 20
{
"NextToken": "someToken",
"ReservedInstancesOfferings": [
{
...
}
]
}
% aws ec2 describe-reserved-instances-offerings --max-results 20 --starting-token someToken
Parameter validation failed:
Unknown parameter in input: "PaginationConfig", must be one of: DryRun, ReservedInstancesOfferingIds, InstanceType, AvailabilityZone, ProductDescription, Filters, InstanceTenancy, OfferingType, NextToken, MaxResults, IncludeMarketplace, MinDuration, MaxDuration, MaxInstanceCount
The documentation found in [1] says to use start-token. How am I supposed to do this?
[1] http://docs.aws.amazon.com/cli/latest/reference/ec2/describe-reserved-instances-offerings.html
With deference to a 2017 solution by marjamis which must have worked on a prior CLI version, please see a working approach for paginating from AWS in bash from a Mac laptop and aws-cli/2.1.2
# The scope of this example requires that credentials are already available or
# are passed in with the AWS CLI command.
# The parsing example uses jq, available from https://stedolan.github.io/jq/
# The below command is the one being executed and should be adapted appropriately.
# Note that the max items may need adjusting depending on how many results are returned.
aws_command="aws emr list-instances --max-items 333 --cluster-id $active_cluster"
unset NEXT_TOKEN
function parse_output() {
if [ ! -z "$cli_output" ]; then
# The output parsing below also needs to be adapted as needed.
echo $cli_output | jq -r '.Instances[] | "\(.Ec2InstanceId)"' >> listOfinstances.txt
NEXT_TOKEN=$(echo $cli_output | jq -r ".NextToken")
fi
}
# The command is run and output parsed in the below statements.
cli_output=$($aws_command)
parse_output
# The below while loop runs until either the command errors due to throttling or
# comes back with a pagination token. In the case of being throttled / throwing
# an error, it sleeps for three seconds and then tries again.
while [ "$NEXT_TOKEN" != "null" ]; do
if [ "$NEXT_TOKEN" == "null" ] || [ -z "$NEXT_TOKEN" ] ; then
echo "now running: $aws_command "
sleep 3
cli_output=$($aws_command)
parse_output
else
echo "now paginating: $aws_command --starting-token $NEXT_TOKEN"
sleep 3
cli_output=$($aws_command --starting-token $NEXT_TOKEN)
parse_output
fi
done #pagination loop
Looks like some busted documentation.
If you run the following, this works:
aws ec2 describe-reserved-instances-offerings --max-results 20 --next-token someToken
Translating the error message, it said it expected NextToken which can be represented as next-token on the CLI.
If you continue to read the reference documentation that you provided, you will learn that:
--starting-token (string)
A token to specify where to start paginating. This is the NextToken from a previously truncated response.
Moreover:
--max-items (integer)
The total number of items to return. If the total number of items available is more than the value specified in max-items then a NextToken will be provided in the output that you can use to resume pagination.