Using jq to parse json output of AWS CLI tools with Lightsail - amazon-web-services

I'm trying to modify a script to automate lightsail snapshots, and I am having trouble modifying the jq query.
I'm trying to parse the output of aws lightsail get-instance-snapshots
This is the original line from the script:
aws lightsail get-instance-snapshots | jq '.[] | sort_by(.createdAt) | select(.[0].fromInstanceName == "WordPress-Test-Instance") | .[].name'
which returns a list of snapshot names with one per line.
I need to modify the query so that is does not return all snapshots, but rather only ones where the name start with 'autosnap'. i'm doing this as the script rotates snapshots, but I don't want it to delete snapshots I manually create (which will not start with 'autosnap').
Here is a redacted sample output from aws lightsail get-instance-snapshots
{
"instanceSnapshots": [
{
"location": {
"availabilityZone": "all",
"regionName": "*****"
},
"arn": "*****",
"fromBlueprintId": "wordpress_4_9_2_1",
"name": "autosnap-WordPress-Test-Instance-2018-04-16_01.46",
"fromInstanceName": "WordPress-Test-Instance",
"fromBundleId": "nano_1_2",
"supportCode": "*****",
"sizeInGb": 20,
"createdAt": 1523843190.117,
"fromAttachedDisks": [],
"fromInstanceArn": "*****",
"resourceType": "InstanceSnapshot",
"state": "available"
},
{
"location": {
"availabilityZone": "all",
"regionName": "*****"
},
"arn": "*****",
"fromBlueprintId": "wordpress_4_9_2_1",
"name": "Premanent-WordPress-Test-Instance-2018-04-16_01.40",
"fromInstanceName": "WordPress-Test-Instance",
"fromBundleId": "nano_1_2",
"supportCode": "*****",
"sizeInGb": 20,
"createdAt": 1523842851.69,
"fromAttachedDisks": [],
"fromInstanceArn": "*****",
"resourceType": "InstanceSnapshot",
"state": "available"
}
]
}
I would have thought something like this would work, but I'm not having any luck after many attempts...
aws lightsail get-instance-snapshots | jq '.[] | sort_by(.createdAt) | select(.[0].fromInstanceName == "WordPress-Test-Instance") | select(.[0].name | test("autosnap")) |.[].name'
Any help would be greatly appreciated!

The basic query for making the selection you describe would be:
.instanceSnapshots | map(select(.name|startswith("autosnap")))
(If you didn't need to preserve the array structure, you could go with:
.instanceSnapshots[] | select(.name|startswith("autosnap"))
)
You could then perform additional filtering by extending the pipeline.
If you were to use test/1, the appropriate invocation would be test("^autosnap") or perhaps test("^autosnap-").
Example
.instanceSnapshots
| map(select(.name|startswith("autosnap")))
| map(select(.fromInstanceName == "WordPress-Test-Instance"))
| sort_by(.createdAt)
| .[].name
The two successive selects could of course be compacted into one. For efficiency, the sorting should be done as late as possible.
Postscript
Although you might indeed be able to get away with commencing the pipeline with .[] instead of .instanceSnapshots, the latter is advisable in case the JSON schema changes. In a sense, the whole point of data formats like JSON is to make it easy to write queries that are robust with respect to (sane) schema-evolution.

Related

How to get time zone boundaries in BigQuery?

Giving the following GCP services:
BigQuery
Cloud Storage
Cloud Shell
What is the easiest way to create a BigQuery table with the following 2-columns structure ?
Column
Description
Type
Primary key
tzid
Time zone identifier
STRING
x
bndr
Boundaries
GEOGRAPHY
For example:
tzid
bndr
Africa/Abidjan
POLYGON((-5.440683 4.896553, -5.303699 4.912035, -5.183637 4.923927, ...))
Africa/Accra
POLYGON((-0.136231 11.13951, -0.15175 11.142384, -0.161168 11.14698, ...))
Pacific/Wallis
MULTIPOLYGON(((-178.350043 -14.384951, -178.344628 -14.394109, ...)))
Download and unzip timezones.geojson.zip from #evan-siroky repository on your computer.
Coordinates are structured as follows (geojson format):
{
"type": "FeatureCollection",
"features":
[
{
"type":"Feature",
"properties":
{
"tzid":"Africa/Abidjan"
},
"geometry":
{
"type":"Polygon",
"coordinates":[[[-5.440683,4.896553],[-5.303699,4.912035], ...]]]
}
},
{
"type":"Feature",
"properties": ...
}
]
}
BigQuery does not accept geojson but jsonl (new line delimited json) format to load tables. Steps 3 to 5 aim to convert to jsonl format.
Upload the file timezones_geojson.json to Cloud Storage gs://your-bucket/.
Move the file in the Cloud Shell Virtual Machine
gsutil mv gs://your-bucket/timezones_geojson.json .
Parse the file timezones_geojson.json, filter on "features" and return one line per element (see jq command):
cat timezones_geojson.json | jq -c ".features[]" > timezones_jsonl.json
The previous format will be transformed to:
{
"type":"Feature",
"properties":
{
"tzid":"Africa/Abidjan"
},
"geometry":
{
"type":"Polygon",
"coordinates":[[[-5.440683,4.896553],[-5.303699,4.912035], ... ]]]
}
}
{
"type":"Feature",
"properties":...
"geometry":...
}
Move the jsonl on Cloud Storage
gsutil mv timezones_jsonl.json gs://your-bucket/
Load the jsonl to BigQuery
bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON --json_extension=GEOJSON your_dataset.timezones gs://your-bucket/timezones_jsonl.json

jq filter JSON array based on value in list

I want to use jq to return the RuleArn value if the following condition matches i.e. .[].Conditions[].Values[] has an element that matches app.fantastic.com
My JSON array:
[
{
"Conditions": [
{
"Field": "http-header",
"HttpHeaderConfig": {
"Values": [
"dark"
],
"HttpHeaderName": "Environment"
}
},
{
"Values": [
"app.fantastic.com"
],
"Field": "host-header",
"HostHeaderConfig": {
"Values": [
"app.fantastic.com"
]
}
}
],
"IsDefault": false,
"Priority": "3",
"RuleArn": "iwantthisvalue"
}
]
I've tried these:
| jq -r '.[] | select(.Conditions[].Values[]=="app.fantastic.com")'
and
| jq -r '.[] | select(.Conditions[].Values[] | has("app.fantastic.com"))'
I get the following error:
jq: error (at :144): Cannot index array with string "Conditions"
And with this:
| jq '.[].Conditions[].Values | index ("app.fantastic.com") == 0 | .RuleArn'
This is the error I get:
jq: error (at :47): Cannot index boolean with string "RuleArn"
Note: I do not want a solution using --query of AWS cli as I use --query already to get a smaller JSON payload which I want to filter further using jq.
The root of the difficulty you are running into is that the problem statement is improper: .[].Conditions[].Values[] does not correspond to your JSON.
Using jq 1.5 or later, you could use the following filter:
.[]
| first(select(.Conditions | .. | objects
| select(has("Values") and
(.Values|index("app.fantastic.com")))))
| .RuleArn
Since .Values occurs in several places, it's not clear what the precise requirements are, but you might also wish to consider:
.[]
| select(.Conditions[].Values? | index("app.fantastic.com"))
| .RuleArn
or (less efficiently, but it works with jq 1.4 or later):
.[]
| select(.Conditions[].Values[]? == "app.fantastic.com")
| .RuleArn

Newman / Postman - Cannot replace value for a key in the Environment JSON, from command line

I'm new to both Postman and Newman.
I have created my simple test which uses the Environment Variables JSON for some properties values.
It runs fine when the value for this key is hardcoded in the environment.json but it's failing if I'm trying to pass/replace the value for the key from the command-line.
I do not have global variable json, and if possible, prefer not to use it.
Here is my command-line:
newman run "C:\Users\Automation\Postman\postman_autotest.json" --folder "AUTO" --global-var "client_secret=XXXX" --environment "C:\Users\Automation\Postman\postman_environment.json"
This value is essential for the API to work/connect, thus I'm getting 400 error back.
here is this key in the environment.json
{
"id": "673a4256-f5a1-7497-75aa-9e47b1dbad4a",
"name": "Postman Env Vars",
"values": [
{
"key": "client_secret",
"value": "",
"description": {
"content": "",
"type": "text/plain"
},
"enabled": true
}
],
"_postman_variable_scope": "environment",
"_postman_exported_at": "2019-04-03T20:31:04.829Z",
"_postman_exported_using": "Postman/6.7.4"
}
Just a thought... You can use a wrapper powershell script to replace the key at runtime then delete the file.
[CmdletBinding()]
Param (
[Parameter(Mandatory)]
[string]$Secret
)
$envFile = "C:\Users\Automation\Postman\postman_environment.json"
$envFileWithKey = "C:\Users\Automation\Postman\postman_environment_w_key.json"
$json = Get-Content $envFile -Raw | ConvertFrom-Json
$json.values[0].key = $Secret
ConvertTo-Json $json -Depth 10 | Out-File $envFileWithKey -Force
newman run "C:\Users\Automation\Postman\postman_autotest.json" --folder "AUTO" --environment $envFileWithKey
Remove-Item -Path $envFileWithKey
Then just:
.\RunAutomation.ps1 -Secret "this_is_a_secret_sshhhhhh"

Get multiple variations from Google Translate API

When we make a query to Translate API
https://translation.googleapis.com/language/translate/v2?key=$API_KEY&q=hello&source=en&target=e
I only get 1 result in :
{
"data": {
"translations": [
{
"translatedText": "....."
}
]
}
}
Is it possible to get all variations (alternatives) of that word, not only 1 translation?
Microsoft Azure supports one. https://learn.microsoft.com/en-us/azure/cognitive-services/translator/reference/v3-0-dictionary-lookup .
For ex. https://api.cognitive.microsofttranslator.com/dictionary/lookup?api-version=3.0&from=en&to=es
[
{"Text":"hello"}
]
gives you a list of translations like this:
[
{
"normalizedSource": "hello",
"displaySource": "hello",
"translations": [
{
"normalizedTarget": "diga",
"displayTarget": "diga",
"posTag": "OTHER",
"confidence": 0.6909,
"prefixWord": "",
"backTranslations": [
{
"normalizedText": "hello",
"displayText": "hello",
"numExamples": 1,
"frequencyCount": 38
}
]
},
{
"normalizedTarget": "dime",
"displayTarget": "dime",
"posTag": "OTHER",
"confidence": 0.3091,
"prefixWord": "",
"backTranslations": [
{
"normalizedText": "tell me",
"displayText": "tell me",
"numExamples": 1,
"frequencyCount": 5847
},
{
"normalizedText": "hello",
"displayText": "hello",
"numExamples": 0,
"frequencyCount": 17
}
]
}
]
}
]
You can see 2 different translations in this case.
The Translation API service doesn't support the retrieval of multiple translations of a word, as mentioned in the FAQ Documentation:
Is it possible to get multiple translations of a word?
No. This feature is only available via the web interface at
translate.google.com
In case this feature doesn't cover your current needs, you can use the Send Feedback button, located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in order to raise a Translation API feature request and notify to Google about this desired functionality.
Approach mapping Wiktionary using POS tags, related terms and Google-translated word.
TL;DR
The question is titled 'get-multiple-variations-from-google-translate-api', but in short, you (still) currently can't do this by using Google's service alone (as of Sept. 2022). It seems most companies, such as Google, want to continue charging for this service. This answer provides an approach using a (free) service as a pivot to get the term, related terms, and their POS (Parts of Speech) e.g. noun, verb, etc. before translating those terms and then re-querying the service.
This alternative creates a small pipeline that queries Wiktionary before (on the source language), and after (on the translated terms target language) the translation (using Google).
The small pipeline is written in python and bash.
Rationale
We could get word senses, for each POS (Part of Speech) and corresponding synonyms, then translate for each word sense since Google only translates word to word, and then match word senses for the corresponding target language using a tool such as Wiktionary.
Wiktionary
Fortunately, someone has already created a python library to query Wiktionary for multiple languages.
Script to get definitions / synonyms from Wiktionary (using python):
(requires wiktionaryparser )
e.g. python -m pip install wiktionaryparser
import sys;
import json;
from wiktionaryparser import WiktionaryParser;
parser = WiktionaryParser()
# sys.argv[1] is a language e.g. 'english'
parser.set_default_language(sys.argv[1])
print(
json.dumps(
[
[
{
'pos': d.get('partOfSpeech'),
'text':d.get('text'),
'examples':[e for e in d.get('examples')][0] if d.get('examples') else [],
'related': d.get('relatedWords')
} for d in w.get('definitions')
] for w in parser.fetch(sys.argv[2])
],
indent=2
)
)
Google translate + Wiktionary
The bash script below gets Wiktionary definitions, splits on synonym lists and correlates translations based on POS (Part of Speech).
To be honest this script is a bit convoluted, it uses a lot of utils, but it works. It could be refactored into python like the wiktionary part by anyone wanting to make something a bit more robust.
This github post provided some of the below script that call the free Google translate api.
#!/bin/bash
sl=$1
tl=$2
wiki_sl=$3
wiki_tl=$4
string=$5
ua='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
#echo "$string"
result="{\"${sl}\":[],\"${tl}\":[]}"
#set -x
while IFS= read line; do
# line could be better named 'synonym' here
pos="$(echo ${line} | jq -r ".pos")"
sl_result="$(echo $line | jq . -c)"
tl_result=""
opt_single="single?client=gtx&sl=${sl}&tl=${tl}&dt=t&q=${string//[[:blank:]]/+}"
full_url="http://translate.googleapis.com/translate_a/${opt_single}"
response=$(curl -sA "${ua}" "${full_url}")
tl_word="$(echo ${response} | jq -r '.[[0][0]][] | .[0:1][0]')"
echo "${tl_word}" | grep -q " " && continue 1
tl_result_new="$(python ./get_wiki.py "${wiki_tl}" "${tl_word}" | jq -r -c --arg POS "$pos" '.[][] | select(.pos==$POS)'),"
# making json
tl_result="[${tl_result_new}"
# iterate over synonyms
while IFS= read qry; do
opt_single="single?client=gtx&sl=${sl}&tl=${tl}&dt=t&q=${qry//[[:blank:]]/+}"
full_url="http://translate.googleapis.com/translate_a/${opt_single}"
response=$(curl -sA "${ua}" "${full_url}")
tl_word="$(echo ${response} | jq -r '.[[0][0]][] | .[0:1][0]')"
echo "${tl_word}" | grep -q " " && continue 1
tl_result_new="$(python ./get_wiki.py "${wiki_tl}" "${tl_word}" | jq -r -c --arg POS "$pos" '.[][] | select(.pos==$POS)'),"
# adding to json
tl_result="${tl_result},${tl_result_new}"
done< <(echo "${line}" | jq -c -r ' .related[].words[]' | \
sed -e 's/.*://;s/"//g;s/^ *//g;s/ *$//g' | tr ',' '\n')
tl_result="$(echo "${tl_result_new}" | sed 's/,$//g')"
[ -z "${tl_result}" ] && tl_result=null
[ -z "${sl_result}" ] && sl_result=null
result="{\"${sl}\":${sl_result},\"${tl}\":${tl_result}}"
echo "$result" | jq "."
done< <(python ./get_wiki.py "$wiki_sl" "$string" | \
jq -c -r '.[][]|select(.related[].relationshipType=="synonyms")') 2> /dev/null | jq -c '[.]'
How to use:
The first 2 arguments used are for google (source language, and target language in that order which are two-letter codes.
The second 2 arguments used are for Wiktionary (source language, a full word - e.g. 'English', 'French', etc.)
The final (fifth) argument is the single word to be translated.
./translate.sh en pt english portuguese help
In fact, the python 'wiktionaryparser' lib occasionally breaks and can throw an error, due to the fact that it is a webscraping library, which is why I add 2> /dev/null to silence stderr on output.
./translate.sh en pt english portuguese help 2> /dev/null
This script isn't perfect, but it is a starting point and a proof-of-concept to show you this is possible using a free tool such as wiktionary.
English to Portuguese
$ ./translate.sh en pt english portuguese help 2> /dev/null
Output:
[
{
"en": {
"pos": "noun",
"text": [
"help (usually uncountable, plural helps)",
"(uncountable) Action given to provide assistance; aid.",
"(usually uncountable) Something or someone which provides assistance with a task.",
"Documentation provided with computer software, etc. and accessed using the computer.",
"(usually uncountable) One or more people employed to help in the maintenance of a house or the operation of a farm or enterprise.",
"(uncountable) Correction of deficits, as by psychological counseling or medication or social support or remedial training."
],
"examples": "I need some help with my homework.",
"related": [
{
"relationshipType": "synonyms",
"words": [
"(action given to provide assistance): aid, assistance"
]
}
]
},
"pt": {
"pos": "noun",
"text": [
"assistência f (plural assistências)",
"assistance, aid, help",
"protection"
],
"examples": [],
"related": [
{
"relationshipType": "related terms",
"words": [
"assistir"
]
}
]
}
}
]
[
{
"en": {
"pos": "verb",
"text": [
"help (third-person singular simple present helps, present participle helping, simple past helped or (archaic) holp, past participle helped or (archaic) holpen)",
"(transitive) To provide assistance to (someone or something).",
"(transitive) To assist (a person) in getting something, especially food or drink at table; used with to.",
"(transitive) To contribute in some way to.",
"(intransitive) To provide assistance.",
"(transitive) To avoid; to prevent; to refrain from; to restrain (oneself). Usually used in nonassertive contexts with can."
],
"examples": "Risk is everywhere. […] For each one there is a frighteningly precise measurement of just how likely it is to jump from the shadows and get you. “The Norm Chronicles” […] aims to help data-phobes find their way through this blizzard of risks.",
"related": [
{
"relationshipType": "synonyms",
"words": [
"(provide assistance to): aid, assist, come to the aid of, help out; See also Thesaurus:help",
"(contribute in some way to): contribute to",
"(provide assistance): assist; See also Thesaurus:assist"
]
}
]
},
"pt": {
"pos": "verb",
"text": [
"ajudar (first-person singular present indicative ajudo, past participle ajudado)",
"to help, aid; to assist"
],
"examples": "Ajude-me! ― Help me!",
"related": [
{
"relationshipType": "related terms",
"words": [
"ajuda",
"ajudante"
]
}
]
}
}
]
English to Latin
$ ./translate.sh en la english latin body | jq '.'
[
{
"en": {
"pos": "noun",
"text": [
"body (countable and uncountable, plural bodies)",
"Physical frame.",
"Main section.",
"Coherent group.",
"Material entity.",
"(printing) The shank of a type, or the depth of the shank (by which the size is indicated).",
"(geometry) A three-dimensional object, such as a cube or cone."
],
"examples": "I saw them walking from a distance, their bodies strangely angular in the dawn light.",
"related": [
{
"relationshipType": "synonyms",
"words": [
"See also Thesaurus:body",
"See also Thesaurus:corpse"
]
}
]
},
"la": {
"pos": "noun",
"text": [
"cadāver n (genitive cadāveris); third declension",
"A corpse, cadaver, carcass"
],
"examples": [],
"related": []
}
}
]
When it doesn't work
Sometimes there is no output at all.
Shortcomings of this approach, and going further
Despite a lot of words being on Wiktionary, and a lot of synonyms being present, they are not always inside the 'related' field, sometimes synonyms are in the 'text' field, which gives word senses. I suspect that the partial information wiktionaryparser provides is the same on the Wiktionary site.
One could use any dictionary tool, or online thesaurus, such as wordnet, to first get possible POS tags and a word's synsets, or query a fasttext model to get a word's nearest neighbors, then filter only words that are nearest neighbors from the 'text' field in wiktionary.

Is there any easy way / API to find out the number of pipelines on a gocd server?

Sorry for the brief question, but just wondering if there's an API to find out the number of pipelines on a GoCD server.
The Pipeline Groups API will give you what you need after some JSON parsing.
$ curl 'https://ci.example.com/go/api/config/pipeline_groups' \
-u 'username:password'
Returns:
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
[
{
"pipelines": [
{
"stages": [
{
"name": "up42_stage"
}
],
"name": "up42",
"materials": [
{
"description": "URL: https://github.com/gocd/gocd, Branch: master",
"fingerprint": "2d05446cd52a998fe3afd840fc2c46b7c7e421051f0209c7f619c95bedc28b88",
"type": "Git"
}
],
"label": "${COUNT}"
}
],
"name": "first"
}
]
You can grab the config.xml file and parse it. from the config repo or via http.
As an alternative, you can just get the cctray file from your server at http://yourgoserver/go/cctray.xml and parse it.
It contains information about all the pipelines (including its stages)
I would recommend using yagocd:
from yagocd import Yagocd
go = Yagocd(server='https://build.gocd.io')
# login as guest
go._session.get('https://build.gocd.io/go/plugin/interact/gocd.guest.user.auth.plugin/index')
print(len(list(go.pipelines)))
Yes, of course. You can get the desired output in different ways. The first easy way to get the number of pipelines and other statistical information from the GoCD support URL (https://example.com/go/api/support) which requires admin privilege.
If the user does not have the admin privilege, we need to go with the GoCD pipeline_groups API. The below command should give you the exact result with jq(JSON processor)
$ curl 'https://example.com/go/api/config/pipeline_groups' -u 'username:password' | jq -r '.[] | .pipelines[].name' | wc -l
NOTE: Still Go Administrator users can get the actual number of pipelines.