Ansible regex_search/regex_findall - regex

I'm trying to parse the output of a command that returned a line like this (there's more output, but this is the line that I'm after):
Remaining Time: 3 Minutes and 12 Seconds
And when there is no time left it returns a line like this:
Remaining Time: 0 Seconds
I'd like to extract the amount of minutes and seconds, so I can feed it to GNU date -d. First I tried this:
- name: determine how much time we have left
set_fact:
time_left: "{{ cmd_output.stdout | regex_search(time_left_regex, '\\1', '\\2') | join(' ') }}"
vars:
time_left_regex: 'Remaining Time: ([0-9]+ Minutes) and ([0-9]+ Seconds)'
But this does not handle the case when there is no time left. So I then tried something like this:
- name: determine how much time we have left
set_fact:
time_left: "{{ cmd_output.stdout | regex_findall(time_left_regex, '\\1') }}"
vars:
time_left_regex: 'Next Execution:.*([0-9]{1,2} (Minutes|Seconds))'
But this only returns something like:
ok: [localhost] => {
"msg": "time left: [[u'2 Seconds', u'Seconds']]" }
I think I'm on the right track but I need a better regex, so maybe somebody can help me out here?
Thank you so much in advance.

You can make the minutes part optional. The minutes will be in group 1 and the seconds will be in group 2.
Remaining Time: (?:([0-9]+ Minutes) and )?([0-9]+ Seconds)
Regex demo

It's possible to split the string (line) and combine a dictionary. For example
- set_fact:
time_left: "{{ time_left|default({})|
combine({myline[item]: myline[item+1]}) }}"
loop: "{{ range(0, myline|length + 1, 3)|list }}"
vars:
myline: "{{ cmd_output.stdout.split(':').1.split()|reverse|list }}"
- debug:
var: time_left
for various command outputs
cmd_output.stdout: 'Remaining Time: 3 Minutes and 12 Seconds'
cmd_output.stdout: 'Remaining Time: 0 Seconds'
cmd_output.stdout: 'Remaining Time: 2 Days and 7 Hours and 3 Minutes and 12 Seconds'
gives (respectively)
"time_left": {
"Minutes": "3",
"Seconds": "12"
}
"time_left": {
"Seconds": "0"
}
"time_left": {
"Days": "2",
"Hours": "7",
"Minutes": "3",
"Seconds": "12"
}

Related

Need to generate a list based on one item name and number of items

I have two variables:
name: "abc232323defg10"
cycle: "4"
I need to generate a list:
list:
- abc232323defg10
- abc232323defg9
- abc232323defg8
- abc232323defg7
where:
abc232323defg9 = abc232323defg(10-(cycle-3)),
abc232323defg8 = abc232323defg(10-(cycle-2)),
abc232323defg7 = abc232323defg(10-(cycle-1))
The variable "cycle" is the same as the number of items in the list, and I already have item where last 2 characters are the "largest number" (that is number 10 in the example). So other items should have last two characters subtracted from this "largest number" (with increments for each cycle). Cycle is never bigger then "largest number", but can be equal.
Order in the list is not relevant.
PS last two characters can be any number or even combination of one letter (a-z,A-Z) and one number. So it can be t1, or e9, or 88... that is why I think I need regex.
Any idea?
Given the variables
_name: "abc232323defg10"
cycle: "4"
Declare the variables prefix and index by splitting _name
prefix: "{{ _name|regex_replace('^(.+?)(\\d+)$', '\\1') }}"
index: "{{ _name|regex_replace('^(.+?)(\\d+)$', '\\2') }}"
give
prefix: abc232323defg
index: 10
Declare the list indexes
indexes: "{{ query('sequence', params) }}"
params: "start={{ index|int }} end={{ index|int - cycle|int + 1 }} stride=-1"
give
indexes: ['10', '9', '8', '7']
Finally, create product of the prefix and indexes, and join them
names: "{{ [prefix]|product(indexes)|map('join')|list }}"
gives the expected result
names:
- abc232323defg10
- abc232323defg9
- abc232323defg8
- abc232323defg7
Example of a complete playbook for testing
- hosts: localhost
vars:
_name: "abc232323defg10"
cycle: "4"
prefix: "{{ _name|regex_replace('^(.+?)(\\d+)$', '\\1') }}"
index: "{{ _name|regex_replace('^(.+?)(\\d+)$', '\\2') }}"
indexes: "{{ query('sequence', params) }}"
params: "start={{ index|int }} end={{ index|int - cycle|int + 1 }} stride=-1"
names: "{{ [prefix]|product(indexes)|map('join')|list }}"
tasks:
- debug:
msg: |
prefix: {{ prefix }}
index: {{ index }}
indexes: {{ indexes }}
names:
{{ names|to_nice_yaml|indent(2) }}

How to search and match pattern to get a value in ansible

My variable info has below value. (Actual case has huge data).
I am trying to search for specific word XYZ_data_001 and get the size information, which is after the pattern physical disk,
XYZ_data_001 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 6 0 8388607
XYZ_data_002 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 13 0 8388607
here is what is tried
- name: Print size
ansible.builtin.debug:
msg: "{{ info | regex_search('XYZ_data_001(.+)') | split('physical disk,') | last }}"
this will give me below output
ok: [testhost] => {
"msg": " 16384.00 MB, Free: 0.00 MB 2 0 6 0 8388607 "
}
Thanks in advance
You can use
{{ info | regex_search('XYZ_data_001\\b.*physical disk,\\s*(\\d[\\d.]*)', '\\1') }}
See the regex demo.
Details:
XYZ_data_001 - a XYZ_data_001 string
\b - a word boundary
.* - any text (any zero or more chars other than line break chars as many as possible)
physical disk, - a literal string
\s* - zero or more whitespaces
(\d[\d.]*) - Group 1 (\1): a digit and then zero or more digits or dots.
There are two filters in the collection Community.General that will help you to create dictionaries from the info.
Split the lines, split and trim the items, and use the filter community.general.dict to create the list of dictionaries
info_dict1: "{{ info.splitlines()|
map('split', ',')|
map('map', 'trim')|
map('zip', ['dev', 'spec', 'dsync', 'dir', 'disk', 'size', 'free'])|
map('map', 'reverse')|
map('community.general.dict') }}"
gives
info_dict1:
- dev: XYZ_data_001 file system device
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 6 0 8388607'
size: 16384.00 MB
spec: special
- dev: XYZ_data_002 file system device
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 13 0 8388607'
size: 16384.00 MB
spec: special
Split the attribute dev and use the filter community.general.dict_kv to create the list of dictionaries with the attribute device
info_dev: "{{ info_dict1|
map(attribute='dev')|
map('split')|
map('first')|
map('community.general.dict_kv', 'device') }}"
gives
info_dev:
- device: XYZ_data_001
- device: XYZ_data_002
Combine the dictionaries
info_dict2: "{{ info_dict1|zip(info_dev)|map('combine') }}"
gives
info_dict2:
- dev: XYZ_data_001 file system device
device: XYZ_data_001
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 6 0 8388607'
size: 16384.00 MB
spec: special
- dev: XYZ_data_002 file system device
device: XYZ_data_002
dir: directio on
disk: physical disk
dsync: dsync off
free: 'Free: 0.00 MB 2 0 13 0 8388607'
size: 16384.00 MB
spec: special
This way you can add other attributes if needed.
Q: "Search for specific word XYZ_data_001 and get the size."
A: Create a dictionary device_size
device_size: "{{ info_dict2|items2dict(key_name='device', value_name='size') }}"
gives
device_size:
XYZ_data_001: 16384.00 MB
XYZ_data_002: 16384.00 MB
Search the dictionary
- debug:
msg: "Size of XYZ_data_001 is {{ device_size.XYZ_data_001 }}"
gives
msg: Size of XYZ_data_001 is 16384.00 MB
Example of a complete playbook for testing
- hosts: localhost
vars:
info: |
XYZ_data_001 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 6 0 8388607
XYZ_data_002 file system device, special, dsync off, directio on, physical disk, 16384.00 MB, Free: 0.00 MB 2 0 13 0 8388607
info_dict1: "{{ info.splitlines()|
map('split', ',')|
map('map', 'trim')|
map('zip', ['dev', 'spec', 'dsync', 'dir', 'disk', 'size', 'free'])|
map('map', 'reverse')|
map('community.general.dict') }}"
info_dev: "{{ info_dict1|
map(attribute='dev')|
map('split')|
map('first')|
map('community.general.dict_kv', 'device') }}"
info_dict2: "{{ info_dict1|zip(info_dev)|map('combine') }}"
device_size: "{{ info_dict2|items2dict(key_name='device', value_name='size') }}"
tasks:
- debug:
var: info_dict1
- debug:
var: info_dev
- debug:
var: info_dict2
- debug:
var: device_size
- debug:
msg: "Size of XYZ_data_001 is {{ device_size.XYZ_data_001 }}"

How do I extract transcript with multiple speakers from Google Video Intelligence API Speech Transcription JSON output using jq?

I'm testing out Google Video Intelligence speech-to-text for transcribing podcast episodes with multiple speakers.
I've extracted an example and published that to a gist: output.json.
cat file.json | jq '.response.annotationResults[].speechTranscriptions[].alternatives[] | {startTime: .words[0].startTime, segment: .transcript }'
Above command will print out the startTime of each segment, along with the segment itself. jq-output.json
{
"time": "6.400s",
"segment": "Hi, my name is Melinda Smith from Noble works. ...snip"
}
{
"time": "30s",
"segment": " Any Graham as a tool for personal and organizational ...snip"
}
What I'm aiming for is to have the speakerTagfor each segment included in my jq output.
This is where I'm stuck... to start, each array within .alternatives[] contains .transcript a string containing that segment, .confidence, and .words[] an array with each word of that segment and the time it was spoken.
That part of the JSON is how I get the first part of the output. Then, after it's gone through each segment of the transcript, at the bottom, it has one last .alternatives[] array, containing (again) each word from the entire transcript, one at a time, along with it's startTime, endTime, and speakerTag.
Here's a simplified example of what I mean:
speechTranscriptions:
alternatives:
transcript: "Example transcript segment"
words:
word: "Example"; startTime: 0s;
word: "transcript"; startTime: 1s;
word: "segment"; startTime: 2s;
alternatives:
transcript: "Another transcript segment"
words:
word: "Another"; startTime: 3s;
word: "transcript"; startTime: 4s;
word: "segment"; startTime: 5s;
alternatives:
words:
word: "Example"; startTime: 0s; speakerTag: 1;
word: "transcript"; startTime: 1s; speakerTag: 1;
word: "segment"; startTime: 2s; speakerTag: 1;
word: "Another"; startTime: 3s; speakerTag: 2;
word: "transcript"; startTime: 4s; speakerTag: 2;
word: "segment"; startTime: 5s; speakerTag: 2;
What I was thinking is to somehow go through the jq-output.json, and match each startTime with it's corresponding speakerTag found in the original Video Intelligence API output.
.response.annotationResults[].speechTranscriptions[].alternatives[] | ( if .words[].speakerTag then {time: .words[].startTime, speaker: .words[].speakerTag} else empty end)
I tried a few variations of this, with the idea to print out only start-time and speakerTag, then match the values in my next step. My problem was not understanding how to only print the startTime if it has a corresponding speakerTag.
As mentioned in the comments, it would be preferable to generate this result in one command, but I was just trying to break the problem down into parts I could attempt to understand.
My problem was not understanding how to only print the startTime if it has a corresponding speakerTag.
This could be accomplished using the filter:
.response.annotationResults[].speechTranscriptions[].alternatives[].words[]
| select(.speakerTag)
| {time: .startTime, speaker: .speakerTag}
So perhaps the following is a solution (or at least close to a solution) to the main problem:
.response.annotationResults[].speechTranscriptions[].alternatives[]
| (INDEX(.words[] | select(.speakerTag); .startTime) | map_values(.speakerTag)) as $dict
| {startTime: .words[0].startTime, segment: .transcript}
| . + {speaker: $dict[.startTime]}

Limit ansible playbook task concurrency

I'm updating several hosts with ansible at the same time, however I have a limitation...
I have to download artifacts from a common repository with no more than 3 simultaneous downloads!
The current solution I have is to limit the whole playbook to max three concurrent tasks
strategy: linear
serial: 3
Is it possible to limit concurrency only for particular task step rather than the whole playbook?
There's no direct way. Only workarounds like run_once loop with delegate_to or multiplying the task with loop and executing only one item per host.
See issue #12170, which is closed with "won't fix" status for details.
delegate_to loop:
- mytask: ..
delegate_to: "{{item}}"
run_once: true
# many diff ways to make the loop
with_inventory_hostnames: all
multiplied task:
- name: target task
debug: msg="Performing task on {{ inventory_hostname }}, item is {{ item }}"
with_items: "{{ play_hosts }}"
when: "hostvars[item].inventory_hostname == inventory_hostname"
Yes, it's possible to only limit the concurrency of a certain task.
You just need to add the throttle keyword to your download task.
Example:
- name: Download payload.tar.gz
get_url:
url: https://example.com/path/payload.tar.gz
dest: /mnt/scratch/payload.tar.gz
mode: '0640'
throttle: 3
Note that throttle was introduced in Ansible 2.9.
Thanks to previous comments and #12170 I have come with my proposal, which still is not working as desired for the download case.
Notice that 2 is the maximum number of concurrent tasks executions desired.
- name: Download at ratio three at most
win_get_url:
url: http://ipv4.download.thinkbroadband.com/100MB.zip
dest: c:/ansible/100MB.zip
force: yes
with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"
While this will match the when on each iteration only if for certain hosts I still can see all the server performing the download at the same time.
Another way of testing it is with debug a message and a add a delay between iterations. This way is clear that only two are executed at each iterations.
- debug:
msg: "Item {{ item }} with modulus {{ (( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) }}"
with_sequence: start=0 end={{ (( play_hosts | length ) / 2 ) | round (0, 'floor') | int }}
loop_control:
pause: 2
when: "(( ansible_play_batch.index(inventory_hostname) % (( play_hosts | length ) / 2 )) | round) == (item | int)"
This seems like a case of the XY problem.
Why not download the files once to your controller, and then use the copy task to fan out from the controller to each host?
(I guess bandwidth concerns between your controller and hosts may cause issues for large files, but it's probably not going to be much different to limiting the download to 3 hosts anyway.)
You can override forks variable in ansible.cf. default value is 5.
ansible.cf
[defaults]
forks = 3
More info

Sorting logs using regex?

I'm trying to figure out how to sort logs for example...
User: test
Level: user
Domain: localhost
Time: 12pm
Blah: INFO
Date: 07-12-2016
Ip: 127.0.0.1
I would like the output text to be this also there is tab spaces.
User:Level:Domain:Time:Blah:Date:IP
If i get your question right, you're talking not about sorting, but about parsing. You have log strings which you want to convert to another format. The regex to match your log string would be
(?P<User>[^:]+):(?P<Level>[^:]+):(?P<Domain>[^:]+):(?P<Time>[^:]+):(?P<Blah>[^:]+):(?P<Date>[^:]+):(?P<IP>[^:]+)
However, since you have so many groups, it could be done much more efficiently, here's an example in python
import re
logString = "User:Level:Domain:Time:Blah:Date:IP"
logGroups = ["User", "Level", "Domain", "Time", "Blah", "Date", "IP"]
reLogGroups = "(?P<"+">[^:]+):(?P<".join(logGroups)+">[^:]+)"
matchLogGroups = re.search(reLogGroups,logString)
if matchLogGroups:
counter = 1
for logGroup in logGroups:
print(str(counter)+". " + logGroup + ": " + matchLogGroups.group(logGroup) + "\n")
counter += 1
The output is
1. User: User
2. Level: Level
3. Domain: Domain
4. Time: Time
5. Blah: Blah
6. Date: Date
7. IP: IP