Why can only download the first episode video on bilibili with youtube-dl? - youtube-dl

I can download the first episode of a series.
yutube-dl https://www.bilibili.com/video/av90163846?p=1
Now I want to download all episodes of the series.
for i in $(seq 1 55)
do
yutube-dl https://www.bilibili.com/video/av90163846?p=$i
done
All other episodes except the first can't be downloaded ,both of them contains same error info such as below:
[BiliBili] 90163846: Downloading webpage
[BiliBili] 90163846: Downloading video info page
[download] 【合集300集全】地道美音 美国中小学教学 自然科学 社会常识-90163846.flv has already been downloaded
Please have a try and check what happens,how to fix then?
#Christos Lytras,strange thing happen with your code:
for i in $(seq 1 55)
do
youtube-dl https://www.bilibili.com/video/av90163846?p=$i -o "%(title)s-%(id)s-$i.%(ext)s"
done
It surely can download video on bilibili,but all of downloaded video have different name and same content,all the content are the same as the first episode,have a try and check ,you will find that fact.

This error occurs because youtube-dl ignores URI parameters after ? for the filename, so the next file it tries to download has the same name with the previous one and it fails because a file already exists with that name. The solution is to use the --output template filesystem option to set a filename which it'll have an index in its name using the variable i.
Filesystem Options
-o, --output TEMPLATE Output filename template, see the "OUTPUT
TEMPLATE" for all the info
OUTPUT TEMPLATE
The -o option allows users to indicate a
template for the output file names.
The basic usage is not to set any template arguments when downloading
a single file, like in youtube-dl -o funny_video.flv "https://some/video". However, it may contain special sequences that
will be replaced when downloading each video. The special sequences
may be formatted according to python string formatting operations. For
example, %(NAME)s or %(NAME)05d. To clarify, that is a percent symbol
followed by a name in parentheses, followed by formatting operations.
Allowed names along with sequence type are:
id (string): Video identifier
title (string): Video title
url (string): Video URL
ext (string): Video filename extension
...
For your case, to use the i in the output filename, you can use something like this:
for i in $(seq 1 55)
do
youtube-dl https://www.bilibili.com/video/av90163846?p=$i -o "%(title)s-%(id)s-$i.%(ext)s"
done
which will use the title the id the i variable for indexing and the ext for the video extension.
You can check the Output Template variables for more options defining the filename.
UPDATE
Apparently, bilibili.com has some Javascript involved to setup the video player and fetch the video files. There is no way so you can extract the whole playlist using youtube-dl. I suggest you use Lux which supports Bilibili playlists out of the box. It has installers for all major operating systems and you can use it like this to download the whole playlist:
lux -p https://www.bilibili.com/video/av90163846
of if you want to download only until 55 video, you can use -end 55 cli option like this:
lux -end 55 -p https://www.bilibili.com/video/av90163846
You can use the -start, -end or -items option to specify the download
range of the list:
-start
Playlist video to start at (default 1)
-end
Playlist video to end at
-items
Playlist video items to download. Separated by commas like: 1,5,6,8-10
For bilibili playlists only:
-eto
File name of each bilibili episode doesn't include the playlist title
If you want to only get information of a playlist without downloading files, then use the -i command line option like this:
lux -i -p https://www.bilibili.com/video/av90163846
will output something like this:
Site: 哔哩哔哩 bilibili.com
Title: 【合集300集全】地道美音 美国中小学教学 自然科学 社会常识 P1 【001】Parts of Plants
Type: video
Streams: # All available quality
[64] -------------------
Quality: 高清 720P
Size: 308.24 MiB (323215935 Bytes)
# download with: lux -f 64 ...
[32] -------------------
Quality: 清晰 480P
Size: 201.57 MiB (211361230 Bytes)
# download with: lux -f 32 ...
[16] -------------------
Quality: 流畅 360P
Size: 124.75 MiB (130809508 Bytes)
# download with: lux -f 16 ...
Site: 哔哩哔哩 bilibili.com
Title: 【合集300集全】地道美音 美国中小学教学 自然科学 社会常识 P2 【002】Life Cycle of a Plant
Type: video
Streams: # All available quality
[64] -------------------
Quality: 高清 720P
Size: 227.75 MiB (238809781 Bytes)
# download with: lux -f 64 ...
[32] -------------------
Quality: 清晰 480P
Size: 148.96 MiB (156191413 Bytes)
# download with: lux -f 32 ...
[16] -------------------
Quality: 流畅 360P
Size: 94.82 MiB (99425641 Bytes)
# download with: lux -f 16 ...

Related

name input/output files in snakemake according to variable (not wildcard) in config.yaml

I am trying to edit and run a snakemake pipeline. In a nutshell, the snakemake pipeline calls a default genome aligner (minimap) and produces output files with this name. I am trying to add a variable aligner to config.yaml to specify the aligner I want to call. Also (where I am actually stuck), the output files should have the name of the aligner specified in config.yaml.
My config.yaml looks like this:
# this config.yaml is passed to Snakefile in pipeline-structural-variation subfolder.
# Snakemake is run from this pipeline-structural-variation folder; it is necessary to
# pass an appropriate path to the input-files (the ../ prefix is sufficient for this demo)
aligner: "ngmlr" # THIS IS THE VARIABLE I AM ADDING TO THIS FILE. VALUES COULD BE minimap or ngmlr
# FASTQ file or folder containing FASTQ files
# check if this has to be gzipped
input_fastq: "/nexusb/Gridion/20190917PGD2staal2/PD170815/PD170815_cat_all.fastq.gz" # original is ../RawData/GM24385_nf7_chr20_af.fastq.gz
# FASTA file containing the reference genome
# note that the original reference sequence contains only the sequence of chr20
reference_fasta: "/nexus/bhinckel/19/ONT_projects/PGD_breakpoint/ref_hg19_local/hg19_chr1-y.fasta" # original is ../ReferenceData/human_g1k_v37_chr20_50M.fasta
# Minimum SV length
min_sv_length: 300000 # original value was 40
# Maximum SV length
max_sv_length: 1000000 # original value was 1000000. Note that the value I used to run the pipeline for the sample PD170677 was 100000000000, which will be coerced to NA in the R script (/home/bhinckel/ont_tutorial_sv/ont_tutorial_sv.R)
# Min read length. Shorter reads will be discarded
min_read_length: 1000
# Min mapping quality. Reads will lower mapping quality will be discarded
min_read_mapping_quality: 20
# Minimum read support required to call a SV (auto for auto-detect)
min_read_support: 'auto'
# Sample name
sample_name: "PD170815" # original value was GM24385.nf7.chr20_af. Note that this can be a list
I am posting below the sections of my snakefile which generate output files with the extension _minimap2.bam, which I would like to replace by either _minimap2.bam or _ngmlr.bam, depending on aligner on config.yaml
# INPUT BAM folder
bam = None
if "bam" in config:
bam = os.path.join(CONFDIR, config["bam"])
# INPUT FASTQ folder
FQ_INPUT_DIRECTORY = []
if not bam:
if not "input_fastq" in config:
print("\"input_fastq\" not specified in config file. Exiting...")
FQ_INPUT_DIRECTORY = os.path.join(CONFDIR, config["input_fastq"])
if not os.path.exists(FQ_INPUT_DIRECTORY):
print("Could not find {}".format(FQ_INPUT_DIRECTORY))
MAPPED_BAM = "{sample}/alignment/{sample}_minimap2.bam" # Original
#MAPPED_BAM = "{sample}/alignment/{sample}_{alignerName}.bam" # this did not work
#MAPPED_BAM = f"{sample}/alignment/{sample}_{config['aligner']}.bam" # this did nor work either
else:
MAPPED_BAM = find_file_in_folder(bam, "*.bam", single=True)
...
if config['aligner'] == 'minimap':
rule index_minimap2:
input:
REF = FA_REF
output:
"{sample}/index/minimap2.idx"
threads: config['threads']
conda: "env.yml"
shell:
"minimap2 -t {threads} -ax map-ont --MD -Y {input.REF} -d {output}"
rule map_minimap2:
input:
FQ = FQ_INPUT_DIRECTORY,
IDX = rules.index_minimap2.output,
SETUP = "init"
output:
BAM = "{sample}/alignment/{sample}_minimap2.bam",
BAI = "{sample}/alignment/{sample}_minimap2.bam.bai"
conda: "env.yml"
threads: config["threads"]
shell:
"cat_fastq {input.FQ} | minimap2 -t {threads} -K 500M -ax map-ont --MD -Y {input.IDX} - | samtools sort -# {threads} -O BAM -o {output.BAM} - && samtools index -# {threads} {output.BAM}"
else:
print(f"Aligner is {config['aligner']} - skipping indexing step for minimap2")
rule map_ngmlr:
input:
REF = FA_REF,
FQ = FQ_INPUT_DIRECTORY,
SETUP = "init"
output:
BAM = "{sample}/alignment/{sample}_minimap2.bam",
BAI = "{sample}/alignment/{sample}_minimap2.bam.bai"
conda: "env.yml"
threads: config["threads"]
shell:
"cat_fastq {input.FQ} | ngmlr -r {input.REF} -t {threads} -x ont - | samtools sort -# {threads} -O BAM -o {output.BAM} - && samtools index -# {threads} {output.BAM}"
I initially tried to create a alignerName parameter, similar to the sample parameter, as shown below:
# Parameter: sample_name
sample = "sv_sample01"
if "sample_name" in config:
sample = config['sample_name']
###############
#
# code below created by me
#
###############
# Parameter: aligner_name
alignerName = "defaultAligner"
if "aligner" in config:
alignerName = config['aligner']
Then I tried to input {alignerName} wherever I have minimap2 on my input/ output files (see commented MAPPED_BAM variable definition above), though this is throwing an error. I guess snakemake will interpret {alignerName} as a wildcard, though what I want is simply to pass the variable name defined in config['aligner'] to input/ output files. I also tried with f-string (MAPPED_BAM = f"{sample}/alignment/{sample}_{config['aligner']}.bam"), though I guess this it did not work either.
You are close!
The way wildcards work in snakemake is they get interpreted 'last', while f-strings get interpreted first. To not interpret a curly brace in an f-string you can escape it with another curly brace, like so:
print(f"{{keep curly}}")
>>> {keep curly}
So all we need to do is
MAPPED_BAM = f"{{sample}}/alignment/{{sample}}_{config['aligner']}.bam"

another LaTeX Error: Missing \begin{document} in rmarkdown

I'm automating a pdf report using rmarkdown. I use a macro to run the code. I can run the code once and it works with no problems. When I call the macro again, it appears to work but when creating a pdf, I get the error "LaTeX Error: Missing \begin{document}"
This is what I get the first time:
output file: L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.knit.md
"C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS "L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.utf8.md" --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output pandoc9e03c3032cf.tex --template "C:\Users\Mortond\Documents\R\win-library\3.5\rmarkdown\rmd\latex\default-1.17.0.2.tex" --highlight-style tango --latex-engine xelatex --variable graphics=yes --variable "geometry:margin=1in" --variable "compact-title:yes" --include-in-header "C:\Users\Mortond\AppData\Local\Temp\Rtmp8cWvvQ\rmarkdown-str9e022b75c22.html"
Output created: Report-254-225573.pdf
The second time, I call the same code but only change the report name, so the data is the same and I get.
output file: L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.knit.md
"C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS "L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.utf8.md" --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output pandoc9e01f0a74c5.tex --template "C:\Users\Mortond\Documents\R\win-library\3.5\rmarkdown\rmd\latex\default-1.17.0.2.tex" --highlight-style tango --latex-engine xelatex --variable graphics=yes --variable "geometry:margin=1in" --variable "compact-title:yes"
! LaTeX Error: Missing \begin{document}.
Error: Failed to compile Report-253-225573.tex. See Report-253-225573.log for more info.
my YAML is
---
title: ''
header-includes:
- \usepackage{fancyhdr}
- \addtolength{\headheight}{1.0cm} % make more space for the header
- \pagestyle{fancyplain} % use fancy for all pages except chapter start
- \lhead{\includegraphics[height=1.2cm]{TJC_logo_color.png}} % left logo
- \renewcommand{\headrulewidth}{0pt} % remove rule below header
output:
pdf_document:
latex_engine: xelatex
word_document: default
html_document: default
urlcolor: blue
classoption: landscape
---
my code that calls the markdown is :
render_report = function(b,h,p) {
rmarkdown::render(
"L:/Statunit/morton/NCC R markdown reports/NCC Dashboard Report
Dave.Rmd", params = list(
b1 = b,
h1 = h,
p1 = p
),
output_file = paste0("Report-", h, "-", p, ".pdf")
)
}
render_report(b="xxxx Hospital, Inc.",h='253',p='225573')
The log file with the error part is.
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\graphics-def\xet
ex.def"
File: xetex.def 2017/06/24 v5.0h Graphics/color driver for xetex
))
\Gin#req#height=\dimen160
\Gin#req#width=\dimen161
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\oberdiek\grffile
.sty"
Package: grffile 2017/06/30 v1.18 Extended file name support for graphics (HO)
Package grffile Info: Option `multidot' is set to `true'.
Package grffile Info: Option `extendedchars' is set to `false'.
Package grffile Info: Option `space' is set to `true'.
Package grffile Info: \Gin#ii of package `graphicx' fixed on input line 494.
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\parskip\parskip.
sty"
Package: parskip 2018-08-24 v2.0a non-zero parskip adjustments
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\titling\titling.
sty"
Package: titling 2009/09/04 v2.1d maketitle typesetting
\thanksmarkwidth=\skip53
\thanksmargin=\skip54
\droptitle=\skip55
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\fancyhdr\fancyhd
r.sty"
Package: fancyhdr 2017/06/30 v3.9a Extensive control of page headers and footer
s
\f#nch#headwidth=\skip56
\f#nch#O#elh=\skip57
\f#nch#O#erh=\skip58
\f#nch#O#olh=\skip59
\f#nch#O#orh=\skip60
\f#nch#O#elf=\skip61
\f#nch#O#erf=\skip62
\f#nch#O#olf=\skip63
\f#nch#O#orf=\skip64
)
! LaTeX Error: Missing \begin{document}.
See the LaTeX manual or LaTeX Companion for explanation.
Type H <return> for immediate help.
...
l.90 \addtolength{\headheight}{1.0cm} \%
make more space for the header
Here is how much of TeX's memory you used:
22493 strings out of 427767
408844 string characters out of 3146884
530389 words of memory out of 3000000
26423 multiletter control sequences out of 15000+200000
532722 words of font info for 28 fonts, out of 3000000 for 9000
1328 hyphenation exceptions out of 8191
45i,0n,68p,816b,443s stack positions out of 5000i,500n,10000p,200000b,50000s
No pages of output.
So why does it work once and not a second time? If I exit RStudio and then s start it up again, it appears to work. I've tried to
.rs.restartR() to no avail. as well as
rm(list = ls(envir = globalenv()),envir = globalenv())
gc() to clean things up.
Any thoughts? I appreciate reading through all this.
I do not know if I had the same issue. But I experienced that if I compiled my document it worked the first time and failed the second. I suspected a cache issue and added cache.rebuild=T to
<<echo=F, cache=T, message=F, warning=F, `>>=
set_parent('../../Parent.Rnw')
#
Just FIY, the header of the parent does not only include the latex info but also sources my main .R file with the calculations.
Anyway, if someone experience a similar problem try to add cache.rebuild=T to your included script(s).

youtube-dl error:Cannot download a video and extract audio into the same file

I used the exact same youtube-dl command without the playlist option to download individual audio files, and it worked. But when I use it for this playlist, I get an error: Cannot download a video and extract audio into the same file! Use "(ext)s.%(ext)s" instead of "(ext)s" as the output template
Running on windows 10. Any help would be greatly appreciated!!
PS C:\xxx\FFMPEG> .\YouTubeBatchAudioPlaylistIndexes.bat
C:\xxx\FFMPEG>call bin\youtube-dl.exe -x --audio-format "mp3" --audio-quality 3 --batch-file="songs.txt" --playlist-items 4,6,7,8,10,11,16,17,20,21,23,25,27,28,31,33,36,38,39,41,43,45,46,48,50 -o"C:\Users\xxx\Downloads\%(title)s.%(ext)s" --verbose
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-x', '--audio-format', 'mp3', '--audio-quality', '3', '--batch-file=songs.txt', '--playlist-items', '4,6,7,8,10,11,16,17,20,21,23,25,27,28,31,33,36,38,39,41,43,45,46,48,50', '-oC:\\Users\\xxx\\Downloads\\(ext)s', '--verbose']
[debug] Batch file urls: ['https://www.youtube.com/watch?v=anurOHpo0aY&index=4&list=PLlRluznmnq9f7OMI4avwFyV2xMVxlV3_w&t=0s']
Usage: youtube-dl.exe [OPTIONS] URL [URL...]
youtube-dl.exe: error: Cannot download a video and extract audio into the same file! Use "C:\Users\xxx\Downloads\(ext)s.%(ext)s" instead of "C:\Users\xxx\Downloads\(ext)s" as the output template
If you look at the output, you see that the percent signs in your output template were gobbled up:
(...) '-oC:\\Users\\xxx\\Downloads\\(ext)s', '--verbose']
That is because in a batch file, you need to write %% if you want a percent sign, and double that again for call, like this:
call bin\youtube-dl.exe -x --audio-format "mp3" --audio-quality 3 ^
--batch-file="songs.txt" --playlist-items ^
4,6,7,8,10,11,16,17,20,21,23,25,27,28,31,33,36,38,39,41,43,45,46,48,50 ^
-o "C:\Users\xxx\Downloads\%%%%(title)s.%%%%(ext)s" --verbose

telegraf - exec plugin - aws ec2 ebs volumen info - metric parsing error, reason: [missing fields] or Errors encountered: [ invalid number]

Machine - CentOS 7.2 or Ubuntu 14.04/16.xx
Telegraf version: 1.0.1
Python version: 2.7.5
Telegraf supports an INPUT plugin named: exec. First please see EXAMPLE 2 in the README doc there. I can't use JSON format as it only consumes Numeric values for metrics. As per the docs:
If using JSON, only numeric values are parsed and turned into floats. Booleans and strings will be ignored.
So, the idea is simple, you specify a script in exec plugin section, which should spit some meaningful info(in either JSON -or- influx data format in my case as I have some metrics which contains non-numeric values) which you would want to catch/show somewhere in a cool dashboard like for example Wavefront Dashboard shown here:
:
Basically one can use these metrics, tags, sources from where these metrics are coming from to find out various info about memory, cpu, disk, networking, other meaningful info and also create alerts using those if something unwanted happens.
OK, I came up with this python script available here:
#!/usr/bin/python
# sudo pip install boto3 if you don't have it on your machine.
import boto3
def generate(key, value):
"""
Creates a nicely formatted Key(Value) item for output
"""
return '{}="{}"'.format(key, value)
#return '{}={}'.format(key, value)
def main():
ec2 = boto3.resource('ec2', region_name="us-west-2")
volumes = ec2.volumes.all()
for vol in volumes:
# You don't need to wrap everything in `str` unless it is not a string
# By default most things will come back as a string
# unless they are very obviously not (complex, date time, etc)
# but since we are printing these (and formatting them into strings)
# the cast to string will be implicit and we don't need to make it
# explicit
# vol is already a fully returned volume you are essentially DOUBLING
# your API calls when you do this
#iv = ec2.Volume(vol.id)
output_parts = [
# Volume level details
generate('create_time', vol.create_time),
generate('availability_zone', vol.availability_zone),
generate('volume_id', vol.volume_id),
generate('volume_type', vol.volume_type),
generate('state', vol.state),
generate('size', vol.size),
generate('iops', vol.iops),
generate('encrypted', vol.encrypted),
generate('snapshot_id', vol.snapshot_id),
generate('kms_key_id', vol.kms_key_id),
]
for _ in vol.attachments:
# Will get any attachments and since it is a list
# we should write this to handle MULTIPLE attachments
output_parts.extend([
generate('InstanceId', _.get('InstanceId')),
generate('InstanceVolumeState', _.get('State')),
generate('DeleteOnTermination', _.get('DeleteOnTermination')),
generate('Device', _.get('Device')),
])
# only process when there are tags to process
if vol.tags:
for _ in vol.tags:
# Get all of the tags
output_parts.extend([
generate(_.get('Key'), _.get('Value')),
])
# output everything at once..
print ','.join(output_parts)
if __name__ == '__main__':
main()
This script will talk to AWS EC2 EBS volumes and outputs all values it can find (usually what you see in AWS EC2 EBS volume console) and format that info into a meaningful CSV format which I'm redirecting to a .csv log file.
We don't want to run the python script all the time (AWS API limits / cost factor).
So, once the .csv file is created, I created this small shell script which I'll set in Telegraf's exec plugin's section.
Shell script /tmp/aws-vol-info.sh set in Telegraf exec plugin is:
#!/bin/bash
cat /tmp/aws-vol-info.csv
Telegraf configuration file created using exec plugin (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf):
#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec
[[inputs.exec]]
commands = ["/tmp/aws-vol-info.sh"]
## Timeout for each command to complete.
timeout = "5s"
# Data format to consume.
# NOTE json only reads numerical measurements, strings and booleans are ignored.
data_format = "influx"
name_suffix = "_telegraf_execplugin"
I tweaked the .py (Python script for generate function) to generate the following three type of output formats (.csv file) and wanted to test how telegraf would handle this data before I enable the config file (/etc/telegraf/telegraf.d/catch-aws-ebs-info.conf) and restart telegraf service.
Format 1: (with double quotes " wrapped for every value)
create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"
Testing telegraf configuration on the telegraf directory gives me the following error.
Command: $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:37:48 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:37:48Z E! Errors encountered: [ metric parsing error, reason: [invalid field format], buffer: [create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"], index: [372]]
[vagrant#myvagrant ~] $
Format 2: (without any " double quotes)
create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app
Getting same error while testing Telegraf's configuration for exec plugin:
2017/03/10 00:45:01 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:45:01Z E! Errors encountered: [ metric parsing error, reason: [invalid value], buffer: [create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [63]]
Format 3: (this format doesn't have any " double quote and space character in the values). Substituted space with _ character.
create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app
Still didn't work, getting same error:
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:50:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:50:30Z E! Errors encountered: [ metric parsing error, reason: [missing fields], buffer: [create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [476]]
Format 4: If I follow influx line protocol as per this page: https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/
awsebs,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1
I'm getting this error:
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 02:34:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T02:34:30Z E! Errors encountered: [ invalid number]
HOW can I get rid of this error and get telegraf to work with exec plugin (which runs the .sh script)?
Other Info:
Python script will run once/twice per day (via cron) and telegraf will run every 1 minute (to run exec plugin - which runs .sh script - which will cat the .csv file so that telegraf can consume it in influx data format).
https://galaxy.ansible.com/wavefrontHQ/wavefront-ansible/
https://github.com/influxdata/telegraf/issues/2525
It seems like the rules are very strict, I should have looked more closely.
Syntax of the output of any program that you can to consume MUST match or follow INFLUX LINE PROTOCOL format shown below and also all the RULES which comes with it.
For ex:
weather,location=us-midwest temperature=82 1465839830100400200
| -------------------- -------------- |
| | | |
| | | |
+-----------+--------+-+---------+-+---------+
|measurement|,tag_set| |field_set| |timestamp|
+-----------+--------+-+---------+-+---------+
You can read more about what's measurement, tag, field and optional(timestamp) here: https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/
Important rules are:
1) There must be a , and no space between measurement and tag set.
2) There must be a space between tag set and field set.
3) For tag keys, tag values, and field keys always use a backslash character \ to escape if you want to escape any character in measurement name, tag or field set name and their values!
4) You can't escape \ with \
5) Line Protocol handles emojis with no problem :)
6) TAG / TAG set (tags comma separated) in OPTIONAL
7) FIELD / FIELD set (fields, comma separated) - At least ONE is required per line.
8) TIMESTAMP (last value shown in the format) is OPTIONAL.
9) VERY IMPORTANT QUOTING rules are below:
a) Never double or single quote the timestamp. It’s not valid Line Protocol. '123123131312313' or "1231313213131" won't work if that # is valid.
b) Never single quote field values (even if they’re strings!). It’s also not valid Line Protocol. i.e. fieldname='giga' won't work.
c) Do not double or single quote measurement names, tag keys, tag values, and field keys. NOTE: THIS does say !!! tag values !!!! so careful.
d) Do not double quote field values that are ONLY in floats, integers, or booleans format, otherwise InfluxDB will assume that those values are strings.
e) Do double quote field values that are strings.
f) AND the MOST IMPORTANT one (which will save you from getting BALD): If a FIELD value is set without double quote / i.e. you think it's an integer value or float in one line (for ex: anyone will say fields size or iops) and in some other lines (anywhere in the file that telegraf will read/parse using exec plugin) if you have a non-integer value set (i.e. a String), then you'll get the following error message Errors encountered: [ invalid number error.
So to fix it, the RULE is, if any possible FIELD value for a FIELD key is a string, then you MUST make sure to use " to wrap it (in every lines), it doesn't matter whether it has value 1, 200 or 1.5 in some lines (for ex: iops can be 1, 5) and in some other lines that value (iops can be None).
Error message: Errors encountered: [ invalid number
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 11:13:18 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T11:13:18Z E! Errors encountered: [ invalid number metric parsing error, reason: [invalid field format], buffer: [awsebsvol,host=myvagrant ], index: [25]]
So, after all this learning, it's clear that first I was missing the Influx Line protocol format and ALSO the RULES!!
Now, my output that I want my python script to generate should be like this (acc. to the INFLUX LINE PROTOCOL). You can just change the .sh file and use sed "s/^/awsec2ebs,/" or also do sed "s/^/awsec2ebs,sourcehost=$(hostname) /" (note: the space before the closing sed / character) and then you can have " around any key=value pair. I did change .py file to not use " for size and iops fields.
Anyways, if the output is something like this:
awsec2ebs,volume_id=vol-058e1d47dgh721121 create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"
In the above final working solution, I created a measurement named awsec2ebs then gave , between this measurement and tag key volume_id and for tag value, I did NOT use any ' or " quotes and then I gave a space character (as I just wanted only one tag for now otherwise you can have more tag using command separated way and following the rules) between tag set and field set.
Finally ran the command:
$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec which worked like a shenzi!
2017/03/10 03:33:54 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
> awsec2ebs_telegraf_execplugin,volume_id=vol-058e1d47dgh721121,host=myvagrant volume_type="gp2",iops="100",kms_key_id="None",role="app",size="8",encrypted="False",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",Name="[company-2b-app90] secondary",snapshot_id="snap-06h1h1b91bh662avn",DeleteOnTermination="True",mirror="secondary",cluster="company",autoscale="true",high_availability="1",create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",state="in-use",Device="/dev/sda1",hostname="company-2b-app90-i-0jjb1boop26f42f50" 1489116835000000000
[vagrant#myvagrant ~] $ echo $?
0
In the above example, size is the only field which will always be a number/numeric value, so we don't need to wrap it with " but it's up to you. Recall the MOST IMPORTANT rule.. above and the error it generates.
So final python file is:
#!/usr/bin/python
#Do `sudo pip install boto3` first
import boto3
def generate(key, value, qs, qe):
"""
Creates a nicely formatted Key(Value) item for output
"""
return '{}={}{}{}'.format(key, qs, value, qe)
def main():
ec2 = boto3.resource('ec2', region_name="us-west-2")
volumes = ec2.volumes.all()
for vol in volumes:
# You don't need to wrap everything in `str` unless it is not a string
# By default most things will come back as a string
# unless they are very obviously not (complex, date time, etc)
# but since we are printing these (and formatting them into strings)
# the cast to string will be implicit and we don't need to make it
# explicit
# vol is already a fully returned volume you are essentially DOUBLING
# your API calls when you do this
#iv = ec2.Volume(vol.id)
output_parts = [
# Volume level details
generate('volume_id', vol.volume_id, '"', '"'),
generate('create_time', vol.create_time, '"', '"'),
generate('availability_zone', vol.availability_zone, '"', '"'),
generate('volume_type', vol.volume_type, '"', '"'),
generate('state', vol.state, '"', '"'),
generate('size', vol.size, '', ''),
#The following vol.iops variable can be a number or None so you must wrap it with double quotes otherwise "invalid number" error will come.
generate('iops', vol.iops, '"', '"'),
generate('encrypted', vol.encrypted, '"', '"'),
generate('snapshot_id', vol.snapshot_id, '"', '"'),
generate('kms_key_id', vol.kms_key_id, '"', '"'),
]
for _ in vol.attachments:
# Will get any attachments and since it is a list
# we should write this to handle MULTIPLE attachments
output_parts.extend([
generate('InstanceId', _.get('InstanceId'), '"', '"'),
generate('InstanceVolumeState', _.get('State'), '"', '"'),
generate('DeleteOnTermination', _.get('DeleteOnTermination'), '"', '"'),
generate('Device', _.get('Device'), '"', '"'),
])
# only process when there are tags to process
if vol.tags:
for _ in vol.tags:
# Get all of the tags
output_parts.extend([
generate(_.get('Key'), _.get('Value'), '"', '"'),
])
# output everything at once..
print ','.join(output_parts)
if __name__ == '__main__':
main()
Final aws-vol-info.sh is:
#!/bin/bash
cat aws-vol-info.csv | sed "s/^/awsebsvol,host=`hostname|head -1|sed "s/[ \t][ \t]*/_/g"` /"
Final telegraf exec plugin config file is (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf) give any name with .conf:
#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec
[[inputs.exec]]
commands = ["/some/valid/path/where/csvfileexists/aws-vol-info.sh"]
## Timeout for each command to complete.
timeout = "5s"
# Data format to consume.
# NOTE json only reads numerical measurements, strings and booleans are ignored.
data_format = "influx"
name_suffix = "_telegraf_exec"
Run: and everything will work now!
$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec

Garbled text when constructing emails with vmime

Hey, my Qt C++ program has a part where it needs to send the first 128 characters or so of the output of a bash command to an email address. The output from the tty is captured in a text box in my gui called textEdit_displayOutput and put into my message I built using the Message Builder ( the object m_vmMessage ) Here is the relevant code snippet:
m_vmMessage.getTextPart()->setCharset( vmime::charsets::US_ASCII );
m_vmMessage.getTextPart()->setText( vmime::create < vmime::stringContentHandler > ( ui->textEdit_displayOutput->toPlainText().toStdString() ) );
vmime::ref < vmime::message > msg = m_vmMessage.construct();
vmime::utility::outputStreamAdapter out( std::cout );
msg->generate( out );
Giving bash 'ls /' and a newline makes vmime give terminal output like this:
ls /=0Abin etc=09 initrd.img.old mnt=09 sbin=09 tmp=09 vmlinuz.o=
ld=0Aboot farts=09 lib=09=09 opt=09 selinux usr=0Acdrom home=09 =
lost+found=09 proc srv=09 var=0Adev initrd.img media=09 root =
Whereas it should look more like this:
ls /
bin etc initrd.img.old mnt sbin tmp vmlinuz.old
boot farts lib opt selinux usr
cdrom home lost+found proc srv var
dev initrd.img media root sys vmlinuz
18:22>
Output seems to be truncated around 'root', nothing after it is displayed.
How do I encode and piece together the email properly? Does vmime just display it like that on purpose and the actual content of the email is complete and properly formatted?
Thanks!
=0A is a line feed (LF) character.
=09 is a horizontal tab (HT).
I think this is just MIME's way of encoding your non-printing (control) characters.