Pandas pattern match and csv conversion of dataFrame - regex

I have the below code which i'm using to parse my data from the text file thats contains the Multiple Fields and hundreds of columns names where i'm choosing the required fields at the time of pandas processing via read_csv which works fine, it it only works with encoding='cp1252' .
There are five key fields which i'm looking for as ['Hostname', 'IP Address', 'Aux Site', 'OS Version', 'Network Name'],
In the pattern section which i'm using a variable patt i'm looking for key words/strings as "AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora|\?" which i beleive doesn't care about the case sensitiveness.
which is get matched into the column OS Version but i'm using the litral ? mark to match the ? which is working but at the same time it also gets the Windows 10 ??? which i only wants ? if its there in the OS Version field.
Secondly, When its converting the df2.to_csv the columns are not delimited rather coming into one which later i'm delimitting manually, How we can ensure that each field is correctly process as a CSV file.
#!/python/v3.6.1/bin/python3
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##################### END OF THE Display Settings ###################
patt = "AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora|\?"
col_names = ['Hostname', 'IP Address', 'Aux Site', 'CPU Model', 'CDN Version', 'OS Version', 'Kernel Version', 'LDAP Profile', 'Network Name']
df1 = pd.read_csv('/home/karn/plura/Test/Python_Panda/host.txt', delimiter = "\t", usecols=col_names, encoding='cp1252', dtype='unicode')
df2 = df1[df1['OS Version'].str.contains(patt, na=False)][['Hostname', 'IP Address', 'Aux Site', 'OS Version', 'Network Name']]
df2['Hostname'] = df2['Hostname'].str.replace("*", "")
df2.to_csv("HostList_from_Surveys.csv", sep='\t', encoding='utf-8', index=False)
Below is the data sample Image for view:
Below is again same data in text format in case to reproduce.
Hostname IP Address Aux Site OS Version Network Name
host01 192.168.1.1 yoko RHEL 5.5 CISCO
host02 192.168.1.2 chelmsford AIX 6.1
host03 192.168.1.3 sanjose RHEL 5.5
host04 192.168.1.4 rosh CentOS 6.8 CISCO
host05 192.168.1.5 noida3 CentOS 5.10 CISCO
host06 192.168.1.6 rosh RHEL 6.5 CISCO
host07 192.168.1.7 noida3 RHEL 6.5 CISCO
host08 192.168.1.8 san jose RHEL 6.5 CISCO
host09 192.168.1.9 noida3 RHEL 5.5
host10 192.168.1.10 sophia RHEL 5.5 AVAYA
host11 192.168.1.11 sanjose RHEL 5.5 AVAYA
host12 192.168.1.12 sanjose RHEL 5.3 AVAYA
host13 192.168.1.13 sanjose RHEL 5.8 AVAYA
host14 192.168.1.14 sanjose Ubuntu 14.04.1
any help will be much appreciated.

I suggest you use
patt = "(?s)AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora|(?<!\?)\?(?!\?)"
This pattern matches
(?s) - the re.DOTALL option inline equivalent that makes . match line break chars
AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora - one of the substring alternatives
| - or
(?<!\?)\?(?!\?) - a question mark not enclosed with other question marks.

Related

How to get the specific content from the log file described bellow?

I Have a log file which is generated by nmap, which is something like this:
Nmap scan report for gateway (10.0.0.1)
Host is up (0.0060s latency).
MAC Address: 10:BE:F5:FC:9C:65 (D-Link International)
Nmap scan report for 10.0.0.2
Host is up (0.055s latency).
MAC Address: 7C:78:7E:E8:1C:2A (Samsung Electronics)
Nmap scan report for 10.0.0.3
Host is up (0.059s latency).
MAC Address: 54:60:09:83:6E:B6 (Google)
Nmap scan report for 10.0.0.200
Host is up (-0.093s latency).
MAC Address: 5C:B9:01:02:5F:D8 (Hewlett Packard)
Nmap scan report for manoj-notebook (10.0.0.4)
Host is up.
Nmap done: 256 IP addresses (5 hosts up) scanned in 16.84 seconds
It keeps on changing as the new devices connect to the network or existing device disconnects from the network. I want to fetch the ip address example: 10.0.0.1, mac address example: 10:BE:F5:FC:9C:65 and the device name example: D-Link International in a single list something like:
result = [['10.0.0.1', '10.0.0.2', '10.0.0.3', '10.0.0.200', '10.0.0.4'], ['10:BE:F5:FC:9C:65', '7C:78:7E:E8:1C:2A', '54:60:09:83:6E:B6', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Samsung Electronics', 'Google', 'Hewlett Packard']]
I tried the following regular expression to match IP address, MAC Address and Device name:
ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern = re.findall(r'(?:.*?s: ){2}(.*)(?= \))', temp)
devicePattern = re.findall(r'(?:.*?\(){2}(.*)(?=\))', temp)
I'm able to match the IP Address but unable to match mac address and device name. How to match the same and store it in a single list? Thank you.
Also if I could get a pattern to fetch latency from the log file example: 0.0060s it would be a cherry on top. Thank you.
You can use the following expressions:
ipPattern : \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
macPattern : (?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b
(?:[0-9A-F]{2}:)+ Non capturing group for sequence of pairs of alphanumerical values followed by :.
[0-9A-F]+\b Final pair of alphanumerical value, followed by word boundary.
devicePattern : (?<=\()[^)0-9.]*(?=\))
(?<=\() Negative lookbehind for bracket ).
[^)0-9.]* Negated character set, matches anything that is not a ) or . or digits.
(?=\)) Positive lookahead for ).
latency : -?\d+\.\d+s(?=\slatency)
-?\d+\.\d+s Match - optionally, digits, full stop, more digits and s.
(?=\slatency) Positive lookahead, assert that what follows whitespace and latency.
Python snippet:
import re
import itertools
temp = """
b'\nStarting Nmap 7.60 ( https://nmap.org ) at 2018-08-03 19:44 IST\nNmap scan report for gateway (10.0.0.1)\nHost is up (0.0070s latency).\nMAC Address: 10:BE:F5:FC:9C:65 (D-Link International)\nNmap scan report for 10.0.0.3\nHost is up (0.11s latency).\nMAC Address: 54:60:09:83:6E:B6 (Google)\nNmap scan report for 10.0.0.5\nHost is up (0.11s latency).\nMAC Address: 7C:78:7E:A4:73:8C (Samsung Electronics)\nNmap scan report for 10.0.0.200\nHost is up (0.027s latency).\nMAC Address: 5C:B9:01:02:5F:D8
"""
ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern= re.findall(r'(?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b',temp)
devicePattern = re.findall(r'(?<=\()[^)0-9.]*(?=\))',temp)
latency = re.findall(r'-?\d+\.\d+s(?=\slatency)',temp)
print(ipPattern)
print(macPattern)
print(devicePattern)
print(latency)
Prints:
['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200']
['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8']
['D-Link International', 'Google', 'Samsung Electronics']
['0.0070s', '0.11s', '0.11s', '0.027s']
For joining in a single list use:
mylist = itertools.chain([ipPattern], [macPattern], [devicePattern], [latency])
print(list(mylist))
Prints:
[['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200'], ['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Google', 'Samsung Electronics'], ['0.0070s', '0.11s', '0.11s', '0.027s']]

telegraf - exec plugin - aws ec2 ebs volumen info - metric parsing error, reason: [missing fields] or Errors encountered: [ invalid number]

Machine - CentOS 7.2 or Ubuntu 14.04/16.xx
Telegraf version: 1.0.1
Python version: 2.7.5
Telegraf supports an INPUT plugin named: exec. First please see EXAMPLE 2 in the README doc there. I can't use JSON format as it only consumes Numeric values for metrics. As per the docs:
If using JSON, only numeric values are parsed and turned into floats. Booleans and strings will be ignored.
So, the idea is simple, you specify a script in exec plugin section, which should spit some meaningful info(in either JSON -or- influx data format in my case as I have some metrics which contains non-numeric values) which you would want to catch/show somewhere in a cool dashboard like for example Wavefront Dashboard shown here:
:
Basically one can use these metrics, tags, sources from where these metrics are coming from to find out various info about memory, cpu, disk, networking, other meaningful info and also create alerts using those if something unwanted happens.
OK, I came up with this python script available here:
#!/usr/bin/python
# sudo pip install boto3 if you don't have it on your machine.
import boto3
def generate(key, value):
"""
Creates a nicely formatted Key(Value) item for output
"""
return '{}="{}"'.format(key, value)
#return '{}={}'.format(key, value)
def main():
ec2 = boto3.resource('ec2', region_name="us-west-2")
volumes = ec2.volumes.all()
for vol in volumes:
# You don't need to wrap everything in `str` unless it is not a string
# By default most things will come back as a string
# unless they are very obviously not (complex, date time, etc)
# but since we are printing these (and formatting them into strings)
# the cast to string will be implicit and we don't need to make it
# explicit
# vol is already a fully returned volume you are essentially DOUBLING
# your API calls when you do this
#iv = ec2.Volume(vol.id)
output_parts = [
# Volume level details
generate('create_time', vol.create_time),
generate('availability_zone', vol.availability_zone),
generate('volume_id', vol.volume_id),
generate('volume_type', vol.volume_type),
generate('state', vol.state),
generate('size', vol.size),
generate('iops', vol.iops),
generate('encrypted', vol.encrypted),
generate('snapshot_id', vol.snapshot_id),
generate('kms_key_id', vol.kms_key_id),
]
for _ in vol.attachments:
# Will get any attachments and since it is a list
# we should write this to handle MULTIPLE attachments
output_parts.extend([
generate('InstanceId', _.get('InstanceId')),
generate('InstanceVolumeState', _.get('State')),
generate('DeleteOnTermination', _.get('DeleteOnTermination')),
generate('Device', _.get('Device')),
])
# only process when there are tags to process
if vol.tags:
for _ in vol.tags:
# Get all of the tags
output_parts.extend([
generate(_.get('Key'), _.get('Value')),
])
# output everything at once..
print ','.join(output_parts)
if __name__ == '__main__':
main()
This script will talk to AWS EC2 EBS volumes and outputs all values it can find (usually what you see in AWS EC2 EBS volume console) and format that info into a meaningful CSV format which I'm redirecting to a .csv log file.
We don't want to run the python script all the time (AWS API limits / cost factor).
So, once the .csv file is created, I created this small shell script which I'll set in Telegraf's exec plugin's section.
Shell script /tmp/aws-vol-info.sh set in Telegraf exec plugin is:
#!/bin/bash
cat /tmp/aws-vol-info.csv
Telegraf configuration file created using exec plugin (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf):
#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec
[[inputs.exec]]
commands = ["/tmp/aws-vol-info.sh"]
## Timeout for each command to complete.
timeout = "5s"
# Data format to consume.
# NOTE json only reads numerical measurements, strings and booleans are ignored.
data_format = "influx"
name_suffix = "_telegraf_execplugin"
I tweaked the .py (Python script for generate function) to generate the following three type of output formats (.csv file) and wanted to test how telegraf would handle this data before I enable the config file (/etc/telegraf/telegraf.d/catch-aws-ebs-info.conf) and restart telegraf service.
Format 1: (with double quotes " wrapped for every value)
create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"
Testing telegraf configuration on the telegraf directory gives me the following error.
Command: $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:37:48 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:37:48Z E! Errors encountered: [ metric parsing error, reason: [invalid field format], buffer: [create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"], index: [372]]
[vagrant#myvagrant ~] $
Format 2: (without any " double quotes)
create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app
Getting same error while testing Telegraf's configuration for exec plugin:
2017/03/10 00:45:01 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:45:01Z E! Errors encountered: [ metric parsing error, reason: [invalid value], buffer: [create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [63]]
Format 3: (this format doesn't have any " double quote and space character in the values). Substituted space with _ character.
create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app
Still didn't work, getting same error:
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:50:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:50:30Z E! Errors encountered: [ metric parsing error, reason: [missing fields], buffer: [create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [476]]
Format 4: If I follow influx line protocol as per this page: https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/
awsebs,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1
I'm getting this error:
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 02:34:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T02:34:30Z E! Errors encountered: [ invalid number]
HOW can I get rid of this error and get telegraf to work with exec plugin (which runs the .sh script)?
Other Info:
Python script will run once/twice per day (via cron) and telegraf will run every 1 minute (to run exec plugin - which runs .sh script - which will cat the .csv file so that telegraf can consume it in influx data format).
https://galaxy.ansible.com/wavefrontHQ/wavefront-ansible/
https://github.com/influxdata/telegraf/issues/2525
It seems like the rules are very strict, I should have looked more closely.
Syntax of the output of any program that you can to consume MUST match or follow INFLUX LINE PROTOCOL format shown below and also all the RULES which comes with it.
For ex:
weather,location=us-midwest temperature=82 1465839830100400200
| -------------------- -------------- |
| | | |
| | | |
+-----------+--------+-+---------+-+---------+
|measurement|,tag_set| |field_set| |timestamp|
+-----------+--------+-+---------+-+---------+
You can read more about what's measurement, tag, field and optional(timestamp) here: https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/
Important rules are:
1) There must be a , and no space between measurement and tag set.
2) There must be a space between tag set and field set.
3) For tag keys, tag values, and field keys always use a backslash character \ to escape if you want to escape any character in measurement name, tag or field set name and their values!
4) You can't escape \ with \
5) Line Protocol handles emojis with no problem :)
6) TAG / TAG set (tags comma separated) in OPTIONAL
7) FIELD / FIELD set (fields, comma separated) - At least ONE is required per line.
8) TIMESTAMP (last value shown in the format) is OPTIONAL.
9) VERY IMPORTANT QUOTING rules are below:
a) Never double or single quote the timestamp. It’s not valid Line Protocol. '123123131312313' or "1231313213131" won't work if that # is valid.
b) Never single quote field values (even if they’re strings!). It’s also not valid Line Protocol. i.e. fieldname='giga' won't work.
c) Do not double or single quote measurement names, tag keys, tag values, and field keys. NOTE: THIS does say !!! tag values !!!! so careful.
d) Do not double quote field values that are ONLY in floats, integers, or booleans format, otherwise InfluxDB will assume that those values are strings.
e) Do double quote field values that are strings.
f) AND the MOST IMPORTANT one (which will save you from getting BALD): If a FIELD value is set without double quote / i.e. you think it's an integer value or float in one line (for ex: anyone will say fields size or iops) and in some other lines (anywhere in the file that telegraf will read/parse using exec plugin) if you have a non-integer value set (i.e. a String), then you'll get the following error message Errors encountered: [ invalid number error.
So to fix it, the RULE is, if any possible FIELD value for a FIELD key is a string, then you MUST make sure to use " to wrap it (in every lines), it doesn't matter whether it has value 1, 200 or 1.5 in some lines (for ex: iops can be 1, 5) and in some other lines that value (iops can be None).
Error message: Errors encountered: [ invalid number
[vagrant#myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 11:13:18 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T11:13:18Z E! Errors encountered: [ invalid number metric parsing error, reason: [invalid field format], buffer: [awsebsvol,host=myvagrant ], index: [25]]
So, after all this learning, it's clear that first I was missing the Influx Line protocol format and ALSO the RULES!!
Now, my output that I want my python script to generate should be like this (acc. to the INFLUX LINE PROTOCOL). You can just change the .sh file and use sed "s/^/awsec2ebs,/" or also do sed "s/^/awsec2ebs,sourcehost=$(hostname) /" (note: the space before the closing sed / character) and then you can have " around any key=value pair. I did change .py file to not use " for size and iops fields.
Anyways, if the output is something like this:
awsec2ebs,volume_id=vol-058e1d47dgh721121 create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"
In the above final working solution, I created a measurement named awsec2ebs then gave , between this measurement and tag key volume_id and for tag value, I did NOT use any ' or " quotes and then I gave a space character (as I just wanted only one tag for now otherwise you can have more tag using command separated way and following the rules) between tag set and field set.
Finally ran the command:
$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec which worked like a shenzi!
2017/03/10 03:33:54 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
> awsec2ebs_telegraf_execplugin,volume_id=vol-058e1d47dgh721121,host=myvagrant volume_type="gp2",iops="100",kms_key_id="None",role="app",size="8",encrypted="False",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",Name="[company-2b-app90] secondary",snapshot_id="snap-06h1h1b91bh662avn",DeleteOnTermination="True",mirror="secondary",cluster="company",autoscale="true",high_availability="1",create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",state="in-use",Device="/dev/sda1",hostname="company-2b-app90-i-0jjb1boop26f42f50" 1489116835000000000
[vagrant#myvagrant ~] $ echo $?
0
In the above example, size is the only field which will always be a number/numeric value, so we don't need to wrap it with " but it's up to you. Recall the MOST IMPORTANT rule.. above and the error it generates.
So final python file is:
#!/usr/bin/python
#Do `sudo pip install boto3` first
import boto3
def generate(key, value, qs, qe):
"""
Creates a nicely formatted Key(Value) item for output
"""
return '{}={}{}{}'.format(key, qs, value, qe)
def main():
ec2 = boto3.resource('ec2', region_name="us-west-2")
volumes = ec2.volumes.all()
for vol in volumes:
# You don't need to wrap everything in `str` unless it is not a string
# By default most things will come back as a string
# unless they are very obviously not (complex, date time, etc)
# but since we are printing these (and formatting them into strings)
# the cast to string will be implicit and we don't need to make it
# explicit
# vol is already a fully returned volume you are essentially DOUBLING
# your API calls when you do this
#iv = ec2.Volume(vol.id)
output_parts = [
# Volume level details
generate('volume_id', vol.volume_id, '"', '"'),
generate('create_time', vol.create_time, '"', '"'),
generate('availability_zone', vol.availability_zone, '"', '"'),
generate('volume_type', vol.volume_type, '"', '"'),
generate('state', vol.state, '"', '"'),
generate('size', vol.size, '', ''),
#The following vol.iops variable can be a number or None so you must wrap it with double quotes otherwise "invalid number" error will come.
generate('iops', vol.iops, '"', '"'),
generate('encrypted', vol.encrypted, '"', '"'),
generate('snapshot_id', vol.snapshot_id, '"', '"'),
generate('kms_key_id', vol.kms_key_id, '"', '"'),
]
for _ in vol.attachments:
# Will get any attachments and since it is a list
# we should write this to handle MULTIPLE attachments
output_parts.extend([
generate('InstanceId', _.get('InstanceId'), '"', '"'),
generate('InstanceVolumeState', _.get('State'), '"', '"'),
generate('DeleteOnTermination', _.get('DeleteOnTermination'), '"', '"'),
generate('Device', _.get('Device'), '"', '"'),
])
# only process when there are tags to process
if vol.tags:
for _ in vol.tags:
# Get all of the tags
output_parts.extend([
generate(_.get('Key'), _.get('Value'), '"', '"'),
])
# output everything at once..
print ','.join(output_parts)
if __name__ == '__main__':
main()
Final aws-vol-info.sh is:
#!/bin/bash
cat aws-vol-info.csv | sed "s/^/awsebsvol,host=`hostname|head -1|sed "s/[ \t][ \t]*/_/g"` /"
Final telegraf exec plugin config file is (/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf) give any name with .conf:
#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec
[[inputs.exec]]
commands = ["/some/valid/path/where/csvfileexists/aws-vol-info.sh"]
## Timeout for each command to complete.
timeout = "5s"
# Data format to consume.
# NOTE json only reads numerical measurements, strings and booleans are ignored.
data_format = "influx"
name_suffix = "_telegraf_exec"
Run: and everything will work now!
$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec

Simple Python Regex to separate the words

Image link here.
I have the following router output:
config t
Enter configuration commands, one per line. End with CNTL/Z.
n7k(config)# port-profile demo_ethernet
n7k(config-port-prof)# ?
bandwidth Set bandwidth informational parameter
beacon Disable/enable the beacon for an interface
cdp Configure CDP interface parameters
channel-group Configure port channel parameters
delay Specify interface throughput delay
description Enter port-profile description of maximum 80 characters
From the above output, I want this:
['bandwidth' , 'beacon' , 'cdp' , 'channel-group' , 'delay' , 'description']
I am trying
m = re.compile('(\w+\s\w+)')
n = m.findall(buffer)
And getting this output:
['config t', 'Enter configuration', 'one per', 'End with', 'profile demo_ethernet', 'Set bandwidth', 'informational parameter', 'enable the', 'beacon for', 'an interface', 'Configure CDP', 'interface parameters', 'Configure port', 'channel parameters', 'Specify interface', 'throughput delay', 'Enter port', 'profile description', 'of maximum', '80 characters']
This regex should work (assuming whatever line you want to match have space in starting)
^\s+([\w-]+)\s+.+$
Regex Demo
Python code (For simplicity I have taken all input in a single string)
p = re.compile(r'^\s+([\w-]+)\s+.+$', re.MULTILINE)
test_str = "config t\nEnter configuration commands, one per line. End with CNTL/Z.\nn7k(config)# port-profile demo_ethernet\nn7k(config-port-prof)# ?\n bandwidth Set bandwidth informational parameter\n beacon Disable/enable the beacon for an interface\n cdp Configure CDP interface parameters\n channel-group Configure port channel parameters\n delay Specify interface throughput delay\n description Enter port-profile description of maximum 80 characters"
print(re.findall(p, test_str))
Ideone Demo

How to extrapolate data from an nmap scan result

I'm still quite new to Python and I'm currently looking at network scanning for available hosts. With my current code, I can search an IP range to determine if hosts are available or not. However, how can I restrict what information the nmap scan results show me, or is there a function I need to be using to only show the host IP address, scan time and if its available?
#!/usr/bin/env python
import nmap
import sys
nm = nmap.PortScannerAsync()
def callback_result(host, scan_result):
print '------------------'
print host, scan_result
try:
nm.scan('192.168.1.86-87', arguments='-O -v', callback=callback_result)
while nm.still_scanning():
print('<<< Scanning >>>')
nm.wait(2)
except KeyboardInterrupt:
print 'Cancelling current operation'
sys.exit()
except KeyError as e:
pass
This provides the output which is broad and contains too much information;
192.168.1.87 {'nmap': {'scanstats': {'uphosts': u'0', 'timestr': u'Wed Apr 8 13:28:29 2015', 'downhosts': u'1', 'totalhosts': u'1', 'elapsed': u'3.77'},
'scaninfo': {u'tcp': {'services': u'1,3-4,6-7,9,13,17,19-26,30,32-33,37,42-43,49,53,70,79-85,88-90,99-100,106,109-111,113,119,125,135,139,143-
144,146,161,163,179,199,211-212,222,254-
256,259,264,280,301,306,311,340,366,389,406-407,416-417,425,427,443-445,458,464-
465,481,497,500,512-515,524,541,543-545,548,554-555,563,587,593,616-617,625,631,636,646,648,666-
668,683,687,691,700,705,711,714,720,722,726,749,765,777,783,787,800-
801,808,843,873,880,888,898,900-903,911-912,981,987,990,992-993,995,999-
1002,1007,1009-1011,1021-1100,1102,1104-1108,1110-1114,1117,1119,1121-
1124,1126,1130-1132,1137-1138,1141,1145,1147-1149,1151-1152,1154,1163-
1166,1169,1174-1175,1183,1185-1187,1192,1198-1199,1201,1213,1216-1218,1233-1234,1236,1244,1247-1248,1259,1271-1272,1277,1287,1296,1300-1301,1309-1311,1322,1328,1334,1352,1417,1433-1434,1443,1455,1461,1494,1500-1501,1503,1521,1524,1533,1556,1580,1583,1594,1600,1641,1658,1666,1687-1688,1700,1717-1721,1723,1755,1761,1782-1783,1801,1805,1812,1839-1840,1862-1864,1875,1900,1914,1935,1947,1971-1972,1974,1984,1998-2010,2013,2020-2022,2030,2033-2035,2038,2040-2043,2045-2049,2065,2068,2099-2100,2103,2105-2107,2111,2119,2121,2126,2135,2144,2160-2161,2170,2179,2190-2191,2196,2200,2222,2251,2260,2288,2301,2323,2366,2381-2383,2393-2394,2399,2401,2492,2500,2522,2525,2557,2601-2602,2604-2605,2607-2608,2638,2701-2702,2710,2717-2718,2725,2800,2809,2811,2869,2875,2909-2910,2920,2967-2968,2998,3000-3001,3003,3005-3007,3011,3013,3017,3030-3031,3052,3071,3077,3128,3168,3211,3221,3260-3261,3268-3269,3283,3300-3301,3306,3322-3325,3333,3351,3367,3369-3372,3389-3390,3404,3476,3493,3517,3527,3546,3551,3580,3659,3689-3690,3703,3737,3766,3784,3800-3801,3809,3814,3826-3828,3851,3869,3871,3878,3880,3889,3905,3914,3918,3920,3945,3971,3986,3995,3998,4000-4006,4045,4111,4125-4126,4129,4224,4242,4279,4321,4343,4443-4446,4449,4550,4567,4662,4848,4899-4900,4998,5000-5004,5009,5030,5033,5050-5051,5054,5060-5061,5080,5087,5100-5102,5120,5190,5200,5214,5221-5222,5225-5226,5269,5280,5298,5357,5405,5414,5431-5432,5440,5500,5510,5544,5550,5555,5560,5566,5631,5633,5666,5678-5679,5718,5730,5800-5802,5810-5811,5815,5822,5825,5850,5859,5862,5877,5900-5904,5906-5907,5910-5911,5915,5922,5925,5950,5952,5959-5963,5987-5989,5998-6007,6009,6025,6059,6100-6101,6106,6112,6123,6129,6156,6346,6389,6502,6510,6543,6547,6565-6567,6580,6646,6666-6669,6689,6692,6699,6779,6788-6789,6792,6839,6881,6901,6969,7000-7002,7004,7007,7019,7025,7070,7100,7103,7106,7200-7201,7402,7435,7443,7496,7512,7625,7627,7676,7741,7777-7778,7800,7911,7920-7921,7937-7938,7999-8002,8007-8011,8021-8022,8031,8042,8045,8080-8090,8093,8099-8100,8180-8181,8192-8194,8200,8222,8254,8290-8292,8300,8333,8383,8400,8402,8443,8500,8600,8649,8651-8652,8654,8701,8800,8873,8888,8899,8994,9000-9003,9009-9011,9040,9050,9071,9080-9081,9090-9091,9099-9103,9110-9111,9200,9207,9220,9290,9415,9418,9485,9500,9502-9503,9535,9575,9593-9595,9618,9666,9876-9878,9898,9900,9917,9929,9943-9944,9968,9998-10004,10009-10010,10012,10024-10025,10082,10180,10215,10243,10566,10616-10617,10621,10626,10628-10629,10778,11110-11111,11967,12000,12174,12265,12345,13456,13722, 13782-
13783,14000,14238,14441-14442,15000,15002-15004,15660,15742,16000-
16001,16012,16016,16018,16080,16113,16992-16993,17877,17988,18040,18101,18988,19101,19283,19315,19350,19780,19801,19842,20
000,20005,20031,20221-20222,20828,21571,22939,23502,24444,24800,25734-
25735,26214,27000,27352-27353,27355-
27356,27715,28201,30000,30718,30951,31038,31337,32768-32785,33354,33899,34571-
34573,35500,38292,40193,40911,41511,42510,44176,44442-
44443,44501,45100,48080,49152-49161,49163,49165,49167,49175-49176,49400,49999-50003,50006,50300,50389,50500,50636,50800,51103,51493,52673,52822,52848,52869,54
045,54328,55055-55056,55555,55600,56737-
56738,57294,57797,58080,60020,60443,61532,61900,62078,63331,64623,64680,65000,65
129,65389', 'method': u'syn'}}, 'command_line': u'nmap -oX - -O -v
192.168.1.87'}, 'scan': {u'192.168.1.87': {'status': {'state': u'down',
'reason': u'no-response'}, 'hostname': '', 'vendor': {}, 'addresses': {u'ipv4':
u'192.168.1.87'}}}}
You can address this from two directions: what actions Nmap takes, and what you do with the output.
The Nmap options in your program (-O -v) instruct Nmap to do the following things:
Increase verbosity (-v). This doesn't matter for python-nmap because it uses the XML output, which doesn't change based on verbosity.
Check if the host is up (default).
Check for a reverse-DNS name for the host (default).
Scan the top 1000 TCP ports on the host (default).
Fingerprint the host's OS based on TCP/IP stack quirks (-O).
If all you want is whether the host is up, you should leave off the -O and use some other options to turn off the other parts of Nmap's default behavior:
-n will turn off reverse-DNS name resolution.
-sn will turn off the port scan.
The scan information like time will always be printed.
Secondly, your callback function currently just prints the string representation of the scan object. If you want less output, then use string formatting to select the object attributes that you want to print.

Vagrant usbfilter make the guest machine entered an invalid state

Based on the following instructions:
https://gist.github.com/dergachev/3866825#vagrant-setup
Ubuntu Linaro
uname -a
Linux ken-desktop 3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/version
Linux version 3.11.0-18-generic (buildd#toyol) (gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8) ) #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014
Virtualbox-4.2
VBoxManage --version
4.2.16_Ubuntur86992
Vagrant 1.5 vagrant_1.5.0_x86_64.deb
In a cookbooks folder I cloned the following chef cookbooks:
git clone git://github.com/opscode-cookbooks/vim.git
git clone git://github.com/opscode-cookbooks/git.git
git clone git://github.com/opscode-cookbooks/apt.git
git clone git://github.com/tiokksar/chef-oh-my-zsh-solo.git
git clone git://github.com/opscode-cookbooks/openssl.git
git clone git://github.com/getaroom/chef-couchbase.git
I also installed this:
https://github.com/dotless-de/vagrant-vbguest
vagrant plugin install vagrant-vbguest
I try to make a nice Vagrantfile creating a precise64 VM with usb automatically mounted.
But each time I try to add an usbfilter on my virtualbox VM, I end up with that message:
% vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'hashicorp/precise64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'hashicorp/precise64' is up to date...
==> default: Setting the name of the VM: smartofficeVM_default_1395303674511_42792
==> default: The cookbook path '/home/ken/smartofficeVM/databags' doesn't exist. Ignoring...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 3000 => 3000 (adapter 1)
default: 22 => 2222 (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Error: Connection refused. Retrying...
default: Error: Connection refused. Retrying...
default: Error: Connection refused. Retrying...
default: Error: Connection refused. Retrying...
default: Error: Connection refused. Retrying...
The guest machine entered an invalid state while waiting for it
to boot. Valid states are 'starting, running'. The machine is in the
'poweroff' state. Please verify everything is configured
properly and try again.
If the provider you're using has a GUI that comes with it,
it is often helpful to open that and watch the machine, since the
GUI often has more helpful error messages than Vagrant can retrieve.
For example, if you're using VirtualBox, run `vagrant up` while the
VirtualBox GUI is open.
my configuration file is the following:
% cat Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
# All Vagrant configuration is done here. The most common configuration
# options are documented and commented below. For a complete reference,
# please see the online documentation at vagrantup.com.
# Every Vagrant virtual environment requires a box to build off of.
config.vm.box = "hashicorp/precise64"
# The url from where the 'config.vm.box' box will be fetched if it
# doesn't already exist on the user's system.
# config.vm.box_url = "http://domain.com/path/to/above.box"
# Create a forwarded port mapping which allows access to a specific port
# within the machine from a port on the host machine. In the example below,
# accessing "localhost:8080" will access port 80 on the guest machine.
# config.vm.network "forwarded_port", guest: 80, host: 8080
# Create a private network, which allows host-only access to the machine
# using a specific IP.
# config.vm.network "private_network", ip: "192.168.33.10"
# Create a public network, which generally matched to bridged network.
# Bridged networks make the machine appear as another physical device on
# your network.
config.vm.network "public_network"
# If true, then any SSH connections made will enable agent forwarding.
# Default value: false
# config.ssh.forward_agent = true
# Share an additional folder to the guest VM. The first argument is
# the path on the host to the actual folder. The second argument is
# the path on the guest to mount the folder. And the optional third
# argument is a set of non-required options.
# config.vm.synced_folder "../data", "/vagrant_data"
# Provider-specific configuration so you can fine-tune various
# backing providers for Vagrant. These expose provider-specific options.
# Example for VirtualBox:
#
# config.vm.provider "virtualbox" do |vb|
# # Don't boot with headless mode
# vb.gui = true
#
# # Use VBoxManage to customize the VM. For example to change memory:
# vb.customize ["modifyvm", :id, "--memory", "1024"]
# end
#
# View the documentation for the provider you're using for more
# information on available options.
# Enable provisioning with Puppet stand alone. Puppet manifests
# are contained in a directory path relative to this Vagrantfile.
# You will need to create the manifests directory and a manifest in
# the file hashicorp/precise32.pp in the manifests_path directory.
#
# An example Puppet manifest to provision the message of the day:
#
# # group { "puppet":
# # ensure => "present",
# # }
# #
# # File { owner => 0, group => 0, mode => 0644 }
# #
# # file { '/etc/motd':
# # content => "Welcome to your Vagrant-built virtual machine!
# # Managed by Puppet.\n"
# # }
#
# config.vm.provision "puppet" do |puppet|
# puppet.manifests_path = "manifests"
# puppet.manifest_file = "site.pp"
# end
# Enable provisioning with chef solo, specifying a cookbooks path, roles
# path, and data_bags path (all relative to this Vagrantfile), and adding
# some recipes and/or roles.
#
config.vm.provision "chef_solo" do |chef|
chef.cookbooks_path = "cookbooks"
#chef.roles_path = "../my-recipes/roles"
chef.data_bags_path = "databags"
#chef.add_role "web"
chef.add_recipe "apt"
chef.add_recipe "zsh"
chef.add_recipe "chef-oh-my-zsh-solo"
chef.add_recipe "vim"
chef.add_recipe "git"
chef.add_recipe "openssl"
chef.add_recipe "couchbase::server"
# setup users (from data_bags/users/*.json)
# chef.add_recipe "users::sysadmins" # creates users and sysadmin group
# chef.add_recipe "users"
# chef.add_recipe "users::sysadmin_sudo" # adds %sysadmin group to sudoers
# homesick_agent and its dependencies
# chef.add_recipe "root_ssh_agent::ppid" # maintains agent during 'sudo su root'
# chef.add_recipe "ssh_known_hosts"
# populates /etc/ssh/ssh_known_hosts from data_bags/ssh_known_hosts/*.json
# You may also specify custom JSON attributes:
#chef.json = { :users => "admin" }
chef.json = {
"couchbase" => {
"server"=> {
"password" => "123"
}
}
}
chef.log_level = :debug
end
# Enable provisioning with chef server, specifying the chef server URL,
# and the path to the validation key (relative to this Vagrantfile).
#
# The Opscode Platform uses HTTPS. Substitute your organization for
# ORGNAME in the URL and validation key.
#
# If you have your own Chef Server, use the appropriate URL, which may be
# HTTP instead of HTTPS depending on your configuration. Also change the
# validation key to validation.pem.
#
# config.vm.provision "chef_client" do |chef|
# chef.chef_server_url = "https://api.opscode.com/organizations/ORGNAME"
# chef.validation_key_path = "ORGNAME-validator.pem"
# end
#
# If you're using the Opscode platform, your validator client is
# ORGNAME-validator, replacing ORGNAME with your organization name.
#
# If you have your own Chef Server, the default validation client name is
# chef-validator, unless you changed the configuration.
#
# chef.validation_client_name = "ORGNAME-validator"
end
On detail is: If I remove the following lines, it's starting properly (but no usb available)
vb.customize ["modifyvm", :id, "--usb", "on"]
vb.customize ["modifyvm", :id, "--usbehci", "on"]
EDIT
Logs from Vlogs file
cat VBox.log
VirtualBox VM 4.2.16_Ubuntu r86992 linux.amd64 (Sep 21 2013 11:46:57) release log
00:00:00.033561 Log opened 2014-03-20T08:21:15.686771000Z
00:00:00.033570 OS Product: Linux
00:00:00.033572 OS Release: 3.11.0-18-generic
00:00:00.033575 OS Version: #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014
00:00:00.033610 DMI Product Name:
00:00:00.033624 DMI Product Version:
00:00:00.033756 Host RAM: 3882MB total, 3328MB available
00:00:00.033763 Executable: /usr/lib/virtualbox/VBoxHeadless
00:00:00.033765 Process ID: 10288
00:00:00.033767 Package type: LINUX_64BITS_GENERIC (OSE)
00:00:00.039722 Installed Extension Packs:
00:00:00.039747 VNC (Version: 4.2.16 r86992; VRDE Module: VBoxVNC)
00:00:00.046777 SUP: Loaded VMMR0.r0 (/usr/lib/virtualbox/VMMR0.r0) at 0xffffffffa0518020 - ModuleInit at ffffffffa052e0f0 and ModuleTerm at ffffffffa052e390
00:00:00.046820 SUP: VMMR0EntryEx located at ffffffffa052f510, VMMR0EntryFast at ffffffffa052f240 and VMMR0EntryInt at ffffffffa052f230
00:00:00.049809 OS type: 'Ubuntu_64'
00:00:00.073143 File system of '/home/ken/VirtualBox VMs/smartofficeVM_default_1395303674511_42792/Snapshots' (snapshots) is unknown
00:00:00.073166 File system of '/home/ken/VirtualBox VMs/smartofficeVM_default_1395303674511_42792/box-disk1.vmdk' is ext4
00:00:00.091096 VMSetError: /build/buildd/virtualbox-4.2.16-dfsg/src/VBox/Main/src-client/ConsoleImpl2.cpp(2300) int Console::configConstructorInner(PVM, util::AutoWriteLock*); rc=VERR_NOT_FOUND
00:00:00.091111 VMSetError: Implementation of the USB 2.0 controller not found!
00:00:00.091113 Because the USB 2.0 controller state is part of the saved VM state, the VM cannot be started. To fix this problem, either install the 'Oracle VM VirtualBox Extension Pack' or disable USB 2.0 support in the VM settings
00:00:00.217513 ERROR [COM]: aRC=NS_ERROR_FAILURE (0x80004005) aIID={db7ab4ca-2a3f-4183-9243-c1208da92392} aComponent={Console} aText={Implementation of the USB 2.0 controller not found!
00:00:00.217535 Because the USB 2.0 controller state is part of the saved VM state, the VM cannot be started. To fix this problem, either install the 'Oracle VM VirtualBox Extension Pack' or disable USB 2.0 support in the VM settings (VERR_NOT_FOUND)}, preserve=false
00:00:00.224473 Power up failed (vrc=VERR_NOT_FOUND, rc=NS_ERROR_FAILURE (0X80004005))
VAGRANT= debug vragant up log
http://pastebin.com/2GMhmy9T
Anybody has some expertise on the topic?
Thank you very much.
SOLUTION: I though it was already installed... when reading : 00:00:00.039722 Installed Extension Packs: 00:00:00.039747 VNC (Version: 4.2.16 r86992; VRDE Module: VBoxVNC) But in fact I have to install the extension guest pack on the Host too. It's a bit confusing. thank you very much. You can add a proper answer, I ll validate it.
The following line in VBox logs:
00:00:00.217535 Because the USB 2.0 controller state is part of the saved VM state, the VM cannot be started. To fix this problem, either install the 'Oracle VM VirtualBox Extension Pack' or disable USB 2.0 support in the VM settings (VERR_NOT_FOUND)}, preserve=false
Highlights that you have to install the VirtualBox Extension Pack in order to fix the issue.
Download and install VirtualBox extension pack from there (according to your VirtualBox version). It may solve your problem.