How to get the specific content from the log file described bellow? - regex

I Have a log file which is generated by nmap, which is something like this:
Nmap scan report for gateway (10.0.0.1)
Host is up (0.0060s latency).
MAC Address: 10:BE:F5:FC:9C:65 (D-Link International)
Nmap scan report for 10.0.0.2
Host is up (0.055s latency).
MAC Address: 7C:78:7E:E8:1C:2A (Samsung Electronics)
Nmap scan report for 10.0.0.3
Host is up (0.059s latency).
MAC Address: 54:60:09:83:6E:B6 (Google)
Nmap scan report for 10.0.0.200
Host is up (-0.093s latency).
MAC Address: 5C:B9:01:02:5F:D8 (Hewlett Packard)
Nmap scan report for manoj-notebook (10.0.0.4)
Host is up.
Nmap done: 256 IP addresses (5 hosts up) scanned in 16.84 seconds
It keeps on changing as the new devices connect to the network or existing device disconnects from the network. I want to fetch the ip address example: 10.0.0.1, mac address example: 10:BE:F5:FC:9C:65 and the device name example: D-Link International in a single list something like:
result = [['10.0.0.1', '10.0.0.2', '10.0.0.3', '10.0.0.200', '10.0.0.4'], ['10:BE:F5:FC:9C:65', '7C:78:7E:E8:1C:2A', '54:60:09:83:6E:B6', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Samsung Electronics', 'Google', 'Hewlett Packard']]
I tried the following regular expression to match IP address, MAC Address and Device name:
ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern = re.findall(r'(?:.*?s: ){2}(.*)(?= \))', temp)
devicePattern = re.findall(r'(?:.*?\(){2}(.*)(?=\))', temp)
I'm able to match the IP Address but unable to match mac address and device name. How to match the same and store it in a single list? Thank you.
Also if I could get a pattern to fetch latency from the log file example: 0.0060s it would be a cherry on top. Thank you.

You can use the following expressions:
ipPattern : \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
macPattern : (?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b
(?:[0-9A-F]{2}:)+ Non capturing group for sequence of pairs of alphanumerical values followed by :.
[0-9A-F]+\b Final pair of alphanumerical value, followed by word boundary.
devicePattern : (?<=\()[^)0-9.]*(?=\))
(?<=\() Negative lookbehind for bracket ).
[^)0-9.]* Negated character set, matches anything that is not a ) or . or digits.
(?=\)) Positive lookahead for ).
latency : -?\d+\.\d+s(?=\slatency)
-?\d+\.\d+s Match - optionally, digits, full stop, more digits and s.
(?=\slatency) Positive lookahead, assert that what follows whitespace and latency.
Python snippet:
import re
import itertools
temp = """
b'\nStarting Nmap 7.60 ( https://nmap.org ) at 2018-08-03 19:44 IST\nNmap scan report for gateway (10.0.0.1)\nHost is up (0.0070s latency).\nMAC Address: 10:BE:F5:FC:9C:65 (D-Link International)\nNmap scan report for 10.0.0.3\nHost is up (0.11s latency).\nMAC Address: 54:60:09:83:6E:B6 (Google)\nNmap scan report for 10.0.0.5\nHost is up (0.11s latency).\nMAC Address: 7C:78:7E:A4:73:8C (Samsung Electronics)\nNmap scan report for 10.0.0.200\nHost is up (0.027s latency).\nMAC Address: 5C:B9:01:02:5F:D8
"""
ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern= re.findall(r'(?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b',temp)
devicePattern = re.findall(r'(?<=\()[^)0-9.]*(?=\))',temp)
latency = re.findall(r'-?\d+\.\d+s(?=\slatency)',temp)
print(ipPattern)
print(macPattern)
print(devicePattern)
print(latency)
Prints:
['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200']
['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8']
['D-Link International', 'Google', 'Samsung Electronics']
['0.0070s', '0.11s', '0.11s', '0.027s']
For joining in a single list use:
mylist = itertools.chain([ipPattern], [macPattern], [devicePattern], [latency])
print(list(mylist))
Prints:
[['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200'], ['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Google', 'Samsung Electronics'], ['0.0070s', '0.11s', '0.11s', '0.027s']]

Related

Grepping two patterns from event logs

I am seeking to extract timestamps and ip addresses out of log entries containing a varying amount of information. The basic structure of a log entry is:
<timestamp>, <token_1>, <token_2>, ... ,<token_n>, <ip_address> <token_n+2>, <token_n+3>, ... ,<token_n+m>,-
The number of tokens n between the timestamp and ip address varies considerably.
I have been studying regular expressions and am able to grep timestamps as follows:
grep -o "[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}"
And ip addresses:
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
But I have not been able to grep both patterns out of log entries which contain both. Every log entry contains a timestamp, but not every entry contains an ip address.
Input:
2021-04-02T09:06:44.248878+00:00,Creation Time,EVT,WinEVTX,[4624 / 0x1210] Source Name: Microsoft-Windows-Security-Auditing Message string: An account was successfully logged on.\n\nSubject:\n\tSecurity ID:\t\tS-1-5-18\n\tAccount Name:\t\tREDACTED$\n\tAccount Domain:\t\tREDACTED\n\tLogon ID:\t\tREDACTED\n\nLogon Type:\t\t\t10\n\nNew Logon:\n\tSecurity ID:\t\tREDACTED\n\tAccount Name:\t\tREDACTED\n\tAccount Domain:\t\tREDACTED\n\tLogon ID:\t\REDACTED\n\tLogon GUID:\t\tREDACTED\n\nProcess Information:\n\tProcess ID:\t\tREDACTED\n\tProcess Name:\t\tC:\Windows\System32\winlogon.exe\n\nNetwork Information:\n\tWorkstation:\tREDACTED\n\tSource Network Address:\t255.255.255.255\n\tSource Port:\t\t0\n\nDetailed Authentication Information:\n\tLogon Process:\t\tUser32 \n\tAuthentication Package:\tNegotiate\n\tTransited Services:\t-\n\tPackage Name (NTLM only):\t-\n\tKey Length:\t\t0\n\nThis event is generated when a logon session is created. It is generated on the computer that was accessed.\n\nThe subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service or a local process such as Winlogon.exe or Services.exe.\n\nThe logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).\n\nThe New Logon fields indicate the account for whom the new logon was created i.e. the account that was logged on.\n\nThe network fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.\n\nThe authentication information fields provide detailed information about this specific logon request.\n\t- Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.\n\t- Transited services indicate which intermediate services have participated in this logon request.\n\t- Package name indicates which sub-protocol was used among the NTLM protocols.\n\t- Key length indicates the length of the generated session key. This will be 0 if no session key was requested. Strings: ['S-1-5-18' 'DEVICE_NAME$' 'NETWORK' 'REDACTED' 'REDACTED' 'USERNAME' 'WORKSTATION' 'REDACTED' '10' 'User32 ' 'Negotiate' 'REDACTED' '{REDACTED}' '-' '-' '0' 'REDACTED' 'C:\\Windows\\System32\\winlogon.exe' '255.255.255.255' '0' '%%1833'] Computer Name: REDACTED Record Number: 1068355 Event Level: 0,winevtx,OS:REDACTED,-
Desired Output:
2021-04-02T09:06:44, 255.255.255.255
$ sed -En 's/.*([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}).*[^0-9]([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/\1, \2/p' file
2021-04-02T09:06:44, 255.255.255.255
Your regexps can be reduced by removing some of the explicit repetition though:
$ sed -En 's/.*([0-9]{4}(-[0-9]{2}){2}T([0-9]{2}:){2}[0-9]{2}).*[^0-9](([0-9]{1,3}\.){3}[0-9]{1,3}).*/\1, \4/p' file
2021-04-02T09:06:44, 255.255.255.255
It could be simpler still if all of the lines in your log file start with a timestamp:
$ sed -En 's/([^,.]+).*[^0-9](([0-9]{1,3}\.){3}[0-9]{1,3}).*/\1, \2/p' file
2021-04-02T09:06:44, 255.255.255.255
If you are looking for lines that contain both patterns, it may be easiest to do it two separate searches.
If you're searching your log file for lines that contain both "dog" and "cat", it's usually easiest to do this:
grep dog filename.txt | grep cat
The grep dog will find all lines in the file that match "dog", and then the grep cat will search all those lines for "cat".
You seem not to know the meaning of the "-o" switch.
Regular "grep" (without "-o") means: give the entire line where the pattern can be found. Adding "-o" means: only show the pattern.
Combining two "grep" in a logical AND-clause can be done using a pipe "|", so you can do this:
grep <pattern1> <filename> | grep <pattern2>

Is there a way to record multiple lines at once with TextFSM?

I want to parse a Checkpoint Firewall cphaprob -a if executed via Netmiko using TextFSM. The final generated list is not well formatted.
I already tried a lot of TextFSM combination of commands but maybe I just fail to understand how it properly works.
Original command output
Below is the cphaprob -a if original output. I want to parse the virtual context (e.g 'vcont 0'), interface names (e.g 'bond0'), virtual interfaces (e.g 'bond0.2121') and their hostnames (e.g '10.105.0.42').
vcont 0:
------
Required interfaces: 2
Required secured interfaces: 1
eth0 UP non sync(non secured), multicast
eth1 UP sync(secured), broadcast
Virtual cluster interfaces: 1
eth0 10.105.0.42
vcont 1:
------
Required interfaces: 3
Required secured interfaces: 1
eth1 UP sync(secured), broadcast
bond0 UP non sync(non secured), multicast, bond Load Sharing (bond0.2101)
bond1 UP non sync(non secured), multicast, bond Load Sharing (bond1.2126)
Virtual cluster interfaces: 3
bond0.2121 10.65.29.21
bond1.2122 10.65.29.22
bond1.2123 10.65.29.23
vcont 2:
------
Required interfaces: 3
Required secured interfaces: 1
eth1 UP sync(secured), broadcast
bond1 UP non sync(non secured), multicast, bond Load Sharing (bond1.2127)
bond0 UP non sync(non secured), multicast, bond Load Sharing (bond0.2102)
Virtual cluster interfaces: 2
bond1.4242 10.65.29.42
bond0.4243 10.65.29.43
TextFSM template
# template for ```cphaprob -a if``` command.
Value Context (\S+\s\d+)
Value List Interface (\S+)
Value List VirtualInterface (\S+)
Value List IPv4 (\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})
Start
^${Context}:
^${Interface}.*(UP|DOWN|Disconnected)
^Virtual cluster interfaces: \d+ -> Cluster
Cluster
^${VirtualInterface}\s+${IPv4} -> Record Start
Expected results
$ python tests/test_checkpoint_functions.py
[['vcont 0', ['eth0', 'eth1'], ['eth0'], ['10.105.0.42']],
['vcont 1', ['eth1', 'bond0', 'bond1'], ['bond0.2121', 'bond1.2122', 'bond1.2123'], ['10.65.29.21', '10.65.29.22', '10.65.29.23']],
['vcont 2', ['eth1', 'bond1', 'bond0'], ['bond1.4242', 'bond0.4243'], ['10.65.29.42', '10.65.29.43']]]
Actual results
$ python tests/test_checkpoint_functions.py
[['vcont 0', ['eth0', 'eth1'], ['eth0'], ['10.105.0.42']],
['vcont 1', ['eth1', 'bond0', 'bond1'], ['bond0.2121'], ['10.65.29.21']],
['vcont 2', ['eth1', 'bond1', 'bond0'], ['bond1.4242'], ['10.65.29.42']]]
As you can see I only get the 1st occurrence of the virtual interfaces and their corresponding IP addresses. The reason may be that in my template in Cluster state I record right after ^${VirtualInterface}\s+${IPv4} -> Record Start. I just can't figure out how to get all virtual interfaces and IP addresses in their corresponding lists.
Value Context (\S+\s\d+)
Value List Interface (\S+)
Value List VirtualInterface (\S+)
# Add escaping for "."
Value List IPv4 (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
Start
^${Context}:
^${Interface}.*(UP|DOWN|Disconnected)
^Virtual cluster interfaces: \d+ -> Cluster
Cluster
^${VirtualInterface}\s+${IPv4}
# The reason for the multiple VirtualInterface and IPv4 entries
# is because you Record after each match.
# Instead, you can Record after you have matched all entries.
# I look for what I know will be the start of the next entry in table,
# and use Continue.Record.
# Continue will continue looking for matches for the current line that is being parsed,
# but will move onto the next regex line, instead of starting back at the top of State.
# Another thing about Continue, is that you cannot do that and have a State Change,
# which is why I change to the Start state afterwards
^.+: -> Continue.Record
^${Context}: -> Start

indexing output of linux command in c++

I want to get the ip address given url.
I am currently using this
std::string i;
std::string pingStr = (std::string)"nslookup " +"www.yahoo.com" ;
i = system (pingStr.c_str());
but the output is
Server: 127.0.1.1
Address: 127.0.1.1#53
Non-authoritative answer:
www.yahoo.com canonical name = atsv2-fp-shed.wg1.b.yahoo.com.
Name: atsv2-fp-shed.wg1.b.yahoo.com
Address: 106.10.250.10
Q: Is there anyway I can only get the Ip address?
Use the getaddrinfo(3) function to look up IP addresses, IPv4 or IPv6, in usable form.
you can use the folowing command.
nslookup www.yahoo.com | grep Address: | sed -n 2p
grep Address gives you all lines having "address" word in them
sed gets the 2nd line of those 2
You can truncate the "Address" part of output in c++.

Simple Python Regex to separate the words

Image link here.
I have the following router output:
config t
Enter configuration commands, one per line. End with CNTL/Z.
n7k(config)# port-profile demo_ethernet
n7k(config-port-prof)# ?
bandwidth Set bandwidth informational parameter
beacon Disable/enable the beacon for an interface
cdp Configure CDP interface parameters
channel-group Configure port channel parameters
delay Specify interface throughput delay
description Enter port-profile description of maximum 80 characters
From the above output, I want this:
['bandwidth' , 'beacon' , 'cdp' , 'channel-group' , 'delay' , 'description']
I am trying
m = re.compile('(\w+\s\w+)')
n = m.findall(buffer)
And getting this output:
['config t', 'Enter configuration', 'one per', 'End with', 'profile demo_ethernet', 'Set bandwidth', 'informational parameter', 'enable the', 'beacon for', 'an interface', 'Configure CDP', 'interface parameters', 'Configure port', 'channel parameters', 'Specify interface', 'throughput delay', 'Enter port', 'profile description', 'of maximum', '80 characters']
This regex should work (assuming whatever line you want to match have space in starting)
^\s+([\w-]+)\s+.+$
Regex Demo
Python code (For simplicity I have taken all input in a single string)
p = re.compile(r'^\s+([\w-]+)\s+.+$', re.MULTILINE)
test_str = "config t\nEnter configuration commands, one per line. End with CNTL/Z.\nn7k(config)# port-profile demo_ethernet\nn7k(config-port-prof)# ?\n bandwidth Set bandwidth informational parameter\n beacon Disable/enable the beacon for an interface\n cdp Configure CDP interface parameters\n channel-group Configure port channel parameters\n delay Specify interface throughput delay\n description Enter port-profile description of maximum 80 characters"
print(re.findall(p, test_str))
Ideone Demo

How to extrapolate data from an nmap scan result

I'm still quite new to Python and I'm currently looking at network scanning for available hosts. With my current code, I can search an IP range to determine if hosts are available or not. However, how can I restrict what information the nmap scan results show me, or is there a function I need to be using to only show the host IP address, scan time and if its available?
#!/usr/bin/env python
import nmap
import sys
nm = nmap.PortScannerAsync()
def callback_result(host, scan_result):
print '------------------'
print host, scan_result
try:
nm.scan('192.168.1.86-87', arguments='-O -v', callback=callback_result)
while nm.still_scanning():
print('<<< Scanning >>>')
nm.wait(2)
except KeyboardInterrupt:
print 'Cancelling current operation'
sys.exit()
except KeyError as e:
pass
This provides the output which is broad and contains too much information;
192.168.1.87 {'nmap': {'scanstats': {'uphosts': u'0', 'timestr': u'Wed Apr 8 13:28:29 2015', 'downhosts': u'1', 'totalhosts': u'1', 'elapsed': u'3.77'},
'scaninfo': {u'tcp': {'services': u'1,3-4,6-7,9,13,17,19-26,30,32-33,37,42-43,49,53,70,79-85,88-90,99-100,106,109-111,113,119,125,135,139,143-
144,146,161,163,179,199,211-212,222,254-
256,259,264,280,301,306,311,340,366,389,406-407,416-417,425,427,443-445,458,464-
465,481,497,500,512-515,524,541,543-545,548,554-555,563,587,593,616-617,625,631,636,646,648,666-
668,683,687,691,700,705,711,714,720,722,726,749,765,777,783,787,800-
801,808,843,873,880,888,898,900-903,911-912,981,987,990,992-993,995,999-
1002,1007,1009-1011,1021-1100,1102,1104-1108,1110-1114,1117,1119,1121-
1124,1126,1130-1132,1137-1138,1141,1145,1147-1149,1151-1152,1154,1163-
1166,1169,1174-1175,1183,1185-1187,1192,1198-1199,1201,1213,1216-1218,1233-1234,1236,1244,1247-1248,1259,1271-1272,1277,1287,1296,1300-1301,1309-1311,1322,1328,1334,1352,1417,1433-1434,1443,1455,1461,1494,1500-1501,1503,1521,1524,1533,1556,1580,1583,1594,1600,1641,1658,1666,1687-1688,1700,1717-1721,1723,1755,1761,1782-1783,1801,1805,1812,1839-1840,1862-1864,1875,1900,1914,1935,1947,1971-1972,1974,1984,1998-2010,2013,2020-2022,2030,2033-2035,2038,2040-2043,2045-2049,2065,2068,2099-2100,2103,2105-2107,2111,2119,2121,2126,2135,2144,2160-2161,2170,2179,2190-2191,2196,2200,2222,2251,2260,2288,2301,2323,2366,2381-2383,2393-2394,2399,2401,2492,2500,2522,2525,2557,2601-2602,2604-2605,2607-2608,2638,2701-2702,2710,2717-2718,2725,2800,2809,2811,2869,2875,2909-2910,2920,2967-2968,2998,3000-3001,3003,3005-3007,3011,3013,3017,3030-3031,3052,3071,3077,3128,3168,3211,3221,3260-3261,3268-3269,3283,3300-3301,3306,3322-3325,3333,3351,3367,3369-3372,3389-3390,3404,3476,3493,3517,3527,3546,3551,3580,3659,3689-3690,3703,3737,3766,3784,3800-3801,3809,3814,3826-3828,3851,3869,3871,3878,3880,3889,3905,3914,3918,3920,3945,3971,3986,3995,3998,4000-4006,4045,4111,4125-4126,4129,4224,4242,4279,4321,4343,4443-4446,4449,4550,4567,4662,4848,4899-4900,4998,5000-5004,5009,5030,5033,5050-5051,5054,5060-5061,5080,5087,5100-5102,5120,5190,5200,5214,5221-5222,5225-5226,5269,5280,5298,5357,5405,5414,5431-5432,5440,5500,5510,5544,5550,5555,5560,5566,5631,5633,5666,5678-5679,5718,5730,5800-5802,5810-5811,5815,5822,5825,5850,5859,5862,5877,5900-5904,5906-5907,5910-5911,5915,5922,5925,5950,5952,5959-5963,5987-5989,5998-6007,6009,6025,6059,6100-6101,6106,6112,6123,6129,6156,6346,6389,6502,6510,6543,6547,6565-6567,6580,6646,6666-6669,6689,6692,6699,6779,6788-6789,6792,6839,6881,6901,6969,7000-7002,7004,7007,7019,7025,7070,7100,7103,7106,7200-7201,7402,7435,7443,7496,7512,7625,7627,7676,7741,7777-7778,7800,7911,7920-7921,7937-7938,7999-8002,8007-8011,8021-8022,8031,8042,8045,8080-8090,8093,8099-8100,8180-8181,8192-8194,8200,8222,8254,8290-8292,8300,8333,8383,8400,8402,8443,8500,8600,8649,8651-8652,8654,8701,8800,8873,8888,8899,8994,9000-9003,9009-9011,9040,9050,9071,9080-9081,9090-9091,9099-9103,9110-9111,9200,9207,9220,9290,9415,9418,9485,9500,9502-9503,9535,9575,9593-9595,9618,9666,9876-9878,9898,9900,9917,9929,9943-9944,9968,9998-10004,10009-10010,10012,10024-10025,10082,10180,10215,10243,10566,10616-10617,10621,10626,10628-10629,10778,11110-11111,11967,12000,12174,12265,12345,13456,13722, 13782-
13783,14000,14238,14441-14442,15000,15002-15004,15660,15742,16000-
16001,16012,16016,16018,16080,16113,16992-16993,17877,17988,18040,18101,18988,19101,19283,19315,19350,19780,19801,19842,20
000,20005,20031,20221-20222,20828,21571,22939,23502,24444,24800,25734-
25735,26214,27000,27352-27353,27355-
27356,27715,28201,30000,30718,30951,31038,31337,32768-32785,33354,33899,34571-
34573,35500,38292,40193,40911,41511,42510,44176,44442-
44443,44501,45100,48080,49152-49161,49163,49165,49167,49175-49176,49400,49999-50003,50006,50300,50389,50500,50636,50800,51103,51493,52673,52822,52848,52869,54
045,54328,55055-55056,55555,55600,56737-
56738,57294,57797,58080,60020,60443,61532,61900,62078,63331,64623,64680,65000,65
129,65389', 'method': u'syn'}}, 'command_line': u'nmap -oX - -O -v
192.168.1.87'}, 'scan': {u'192.168.1.87': {'status': {'state': u'down',
'reason': u'no-response'}, 'hostname': '', 'vendor': {}, 'addresses': {u'ipv4':
u'192.168.1.87'}}}}
You can address this from two directions: what actions Nmap takes, and what you do with the output.
The Nmap options in your program (-O -v) instruct Nmap to do the following things:
Increase verbosity (-v). This doesn't matter for python-nmap because it uses the XML output, which doesn't change based on verbosity.
Check if the host is up (default).
Check for a reverse-DNS name for the host (default).
Scan the top 1000 TCP ports on the host (default).
Fingerprint the host's OS based on TCP/IP stack quirks (-O).
If all you want is whether the host is up, you should leave off the -O and use some other options to turn off the other parts of Nmap's default behavior:
-n will turn off reverse-DNS name resolution.
-sn will turn off the port scan.
The scan information like time will always be printed.
Secondly, your callback function currently just prints the string representation of the scan object. If you want less output, then use string formatting to select the object attributes that you want to print.