Sorting logs using regex? - regex

I'm trying to figure out how to sort logs for example...
User: test
Level: user
Domain: localhost
Time: 12pm
Blah: INFO
Date: 07-12-2016
Ip: 127.0.0.1
I would like the output text to be this also there is tab spaces.
User:Level:Domain:Time:Blah:Date:IP

If i get your question right, you're talking not about sorting, but about parsing. You have log strings which you want to convert to another format. The regex to match your log string would be
(?P<User>[^:]+):(?P<Level>[^:]+):(?P<Domain>[^:]+):(?P<Time>[^:]+):(?P<Blah>[^:]+):(?P<Date>[^:]+):(?P<IP>[^:]+)
However, since you have so many groups, it could be done much more efficiently, here's an example in python
import re
logString = "User:Level:Domain:Time:Blah:Date:IP"
logGroups = ["User", "Level", "Domain", "Time", "Blah", "Date", "IP"]
reLogGroups = "(?P<"+">[^:]+):(?P<".join(logGroups)+">[^:]+)"
matchLogGroups = re.search(reLogGroups,logString)
if matchLogGroups:
counter = 1
for logGroup in logGroups:
print(str(counter)+". " + logGroup + ": " + matchLogGroups.group(logGroup) + "\n")
counter += 1
The output is
1. User: User
2. Level: Level
3. Domain: Domain
4. Time: Time
5. Blah: Blah
6. Date: Date
7. IP: IP

Related

NiFi: ReplaceTextWithMapping processor

I have the following insert statements:
insert into temp1 values (test1, test2)
insert into temp2 values (test3)
Expected results:
insert into temp1 values (100, 200)
insert into temp2 values (300)
Essentially, I wanted to replace the first query literals test1, test2 with value 100, 200 respectively and for the second query replace test3 with value 300. Can someone help with the mapping file for the above use case?
I tried with the following, but it doesn't have any effect.
Search Value (RegEx) Replacement values
(1)(.*values.*)(.*test1)(.*,)(.*test2) -> $2 val1 $4 val2
(2)(.*values.*)(.*test1) -> $2 val3
If this is literally the extent of the mapping you need to perform, a regular ReplaceText processor is enough. Using the settings below results in the desired output:
It simply detects every instance of test followed by a single digit and replaces it with that digit and 00.
If you need to use ReplaceTextWithMapping for more complex lookups, the mapping file must be of the format:
search_value_1 replacement_value_1
search_value_2 replacement_value_2
etc.
The delimiter between the search and replacement values is \t.
--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Wed Dec 07 10:48:24 PST 2016'
Key: 'lineageStartDate'
Value: 'Wed Dec 07 10:48:24 PST 2016'
Key: 'fileSize'
Value: '66'
FlowFile Attribute Map Content
Key: 'filename'
Value: '56196144045589'
Key: 'path'
Value: './'
Key: 'uuid'
Value: 'f6b28eb0-73b5-4d94-86c2-b7a5d4cc991e'
--------------------------------------------------
insert into temp1 values (100, 200)
insert into temp2 values (300)

How do I sort through lines of a text document to find a phrase based on a date?

I am writing a program that logs jobs into a file and then sorts and organises the jobs by date. The entries are lists that are just appended to the end of a text file. They appear in the file like so:
2017-01-31,2016-05-24,test1
2016-05-15,2016-05-24,test2
2016-06-15,2016-05-24,test3
2016-07-16,2016-05-24,test4
They follow this format: due date, date entered, job title. I would like to be able to be able to print the jobs from the text file to the python shell by order of dates, the job with the closest date being first. I was thinking of turning each line into an item in a list, doing something with the due date characters, and sorting that way. I can't figure out how to keep everything together if I do it that way though. Any thoughts?
Use datetime.datetime.strptime to parse the date strings into datetime objects. Then just sort the list of jobs by date and output them.
from datetime import datetime
date_str_format = '%Y-%m-%d'
jobs = []
with open('jobs.txt', 'r') as f:
for line in f:
date_due, date_entered, title = line.split(',')
jobs.append((datetime.strptime(date_due, date_str_format),
datetime.strptime(date_entered, date_str_format),
title.strip()))
jobs.sort()
for date_due, _, title in jobs:
print '{} (due {})'.format(title, date_due)
Here are the contents of jobs.txt:
2017-01-31,2016-05-24,test1
2016-05-15,2016-05-24,test2
2016-06-15,2016-05-24,test3
2016-07-16,2016-05-24,test4
And the output...
test2 (due 2016-05-15 00:00:00)
test3 (due 2016-06-15 00:00:00)
test4 (due 2016-07-16 00:00:00)
test1 (due 2017-01-31 00:00:00)
I think this does what you want since you've picked a nice date format:
lines = """
2017-01-31,2016-05-24,test1
2016-05-15,2016-05-24,test2
2016-06-15,2016-05-24,test3
2016-07-16,2016-05-24,test4
"""
sorted([entry.split(",") for entry in lines.split("\n") if any(entry)], reverse=True)
In Python 2.7 shell:
>>> lines = """
... 2017-01-31,2016-05-24,test1
... 2016-05-15,2016-05-24,test2
... 2016-06-15,2016-05-24,test3
... 2016-07-16,2016-05-24,test4
... """
>>>
>>> lines_sorted = sorted([entry.split(",") for entry in lines.split("\n") if any(entry)], reverse=True)
>>> for line in lines_sorted:
... print line
...
['2017-01-31', '2016-05-24', 'test1']
['2016-07-16', '2016-05-24', 'test4']
['2016-06-15', '2016-05-24', 'test3']
['2016-05-15', '2016-05-24', 'test2']
>>>
Using string formatting and unpacking using *:
output_str = "Due date: {0}\nDate entered: {1}\nJob title: {2}\n"
entries_sorted = sorted([entry.split(",") for entry in lines.split("\n") if any(entry)], reverse=True)
for entry in entries_sorted:
print output_str.format(*entry)
Output:
Due date: 2017-01-31
Date entered: 2016-05-24
Job title: test1
Due date: 2016-07-16
Date entered: 2016-05-24
Job title: test4
Due date: 2016-06-15
Date entered: 2016-05-24
Job title: test3
Due date: 2016-05-15
Date entered: 2016-05-24
Job title: test2

Rearrange words in a given sentence in python

I'm very new to python.
I get a serial data in COM port in fixed format as a string like this:
"21-12-2015 10:12:05 005 100 10.5 P"
The format is 'date time id count data data'
Here i don't require count and first data, instead i want to add one more data and send this again through another COM port.
I want to rearrange this and give output as
21-12-2015 10:12:05
SI.NO: 1451
Result: 10.5 P
My attempt:
ip = '21-12-2015_10:12:05_005_100_10.5 P'
dt = ip[0]+ip[1]+ip[3]+..... #save date as dt
tm = ip[9]+ip[10]+ip[11]+.... etc
and at the end
Result = dt + tm +"\n" + " "+ "SI.NO"+.......
Please suggest some good concept to do this in python 2.7.11
If you can mention some ideas i will search for the code.
Thank you
You can split up your string on whitespace into fields with split and build a new string using Python's string formatting syntax:
ip = "21-12-2015 10:12:05 005 100 10.5 P"
fields = ip.split()
s = '{date} {time}\n SI.NO: {sino}\n Result: {x} {y}'.format(
date=fields[0],
time=fields[1],
sino=1451, # Provide your own counter here
x=fields[4],
y=fields[5])
print s
21-12-2015 10:12:05
SI.NO: 1451
Result: 10.5 P
It isn't clear from your question whether your fields are separated by spaces or underscores. In the latter case, use fields = ip.split('_').

How to parse through a string in perl to extract certain value?

I have following string
> show box detail
2 boxes:
1) Box ID: 1
IP: 127.0.0.1
Interface: 1/1
Priority: 31
2) Box ID: 2
IP: 192.68.1.1
Interface: 1/2
Priority: 31
How to get BOX ID from above string in perl?
The number of boxes here can vary . So based on the number of boxes "n", how to extract box Ids if the show box detail can go upto n nodes in the same format ?
my #ids = $string =~ /Box ID: ([0-9]+)/g;
More restrictive:
my #ids = $string =~ /^[0-9]+\) Box ID: ([0-9]+)$/mg;

Search then Extract

I have a text file with multiple records. I want to search a name and date, for example if I typed JULIUS CESAR as name then the whole data about JULIUS will be extracted. What if I want only to extract information?
Record number: 1
Date: 08-Oct-08
Time: 23:45:01
Name: JULIUS CESAR
Address: BAGUIO CITY, Philippines
Information:
I lived in Peza Loakan, Bagiou City
A Computer Engineering student
An OJT at TIPI.
23 years old.
Record number: 2
Date: 09-Oct-08
Time: 23:45:01
Name: JOHN Castro
Address: BAGUIO CITY, Philippines
Information:
I lived in Peza Loakan, Bagiou City
A Electronics Comm. Engineering Student at SLU.
An OJT at TIPI.
My Hobby is Programming.
Record number: 3
Date: 08-Oct-08
Time: 23:45:01
Name: CESAR JOSE
Address: BAGUIO CITY, Philippines
Information:
Hi,,
I lived Manila City
A Computer Engineering student
Working at TIPI.
If it is one line per entry, you could use a regular expression such as:
$name = "JULIUS CESAR";
Then use:
/$name/i
to test if each line is about "JULIUS CESAR." Then you simply have to use the following regex to extract the information (once you find the line):
/Record number: (\d+) Date: (\d+)-(\w+)-(\d+) Time: (\d+):(\d+):(\d+) Name: $name Address: ([\w\s]+), ([\w\s]+?) Information: (.+?)$/i
$1 = record number
$2-$4 = date
$5-$7 = time
$6 = address
$7 = comments
I would write a code example, but my perl is rusty. I hope this helps :)
In PHP, you can run a SQL select statement like:
"SELECT * WHERE name LIKE 'JULIUS%';"
There are native aspects of PHP where you can get all of your results in an associative array. I'm pretty sure it's ordered by row order. Then you can just do something like this:
echo implode(" ", $whole_row);
Hope this is what you're looking for!