I am trying to strip the timestamp via pattern match on a log file using this code
searchfile = open(file, "r")
pattern1 = ("STRING1", "STRING2")
pattern2 = ("STRING1", "STRING3")
for line in searchfile:
if all(s in line for s in pattern1):
time1 = line.strip().split()
print("Start: " + time1[3])
if all(s in line for s in pattern2):
time2 = line.strip().split()
print("End: " + time2[3])
However when i run it, the "End" print is appearing twice with the 1st "End" inheriting the value, of the "Start"
Output below;
Start: 00:00:01:543
End: 00:00:01:543
End: 00:00:01:841
Start: 00:00:05:645
End: 00:00:05:645
End: 00:00:05:903
Start: 00:00:12:408
End: 00:00:12:408
End: 00:00:12:640
Start: 00:00:14:648
End: 00:00:14:648
End: 00:00:14:871
Start: 00:00:22:677
End: 00:00:22:677
End: 00:00:22:916
I cant have another for loop for the "End" time because it would not result to a sequential "Start-End" output. I tried to assign a different variable name for the "End" time but still the results are the same.
Just found that the problem is on the pattern2 string filters. The old criteria strings resulted into two result set which also include the result set of pattern1. Need to include another criteria string on pattern 2 and it works now.
Start: 00:00:01:543
End: 00:00:01:841
Start: 00:00:05:645
End: 00:00:05:903
Start: 00:00:12:408
End: 00:00:12:640
Start: 00:00:14:648
End: 00:00:14:871
Start: 00:00:22:677
End: 00:00:22:916
Start: 00:00:27:419
End: 00:00:27:649
Start: 00:00:31:633
End: 00:00:31:858
Start: 00:00:35:605
End: 00:00:35:857
Start: 00:00:40:314
End: 00:00:40:597
Start: 00:00:41:875
End: 00:00:42:110
Related
Examples
"123456" would be ["123", "456"].
"1234567891011" would be ["123", "456", "789", "10", "11"].
I have come up with this logic using regex to solve the challenge but I am being asked if there is a way to make the logic shorter.
def ft(str)
end
The result from the scan gives a lot of whitespaces so after the join operation, I am left with either a double dash or triple dashes so I used this .gsub(/-+/, '-') to fix that. I also noticed sometimes there is a dash at the begin or the end of the string, so I used .gsub(/^-|-$/, '') to fix that too
Any Ideas?
Slice the string in chunks of max 3 digits. (s.scan(/.{1,3}/)
Check if the last chunk has only 1 character. If so, take the last char of the chunk before and prepend it to the last.
Glue the chunks together using join(" ")
Inspired by #steenslag's recommendation. (There are quite a few other ways to achieve the same with varying levels of verbosity and esotericism)
Here is how I would go about it:
def format_str(str)
numbers = str.delete("^0-9").scan(/.{1,3}/)
# there are a number of ways this operation can be performed
numbers.concat(numbers.pop(2).join.scan(/../)) if numbers.last.length == 1
numbers.join('-')
end
Breakdown:
numbers = str.delete("^0-9") - delete any non numeric character from the string
.scan(/.{1,3}/) - scan them into groups of 1 to 3 characters
numbers.concat(numbers.pop(2).join.scan(/../)) if numbers.last.length == 1 - If the last element has a length of 1 then remove the last 2 elements join them and then scan them into groups of 2 and add these groups back to the Array
numbers.join('-') - join the numbers with a hyphen to return a formatted String
Example:
require 'securerandom'
10.times do
s = SecureRandom.hex
puts "Original: #{s} => Formatted: #{format_str(s)}"
end
# Original: fd1bbce41b1c784ce6ad5303d868bbe9 => Formatted: 141-178-465-303-86-89
# Original: af04bd4d4d6beb5a0412a692d5d3d42d => Formatted: 044-465-041-269-253-42
# Original: 9a1833a43cbef51c3f3c21baa66fe996 => Formatted: 918-334-351-332-166-996
# Original: 4104ae13c998cec896997b9919bdafb3 => Formatted: 410-413-998-896-997-991-93
# Original: 0eb49065472240ba32b3c029f897b30d => Formatted: 049-065-472-240-323-029-897-30
# Original: 4c68f9f68e8f6132c0ed5b966d639cf4 => Formatted: 468-968-861-320-596-663-94
# Original: 65987ee04aea8fb533dbe38c0fea7d63 => Formatted: 659-870-485-333-807-63
# Original: aa8aaf1cf59b52c9ad7db6d4b1ae0cbb => Formatted: 815-952-976-410
# Original: 8eb6b457059f91fd06ccbac272db8f4e => Formatted: 864-570-599-106-272-84
# Original: 1c65825ed59dcdc6ec18af969938ea57 => Formatted: 165-825-596-189-699-38-57
That being said to modify your existing code this will work as well:
def format_str(str)
str
.delete("^0-9")
.scan(/(?=\d{5})\d{3}|(?=\d{3}$)\d{3}|\d{2}/)
.join('-')
end
Here are three more ways to do that.
Use String#scan with a regular expression
def fmt(str)
str.delete("^0-9").scan(/\d{2,3}(?!\d\z)/)
end
The regular expression reads, "match two or three digits provided they are not followed by a single digit at the end of the string". (?!\d\z) is a negative lookahead (which is not part of the match). As matches are greedy by default, the regex engine will always match three digits if possible.
Solve by recursion
def fmt(str)
recurse(str.delete("^0-9"))
end
def recurse(s)
case s.size
when 2,3
[s]
when 4
[s[0,2], s[2,2]]
else
[s[0,3], *fmt(s[3..])]
end
end
Determine the last two matches from the size of the string
def fmt(str)
s = str.delete("^0-9")
if s.size % 3 == 1
s[0..-5].scan(/\d{3}/) << s[-4,2] << s[-2,2]
else
s.scan(/\d{2,3}/)
end
end
All methods exhibit the following behaviour.
["5551231234", "18883319", "123456", "1234567891011"].each do |str|
puts "#{str}: #{fmt(str)}"
end
5551231234: ["555", "123", "12", "34"]
18883319: ["188", "833", "19"]
123456: ["123", "456"]
1234567891011: ["123", "456", "789", "10", "11"]
An approach:
def foo(s)
s.gsub(/\D/, '').scan(/\d{1,3}/).each_with_object([]) do |x, arr|
if x.size == 3 || arr == []
arr << x
else
y = arr.last
arr[-1] = y[0...-1]
arr << "#{y[-1]}#{x}"
end
end
end
Remove all non-digits characters, then scan for 1 to 3 digit chunks. Iterate over them. If it's the first time through or the chunk is three digits, add it to the array. If it isn't, take the last digit from the previous chunk and prepend it to the current chunk and add that to the array.
Alternatively, without generating a new array.
def foo(s)
s.gsub(/\D/, '').scan(/\d{1,3}/).instance_eval do |y|
y.each_with_index do |x, i|
if x.size == 1
y[i] = "#{y[i-1][-1]}#{x}"
y[i-1] = y[i-1][0...-1]
end
end
end
end
Without changing your code too much and without adjusting your actual regex, I might suggest replacing scan with split in order to avoid all the extra nil values; replacing gsub with tr which is much faster; and then using reject(&:empty?) to loop through and remove any blank array elements before joining with whatever character you want:
string = "12345fg\n\t\t\t 67"
string.tr("^0-9", "")
.split(/(?=\d{5})(\d{3})|(?=\d{3}$)(\d{3})|(\d{1,2})/)
.reject(&:empty?)
.join("-")
#=> 123-45-67
Not suggesting this is the best approach, but wanted to offer a little food for thought:
You can basically reduce the logic for your challenge to test for 1 single condition and to use 2 very simple pattern matches:
Condition to test for: Number of characters is more than 3 and has a modulo(3) of 1. This condition will require the use of both pattern matches.
All other conditions will use a single pattern match so no reason to test for those.
This could probably be made a little less verbose but it’s all spelled out pretty well for clarity:
def format(s)
n = s.delete("^0-9")
regex_1 = /.{1,3}/
regex_2 = /../
if [n.length-3, 0].max.modulo(3) == 1
a = n[0..-5].scan(regex_1)+n[-4..-1].scan(regex_2)
else a=n.scan(regex_1)
end
a.join("-")
end
I have set of coordinates that are randomly separated with an empty line:
19.815857300 39.791813400
19.816105700 39.791921800
19.816220800 39.791984600
19.816271400 39.792010000
19.786375895 39.678097997
19.783813875 39.677022719
19.782758486 39.676590122
and so on...lot of coordinates :)
I want to insert a specific text at the beginning and at the end of the 1st coordinates set,( first line of the "paragraph").
The same for the last line of each paragraph - different text entry
The coordinates between them, also need another text before and after.
so.. the result I would like to achieve, should be something like
BEGIN_SEGMENT 0 50 10 19.815857300 39.791813400 0.00000000
SHAPE_POINT 19.816105700 39.791921800 0
SHAPE_POINT 19.816220800 39.791984600 0
END_SEGMENT 20 19.816271400 39.792010000 0.00000000
BEGIN_SEGMENT 0 50 10 19.786375895 39.678097997 0.00000000
SHAPE_POINT 19.783813875 39.677022719 0
END_SEGMENT 20 19.782758486 39.676590122 0.00000000
etc..etc..
Any ideas on how this could be done with notepad++ and regular expressions?
Thanx in advance!
Solution using python 3.6.
It reads in a file "input.txt" - line by line - until it gets an empty line.
On each start it recreates the file "result.txt".
Each found paragraph is appended to "result.txt" - until done.
Now you just need to get a python environment ;o) and feed it your file.
import re
import datetime
import random
# recreate empty file with timestamp
with open("result.txt","w") as r:
r.write(str(datetime.datetime.now()) + "\n")
with open("input.txt","r") as f:
while True:
paragraph = []
line = f.readline()
if line == "":
break # readline always retunrs \n on empty lines until file end reached
while len(line.strip()) > 0:
# accumulate lines
paragraph.append(line.strip())
line = f.readline()
if not paragraph:
continue # jumps back up to while True:
rv = []
for idx in range(0,len(paragraph)):
if idx == 0:
rv.append("BEGIN_SEGMENT 0 50 " + str(random.randint(1,10000)) + " " + paragraph[idx] + " 0.00000000 " + "\n")
elif idx != (len(paragraph) - 1):
rv.append("SHAPE_POINT " + paragraph[idx] + " 0" + "\n")
else:
rv.append("END_SEGMENT " + str(random.randint(1,10000)) + " " + paragraph[idx] + " 0.00000000" + "\n" + "\n")
# append results of paragraph
with open("result.txt","a") as resFile:
for line in rv:
resFile.write(line)
# edited code for random numbers instead of fixed ones
Not complete - you'll have to adjust the first and last rows by hand.
Step 1 end segment-begin segment
FIND:\r\n(\d.*)\r\n\r\n
REPLACE: \r\nEND_SEGMENT\t$1\r\n\r\nBEGIN_SEGMENT\t
Step 2: shape point
FIND: ^(\d.*)$
REPLACE: SHAPE_POINT\t$1
Assumes that there will be only one blank line between segments.
You can use https://regexper.com/ to display a diagram of the expressions.
How do I replace the following using python
GSA*HC*11177*NYSfH-EfC*23130303*0313*1*R*033330103298
STEM*333*3001*0030303238
BHAT*3319*33*33377*23330706*031829*RTRCP
NUM4*41*2*My Break Room Place*****6*1133337
I want to replace the all character after first occurence of '*' . All characters must be replace except '*'
Example input:
NUM4*41*2*My Break Room Place*****6*1133337
example output:
NUM4*11*1*11 11111 1111 11111*****1*1111111
Fairly simple, use a callback to return group 1 (if matched) unaltered, otherwise
return replacement 1
Note - this also would work in multi-line strings.
If you need that, just add (?m) to the beginning of the regex. (?m)(?:(^[^*]*\*)|[^*\s])
You'd probably want to test the string for the * character first.
( ^ [^*]* \* ) # (1), BOS/BOL up to first *
| # or,
[^*\s] # Not a * nor whitespace
Python
import re
def repl(m):
if ( m.group(1) ) : return m.group(1)
return "1"
str = 'NUM4*41*2*My Break Room Place*****6*1133337'
if ( str.find('*') ) :
newstr = re.sub(r'(^[^*]*\*)|[^*\s]', repl, str)
print newstr
else :
print '* not found in string'
Output
NUM4*11*1*11 11111 1111 11111*****1*1111111
If you want to use regex, you can use this one: (?<=\*)[^\*]+ with re.sub
inputs = ['GSA*HC*11177*NYSfH-EfC*23130303*0313*1*R*033330103298',
'STEM*333*3001*0030303238',
'BHAT*3319*33*33377*23330706*031829*RTRCP',
'NUM4*41*2*My Break Room Place*****6*1133337']
outputs = [re.sub(r'(?<=\*)[^\*]+', '1', inputline) for inputline in inputs]
Regex explication here
I'm trying to get the message string out from this VMG file. I only want to strings after the Date line and before "END:VBODY"
The best I got so far is this regex string BEGIN:VBODY([^\n]*\n+)+END:VBODY
Anyone can help refine it?
N:
TEL:+65123345
END:VCARD
BEGIN:VENV
BEGIN:VBODY
Date:8/11/2013 11:59:00 PM
thi is a test message
Hello this is a test message on line 2
END:VBODY
END:VENV
END:VENV
END:VMSG
If you want to use regex, you can modify your current regex a little, because the $0 group has what you are looking for.
BEGIN:VBODY\n?((?:[^\n]*\n+)+?)END:VBODY
Basically what happened was ([^\n]*\n+)+ turned into (?:[^\n]*\n+)+? (turning this part lazy might be safer)
And then wrap that whole part around parens: ((?[^\n]*\n+)+?)
I added \n? before this to make the output a little cleaner.
A non-regex solution might be something like this:
string str = #"N:
TEL:+65123345
END:VCARD
BEGIN:VENV
BEGIN:VBODY
Date:8/11/2013 11:59:00 PM
thi is a test message
Hello this is a test message on line 2
END:VBODY
END:VENV
END:VENV
END:VMSG";
int startId = str.IndexOf("BEGIN:VBODY")+11; // 11 is the length of "BEGIN:VBODY"
int endId = str.IndexOf("END:VBODY");
string result = str.Substring(startId, endId-startId);
Console.WriteLine(result);
Output:
Date:8/11/2013 11:59:00 PM
thi is a test message
Hello this is a test message on line 2
ideone demo
Here is a solution using Regular Expressions,
string text = #"N:
TEL:+65123345
END:VCARD
BEGIN:VENV
BEGIN:VBODY
Date:8/11/2013 11:59:00 PM
thi is a test message
Hello this is a test message on line 2
END:VBODY
END:VENV
END:VENV
END:VMSG";
string pattern = #"BEGIN:VBODY(?<Value>[a-zA-Z0-9\r\n.\S\s ]*)END:VBODY";//Pattern to match text.
Regex rgx = new Regex(pattern, RegexOptions.Multiline);//Initialize a new Regex class with the above pattern.
Match match = rgx.Match(text);//Capture any matches.
if (match.Success)//If a match is found.
{
string value2 = match.Groups["Value"].Value;//Capture match value.
MessageBox.Show(value2);
}
Demo here.
and now a non-regex solution,
string text = #"N:
TEL:+65123345
END:VCARD
BEGIN:VENV
BEGIN:VBODY
Date:8/11/2013 11:59:00 PM
thi is a test message
Hello this is a test message on line 2
END:VBODY
END:VENV
END:VENV
END:VMSG";
int startindex = text.IndexOf("BEGIN:VBODY") + ("BEGIN:VBODY").Length;//The just start index of Date...
int length = text.IndexOf("END:VBODY") - startindex;//Length of text till END...
if (startindex >= 0 && length >= 1)
{
string value = text.Substring(startindex, length);//This is the text you need.
MessageBox.Show(value);
}
else
{
MessageBox.Show("No match found.");
}
Demo here.
Hope it helps.
I am trying to read a file that looks as follows:
Data Sampling Rate: 256 Hz
*************************
Channels in EDF Files:
**********************
Channel 1: FP1-F7
Channel 2: F7-T7
Channel 3: T7-P7
Channel 4: P7-O1
File Name: chb01_02.edf
File Start Time: 12:42:57
File End Time: 13:42:57
Number of Seizures in File: 0
File Name: chb01_03.edf
File Start Time: 13:43:04
File End Time: 14:43:04
Number of Seizures in File: 1
Seizure Start Time: 2996 seconds
Seizure End Time: 3036 seconds
So far I have this code:
fid1= fopen('chb01-summary.txt')
data=struct('id',{},'stime',{},'etime',{},'seizenum',{},'sseize',{},'eseize',{});
if fid1 ==-1
error('File cannot be opened ')
end
tline= fgetl(fid1);
while ischar(tline)
i=1;
disp(tline);
end
I want to use regexp to find the expressions and so I did:
line1 = '(.*\d{2} (\.edf)'
data{1} = regexp(tline, line1);
tline=fgetl(fid1);
time = '^Time: .*\d{2]}: \d{2} :\d{2}' ;
data{2}= regexp(tline,time);
tline=getl(fid1);
seizure = '^File: .*\d';
data{4}= regexp(tline,seizure);
if data{4}>0
stime = '^Time: .*\d{5}';
tline=getl(fid1);
data{5}= regexp(tline,seizure);
tline= getl(fid1);
data{6}= regexp(tline,seizure);
end
I tried using a loop to find the line at which file name starts with:
for (firstline<1) || (firstline>1 )
firstline= strfind(tline, 'File Name')
tline=fgetl(fid1);
end
and now I'm stumped.
Suppose that I am at the line at which the information is there, how do I store the information with regexp? I got an empty array for data after running the code once...
Thanks in advance.
I find it the easiest to read the lines into a cell array first using textscan:
%// Read lines as strings
fid = fopen('input.txt', 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
and then apply regexp on it to do the rest of the manipulations:
%// Parse field names and values
C = regexp(C{:}, '^\s*([^:]+)\s*:\s*(.+)\s*', 'tokens');
C = [C{:}]; %// Flatten the cell array
C = reshape([C{:}], 2, []); %// Reshape into name-value pairs
Now you have a cell array C of field names and their corresponding (string) values, and all you have to do is plug it into struct in the correct syntax (using a comma-separated list in this case). Note that the field names have spaces in them, so this needs to be taken care of before they can be used (e.g replace them with underscores):
C(1, :) = strrep(C(1, :), ' ', '_'); %// Replace spaces with underscores
data = struct(C{:});
Here's what I get for your input file:
data =
Data_Sampling_Rate: '256 Hz'
Channel_1: 'FP1-F7'
Channel_2: 'F7-T7'
Channel_3: 'T7-P7'
Channel_4: 'P7-O1'
File_Name: 'chb01_03.edf'
File_Start_Time: '13:43:04'
File_End_Time: '14:43:04'
Number_of_Seizures_in_File: '1'
Seizure_Start_Time: '2996 seconds'
Seizure_End_Time: '3036 seconds'
Of course, it is possible to prettify it even more by converting all relevant numbers to numerical values, grouping the 'channel' fields together and such, but I'll leave this to you. Good luck!