Split regex matches into multiple lines - regex

I'm using regex to read a line, gather all the matches and print each match as a new line.
So far i have read the line and extracted the data I need but the code prints it all in a single line.
Is there a way to print each match separately?
Here is the code i have been using:
import os
import re
msg = "0,0.000000E+000,NCAP,64Q34,39,39,1028,NCAP,1,1,NCAP"
text = [msg.split(',')]
which gives me [['0', '0.000000E+000', 'NCAP', '64Q34', '39', '39', '1028', 'NCAP', '1', '1', 'NCAP']].
Searching for data between ' ' will get me the individual results.
Using the code below will find all matches but it keeps it all as one line, giving me the same as the input.
text = str(text)
line = text.strip()
m = re.findall("'(.+?)'", line)
found = str(m)
print(found+ '\n')

I am unsure what you are trying to capture using regexs, but from what I understand you want to split msg up by commas ',' and print each element on a new line.
msg = "0,0.000000E+000,NCAP,64Q34,39,39,1028,NCAP,1,1,NCAP"
msg = msg.split(',')
for m in msg:
print(m)
>>> 0
0.000000E+000
NCAP
...
This will print each element of msg on a new line - the elements of msg are split up by ','.
I would also use this great online interactive regex tester to test your regexs in real time to understand how to use regex / which expressions to use. (make sure to select python language).

Related

Conditionally extracting the beginning of a regex pattern

I have a list of strings containing the names of actors in a movie that I want to extract. In some cases, the actor's character name is also included which must be ignored.
Here are a couple of examples:
# example 1
input = 'Levan Gelbakhiani as Merab\nAna Javakishvili as Mary\nAnano Makharadze'
expected_output = ['Levan Gelbakhiani', 'Ana Javakishvili', 'Anano Makharadze']
# example 2
input = 'Yoosuf Shafeeu\nAhmed Saeed\nMohamed Manik'
expected_output = ['Yoosuf Shafeeu', 'Ahmed Saeed', 'Mohamed Manik']
Here is what I've tried to no avail:
import re
output = re.findall(r'(?:\\n)?([\w ]+)(?= as )?', input)
output = re.findall(r'(?:\\n)?([\w ]+)(?: as )?', input)
output = re.findall(r'(?:\\n)?([\w ]+)(?:(?= as )|(?! as ))', input)
The \n in the input string are new line characters. We can make use of this fact in our regex.
Essentially, each line always begins with the actor's name. After the the actor's name, there could be either the word as, or the end of the line.
Using this info, we can write the regex like this:
^(?:[\w ]+?)(?:(?= as )|$)
First, we assert that we must be at the start of the line ^. Then we match some word characters and spaces lazily [\w ]+?, until we see (?:(?= as )|$), either as or the end of the line.
In code,
output = re.findall(r'^(?:[\w ]+?)(?:(?= as )|$)', input, re.MULTILINE)
Remember to use the multiline option. That is what makes ^ and $ mean "start/end of line".
You can do this without using regular expression as well.
Here is the code:
output = [x.split(' as')[0] for x in input.split('\n')]
I guess you can combine the values obtained from two regex matches :
re.findall('(?:\\n)?(.+)(?:\W[a][s].*?)|(?:\\n)?(.+)$', input)
gives
[('Levan Gelbakhiani', ''), ('Ana Javakishvili', ''), ('', 'Anano Makharadze')]
from which you filter the empty strings out
output = list(map(lambda x : list(filter(len, x))[0], output))
gives
['Levan Gelbakhiani', 'Ana Javakishvili', 'Anano Makharadze']

Regex in python trouble

I have a text file that I would like to search through it to see how many of a certain word is in it. I'm getting the wrong count for the words.
File is here
code:
import re
with open('SysLog.txt', 'rt') as myfile:
for line in myfile:
m = re.search('guest', line, re.M|re.I)
if m is not None:
m.group(0)
print( "Found it.")
print('Found',len(m.group()), m.group(),'s')
break
for line in myfile:
n = re.search('Worm', line)
if n is not None:
n.group(0)
print("\n\tNext Match.")
print('Found', len(n.group()), n.group(), 's')
break
for line in myfile:
o = re.search('anonymous', line)
if o is not None:
o.group(0)
print("\n\tNext Match.")
print('Found', len(o.group()), o.group(), 's')
break
There is no need to use a regex, you can use str.count() to make the process much more simple:
with open('SysLog.txt', 'rt') as myfile:
text = myfile.read()
for word in ('guest', 'Worm', 'anonymous'):
print("\n\tNext Match.")
print('Found', text.count(word), word, 's')
To test this, I downloaded the file and ran the code above, and got the output:
Next Match.
Found 4 guest s
Next Match.
Found 91 Worm s
Next Match.
Found 18 anonymous s
which is correct if you do a find on the document in a text editor!
*As a sidenote, I'm not sure why you want to print a tab (\t) before 'Next Match' each time as it just looks weird in the output but it doesn't matter :)
There are multiple problems with your code:
re.search will only give you the first match, if any; this does not have to be a problem, though, as it seems like the word is only expected to appear once per line; otherwise, use re.findall
the line n.group(0) does not do anything without an assignment
len(n.group()) does not give you the number of matches, but the length of the matched string
you break after the first line in the file
myfile is an iterator, so once the first for line in myfile loop has finished, the other two won't have any lines left to loop (it will never finish because of the break anyway, though)
as already noted, you do not need regular expression at all
One (among many) possible ways of doing this would be this (not tested):
counts = {"worm": 0, "guest": 0, "anonymous": 0}
for line in myfile:
for word in counts:
if word in line:
counts[word] += 1

How to remove the last character along with newline character from every line in python?

In my program I am getting an o/p which is a string content like below:
TD_MAP1:
TD_MAP2:
TD_MAP5:
TD_MAP4:
Now I want to convert it to a list containing only useful info like:
['TD_MAP1','TD_MAP2','TD_MAP3','TD_MAP4']
Can we make it through strip()?
You can use split. It converts your string in a list seperate by a delimiter.
example :
a = "D_MAP1: TD_MAP2: TD_MAP5: TD_MAP4:"
b = a.split(":")
# b will be equal to ['TD_MAP1', ' TD_MAP2', ' TD_MAP5', ' TD_MAP4', '']
# you can remove the last sentence with b.pop(-1)

Python Search File For Specific Word And Find Exact Match And Print Line

I wrote a script to print the lines containing a specific word from a bible txt file.The problem is i couldn't get the exact word with the line instead it prints all variations of the word.
For eg. if i search for "am" it prints sentences with words containing "lame","name" etc.
Instead i want it to print only the sentences with "am" only
i.e, "I am your saviour", "Here I am" etc
Here is the code i use:
import re
text = raw_input("enter text to be searched:")
shakes = open("bible.txt", "r")
for line in shakes:
if re.match('(.+)' +text+ '(.+)', line):
print line
This is another approach to take to complete your task, it may be helpful although it doesn't follow your current approach very much.
The test.txt file I fed as input had four sentences:
This is a special cat. And this is a special dog. That's an average cat. But better than that loud dog.
When you run the program, include the text file. In command line, that'd look something like:
python file.py test.txt
This is the accompanying file.py:
import fileinput
key = raw_input("Please enter the word you with to search for: ")
#print "You've selected: ", key, " as you're key-word."
with open('test.txt') as f:
content = str(f.readlines())
#print "This is the CONTENT", content
list_of_sentences = content.split(".")
for sentence in list_of_sentences:
words = sentence.split(" ")
for word in words:
if word == key:
print sentence
For the keyword "cat", this returns:
That is a special cat
That's an average cat
(note the periods are no longer there).
I think if you, in the strings outside text, put spaces like this:
'(.+) ' + text + ' (.+)'
That would do the trick, if I correctly understand what is going on in the code.
re.findall may be useful in this case:
print re.findall(r"([^.]*?" + text + "[^.]*\.)", shakes.read())
Or even without regex:
print [sentence + '.' for sentence in shakes.split('.') if text in sentence]
reading this text file:
I am your saviour. Here I am. Another sentence.
Second line.
Last line. One more sentence. I am done.
both give same results:
['I am your saviour.', ' Here I am.', ' I am done.']

Python and Regex with special characters

I can't get my regex to work as desired in my Python 3 code.
I am trying to parse a file find a specific pattern (the exact pattern is Total Optimized)
I am doing this because the file can contain lines which say """Total Optimization (Active)""" and other permutations. I have tried the following lines. None work
PkOp = re.compile(r'Total Optimized\t\d')
PkOp = re.compile(r'Total Optimized\t\d')
PkOp = re.compile(r'Total Optimized\t[^(Active)]')
My basic code (which is simplified here) to just print the matching line out. If I got that working I would then choose the array item I wanted such as
PkOp = PkOpArray[4]
App = re.compile(r'Appliance\s(Active)')
PkOp = re.compile(r"Total Optimized\t\d")
with open("SteelheadMetric2.txt","r") as f:
with open("mydumbfile.txt","w") as o:
for line in f:
line = line.lstrip()
matches = PkOp.findall(line)
for firestick in matches:
PkOpArray = line.split()
PkOp = PkOpArray
print(PkOp)
Mostly I get this error
matches = PkOp.findall(line)
AttributeError: 'list' object has no attribute 'findall'
If I remove the slash characters I can get it to show lines with 'Total Optimization' or 'Appliance' whatever. I just can't be more specific in what I want.
What am I missing? It works fine if I just compile a text string but to use special regex like whitespace, tab digit it fails. The regex checks out in notepad++
When you write PkOp = PkOpArray you have just changed your regex into a list.
If you delete that line, and change your print(PkOp) to print(PkOpArray), it should fix your problem, assuming the rest of your code is correct.