i have a log where a certain part is requestid in that text is there which i have to extract
Ex: RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book,
Can any1 pls help in extracting Book out of it
Consider string splitting instead
>>> s = "RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book"
>>> s.split("_")[-1]
'Book'
It seems that string splitting will be more efficient, if you must use regular expressions, here is an example.
#!/usr/bin/env python3
import re
print(
re.findall(r"^\w+_\d+\d+_(\w+)$",'RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book')
)
// output: ['Book']
Related
I need help for regex. My regex is not producing the desired results. Below is my code:
import re
text='<u+0001f48e> repairs <u+0001f6e0><u+fe0f>your loved<u+2764><u+fe0f>one
on the spot<u+26a1>'
regex=re.compile(r'[<u+\w+]+>')
txt=regex.findall(text)
print(txt)
Output
['<u+0001f48e>', '<u+0001f6e0>', '<u+fe0f>', 'loved<u+2764>', '<u+fe0f>', 'spot<u+26a1>']
I know, regex is not correct. I want output as:
'<u+0001f48e>', '<u+0001f6e0><u+fe0f>', '<u+2764><u+fe0f>', '<u+26a1>'
import re
regex = re.compile(r'<u\+[0-9a-f]+>')
text = '<u+0001f48e> repairs <u+0001f6e0><u+fe0f>your loved<u+2764><u+fe0f>one on the spot<u+26a1>'
print(regex.findall(text))
# output:
['<u+0001f48e>', '<u+0001f6e0>', '<u+fe0f>', '<u+2764>', '<u+fe0f>', '<u+26a1>']
That is not exactly what you want, but its almost there.
Now, to achieve what you are looking for, we make our regex more eager:
import re
regex = re.compile(r'((?:<u\+[0-9a-f]+>)+)')
text = '<u+0001f48e> repairs <u+0001f6e0><u+fe0f>your loved<u+2764><u+fe0f>one on the spot<u+26a1>'
print(regex.findall(text))
# output:
['<u+0001f48e>', '<u+0001f6e0><u+fe0f>', '<u+2764><u+fe0f>', '<u+26a1>']
Why won't you add optional 2nd tag search:
regex=re.compile(r'<([u+\w+]+>(<u+fe0f>)?)')
This one works fine with your example.
I have been working on a python project while I don't have that much experience so can you tell me please if I have this string : Synset'dog.n.01' and I want to extract the string dog only what should I do ?
I mean just to extract any string between Synset' and .n.01'
I suggest to use re (regex)
import re
s = "Synset'dog.n.01'"
result = re.search("Synset'(.*).n.01'", s)
print result.group(1)
I have a string path as:
"Z:\results\cfg3\clear1"
I need a python string method to capture whatever Number comes after cfg but before the \ . Note that string before \cfg\ and after it could change, so I cannot use string length. So, basically, with following 2 versions
"Z:\results\cfg3\clear1"
"Z:\results1\enhanced\cfg1\clear2\final"
the script should return
cfg3 and cfg1 as the answers.
Any ideas using regular expression?
sedy
>>> import re
>>> re.findall(r'.*(cfg\d+).*', "Z:\results\cfg3\clear1")
['cfg3']
I try to write a python scripts to analys a data txt.I want the script to do such things:
find all the time data in one line, and compare them.but this is my first time to write RE syntax.so I write a small script at 1st.
and my script is:
import sys
txt = open('1.txt','r')
a = []
for eachLine in txt:
a.append(eachLine)
import re
pattern = re.compile('\d{2}:\d{2}:\d{2}')
for i in xrange(len(a)):
print pattern.match(a[i])
#print a
and the output is always None.
my txt is just like the picture:
what's the problem? plz help me. thx a lot.
and my python is python 2.7.2.my os is windows xp sp3.
Didn't you miss one of the ":" in you regex? I think you meant
re.compile('\d{2}:\d{2}:\d{2}')
The other problems are:
First, if you want to search in the hole text, use search instead of match. Second, to access your result you need to call group() in the match object returned by your search.
Try it:
import sys
txt = open('1.txt','r')
a = []
for eachLine in txt:
a.append(eachLine)
import re
pattern = re.compile('\d{2}:\d{2}:\d{2}')
for i in xrange(len(a)):
match = pattern.search(a[i])
print match.group()
#print a
I think you're missing the colons and dots in your regex. Also try using re.search or re.findall instead on the entire text. Like this:
import re, sys
text = open("./1.txt", "r").read() # or readlines() to make a list of lines
pattern = re.compile('\d{2}:\d{2}:\d{2}')
matches = pattern.findall(text)
for i in matches:
print(i);
Given this as input:
<IMG alt="Just do it." src="http://25.media.moo.com/moo_kjasdf0nd_500.jpg">
How can I get as the output:
Just-do-it.jpg
Here's a solution using Python's re:
>>> import re
>>> input = '''<IMG alt="Just do it." src="http://25.media.moo.com/moo_kjasdf0nd_500.jpg">'''
>>> pattern = '''.*alt="([^"]*).*src=".*([.][^.]+)"'''
>>> re.match(pattern,input).groups()
('Just do it.', '.jpg')
>>>
I'll leave assembling the parts as an exercise :)
I think regex is not necessary for this.
Just parse the string in the right way.
First, strip off the '<', '>' and the tag name.
Then split the remaining by whitespace.
Split each part by '=' to get the attribute names and values.
Then find out the ones with attribute names 'alt' and 'src', then combine their values to get the file name.