regex : all strings containing "tomcat/logs" - regex

I want to know how to match all strings containing tomcat/logs ?
For example : /home/tomcat/logs, /etc/tomcat/logs, /home/folder/tomcat/logs
Thanks.
Edited :
I'm using this for excluding backup directories, I need just regular expression independent of any specific language.

You can do something like this: (This is in Python)
>>> import re
>>> string_to_find_in = '/home/tomcat/logs'
>>> m = re.search('(.*tomcat\/logs)', string_to_find_in)
>>> m.group(0)
'/home/tomcat/logs'

Related

Regex to extract text from request id

i have a log where a certain part is requestid in that text is there which i have to extract
Ex: RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book,
Can any1 pls help in extracting Book out of it
Consider string splitting instead
>>> s = "RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book"
>>> s.split("_")[-1]
'Book'
It seems that string splitting will be more efficient, if you must use regular expressions, here is an example.
#!/usr/bin/env python3
import re
print(
re.findall(r"^\w+_\d+\d+_(\w+)$",'RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book')
)
// output: ['Book']

Regex matches string, but doesn't group correctly [duplicate]

While matching an email address, after I match something like yasar#webmail, I want to capture one or more of (\.\w+)(what I am doing is a little bit more complicated, this is just an example), I tried adding (.\w+)+ , but it only captures last match. For example, yasar#webmail.something.edu.tr matches but only include .tr after yasar#webmail part, so I lost .something and .edu groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?
re module doesn't support repeated captures (regex supports it):
>>> m = regex.match(r'([.\w]+)#((\w+)(\.\w+)+)', 'yasar#webmail.something.edu.tr')
>>> m.groups()
('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
>>> m.captures(4)
['.something', '.edu', '.tr']
In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in #Li-aung Yip's answer.
You can fix the problem of (\.\w+)+ only capturing the last match by doing this instead: ((?:\.\w+)+)
This will work:
>>> regexp = r"[\w\.]+#(\w+)(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?(\.\w+)?"
>>> email_address = "william.adama#galactica.caprica.fleet.mil"
>>> m = re.match(regexp, email_address)
>>> m.groups()
('galactica', '.caprica', '.fleet', '.mil', None, None)
But it's limited to a maximum of six subgroups. A better way to do this would be:
>>> m = re.match(r"[\w\.]+#(.+)", email_address)
>>> m.groups()
('galactica.caprica.fleet.mil',)
>>> m.group(1).split('.')
['galactica', 'caprica', 'fleet', 'mil']
Note that regexps are fine so long as the email addresses are simple - but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.
This is what you are looking for:
>>> import re
>>> s="yasar#webmail.something.edu.tr"
>>> r=re.compile("\.\w+")
>>> m=r.findall(s)
>>> m
['.something', '.edu', '.tr']

How to specify string variables as unicode strings for pattern and text in regex matching?

>>> import re
>>> re.match(u'^[一二三四五六七]、', u'一、')
If the pattern and the text are stored in variables (for example, they were read from text files),
>>> myregex='^[一二三四五六七]、'
>>> mytext='一、'
How shall I specify myregex and mytext to re.match, in the same way as re.match(u'^[一二三四五六七]、', u'一、')? Thanks.
simply use
re.match(myregex.decode('utf-8'), mytext.decode('utf-8'))

Retrieving contents of a CSS Selector

I would like to extract "1381912680" from the following code:
[<abbr class="timestamp" data-utime="1381912680"></abbr>]
Using Python 2.7, this is what I currently have in my code to get to that stage:
s = soup.find_all("abbr", { "class" : "timestamp" })
print s
Should I use regex or can BS do it on its own?
EDIT
I tried to using regex but with no luck:
import re
regex = 'data-utime=\"(\d+)\"'
x = re.compile(regex)
x2 = re.findall(x, s)
print x2
I got: TypeError: expected string or buffer
Python reserves class so you use the format:
s= soup.find("abbr", class_="timestamp")
but... <abbr> is empty so use the above answers :)
You could use the below regex to extract the number within double quotes,
(?<=data-utime=\")[^\"]*
DEMO
Python code would be,
>>> import re
>>> str = '[<abbr class="timestamp" data-utime="1381912680"></abbr>]'
>>> m = re.findall(r'(?<=data-utime=\")[^\"]*', str)
>>> m
['1381912680']
Explanation:
(?<=data-utime=\") Regex engine sets a marker just after to the string data-utime="
[^\"]* Matches nay character zero or more times upto the literal "

Regular expression syntax in python

I try to write a python scripts to analys a data txt.I want the script to do such things:
find all the time data in one line, and compare them.but this is my first time to write RE syntax.so I write a small script at 1st.
and my script is:
import sys
txt = open('1.txt','r')
a = []
for eachLine in txt:
a.append(eachLine)
import re
pattern = re.compile('\d{2}:\d{2}:\d{2}')
for i in xrange(len(a)):
print pattern.match(a[i])
#print a
and the output is always None.
my txt is just like the picture:
what's the problem? plz help me. thx a lot.
and my python is python 2.7.2.my os is windows xp sp3.
Didn't you miss one of the ":" in you regex? I think you meant
re.compile('\d{2}:\d{2}:\d{2}')
The other problems are:
First, if you want to search in the hole text, use search instead of match. Second, to access your result you need to call group() in the match object returned by your search.
Try it:
import sys
txt = open('1.txt','r')
a = []
for eachLine in txt:
a.append(eachLine)
import re
pattern = re.compile('\d{2}:\d{2}:\d{2}')
for i in xrange(len(a)):
match = pattern.search(a[i])
print match.group()
#print a
I think you're missing the colons and dots in your regex. Also try using re.search or re.findall instead on the entire text. Like this:
import re, sys
text = open("./1.txt", "r").read() # or readlines() to make a list of lines
pattern = re.compile('\d{2}:\d{2}:\d{2}')
matches = pattern.findall(text)
for i in matches:
print(i);