I have a string path as:
"Z:\results\cfg3\clear1"
I need a python string method to capture whatever Number comes after cfg but before the \ . Note that string before \cfg\ and after it could change, so I cannot use string length. So, basically, with following 2 versions
"Z:\results\cfg3\clear1"
"Z:\results1\enhanced\cfg1\clear2\final"
the script should return
cfg3 and cfg1 as the answers.
Any ideas using regular expression?
sedy
>>> import re
>>> re.findall(r'.*(cfg\d+).*', "Z:\results\cfg3\clear1")
['cfg3']
Related
i have a log where a certain part is requestid in that text is there which i have to extract
Ex: RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book,
Can any1 pls help in extracting Book out of it
Consider string splitting instead
>>> s = "RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book"
>>> s.split("_")[-1]
'Book'
It seems that string splitting will be more efficient, if you must use regular expressions, here is an example.
#!/usr/bin/env python3
import re
print(
re.findall(r"^\w+_\d+\d+_(\w+)$",'RES_1621480647_49610052479341623017223137119508459972977816017376903362_Book')
)
// output: ['Book']
I try to extract words from the string content like
export {AbcClient} from ...
export {AdcClient} from ..
How to use regular expression to get array of string? In this example is[AbcClient, AdcClient]
Thanks
Most programming languages have the ability to do a regex find all. In Python, we can try:
inp = """export {AbcClient} from ...
export {AdcClient} from .."""
matches = re.findall(r'\bexport \{(.*?)\}', inp)
print(matches)
This prints:
['AbcClient', 'AdcClient']
Currently I have the following:
for child in root:
if child.attrib['startDateTime'] == fr'2019-11-10T{test_time}:\d{{2}}':
print('Found')
Which isn't working. The goal here is to match the datetime string with my own string where test_time is formatted as 'HH:MM', and the seconds digits can be anything from 00 - 60.
Is this the correct approach for a problem like this? Or am I better off converting to datetime objects?
It's not the f-string that's the problem. The r prefix on a string doesn't mean "regex", it means "raw" - i.e. backslashes are taken literally. For regex, use the re module. Here's an example using Pattern.match:
import re
regex = fr'2019-11-10T{test_time}:\d{{2}}'
pattern = re.compile(regex)
for child in root:
if pattern.match(child.attrib['startDateTime']):
print('Found')
You can put a regexp in an f-string, but you need to use the re module to match with it, not ==.
if re.match(fr'2019-11-10T{test_time}:\d{{2}}', child.attrib['startDateTime']):
print('Found')
I am trying to extract some questions from a web site using BeautifulSoup, and want to use regular expression to get these questions from the web. Is my regular expression incorrect? And how can I combine soup.find_all with re.compile?
I have tried the following:
from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen
import urllib
import re
url = "https://www.sanfoundry.com/python-questions-answers-variable-names/"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "lxml")
a = soup.find_all("p")
for m in a:
print(m.get_text())
Now I have some text containing the questions like "1. Is Python case sensitive when dealing with identifiers?". I want to use r"[^.!?]+\?" to filter out the unwanted text, but I have the following error:
a = soup.find_all("p" : re.compile(r'[^.!?]+\?'))
a = soup.find_all("p" : re.compile(r'[^.!?]+\?'))
^
SyntaxError: invalid syntax
I checked my regular expression on https://regex101.com, it seems right. Is there a way to combine the regular expression and soup.find_all together?
One of methods to find p elements containig a ? it to
define a criterion function:
def criterion(tag):
return tag.name == 'p' and re.search('\?', tag.text)
and use it in find_all:
pars = soup.find_all(criterion)
But you want to print only questions, not the whole paragraphs
from pars.
To match these questions, define a pattern:
pat = re.compile(r'\d+\.\s[^?]+\?')
(a sequence of digits, a dot, a space, then a sequence of chars other
than ? and finally a ?).
Note that in general case one paragraph may contain multiple
questions. So the loop processing the paragraphs found should:
use findall to find all questions in the current paragraph
(the result is a list of found strings),
print also all of them, in separate lines, so you should
use join with a \n as a separator.
So the whole loop should be:
for m in pars:
questions = pat.findall(m.get_text())
print('\n'.join(questions))
Not a big regex fan, so tried this:
for q in a:
for i in q:
if '?' in i:
print(i)
Output:
1. Is Python case sensitive when dealing with identifiers?
2. What is the maximum possible length of an identifier?
3. Which of the following is invalid?
4. Which of the following is an invalid variable?
5. Why are local variable names beginning with an underscore discouraged?
6. Which of the following is not a keyword?
8. Which of the following is true for variable names in Python?
9. Which of the following is an invalid statement?
10. Which of the following cannot be a variable?
I have been working on a python project while I don't have that much experience so can you tell me please if I have this string : Synset'dog.n.01' and I want to extract the string dog only what should I do ?
I mean just to extract any string between Synset' and .n.01'
I suggest to use re (regex)
import re
s = "Synset'dog.n.01'"
result = re.search("Synset'(.*).n.01'", s)
print result.group(1)