Python regex within an F string? - regex

Currently I have the following:
for child in root:
if child.attrib['startDateTime'] == fr'2019-11-10T{test_time}:\d{{2}}':
print('Found')
Which isn't working. The goal here is to match the datetime string with my own string where test_time is formatted as 'HH:MM', and the seconds digits can be anything from 00 - 60.
Is this the correct approach for a problem like this? Or am I better off converting to datetime objects?

It's not the f-string that's the problem. The r prefix on a string doesn't mean "regex", it means "raw" - i.e. backslashes are taken literally. For regex, use the re module. Here's an example using Pattern.match:
import re
regex = fr'2019-11-10T{test_time}:\d{{2}}'
pattern = re.compile(regex)
for child in root:
if pattern.match(child.attrib['startDateTime']):
print('Found')

You can put a regexp in an f-string, but you need to use the re module to match with it, not ==.
if re.match(fr'2019-11-10T{test_time}:\d{{2}}', child.attrib['startDateTime']):
print('Found')

Related

Regex match everything not between a pair of characters

Suppose a string (representing elapsed time in the format HH:MM:ss) like this:
"123:59:00"
I want to match everything except the numbers for the minutes, i.e.: the regex should match the bold parts and not the number between colons:
"123: 59 :00"
In the example, the 59 should be the only part unmatched.
Is there any way to accomplish this with a js regex?
EDIT: I'm asking explicitly for a regex, because I'm using the Notion Formula API and can only use JS regex here.
You don't necessarily need to use RegEx for this. Use split() instead.
const timeString = "12:59:00";
const [hours, _, seconds] = timeString.split(":");
console.log(hours, seconds);
If you want to use Regex you can use the following:
const timeString = "12:59:00";
const matches = timeString.match(/(?<hours>^\d{2}(?=:\d{2}:))|(?<seconds>(?<=:\d{2}:)\d{2}$)/g);
console.log(matches);
// if you want to include the colons use this
const matchesWithColons = timeString.match(/(?<hours>^\d{2}:(?=\d{2}:))|(?<seconds>(?<=:\d{2}):\d{2}$)/g);
console.log(matchesWithColons);
You can drop the named groups ?<hours> and ?<seconds>.
Using split() might be the most canonical way to go, but here is a regex approach using match():
var input = "123:59:00";
var parts = input.match(/^[^:]+|[^:]+$/g);
console.log(parts);
If you want to also capture the trailing/leading colons, then use this version:
var input = "123:59:00";
var parts = input.match(/^[^:]+:|:[^:]+$/g);
console.log(parts);
Could also work
^([0-9]{2})\:[0-9]{2}\:([0-9]{2})$/mg

how to trim the specific lines starting and ending with character in a string in java

I have stored the multiline string in java as shown in code below it shows the output as :
aa
bb
hhh me $ hdddhd hhhdhhdhh
hrx
$
dddsss
I dont need the line starting with hhh me $ and in between lines and upto $.
I need to get output as
aa
bb
hrx
dddsss
I have tried like this on eclipse
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class dummyFile {
public static void main(String[] args) throws FileNotFoundException {
String line = new StringBuilder()
.append("aa\n\n")
.append("bb\n\n")
.append("hhh me $ hdddhd hhhdhhdhh\n\n")
.append("hrx\n\n")
.append("$\n\n")
.append("dddsss")
.toString();
System.out.println(line);
String pattern = "hhh me (.)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find())
{
System.out.println(m.group(1));
}
if (line.contains("hhh me "+ m.group(1)))
{
line.replace(
line.substring(
line.indexOf("banner mod " +m.group(1)),
line.lastIndexOf(m.group(1))+1
),
""
)
.replace("\n\n", "\n");
}
System.out.println(line);
}
}
Could some one please help ??
Phew, that was a fun one (if you're insane like me!)
(?!.*?\$.*?)^.+?(?:\n\n|$).*?
You'll need the regex options global and multiline. For most regex instances that's just a matter of formatting it like:
/(?!.*?\$.*?)^.+?(?:\n\n|$).*?/gm
However for Java there may be some options you need to supply, I'm not 100% sure.
That pattern will give you multiple matches, which you can glue back together with StringBuilder, for example.
If you REALLY want, I'll edit my answer and break down exactly what it's doing if you need me to.
This sounds a lot like homework that I don't want to do for you. But I'll throw some stuff up here that will hopefully help you figure it out.
Your regex isn't going to match what you want. (.) will capture a single character, and it won't capture new line characters. So you'll have to fix that. + matches one or more of the previous character set and * matches zero or more of the previous character. Seems like you also want to make sure you're matching from $ to $. You're working inside Java strings so you have to escape it.
Try something like this for your regex:
final String pattern = "hhh me \\$([a-zA-Z\\s\n\r]*)\\$";
Then in Eclipse or in Java Docs look around the Matcher class for some helpful methods to find/replace matches you've got (The stuff inside () in a regular expression).
Maybe something like Matcher.replaceFirst() will help.

can regex be used to index/slice parts of string?

So I have a list of serial numbers in the following format:
Serial Number: CN073GTT74445714892L
I was wondering if regex can be used to extract just the last 6 chars?
So in this case, it is 14892L
forget to mention, there is other unrelated text in the document, so how would i make so the match pattern is always after "serial Number: " ?
EDIT - this worked (?<=\s.{29}).{6}$
You can do it with a regex:
.{6}$
Demo
But you can do it without it, and it's an advisable solution. E.g. in Ruby:
"CN073GTT74445714892L"[-6..-1]
in Python:
In [4]: "CN073GTT74445714892L"[-6:]
Out[4]: '14892L'
Regex is ideally used to identify patterns. If it's only the last 6 digits you're interested in, then a normal string manipulation will work too.
e.g in Python, you could use:
str = "CN073GTT74445714892L"
str[-6:]

python regex for parsing filenames

I'm the worst for regex in general, but in python... I need help in fixing my regex for parsing filenames, e.g:
>>> from re import search, I, M
>>> x="/almac/data/vectors_puces_T12_C1_00_d2v_H50_corr_m10_70.mtx"
>>> for i in range(6):
... print search(r"[vectors|pairs]+_(\w+[\-\w+]*[0-9]{0,4})([_T[0-9]{2,3}_C[1-9]_[0-9]{2}]?)(_[d2v|w2v|coocc\w*|doc\w*]*)(_H[0-9]{1,4})(_[sub|co[nvs{0,2}|rr|nc]+]?)(_m[0-9]{1,3}[_[0-9]{0,3}]?)",x, M|I).group(i)
...
It gives the following output:
vectors_puces_T12_C1_00_d2v_H50_corr_m10_70
puces_T
12_C1_00
_d2v
_H50
_corr
However, what I need is
vectors_puces_T12_C1_00_d2v_H50_corr_m10_70
puces
T12_C1_00
_d2v
_H50
_corr
I don't know what exactly is wrong. Thank you
One problem is that \w would also match underscore which you want to be a delimiter between puces and T12_C1_00 in this case. Replace the \w with A-Za-z\-. Also, you should put the underscore between the appropriate saving groups:
(?:vectors|pairs)_([A-Za-z\-]+[0-9]{0,4})_([T[0-9]{2,3}_C[1-9]_[0-9]{2}]?)...
HERE^
Works for me:
>>> import re
>>> re.search(r"(?:vectors|pairs)_([A-Za-z\-]+[0-9]{0,4})_([T[0-9]{2,3}_C[1-9]_[0-9]{2}]?)(_[d2v|w2v|coocc\w*|doc\w*]*)(_H[0-9]{1,4})(_[sub|co[nvs{0,2}|rr|nc]+]?)(_m[0-9]{1,3}[_[0-9]{0,3}]?)",x, re.M|re.I).groups()
('puces', 'T12_C1_00', '_d2v', '_H50', '_corr', '_m10_70')
I've also replaced the [vectors|pairs] with (?:vectors|pairs) which is, I think, what you've actually meant - match either vectors or pairs literal strings, (?:...) is a syntax for a non-capturing group.
I'm not sure what your goal is, but you seem to be interested in what's between each underscore, so it may be simpler to split by it:
path, filename = os.path.split(x)
filename = filename.split('.')
fileparts = filename.split('_')
fileparts will then be this list:
vectors
puces
T12
C1
00
d2v
H50
corr
m10
70
And you can validate / inspect any part, e.g. if fileparts[0] == 'vectors' or tpart = fileparts[2:4]...

RegEx for a price in £

i have: \£\d+\.\d\d
should find: £6.95 £16.95 etc
+ is one or more
\. is the dot
\d is for a digit
am i wrong? :(
JavaScript for Greasemonkey
// ==UserScript==
// #name CurConvertor
// #namespace CurConvertor
// #description noam smadja
// #include http://www.zavvi.com/*
// ==/UserScript==
textNodes = document.evaluate(
"//text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /\£[0-9]\+.[0-9][0-9];
var replace = 'pling';
for (var i=0;i<textNodes.snapshotLength;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
when i change the regex to /Free for example it finds and changes. but i guess i am missing something!
Had this written up for your last question just before it was deleted.
Here are the problems you're having with your GM script.
You're checking absolutely every
text node on the page for some
reason. This isn't causing it to
break but it's unnecessary and slow.
It would be better to look for text
nodes inside .price nodes and .rrp
.strike nodes instead.
When creating new regexp objects in
this way, backslashes must be
escaped, ex:
var searchRE = new
RegExp('\\d\\d','gi');
not
var
searchRE = new RegExp('\d\d','gi');
So you can add the backslashes, or
create your regex like this:
var
searchRE = /\d\d/gi;
Your actual regular expression is
only checking for numbers like
##ANYCHARACTER##, and will ignore £5.00 and £128.24
Your replacement needs to be either
a string or a callback function, not
a regular expression object.
Putting it all together
textNodes = document.evaluate(
"//p[contains(#class,'price')]/text() | //p[contains(#class,'rrp')]/span[contains(#class,'strike')]/text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /£(\d+\.\d\d)/gi;
var replace = function(str,p1){return "₪" + ( (p1*5.67).toFixed(2) );}
for (var i=0,l=textNodes.snapshotLength;i<l;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
Changes:
Xpath now includes only p.price and p.rrp span.strke nodes
Search regular expression created with /regex/ instead of new RegExp
Search variable now includes target currency symbol
Replace variable is now a function that replaces the currency symbol with a new symbol, and multiplies the first matched substring with substring * 5.67
for loop sets a variable to the snapshot length at the beginning of the loop, instead of checking textNodes.snapshotLength at the beginning of every loop.
Hope that helps!
[edit]Some of these points don't apply, as the original question changed a few times, but the final script is relevant, and the points may still be of interest to you for why your script was failing originally.
You are not wrong, but there are a few things to watch out for:
The £ sign is not a standard ASCII character so you may have encoding issue, or you may need to enable a unicode option on your regular expression.
The use of \d is not supported in all regular expression engines. [0-9] or [[:digit:]] are other possibilities.
To get a better answer, say which language you are using, and preferably also post your source code.
£[0-9]+(,[0-9]{3})*\.[0-9]{2}$
this will match anything from £dd.dd to £d[dd]*,ddd.dd. So it can fetch millions and hundreds as well.
The above regexp is not strict in terms of syntaxes. You can have, for example: 1123213123.23
Now, if you want an even strict regexp, and you're 100% sure that the prices will follow the comma and period syntaxes accordingly, then use
£[0-9]{1,3}(,[0-9]{3})*\.[0-9]{2}$
Try your regexps here to see what works for you and what not http://tools.netshiftmedia.com/regexlibrary/
It depends on what flavour of regex you are using - what is the programming language?
some older versions of regex require the + to be escaped - sed and vi for example.
Also some older versions of regex do not recognise \d as matching a digit.
Most modern regex follow the perl syntax and £\d+\.\d\d should do the trick, but it does also depend on how the £ is encoded - if the string you are matching encodes it differently from the regex then it will not match.
Here is an example in Python - the £ character is represented differently in a regular string and a unicode string (prefixed with a u):
>>> "£"
'\xc2\xa3'
>>> u"£"
u'\xa3'
>>> import re
>>> print re.match("£", u"£")
None
>>> print re.match(u"£", "£")
None
>>> print re.match(u"£", u"£")
<_sre.SRE_Match object at 0x7ef34de8>
>>> print re.match("£", "£")
<_sre.SRE_Match object at 0x7ef34e90>
>>>
£ isn't an ascii character, so you need to work out encodings. Depending on the language, you will either need to escape the byte(s) of £ in the regex, or convert all the strings into Unicode before applying the regex.
In Ruby you could just write the following
/£\d+.\d{2}/
Using the braces to specify number of digits after the point makes it slightly clearer