My regular expression: https://regex101.com/r/oF7pM8/1
I get http://joxi.ru/J2b54KaI40bbwm
But, i have get all "num" values (all digits) and that they are in an array "num"
I have to get it:
name = house
num = [3 4 5 6 7 8 9]
What's wrong doing?
p.s.: python regular expression
The pattern must find all the numbers separately (array).
Does (?P<name>house)(?:\s(?P<num>(\d\s+)+)\d?)+? do the job ?
My additions to your original in bold: (?Phouse)(?:\s(?P(\d\s+)+)\d?)+?
Then the last digit is found, not all. I need all.
re.match finds all, but returns only the last one. Since you have to post-process the matches anyway in order to assign them to the Python variables name and num, make the pattern simple:
import re
test_string = 'house 3 44 555 6666 777 88 9'
m = re.match(r'(house)((\s\d+)+)', test_string)
name = m.group(1)
num = [int(s) for s in m.group(2).split()]
Related
I have a set of strings that have some letters, occasional one number, and then somewhere 2 or 3 numbers. I need to match those 2 or 3 numbers.
I have this:
\w*(\d{2,3})\w*
but then for strings like
AAA1AAA12A
AAA2AA123A
it matches '12' and '23' respectively, i.e. it fails to pick the three digits in the second case.
How do I get those 3 digits?
Here is how you would do it in Java.
the regex simply matches on a group of 2 or 3 digits.
the while loop uses find() to continue finding matches and the printing the captured match. The 1 and the 1223 are ignored.
String s= "AAA1AAA12Aksk2ksksk21sksksk123ksk1223sk";
String regex = "\\D(\\d{2,3})\\D";
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
prints
12
21
123
Looks like the correct answer would be:
\w*?(\d{2,3})\w*
Basically, making preceding expression lazy does the job
I want to craft a ruby regex that matches strings that contain exactly x integer numbers. I want the string to also be able to contain other words as well.
Example:
If x = 2, the following should match:
"There were 3 cars going over 45 miles per hour"
where the first integer is 3 and the second integer is 45.
I would write
str = "There were 3 cars going over 45 miles per hour"
str.scan(/\d+/).size == 2
#=> true
or
str.gsub(/\d+/).count == 2
#=> true
The latter has the advantage that str.gsub(/\d+) returns an enumerator, whereas str.scan(/\d+/) creates a temporary array.
See the form of String#gsub that takes an argument but no block, and Enumerable#count.
You may use the general regex pattern:
^\D*(?:\d+\D+){2}$
You may replace the {2} with however many numbers you expect in the string. Here is a working demo.
I have a list of strings each telling me after how many iterations an algorithm converged.
string_list = [
"Converged after 1 iteration",
"Converged after 20 iterations",
"Converged after 7 iterations"
]
How can I extract the number of iterations? The result woudl be [1, 20, 7]. I tried with regex. Apparently (?<=after )(.*)(?= iteration*) will give me anything in between after and iteration but then this doesn't work:
occursin(string_list[1], r"(?<=after )(.*)(?= iteration*)")
There's a great little Julia package that makes creating regexes easier called ReadableRegex, and as luck would have it the first example in the readme is an example of finding every integer in a string:
julia> using ReadableRegex
julia> reg = #compile look_for(
maybe(char_in("+-")) * one_or_more(DIGIT),
not_after = ".",
not_before = NON_SEPARATOR)
r"(?:(?<!\.)(?:(?:[+\-])?(?:\d)+))(?!\P{Z})"
That regex can now be broadcast over your list of strings:
julia> collect.(eachmatch.(reg, string_list))
3-element Vector{Vector{RegexMatch}}:
[RegexMatch("1")]
[RegexMatch("20")]
[RegexMatch("7")]
To extract information out of a regex, you want to use match and captures:
julia> convergeregex = r"Converged after (\d+) iteration"
r"Converged after (\d+) iteration"
julia> match(convergeregex, string_list[2]).captures[1]
"20"
julia> parse.(Int, [match(convergeregex, s).captures[1] for s in string_list])
3-element Vector{Int64}:
1
20
7
\d+ matches a series of digits (so, the number of iterations here), and the parantheses around it indicates that you want the part of the string matched by that to be placed in the results captures array.
You don't need the lookbehind and lookahead operators (?<=, ?=) here.
I already made it to get the information in single line. I have a list of information like:
1 1 838028476391 4 23 36 P 1/820-01 *
2 1 838028476490 4 23 36 P 1/820-17 *
3 1 838028474271 4 23 36 P 1/820-21 *
4 1 838028476292 4 23 36 P 1/820-21 *
5 1 838028474263 4 23 36 P 1/820-23 *
6 1 838028473802 4 23 36 P 1/820-21 *
And I need the 12 digits numbers from every line. I tried this code:
Dim re As String
Dim re18 As String
re18 = "(\d{12})"
Dim r3 As New RegExp
r3.Pattern = re18
r3.IgnoreCase = True
r3.MultiLine = True
If r3.Test(Body) Then
Dim m3 As MatchCollection
Set m3 = r3.Execute(Body)
If m3.Item(0).SubMatches.Count > 0 Then
Dim number
For j = 1 To m3.Count
Set number = m3.Item(j - 1)
MsgBox ("Number: " + number)
Next
End If
End If
I only get the first match - even if I debug the makro and view m3 in the watch - there is only 1 match. I also tried to use the quantifiers * or + after \d{12}
How do I get this RegEx working?
And regarding RegEx I have another question: If I want to match something AFTER a special word i would put the word in the pattern at the beginning and behind that the numbers or whatever I want. If I execute this regex - do I get the information or match INCLUDING the word I put at the beginning of my pattern?!
Like: "BUS \d{12}" and I only want the numbers as a result but know that BUS stands before the numbers...
You need to use the Global option, not Multiline. Multiline changes the behavior the anchors (^ and $) so they match the beginning and end of each line, not just the beginning and end of the whole text. Global is the option that tells it to find all the matches, not just the first one.
You probably don't need to use the SubMatches property either. Your regex has only the one capturing group, which captures the whole match. That means m3.SubMatches will only contain one Item, Item(0), and it will be exactly the same as m3.Item(0). (Notice that the index of the first group is 0, not 1 as you would expect from working with other regex tools.)
Your second question is where the SubMatches property comes in. If you wanted to find every 12-digit number that follows the word "BUS" you would use a regex like this:
BUS\s*(\d{12})
...and you would retrieve the number from each match like this:
Set m3 = r3.Execute(Body)
For Each myMatch in m3
MsgBox("Number: " + m3.SubMatches(0).Value)
Next
See this page for more info.
I'm trying to find the year from the date.
the dates are in the format
"Nov.-Dec. 2010"
"Aug. 30 2011-Sept. 3 2011"
"21-21 Oct. 1997"
my regular expression is
q = re.compile("\d\d\d\d")
a = q.findall(date)
so obviously in the list it has two items for a string like "Aug. 30 2011-Sept. 3 2011"
["2011","2011"]
i dont want a repetition, how do i do that?
You could use a backreference in the regex (see the syntax here):
(\d{4}).*\1
Or you could use the current regex and put this logic in the python code:
if a[0] == a[1]:
...
Use the following function :
def getUnique(date):
q = re.compile("\d\d\d\d")
output = []
for x in q.findall(date):
if x not in output:
output.append(x)
return output
It's O(n^2) though, with the repeated use of not in for each element of the input list
see How to remove duplicates from Python list and keep order?