How to get the last item by regex?
"Read the information failed.
111 a bcd
SAM Error Log not up supported"
I did this
111\s(.*)$
but it gives me
output = "a bcd sam"
But, I want output of regex which starts with 111 as
output = "sam" // for the line starts with 111
Also, how can i make change if there is any space before 111?
you can test this at https://regex-golang.appspot.com/assets/html/index.html
Note that 111\s(.*)$ matches 111 anywhere inside the string (the first occurrence) and then captures into Submatch 1 any 0+ characters up to the end of the string.
If there is a space before the last sam, you may use
^111.*\s(\S+)$
Pattern explanation:
^ - start of string
111 - a literal substring 111
.* - any characters, 0 or more, as many as possible up to the last...
\s - whitespace
(\S+) - Submatch 1 capturing one or more non-whitespace characters
$ - end of string.
If you want to get the line that starts with 111 (and any leading whitespace is allowed) and has some whitespace after which your submatch is located, you may consider either
(?m)^\s*111[^\r\n]*\s(\S+)$
(a . is replaced with [^\r\n] because in Go regex, a dot . matches any character incl. a newline), or - to make sure you only match horizontal whitespace:
(?m)^[^\S\r\n]*111[^\r\n]*[^\S\r\n](\S+)$
or even
(?m)^[^\S\r\n]*111[^\r\n]*[^\S\r\n](\S+)$
Explanation:
(?m)^ - start of the line (due to the (?m) MULTILINE modifier, the ^ now matches a line start and $ will match the line end)
[^\S\r\n]* - zero or more whitespaces except LF and CR (=horizontal whitespace)
111 - a literal 111
[^\r\n]* - any 0+ characters other than CR and LF as many as possible up to the last....
[^\S\r\n] - horizontal space
(\S+) - Submatch 1 capturing 1+ non-whitespace chars
$ - end of line (prepend with [^\S\r\n]* or [^\S\n]* to allow trailing horizontal whitespace)
Here is a Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
s := `Read SMART Log Directory failed.
111 a bcd sam
111 sam
SMART Error Log not supported`
rex := regexp.MustCompile(`(?m)^[^\S\r\n]*111[^\r\n]*[^\S\r\n](\S+)$`)
fmt.Printf("%#v\n", rex.FindAllStringSubmatch(s,-1))
}
Output: [][]string{[]string{" 111 a bcd sam", "sam"}, []string{" 111 sam", "sam"}}
This should work for you:
\s(\w+)$ // The output will be `sam`
This means capture the last string ($) after a whitespace.
You can use this:
^\s*1{3}.*\s(\S+)$
^ start of line
\s* 0 or more occurrences of space in the beginning
1{3} followed by three ones (i.e. 111)
.* followed by anything
\s followed by space
(\S+)$ ending with non-space characters. First capture group.
Related
I have the following regular expressions that extract everything after first two alphabets
^[A-Za-z]{2})(\w+)($) $2
now I want to the extract nothing if the data doesn't start with alphabets.
Example:
AA123 -> 123
123 -> ""
Can this be accomplished by regex?
Introduce an alternative to match any one or more chars from start to end of string if your regex does not match:
^(?:([A-Za-z]{2})(\w+)|.+)$
See the regex demo. Details:
^ - start of string
(?: - start of a container non-capturing group:
([A-Za-z]{2})(\w+) - Group 1: two ASCII letters, Group 2: one or more word chars
| - or
.+ - one or more chars other than line break chars, as many as possible (use [\w\W]+ to match any chars including line break chars)
) - end of a container non-capturing group
$ - end of string.
Your pattern already captures 1 or more word characters after matching 2 uppercase chars. The $ does not have to be in a group, and this $2 should not be in the pattern.
^[A-Za-z]{2})(\w+)$
See a regex demo.
Another option could be a pattern with a conditional, capturing data in group 2 only if group 1 exist.
^([A-Z]{2})?(?(1)(\w+)|.+)$
^ Start of string
([A-Z]{2})? Capture 2 uppercase chars in optional group 1
(? Conditional
(1)(\w+) If we have group 1, capture 1+ word chars in group 2
| Or
.+ Match the whole line with at least 1 char to not match an empty string
) Close conditional
$ End of string
Regex demo
For a match only, you could use other variations Using \K like ^[A-Za-z]{2}\K\w+$ or with a lookbehind assertion (?<=^[A-Za-z]{2})\w+$
I need one regex to capture a string up to a :, but the problem is that the : is not always there.
At this moment I am able to capture the groups when I have the : but not when I dont.
Not sure what I am doing wrong.
strings to capture
XXX 1 A:B (working)
XXX 1 A: (working)
XXX A (not working)
My regex:
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*)(?=\:)(?:.)*$
You can use
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*?)(?::.*)?$
See the regex demo. Details:
^ - start of string
(?P<grp1>[A-Z]{3,10}) - Group "grp1": three to ten uppercase letters
\s - a whitespace
(?P<grp2>.*?) - Group "grp2": any zero or more chars other than line break chars, as few as possible
(?::.*)? - an optional group matching any zero or more chars other than line break chars as many as possible
$- end of string.
Optionally match a single : after it
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>[^:\r\n]*)(?::[^:\r\n]*)?$
^ Start of string
(?P<grp1>[A-Z]{3,10}) Group grp1
\s Match a whitspace char
(?P<grp2>[^:\r\n]*) Group 2 grp2 Match any char except : or a newline
(?::[^:\r\n]*)? Optionally match a single : between optional chars other than : or a newline
$ End of string
Regex demo
I am looking to get to the next line of data within a text file. Here is an example of data from the file I am working with.
0519 ABF 244 AN A1 ADV STUFF 1.0 2.0 Somestuff 018 0155 MTWTh 10:30A 11:30A 20 20 0 6.7
Somestuff 011 0145 MTWTh 12:30P 1:30P
I have been trying to move to the next line by utilizing a variety of code such as.. carriage return \n using \s+ to replace the large space after 6.7. using m like so //m not finding a result just yet.
Here is some example code
while !regex_file.eof?
line = regex_file.gets.chomp
if line =~ ^.*?\d{4}\s+[A-Z]+\s+\d{3}.+$
puts line
end
end
Using https://rubular.com/ this particular set of code matches my desired output for the first line
0519 ABF 244 AN A1 ADV STUFF 1.0 2.0 Somestuff 018 0155 MTWTh 10:30A 11:30A 20 20 0 6.7
but does not match and haven't figured out how to match the next line.
Somestuff 011 0145 MTWTh 12:30P 1:30P
Try something like this: the \n captures the new line, and you can apply your own rules to capture anything you want which comes after \n - see below pls:
^.*\d{4}\s+[A-Z]+\s+\d{3}.+\n.*$
I've made an arbitrary assumption about the requirements for matching the second line. It is more demanding than the requirements for matching the first that are reflected in your regex, but I thought the additional complexity would have some educational value for you.
Here is a regular expression (untested) for matching both lines. Note you don't need ^.*? at the beginning of the regex and for the part of the regex that matches the first line .+$ adds nothing, so I removed it. After all you are just matching each line separately (line), and will display the entire line if there's a match. As well, the end-of-string anchor \z is more appropriate than the end-of-line anchor ($), though either can be used.
r = /
(?: # begin non-capture group
\d{4} # match 4 digits
\s+ # match > 0 whitespaces
[A-Z]+ # match > 0 uppercase letters
\s+ # match > 0 whitespaces
\d{3} # match 3 digits
| # or
\b # match a (zero-width) word break
[A-Z] # match 1 uppercase letter
[a-z]* # match >= 0 lowercase letter
\s+ # match > 0 whitespaces
\d{3} # match 3 digits
\s+ # match > 0 whitespaces
\d{4} # match 4 digits
\s+ # match > 0 whitespaces
[A-Za-z]+ # match > 0 letters
(?: # begin non-capture group
\s+ # match > 0 whitespaces
(?: # begin a non-capture group
0\d # match 0 followed by any digit
| # or
1[012] # match 1 followed by 0, 1 or 2
) # end non-capture group
: # match a colon
[0-5][0-9] # match 0-5 followed by 0-9
){2} # end non-capture group and execute twice
) # end non-capture group
/x # free-spacing regex definition mode
This regular expression is conventionally written as follows.
r = /(?:\d{4}\s+[A-Z]+\s+\d{3}|\b[A-Z][a-z]*\s+\d{3}\s+\d{4}\s+[A-Za-z]+(?:\s+(?:0\d|1[012]):[0-5][0-9]){2})/
You might go through the file putsing matching lines as follows:
File.foreach(fname) { |line| puts line if line.match? r }
See IO::foreach, which is a very convenient method for reading files line-by-line. Note IO class methods (such foreach) are commonly invoked with File as their receiver. That's OK, as File.superclass #=> IO, so File inherits those methods from IO.
When used without a block foreach returns an enumerator, which is often convenient as well. If, for example, you wished to return an array of matching lines (rather than puts them), you could write:
File.foreach(fname).with_object([]) do |line, arr|
arr << line.chomp if line.match? r
end
Your current regex:
^.*?\d{4}\s+[A-Z]+\s+\d{3}.+$
matches in this order:
the beginning of the line (^)
zero or more characters non-greedy .*?
four digits (\d{4})
one or more spaces (\s+)
one or more capital letters ([A-Z]+)
one or more spaces
three digits (\d{3})
one or more characters (.+)
the end of the line ($)
The second line of your file is:
Somestuff 011 0145 MTWTh 12:30P 1:30P
starts matching 0145 MTWT but then fails to match \d{3}
I would like to know how to capture text only if the beginning of a line matching a certain string... but i dont want to capture the begining string...
for example if i have the text:
BEGIN_TAG: Text To Capture
WRONG_TAG: Text Not to Capture
i want to capture:
Text To Capture
From the line that begin with BEGIN_TAG: not the line that begin with WRONG_TAG:
I know the how to select the line that begin with the desired text: ^BEGIN_TAG:\W?(.*)
but this selects the text "BEGIN_TAG:"... i dont want this only want the text after "BEGIN_TAG"
I am using PCRE regex
Instead of a positive lookbehind that does not allow unknown width patterns, you may use a match reset operator \K:
^BEGIN_TAG:\W?\K.*
See the regex demo
Details:
^ - in Sublime, start of a line
BEGIN_TAG: - a string of literal chars
\W? - 1 or 0 non-word chars
\K - the match reset operator that discards all text matched so far
.* - any 0+ chars other than linebreak characters (the rest of the line) that are the only chars that will be kept in the matched text.
You can use lookbehind. Then, the text in the lookbehind group isn't part of the whole match. You can see it as an anchor like \b, ^, etc.
You then get:
(?<=^BEGIN_TAG:\W)(\w.*)$
Explained:
(?<= # Positive lookbehind group
^ # Start of line / string
BEGIN_TAG: # Literal
\W # A non-word character ([^a-zA-Z_])
)
( # First and only matching group (probably not needed)
\w # A word character ([a-zA-Z_])
.* # Any character, any number of times
)
$ # End of line / string
I need to detect last digits in the string, as they are indexes for my strings. They may be 2^64, So it's not convenient to check only last element in the string, then try second... etc.
String may be like asdgaf1_hsg534, i.e. in the string may be other digits too, but there are somewhere in the middle and they are not neighboring with the index I want to get.
Here is a method using re.sub:
import re
input = ['asdgaf1_hsg534', 'asdfh23_hsjd12', 'dgshg_jhfsd86']
for s in input:
print re.sub('.*?([0-9]*)$',r'\1',s)
Output:
534
12
86
Explanation:
The function takes a regular expression, a replacement string, and the string you want to do the replacement on: re.sub(regex,replace,string)
The regex '.*?([0-9]*)$' matches the whole string and captures the number that precedes the end of the string. Parenthesis are used to capture parts of the match we are interested in, \1 refers to the first capture group and \2 the second ect..
.*? # Matches anything (non-greedy)
([0-9]*) # Upto a zero or more digits digit (captured)
$ # Followed by the end-of-string identifier
So we are replacing the whole string with just the captured number we are interested in. In python we need to use raw strings for this: r'\1'. If the string doesn't end with digits then a blank string with be returned.
twosixfour = "get_the_numb3r_2_^_64__18446744073709551615"
print re.sub('.*?([0-9]*)$',r'\1',twosixfour)
>>> 18446744073709551615
A simple regex can detect digits at the end of the string:
'\d+$'
$ matches the end of the string. \d+ matches one or more digits. The + operator is greedy by default, meaning it matches as many digits as possible. So this will match all of the digits at the end of the string.
If you want to use re.sub and make sure that there is at least a single digit present at the end of the line, you can use the quantifier + to match 1 or more digits \d+ to not remove the whole line if there are no digits present or no digits only at the end of the line.
^.*?(\d+)$
^ Start of line
.*? Match any char except a newline as least as possible (non greedy)
(\d+) Capture group 1, match 1+ digits
$ End of line
Or using a negative lookbehind
^.*(?<!\d)(\d+)$
^ Start of line
.* Match any char except a newline as much as possible
(?<!\d)(\d+) Assert no digits directly to the left, then capture 1+ digits in group 1
$ End of line
Regex demo
When using re.match, you can omit the ^ anchor and you might also use \A and \Z to asert the start and the end of the string.
Regex demo
import re
strings = ['asdgaf1_hsg534', 'asdfh23_hsjd12', 'dgshg_jhfsd86', 'test']
for s in strings:
print (re.sub(r".*?(\d+)$", r'\1',s))
Output
534
12
86
test
If there should be a non digit present before matching a digit as in this comment you could use a negated character class with a single capture group.
^.*[^\d\r\n](\d+)
^ Start of line
.* Match any char except a newline as much as possible
[^\d\r\n] Negated character class, match any char except a digit or a newline
(\d+) Capture group 1, match 1+ digits
Regex demo
To get the last digits in the string (not necessarily at the end of the string)
^.*?(\d+)[^\r\n\d]*$
^ Start of line
.*? Match any char except a newline as least as possible (non greedy)
(\d+) Capture group 1, match 1+ digits
[^\r\n\d]* Negated character class, match 0+ times any char except a newline or digit
$ End of line
Regex demo