I am simply trying to take the following text:
Password length (length ordered)
5 = 1 (0.37%)
6 = 1 (0.37%)
7 = 1 (0.37%)
8 = 157 (58.58%)
9 = 55 (20.52%)
10 = 33 (12.31%)
11 = 12 (4.48%)
12 = 6 (2.24%)
13 = 2 (0.75%)
And find every new line that exists between Password Length and \n\n. Here's what I was currently doing
data[/(?<=Password length)(.*?)(?=\n\n)/m]
but that captures (length ordered) in the first line.
I have tried to do something like this:
44] pry(main)> data[/(?<=Password length.*?\n)(.*?)(?=\n\n)/m]
(eval):2: invalid pattern in look-behind: /(?<=Password length.*?\n)(.*?)(?=\n\n)/m
To basically capture everything after Password length up to the new line, but as you can see above I get an error about the invalid pattern in look-behind.
What should I be doing instead of this to fix this?
You can use
data[/Password length.*\R(?m:(.*?))\R{2}/, 1]
See the Rubular demo. Details:
Password length - a literal string
.* - the rest of the line
\R - a line break sequence
(?m:(.*?)) - An inline modifier group where . matches any char including line break chars, capturing group 1 matching any zero or more chars but as few as possible
\R{2} - double line break sequence.
The 1 argument returns the value inside the first capturing group only (see the str[regexp, capture] → new_str or nil reference).
An alternative:
data[/Password length.*\R\K.*(?:\R(?!\R).*)*/]
See this Rubular demo. Details:
Password length.*\R - Password length, the rest of the line and a line break sequence
\K - match reset operator, it removes all text matched so far from the match memory buffer
.* - a line, any zero or more chars other than line break chars as many as possible
(?:\R(?!\R).*)* - zero or more lines that do not end with double line break sequence where \R(?!\R).* matches a line break not immediately followed with another line break sequence, and .* matches the rest of the line.
Related
I have a one line string that looks like this:
{"took":125,"timed_out":false,"_shards":{"total":10,"successful":10,"skipped":0,"failed":0}}{"took":365,"timed_out":false,"_shards":{"total":10,"successful":10,"skipped":0,"failed":0}}{"took":15,"timed_out":false,"_shards":{"total":10,"successful":10,"skipped":0,"failed":0}}
I would like to extract all the numbers after the "took" part, so in my case the output would look like this:
125
365
15
What I've tried so far is using took":(\d{1,6}),"(.*) as a regex. But since its a one line string, it only extracts the first occurence and ignores the others.
You can use
Find What: took":(\d+)|(?s)(?:(?!took":\d).)*
Replace With: (?{1}$1\n)
Details:
took": - literal text
(\d+) - one or more digits captured into Group 1
| - or
(?s) - set the DOTALL mode on (. matches line break chars now)
(?:(?!took":\d).)* - any single char, zero or more times, as many as possible, that does not start a took": + digit char sequence.
The (?{1}$1\n) conditional replacement pattern replaces this way:
(?{1} - if Group 1 is matched
$1\n - replace the match with Group 1 and a newline
) - else, replace with an empty string.
I am struggling with finding the regex to find a word that must be between two others.
in simple, my constraints are:
must begin with the word line con
must end with the next appearence of the word line
between these two must have the word session-timeout
may also contain other words between line con and line.
I wish to match on any block of text starting with line con and ending with the next instance of the word line, however I need them to only match if the word session-timeout is between them.
bonus points if you can tell me how to match for any number larger than 10 after session-timeout (eg. session-timeout 12 would match)
an example of where I would want it to match is:
line con 0
session-timeout 14
stopbits 1
line aux 0
stopbits 1
line vty 0 4
However, this should not match
line con 0
session-timeout 8
stopbits 1
line aux 0
stopbits 1
line vty 0 4
session-timeout 13
line vty 0 5
so far I have the regex expression (line con)(\s|\S)+?(session-timeout ){1}([0-9])(\s|\S)+?(line), however if it does not match a session-timeout within the terms, it simply ignores the first line, which is where I want it to stop looking.
Any help would be massivley appreciated!
You can use
(line con)(?:(?!line con)[\s\S])+?(session-timeout\s+)([1-9][0-9]+)[\s\S]+?(line)
See the regex demo. Details:
(line con) - Group 1: line con string
(?:(?!line con)[\s\S])+? - any char, one or more occurrences but as few as possible, that does not start line con char sequence
(session-timeout\s+) - Group 2: session-timeout string and one or more whitespaces
([1-9][0-9]+) - Group 3: a number from 10 and larger (if you want to allow any leading zeros, append 0* before [1-9])
[\s\S]+? - any one or more chars, as few as possible
(line) - Group 4: line.
Adjust the capturing groups as per your requirements.
I'm trying to write a regex which will match all or part of the first part of a query and all of the second part.
The text comes in the form of:
=S
==S-S
===DC1
===DC3
====REF
=====SUB=F
AB123-05.abc
It needs to match a line beginning with a = symbol. There may be one to five lines each starting with one to five = symbols. The first part is to match the = symbol and the second part whhatever is after it.
The third part needs to match the last line:
AB123-05.abc
I started out with the regex:
([=]{1,5})(.+)
Which correctly matches the = symbols and whatever is after it but did not match the last line.
So, I changed it to:
([=]{1,5})(.+)(\n[\S]+\.abc)
which now only matches the last line and the line above. You can see what I have done here...
https://regex101.com/r/VtW9PY/2/
So basically the first match is to be line beginning with a = symbol (s) and the second match is the remainder of the line after the = symbol(s).
the third match is the last line. There may not be any lines beginning with a - in which case the last line becomes the first match.
I'm doing this in VBA.
Make sure you set RegExp.Multiline = True and then use the following regex:
^(?:={1,5}(.+)|(.*)$(?!.))
See the regex demo.
Details
^ - start of a line (here)
(?: - start of a non-capturing group so that ^ could apply to both the alternatives:
={1,5}(.+) - 1 to 5 = chars and then any 1+ chars other than a line break char as many as possible captured into Group 1 (match.Submatches(0))
| - or
(.*)$(?!.) - any 0+ chars other than a line break char as many as possible captured into Group 2 (match.Submatches(1)) up to the end of the text. Note $ can't be used since it will match the end of any line due to the RegExp.Multiline = True.
) - end of the group.
I have a pipe delimited file which has a line
H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||
I want to substitute the date (28092017) with a regex "[0-9]{8}" if the first character is "H"
I tried the following example to test my understanding where Im trying to subtitute "a" with "i".
str = "|123||a|"
str.gsub /\|(.*?)\|(.*?)\|(.*?)\|/, "\|\\1\|\|\\1\|i\|"
But this is giving o/p as
"|123||123|i|"
Any clue how this can be achieved?
You may replace the first occurrence of 8 digits inside pipes if a string starts with H using
s = "H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||"
p s.gsub(/\A(H.*?\|)[0-9]{8}(?=\|)/, '\100000000')
# or
p s.gsub(/\AH.*?\|\K[0-9]{8}(?=\|)/, '00000000')
See the Ruby demo. Here, the value is replaced with 8 zeros.
Pattern details
\A - start of string (^ is the start of a line in Ruby)
(H.*?\|) - Capturing group 1 (you do not need it when using the variation with \K): H and then any 0+ chars as few as possible
\K - match reset operator that discards the text matched so far
[0-9]{8} - eight digits
(?=\|) - the next char must be |, but it is not added to the match value since it is a positive lookahead that does not consume text.
The \1 in the first gsub is a replacement backreference to the value in Group 1.
In reference to a previous question
Python data extract from text file - script stops before expected data match
How can I capture a match and the previous two lines?
I tried this but get:
unterminated subpattern at position 0 (line 1, column 1)
output = re.findall('(.*\r\n{2}random data.',f.read(), re.DOTALL)
You may use
re.findall(r'(?:.*\r?\n){2}.*random data.*', s)
Note you can't use re.DOTALL or .* will match up to the end of the input and you will only get the last occurrence.
See the Python demo
Pattern details
(?:.*\r?\n){2} - 2 occurrences of a sequence of
.* - any 0+ chars other than line break chars, as many as possible (a line)
\r?\n - a line ending (CRLF or LF)
.*random data.* - a line containing random data substring.
See the regex demo.