simplify my multiline regex - match plus next x amount of lines - regex

I have the following Haml code:
%ul#sub-nav
%li.li1
%a{:href => "/dsadasd/"} dasdasd
%li.li2
%a.selected{:href => "/asdadasd"} Tasdada /asdas
%li.li3
%a{:href => "/dasd/"} asdasd
%li.li4
%a{:href => "/wdasn/"} das
I seem to be able to match this with the following repetitive regex - %ul#sub-nav\n.*\n^.*\n^.*\n^.*\n^.*\n^.*\n^.*\n^.* in intellij's rubymine ide.
This looks way too repetitive. Help appreciated.

If you want to match %ul#sub-nav plus the eight following lines, this should do:
%ul#sub-nav(\n.*$){8}

Related

Combine multiple regular expressions in to one

given the following sample strings, how can the highlighted parts be extracted using regex?
x => x.One.Two[0].Three.get_Item(0).Four[0].Five
x => x.One.Two[0].Three.get_Item(0).Four[0].Five.get_Item(0)
x => x.One.Two[0].Three.get_Item(0).Four[0].Five[0]
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five)
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five.get_Item(0))
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five[0])
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five, Object)
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five.get_Item(0), Object)
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five[0], Object)
so far, i was able to come up with a couple of different regex patterns but ideally i'd like to have a single regex that will handle all the above cases.
this is what i have so far:
\.(.+)(?<!\d)\)$ and \.(.+), Object\) and \.(.+)
here's the sample data to play with: https://regex101.com/r/jxqsQl/2
appreciate any help you can provide...
This regex will do what you want. It looks for multiple groups of a . and a word, followed optionally by digits enclosed in [] or ():
(?:\.\w+(?:[[(]\d+[)\]])?)+
Demo on regex101

How to build regex for following, i have done half?

http://con.google.com/video/daTUTOEl-OMBccnY9.mp4
I need to check for URL which has all this attribute
/https?:\/\/con\.google\.com\/video\//
With above Regex I can get to video/ but after that I need to check if its seperated by a - or not daTUTOEl => 9characters both number and string
OMBccnY9 => 9characters both number and string.
Those are 8 characters not 9,
Try following regex
https?:\/\/con\.google\.com\/video\/(?:\w+){8}-(?:\w+){8}.mp4
See it working here
https://regex101.com/r/IvNKc1/1

Does using multiline in logstash filter print out the data?

I am trying to use multiline to combine a number of of lines in a logfile with the same starting symbol. In my case the starting symbol is #S#. it would look something like this:
#S# dsifj sdfojosf sfjosdfoisdjf
#S# dsfj sdojifoig dfpkgokdfgk 89s7fsjlk sdf
#S# lsdffm dg;;dfgl djfg 930`e`fsd
...
...
...
Note: The random character is just use to imitate the content of the actual log.
The following is what is wrote for the multiline startment:
multiline {
type => "table_init"
pattern => "#S#"
negate => true
what => "next"
}
I am assuming what I wrote does combine them as one line, but I am wondering if this prints out the line or do I need to use gork to parse the whole entire line before it prints. Any thoughts and inputs will be helpful. Thank you.
If you are trying to match up all lines that DO match "#S#", then you should have negate set to false. You use negate when you want to get all lines that DO NOT match a certain pattern.
As for your actual question, multiline takes all the relevant lines and puts them into the "message" field, including newline characters (\n, and I assume \r if you are running Windows as well though I have never checked). You can then grok this entire message to get the data you want.
So if you set up your output like so:
output { stdout { codec => rubydebug } }
You should find that the outputted message will read something like:
"message" = "#S# dsifj sdfojosf sfjosdfoisdjf \n#S# dsfj sdojifoig dfpkgokdfgk 89s7fsjlk sdf\n#S# lsdffm dg;;dfgl djfg 930`e`fsd
if you set up your multiline filter correctly.
Hope this helps!

Parse labeled param strings with Regex

Can anyone help me with this one?
My objective here is to grab some info from a text file, present the user with it and ask for values to replace that info so to generate a new output. So I thought of using regular expressions.
My variables would be of the format: {#<num>[|<value>]}.
Here are some examples:
{#1}<br>
{#2|label}<br>
{#3|label|help}<br>
{#4|label|help|something else}<br><br>
So after some research and experimenting, I came up with this expression: \{\#(\d{1,})(?:\|{1}(.+))*\}
which works pretty well on most of the ocasions, except when on something like this:
{#1} some text {#2|label} some more text {#3|label|help}
In this case variables 2 & 3 are matched on a single occurrence rather than on 2 separate matches...
I've already tried to use lookahead commands for the trailing } of the expression, but I didn't manage to get it.
I'm targeting this expression for using into C#, should that further help anyone...
I like the results from this one:
\{\#(\d+)(?:|\|(.+?))\}
This returns 3 groups. The second group is the number (1, 2, 3) and the third group is the arguments ('label', 'label|help').
I prefer to remove the * in favor of | in order to capture all the arguments after the first pipe in the last grouping.
A regular expression which can be used would be something like
\{\#(\d+)(?:\|([^|}]+))*\}
This will prevent reading over any closing }.
Another possible solution (with slightly different behaviour) would be to use a non-greedy matcher (.+?) instead of the greedy version (.+).
Note: I also removed the {1} and replaced {1,} with + which are equivalent in your case.
Try this:
\{\#(\d+)(?:\|[^|}]+)*\}
In C#:
MatchCollection matches = Regex.Matches(mystring,
#"\{\#(\d+)(?:\|[^|}]+)*\}");
It prevents the label and help from eating the | or }.
match[0].Value => {#1}
match[0].Groups[0].Value => {#1}
match[0].Groups[1].Value => 1
match[1].Value => {#2|label}
match[1].Groups[0].Value => {#2|label}
match[1].Groups[1].Value => 2
match[2].Value => {#3|label|help}
match[2].Groups[0].Value => {#3|label|help}
match[2].Groups[1].Value => 3

Regex Help, How do I make order of expressions not matter?

I can't figure out how to get the order of the incoming string parameters (price,merchant,category) will not matter to the regex. My regex matches the parts of the string but not the string as a whole. I need to be able to add \A \Z to it.
Pattern:
(,?price:(;?(((\d+(\.\d+)?)|min)-((\d+(\.\d+)?)|max))|\d+)+){0,1}(,?merchant:\d+){0,1}(,?category:\d+){0,1}
Sample Strings:
price:1.00-max;3-12;23.34-12.19,category:3
merchant:25,price:1.00-max;3-12;23.34-12.19,category:3
price:1.00-max;3-12;23.34-12.19,category:3,merchant:25
category:3,price:1.00-max;3-12;23.34-12.19,merchant:25
Note: I'm going to add ?: to all my groups after I get it working.
You should probably just parse this string through normal parsing. Split it at the commas, then split each of those pieces into two by the colons. You can store validation regexes if you'd like to check each of those inputs individually.
If you do it through regex, you'll probably have to end up saying "this combination OR this combination OR this combination", which will hurt real bad.
You have three options:
You can enumerate all the possible orders. For 3 variables there are 6 possibilities. Obviously this doesn't scale;
You can accept possible duplicates; or
You can break the string up and then parse it.
(2) means something like:
/(\b(price|category|merchant)=(...).*?)*/
The real problem you're facing here is that you're trying to parse what is essentially a non-regular language with a regular expression. A regular expression describes a DFSM (deterministic finite state machine) or DFA (deterministic finite automaton). Regular languages have no concept of state so the expression can't "remember" what else there has been.
To get to that you have to add a "memory" usually in the form of a stack, which yields a PDA (pushdown automaton).
It's exactly the same problem people face when they try and parse HTML with regexes and get stuck on tag nesting issues and similar.
Basically you accept some edge conditions (like repeated values), split the string by comma and then parse or you're just using the wrong tool for the job.
How about don't try and do it all with one Cthulhugex?
/price:([^,]*)/
/merchant:([^,]*)/
/category:([^,]*)/
$string=<<<EOF
price:1.00-max;3-12;23.34-12.19,category:3
merchant:25,price:1.00-max;3-12;23.34-12.19,category:3
price:1.00-max;3-12;23.34-12.19,category:3,merchant:25
category:3,price:1.00-max;3-12;23.34-12.19,merchant:25
EOF;
$s = preg_replace("/\n+/",",",$string);
$s = explode(",",$s);
print_r($s);
output
$ php test.php
Array
(
[0] => price:1.00-max;3-12;23.34-12.19
[1] => category:3
[2] => merchant:25
[3] => price:1.00-max;3-12;23.34-12.19
[4] => category:3
[5] => price:1.00-max;3-12;23.34-12.19
[6] => category:3
[7] => merchant:25
[8] => category:3
[9] => price:1.00-max;3-12;23.34-12.19
[10] => merchant:25
)