given the following sample strings, how can the highlighted parts be extracted using regex?
x => x.One.Two[0].Three.get_Item(0).Four[0].Five
x => x.One.Two[0].Three.get_Item(0).Four[0].Five.get_Item(0)
x => x.One.Two[0].Three.get_Item(0).Four[0].Five[0]
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five)
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five.get_Item(0))
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five[0])
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five, Object)
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five.get_Item(0), Object)
x => Convert(x.One.Two[0].Three.get_Item(0).Four[0].Five[0], Object)
so far, i was able to come up with a couple of different regex patterns but ideally i'd like to have a single regex that will handle all the above cases.
this is what i have so far:
\.(.+)(?<!\d)\)$ and \.(.+), Object\) and \.(.+)
here's the sample data to play with: https://regex101.com/r/jxqsQl/2
appreciate any help you can provide...
This regex will do what you want. It looks for multiple groups of a . and a word, followed optionally by digits enclosed in [] or ():
(?:\.\w+(?:[[(]\d+[)\]])?)+
Demo on regex101
Related
I have got below string and I need to Get all the values Between Pizzahut: and |.
ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|
I have got RegExpression .scan(/(?<=Pizzahut:)([.*\s\S]+)(?=\|)/) but it fetches
"j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|"
Result should be: 34532jdhgj,3242237,67688873rg
You can use
s='ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|'
p s.scan(/Pizzahut:([^|]+)/).flatten
# => ["j34532jdhgj", "3242237", "67688873rg"]
See this Ruby demo and the Rubular demo.
It does not look possible that you have Pizzahut as a part of another word, but it is possible, use a version with a word boundary, /\bPizzahut:([^|]+)/.
The Pizzahut:([^|]+) matches Pizzahut: and then captures into Group 1 any one or more chars other than a pipe (with ([^|]+)).
Note that String#scan returns the captures only if a pattern contains a capturing group, so you do not need to use lookarounds.
I'm not sure why you're jumping to a regex solution here; that input string clearly looks structured to me, and you would probably do better by splitting it on the delimiters to convert it into a more convenient data structure.
Something like this:
input = "ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg"
converted_input = input
.split('|') #=> ["ABC:2fg45rdvsg", "Pizzahut:j34532jdhgj", ... ]
.map { |pair| pair.split(':') } #=> [["ABC", "2fg45rdvsg"], ["Pizzahut", "j34532jdhgj"], ... ]
.group_by(&:first) #=> {"ABC"=>[["ABC", "2fg45rdvsg"]], "Pizzahut"=>[["Pizzahut", "j34532jdhgj"], ... ], "Dominos"=>[["Dominos", "3424232"]], ... ]
.transform_values { |v| v.flat_map(&:last) }
(The above series of transformations is just one possible way; you could probably come up with a dozen similar alternative steps to convert this input into the same hash shown below! For example, by using reduce or even the CSV library.)
Which gives you the final result:
converted_input = {
"ABC" => ["2fg45rdvsg"],
"Pizzahut" => ["j34532jdhgj", "3242237", "67688873rg"],
"Dominos" => ["3424232"],
"Wendys" => ["3462783"]
}
Now that the data is formatted conveniently, obtaining data like your original request becomes trivial:
converted_input["Pizzahut"].join(',') #=> "j34532jdhgj,3242237,67688873rg"
(Although quite likely it would be more suitable to leave it as an Array, not a comma-separated String!!)
I have a list of strings, and need to build the regular expression from them, using Regexp#union. I need the resulting pattern to be case insensitive.
The #union method itself does not accept options/modifiers, hence I currently see two options:
strings = %w|one two three|
Regexp.new(Regexp.union(strings).to_s, true)
and/or:
Regexp.union(*strings.map { |s| /#{s}/i })
Both variants look a bit weird.
Is there an ability to construct a case-insensitive regular expression by using Regexp.union?
The simple starting place is:
words = %w[one two three]
/#{ Regexp.union(words).source }/i # => /one|two|three/i
You probably want to make sure you're only matching words so tweak it to:
/\b#{ Regexp.union(words).source }\b/i # => /\bone|two|three\b/i
For cleanliness and clarity I prefer using a non-capturing group:
/\b(?:#{ Regexp.union(words).source })\b/i # => /\b(?:one|two|three)\b/i
Using source is important. When you create a Regexp object, it has an idea of the flags (i, m, x) that apply to that object and those get interpolated into the string:
"#{ /foo/i }" # => "(?i-mx:foo)"
"#{ /foo/ix }" # => "(?ix-m:foo)"
"#{ /foo/ixm }" # => "(?mix:foo)"
or
(/foo/i).to_s # => "(?i-mx:foo)"
(/foo/ix).to_s # => "(?ix-m:foo)"
(/foo/ixm).to_s # => "(?mix:foo)"
That's fine when the generated pattern stands alone, but when it's being interpolated into a string to define other parts of the pattern the flags affect each sub-expression:
/\b(?:#{ Regexp.union(words) })\b/i # => /\b(?:(?-mix:one|two|three))\b/i
Dig into the Regexp documentation and you'll see that ?-mix turns off "ignore-case" inside (?-mix:one|two|three), even though the overall pattern is flagged with i, resulting in a pattern that doesn't do what you want, and is really hard to debug:
'foo ONE bar'[/\b(?:#{ Regexp.union(words) })\b/i] # => nil
Instead, source removes the inner expression's flags making the pattern do what you'd expect:
/\b(?:#{ Regexp.union(words).source })\b/i # => /\b(?:one|two|three)\b/i
and
'foo ONE bar'[/\b(?:#{ Regexp.union(words).source })\b/i] # => "ONE"
You can build your patterns using Regexp.new and passing in the flags:
regexp = Regexp.new('(?:one|two|three)', Regexp::EXTENDED | Regexp::IGNORECASE) # => /(?:one|two|three)/ix
but as the expression becomes more complex it becomes unwieldy. Building a pattern using string interpolation remains more easy to understand.
You've overlooked the obvious.
strings = %w|one two three|
r = Regexp.union(strings.flat_map do |word|
len = word.size
(2**len).times.map { |n|
len.times.map { |i| n[i]==1 ? word[i].upcase : word[i] } }
end.map(&:join))
"'The Three Little Pigs' should be read by every building contractor" =~ r
#=> 5
Can anyone help me with this one?
My objective here is to grab some info from a text file, present the user with it and ask for values to replace that info so to generate a new output. So I thought of using regular expressions.
My variables would be of the format: {#<num>[|<value>]}.
Here are some examples:
{#1}<br>
{#2|label}<br>
{#3|label|help}<br>
{#4|label|help|something else}<br><br>
So after some research and experimenting, I came up with this expression: \{\#(\d{1,})(?:\|{1}(.+))*\}
which works pretty well on most of the ocasions, except when on something like this:
{#1} some text {#2|label} some more text {#3|label|help}
In this case variables 2 & 3 are matched on a single occurrence rather than on 2 separate matches...
I've already tried to use lookahead commands for the trailing } of the expression, but I didn't manage to get it.
I'm targeting this expression for using into C#, should that further help anyone...
I like the results from this one:
\{\#(\d+)(?:|\|(.+?))\}
This returns 3 groups. The second group is the number (1, 2, 3) and the third group is the arguments ('label', 'label|help').
I prefer to remove the * in favor of | in order to capture all the arguments after the first pipe in the last grouping.
A regular expression which can be used would be something like
\{\#(\d+)(?:\|([^|}]+))*\}
This will prevent reading over any closing }.
Another possible solution (with slightly different behaviour) would be to use a non-greedy matcher (.+?) instead of the greedy version (.+).
Note: I also removed the {1} and replaced {1,} with + which are equivalent in your case.
Try this:
\{\#(\d+)(?:\|[^|}]+)*\}
In C#:
MatchCollection matches = Regex.Matches(mystring,
#"\{\#(\d+)(?:\|[^|}]+)*\}");
It prevents the label and help from eating the | or }.
match[0].Value => {#1}
match[0].Groups[0].Value => {#1}
match[0].Groups[1].Value => 1
match[1].Value => {#2|label}
match[1].Groups[0].Value => {#2|label}
match[1].Groups[1].Value => 2
match[2].Value => {#3|label|help}
match[2].Groups[0].Value => {#3|label|help}
match[2].Groups[1].Value => 3
I have the following Haml code:
%ul#sub-nav
%li.li1
%a{:href => "/dsadasd/"} dasdasd
%li.li2
%a.selected{:href => "/asdadasd"} Tasdada /asdas
%li.li3
%a{:href => "/dasd/"} asdasd
%li.li4
%a{:href => "/wdasn/"} das
I seem to be able to match this with the following repetitive regex - %ul#sub-nav\n.*\n^.*\n^.*\n^.*\n^.*\n^.*\n^.*\n^.* in intellij's rubymine ide.
This looks way too repetitive. Help appreciated.
If you want to match %ul#sub-nav plus the eight following lines, this should do:
%ul#sub-nav(\n.*$){8}
I can't figure out how to get the order of the incoming string parameters (price,merchant,category) will not matter to the regex. My regex matches the parts of the string but not the string as a whole. I need to be able to add \A \Z to it.
Pattern:
(,?price:(;?(((\d+(\.\d+)?)|min)-((\d+(\.\d+)?)|max))|\d+)+){0,1}(,?merchant:\d+){0,1}(,?category:\d+){0,1}
Sample Strings:
price:1.00-max;3-12;23.34-12.19,category:3
merchant:25,price:1.00-max;3-12;23.34-12.19,category:3
price:1.00-max;3-12;23.34-12.19,category:3,merchant:25
category:3,price:1.00-max;3-12;23.34-12.19,merchant:25
Note: I'm going to add ?: to all my groups after I get it working.
You should probably just parse this string through normal parsing. Split it at the commas, then split each of those pieces into two by the colons. You can store validation regexes if you'd like to check each of those inputs individually.
If you do it through regex, you'll probably have to end up saying "this combination OR this combination OR this combination", which will hurt real bad.
You have three options:
You can enumerate all the possible orders. For 3 variables there are 6 possibilities. Obviously this doesn't scale;
You can accept possible duplicates; or
You can break the string up and then parse it.
(2) means something like:
/(\b(price|category|merchant)=(...).*?)*/
The real problem you're facing here is that you're trying to parse what is essentially a non-regular language with a regular expression. A regular expression describes a DFSM (deterministic finite state machine) or DFA (deterministic finite automaton). Regular languages have no concept of state so the expression can't "remember" what else there has been.
To get to that you have to add a "memory" usually in the form of a stack, which yields a PDA (pushdown automaton).
It's exactly the same problem people face when they try and parse HTML with regexes and get stuck on tag nesting issues and similar.
Basically you accept some edge conditions (like repeated values), split the string by comma and then parse or you're just using the wrong tool for the job.
How about don't try and do it all with one Cthulhugex?
/price:([^,]*)/
/merchant:([^,]*)/
/category:([^,]*)/
$string=<<<EOF
price:1.00-max;3-12;23.34-12.19,category:3
merchant:25,price:1.00-max;3-12;23.34-12.19,category:3
price:1.00-max;3-12;23.34-12.19,category:3,merchant:25
category:3,price:1.00-max;3-12;23.34-12.19,merchant:25
EOF;
$s = preg_replace("/\n+/",",",$string);
$s = explode(",",$s);
print_r($s);
output
$ php test.php
Array
(
[0] => price:1.00-max;3-12;23.34-12.19
[1] => category:3
[2] => merchant:25
[3] => price:1.00-max;3-12;23.34-12.19
[4] => category:3
[5] => price:1.00-max;3-12;23.34-12.19
[6] => category:3
[7] => merchant:25
[8] => category:3
[9] => price:1.00-max;3-12;23.34-12.19
[10] => merchant:25
)