eval certain regex from file to replace chars in string - regex

I'm new to ruby so please excuse my ignorance :)
I just learned about eval and I read about its dark sides.
what I've read so far:
When is eval in Ruby justified?
Is 'eval' supposed to be nasty?
Ruby Eval and the Execution of Ruby Code
so what I have to do is to read a file in which there are some text such as /e/ 3 which will replace each e with a 3 after evaluation.
so here what i did so far:(working but..)
def evaluate_lines
result="elt"
IO.foreach("test.txt") do |reg|
reg=reg.chomp.delete(' ')
puts reg
result=result.gsub(eval(reg[0..2]),"#{reg[3..reg.length]}" )
p result
end
end
contents of the test.txt file
/e/ 3
/l/ 1
/t/ 7
/$/ !
/$/ !!
this only works because I know the length of the lines in the file.
so assuming my file has the following /a-z/ 3 my program would be not able to do what is expected from it.
Note
I tried using Regexp.new reg and this resulted with the following /\/e\/3/ which isn't very helpful in this case.
simple example to the `Regexp
str="/e/3"
result="elt"
result=result.gsub(Regexp.new str)
p result #outputs: #<Enumerator: "elt":gsub(/\/e\/3/)>
i already tried stripping off the slashes but even though this wont deliver the desired result thus the gsub() takes two parameters, such as this gsub(/e/, "3").
for the usage of the Regexp, I have already read Convert a string to regular expression ruby

While you can write something to parse that file, it rapidly gets complicated because you have to parse regular expressions. Consider /\/foo\\/.
There are a number of incomplete solutions. You can split on whitespace, but this will fail on /foo bar/.
re, replace = line.split(/\s+/, 2)
You can use a regex. Here's a first stab.
match = "/3/ 4".match(%r{^/(.*)/\s+(.+)})
This fails on escaped /, we need something more complex.
match = '/3\// 4'.match(%r{\A / ((?:[^/]|\\/)*) / \s+ (.+)}x)
I'm going to guess it was not your teacher's intent to have you parsing regexes. For the purposes of the assignment, splitting on whitespace is probably fine. You should clarify with your teacher.
This is a poor data format. It is non-standard, difficult to parse, and has limitations on the replacement. Even a tab-delimited file would be better.
There's little reason to use a non-standard format these days. The simplest thing is to use a standard data format for the file. YAML or JSON are the most obvious choices. For such simple data, I'd suggest JSON.
[
{ "re": "e", "replace": "3" },
{ "re": "l", "replace": "1" }
]
Parsing the file is trivial, use the built-in JSON library.
require 'json'
specs = JSON.load("test.json")
And then you can use them as a list of hashes.
specs.each do |spec|
# No eval necessary.
re = Regexp.new(spec["re"])
# `gsub!` replaces in place
result.gsub!(re, spec["replace"])
end
The data file is extensible. For example, if later you want to add regex options.
[
{ "re": "e", "replace": "3" },
{ "re": "l", "replace": "1", "options": ['IGNORECASE'] }
]
While the teacher may have specified a poor format, pushing back on bad requirements is good practice for being a developer.

Here's a really simple example that uses vi notation like s/.../.../ and s/.../.../g:
def rsub(text, spec)
_, mode, repl, with, flags = spec.match(%r[\A(.)\/((?:[^/]|\\/)*)/((?:[^/]|\\/)*)/(\w*)\z]).to_a
case (mode)
when 's'
if (flags.include?('g'))
text.gsub(Regexp.new(repl), with)
else
text.sub(Regexp.new(repl), with)
end
end
end
Note the matcher looks for non-slash characters ([^/]) or a literal-slash combination (\\/) and splits out the two parts accordingly.
Where you can get results like this:
rsub('sandwich', 's/and/or/')
# => "sorwich"
rsub('and/or', 's/\//,/')
# => "and,or"
rsub('stack overflow', 's/o/O/')
# => "stack Overflow"
rsub('stack overflow', 's/o/O/g')
# => "stack OverflOw"
The principle here is you can use a very simple regular expression to parse out your input regular expression and feed that cleaned up data into Regexp.new. There is absolutely no need for eval here, and if anything that severely limits what you can do.
With a little work you could alter that regular expression to parse what's in your existing file and make it do what you want.

Related

Regex- to get part of String

I have got below string and I need to Get all the values Between Pizzahut: and |.
ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|
I have got RegExpression .scan(/(?<=Pizzahut:)([.*\s\S]+)(?=\|)/) but it fetches
"j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|"
Result should be: 34532jdhgj,3242237,67688873rg
You can use
s='ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg|'
p s.scan(/Pizzahut:([^|]+)/).flatten
# => ["j34532jdhgj", "3242237", "67688873rg"]
See this Ruby demo and the Rubular demo.
It does not look possible that you have Pizzahut as a part of another word, but it is possible, use a version with a word boundary, /\bPizzahut:([^|]+)/.
The Pizzahut:([^|]+) matches Pizzahut: and then captures into Group 1 any one or more chars other than a pipe (with ([^|]+)).
Note that String#scan returns the captures only if a pattern contains a capturing group, so you do not need to use lookarounds.
I'm not sure why you're jumping to a regex solution here; that input string clearly looks structured to me, and you would probably do better by splitting it on the delimiters to convert it into a more convenient data structure.
Something like this:
input = "ABC:2fg45rdvsg|Pizzahut:j34532jdhgj|Dominos:3424232|Pizzahut:3242237|Wendys:3462783|Pizzahut:67688873rg"
converted_input = input
.split('|') #=> ["ABC:2fg45rdvsg", "Pizzahut:j34532jdhgj", ... ]
.map { |pair| pair.split(':') } #=> [["ABC", "2fg45rdvsg"], ["Pizzahut", "j34532jdhgj"], ... ]
.group_by(&:first) #=> {"ABC"=>[["ABC", "2fg45rdvsg"]], "Pizzahut"=>[["Pizzahut", "j34532jdhgj"], ... ], "Dominos"=>[["Dominos", "3424232"]], ... ]
.transform_values { |v| v.flat_map(&:last) }
(The above series of transformations is just one possible way; you could probably come up with a dozen similar alternative steps to convert this input into the same hash shown below! For example, by using reduce or even the CSV library.)
Which gives you the final result:
converted_input = {
"ABC" => ["2fg45rdvsg"],
"Pizzahut" => ["j34532jdhgj", "3242237", "67688873rg"],
"Dominos" => ["3424232"],
"Wendys" => ["3462783"]
}
Now that the data is formatted conveniently, obtaining data like your original request becomes trivial:
converted_input["Pizzahut"].join(',') #=> "j34532jdhgj,3242237,67688873rg"
(Although quite likely it would be more suitable to leave it as an Array, not a comma-separated String!!)

A Perl 6 Regex to match a Perl 6 delimited comment

Anyone have a Perl 6 regular expression that will match Perl 6 delimited comments? I would prefer something that's short rather than a full grammar, but I rule out nothing.
As an example of what I am looking for, I want something that can parse the comments in here:
#`{ foo {} bar }
#`« woo woo »
say #`(
This is a (
long )
multiliner()) "You rock!"
#`{{ { And don't forget the tricky repeating delimiters }}
My overall goal is to be able to take a source file and strip the pod and comments and then do interesting things with the code that is left. Stripping line comments and pod is pretty easy, but delimited comments requires additional finesse. I also want this solution to be small and using only Perl 6 core so I can stick it in my dotfiles repo without having external dependencies.
Matching your examples
my %openers-closers = < { } « » ( ) >; # (many more in reality)
my #openers = %openers-closers.keys; # { « ( ...
my ($open, $close); # possibly multiple chars
my token comment { '#`' <&open> <&middle> <&close> }
my token open {
# Store first delimiter char: Slurp as many as are repeated:
( ( #openers ) $0* )
# Store the full (possibly multiple character) delimiters:
{ $open = ~$0; $close = %openers-closers{$0[0]} x $0.chars }
}
my token middle {
:my $nest-level; # for tracking nesting
[
# Continue if nested: or if not at unnested end delimiter:
[ <?{$nest-level}> || <!&close> ]
# Match either a nested delimiter: or a single character:
( $open || $close || . )
# Keep track of nesting:
{ $_ = ~$0.tail; # set topic to latest match in list
$nest-level++ when $open; $nest-level-- when $close }
]*
}
my token close { $close }
.say for $your-examples ~~ m:g / <.&comment> /
displays:
「{ foo {} bar }」
「« woo woo »」
「(
This is a (
long )
multiliner())」
「{{ { And don't forget the tricky repeating delimiters }}」
Hopefully the code is self-explanatory if you know Raku regexes. Please use the comments if you want clarification of any of it.
Looking at related Rakudo source code
I wrote the above without referring to Rakudo's source code. (I wanted to see what I came up with without doing so.)
But I've now looked at the source code, which imo would be a more or less mandatory thing to do for anyone trying to do what you're trying to do and serious about understanding how well it might work in the general case.
As I starting point, I was particularly interested in seeing if I could figure out why feeding this code to rakudo (2018.12):
#`{{ {{ And don't forget the tricky repeating delimiters } }}
yields the rather LTA (Less Than Awesome) compiler error:
Starter {{ is immediately followed by a combining codepoint...
This doesn't look directly relevant to your question but I encountered it when trying to understand the nested delimiter rules.
So when I got to this part of my answer I started by searching the Rakudo repo for "immediately followed". That led to a fail-terminator method in the Raku grammar. (Perhaps not of interest to you but it is to me.)
Here's what else I found in the standard grammar that imo is directly related to what you're trying to do, or at least understanding precisely what the code says the rules are about matching comments:
The comment:sym<#`(...)> token that parses these comments. This leads to:
The list of openers. This list should replace the measly 3 opener/closer pairs in my code that just match your examples.
The quibble token. This seems to be a generic "parse 'quoted' (delimited) thing". It leads to:
The babble token. This establishes a "start" and "stop" with this code:
$<B>=[<?before .>]
{
# Work out the delimiters.
my $c := $/;
my #delims := $c.peek_delimiters($c.target, $c.pos);
my $start := #delims[0];
my $stop := #delims[1];
The rule peek_delimiters is not in the Raku grammar file.
A search in the Rakudo repo shows it's not anywhere in Rakudo or Raku.
A search in NQP yields a routine in nqp's grammar (from which the Raku grammar inherits, which is why the peek_delimiters call works and why I looked in NQP when I didn't find it in Rakudo/Raku).
I'll stop at this point to draw a conclusion.
Conclusion
You've got a regex. It might work out as you intend. I don't know.
If you end up investigating the above Rakudo/NQP code and understand it well enough to write a walk through of what quibble, babble, nibble, et al do, or discover a good existing write up (I haven't searched for one yet), please add a comment to this answer linking to it. I'll do likewise. TIA!

A regex to match between delimiters except when there is a colon that is not between double quotes?

This one is a little bit complicated and I'm not sure if it can be done.
The regex need to match everything between a , (comma) or [] (square brackets).
It must not match if there is a :
And now the tricky part.
If the : is between " " it can match.
I managed to create a regex that does everything except the last.
(?<=[[,])[^:]+?(?=[],])
So this is what it needs to match.
[ ItemName:Data, More Data, With a number "as: " item name]
I'm going to keep testing. Lets see if someone solves it.
It sounds like you're trying to specify a language that's really to complicated to parse using only regular expressions. Here's a pattern that matches what you've described, but probably won't work perfectly. It doesn't use look behinds so you need to select the first match group to get the contents.
/[\[,](("[^"\]]*"|[^:\[])*?)[\]\,]/
/[\[,] # Opening bracket or comma.
(("[^"\]]*" # Anything not including the closing bracket, in quotes...
|[^:\[] # or not including the colon...
))*? # repeated any number of times.
[\]\,]/x # Closing bracket or comma.
An example usage in Python:
import re
pattern = re.compile(r"""[\[,](("[^"\]]*"|[^:\[])*?)[\]\,]""", re.DEBUG)
for match in pattern.finditer('[1 2 3] [4 5] [6 : 7], "8 : 9", '):
print match.group(1)
Producing output:
1 2 3
4 5
"8 : 9"
I have good experience in using (perl) regexps in practise, so let me share my experience. If you are handling complex cases like this it is almost always best to do it step by step, unless you are in special ciscumstances (for example speed of execution is crucial).
So in this case I woud simply do it in two steps. First explode the data to chunks, i.e. something like (depending on your language)
split(/[][,]/)
and than accept or remove individual parts. In this case just remove parts which match this expression
/^([^"]*:.*|.*:[^"])$/
i.e. parts which include semicolon not surrounded with parantheses.
Clearly this deos not solve all the cases like With a number "as: " : "item" name, but I agree with Jeremy, than if you are trying to implement complicated syntax language, than it might not be the right thing to just throw few regexpes on it without deeper analysis (i.e. answering what exactly it should accept in wierd cases like [ 1:1, 2":"2,3":":3,4":":":"4,5":":"5], ...) and using appropriate aprroach to solve it (recursive syntax parser)

Regexp: Keyword followed by value to extract

I had this question a couple of times before, and I still couldn't find a good answer..
In my current problem, I have a console program output (string) that looks like this:
Number of assemblies processed = 1200
Number of assemblies uninstalled = 1197
Number of failures = 3
Now I want to extract those numbers and to check if there were failures. (That's a gacutil.exe output, btw.) In other words, I want to match any number [0-9]+ in the string that is preceded by 'failures = '.
How would I do that? I want to get the number only. Of course I can match the whole thing like /failures = [0-9]+/ .. and then trim the first characters with length("failures = ") or something like that. The point is, I don't want to do that, it's a lame workaround.
Because it's odd; if my pattern-to-match-but-not-into-output ("failures = ") comes after the thing i want to extract ([0-9]+), there is a way to do it:
pattern(?=expression)
To show the absurdity of this, if the whole file was processed backwards, I could use:
[0-9]+(?= = seruliaf)
... so, is there no forward-way? :T
pattern(?=expression) is a regex positive lookahead and what you are looking for is a regex positive lookbehind that goes like this (?<=expression)pattern but this feature is not supported by all flavors of regex. It depends which language you are using.
more infos at regular-expressions.info for comparison of Lookaround feature scroll down 2/3 on this page.
If your console output does actually look like that throughout, try splitting the string on "=" when the word "failure" is found, then get the last element (or the 2nd element). You did not say what your language is, but any decent language with string splitting capability would do the job. For example
gacutil.exe.... | ruby -F"=" -ane "print $F[-1] if /failure/"

Using perl to split a line that may contain whitespace

Okay, so I'm using perl to read in a file that contains some general configuration data. This data is organized into headers based on what they mean. An example follows:
[vars]
# This is how we define a variable!
$var = 10;
$str = "Hello thar!";
# This section contains flags which can be used to modify module behavior
# All modules read this file and if they understand any of the flags, use them
[flags]
Verbose = true; # Notice the errant whitespace!
[path]
WinPath = default; # Keyword which loads the standard PATH as defined by the operating system. Append with additonal values.
LinuxPath = default;
Goal: Using the first line as an example "$var = 10;", I'd like to use the split function in perl to create an array that contains the characters "$var" and "10" as elements. Using another line as an example:
Verbose = true;
# Should become [Verbose, true] aka no whitespace is present
This is needed because I will be outputting these values to a new file (which a different piece of C++ code will read) to instantiate dictionary objects. Just to give you a little taste of what it might look like (just making it up as I go along):
define new dictionary
name: [flags]
# Start defining keys => values
new key name: Verbose
new value val: 10
# End dictionary
Oh, and here is the code I currently have along with what it is doing (incorrectly):
sub makeref($)
{
my #line = (split (/=/)); # Produces ["Verbose", " true"];
}
To answer one question, why I am not using Config::Simple, is that I originally did not know what my configuration file would look like, only what I wanted it to do. Making it up as I went along - at least what seemed sensible to me - and using perl to parse the file.
The problem is I have some C++ code that will load the information in the config file, but since parsing in C or C++ is :( I decided to use perl. It's also a good learning exercise for me since I am new to the language. So that's the thing, this perl code is not really apart of my application, it just makes it easier for the C++ code to read the information. And, it is more readable (both the config file, and the generated file). Thanks for the feedback, it really helped.
If you're doing this parsing as a learning exercise, that's fine. However, CPAN has several modules that will do a lot of the work for you.
use Config::Simple;
Config::Simple->import_from( 'some_config_file.txt', \my %conf );
split splits on a regular expression, so you can simply put the whitespace around the = sign into its regex:
split (/\s*=\s*/, $line);
You obviously do not want to remove all whitespace, or such a line would be produced (whitespace missing in the string):
$str="Hellothere!";
I guess that only removing whitespace from the beginning and end of the line is sufficient:
$line =~ s/^\s*(.*?)\s*$/$1/;
A simpler alternative with two statements:
$line =~ s/^\s+//;
$line =~ s/\s+$//;
Seems like you've got it. Strip the whitespaces before splitting.
sub makeref($)
{
s/\s+//g;
my #line = (split(/=/)); # gets ["verbose", "true"]
}
This code does the trick (and is more efficient without reversing).
for (#line) {
s/^\s+//;
s/\s+$//;
}
You probably have it all figured out, but I thought I'd add a little. If you
sub makeref($)
{
my #line = (split(/=/));
foreach (#line)
{
s/^\s+//g;
s/\s+$//g;
}
}
then you will remove the whitespace before and after both the left and right side. That way something like:
this is a parameter = all sorts of stuff here
will not have crazy spaces.
!!Warning: I probably don't know what I'm talking about!!