Multiple grep (with regex's) functions not working in Perl script

Multiple grep (with regex's) functions not working in Perl script - regex

Having trouble with a script right now.
Trying to filter out portions of a file and put them into a scalar. Here is the code.
#value = (grep {m/(III[ABC])/g and m//g }<$fh>)
print #value;
#value = (grep { m/[012]iii/g}(<$fh>));
print #value;
When I run the first grep , the values appear in the print statment. But when I run the second grep. The 2nd print statement doesnt print anything. Does adding a second grep, cancel out the effectiveness of the first grep ?
I know that first and second grep work because even when I commented out the first grep. The second grep function worked.
All I really want to do, is filter out information, for multiple different individual arrays. I am really confused as to how to fix this problem, since I am planning on adding more grep's to the script.

The first read on <$fh> gets to the end of the file. Then the second invocation has nothing to read. Thus if you comment out the first one this doesn't happen and the second one works.
The code below adds to the same array. Change to the commented out code if needed. The regex is simplified, since it requires a comment while it doesn't affect the actual question. Please put it back the way it was, if that was what you really meant.
You can either rewind the filehandle after all lines have been read
my #vals = grep { /III[ABC]/ } <$fh>;
seek $fh, 0, 0;
# ready for reading again from the beginning
push #vals, grep { /[012]iii/ } <$fh>;
#or: my #vals_2 = grep { /[012]iii/ } <$fh>;
Or you can read all lines into an array that you can then process repeatedly.
my #original = <$fh>;
my #vals = grep { /III[ABC]/ } #original;
push #vals, grep { m/[012]iii/ } #original;
# or assign to a different array
If you don't need to store these results in such order it would be far more efficient to read the file line by line, and process and add as you go.
Update
I simplified the originally posted regex in order to focus on the question at hand, since the exact condition inside the block has no bearing on it. See the Note below. Thanks to ikegami for bringing it up and for explaining that // "repeats the last successful query".
The m//g is tricky and I removed it.
grep checks a condition and passes a line through if the condition evaluates true. In such scalar context /.../g modifier has effects which are a very different story, removed.
For the same reason as above, the capturing () is unneeded (excessive).
Cleaning up the syntax helps readability here, removed m/.
Note on regex
In scalar context /.../g modifier does the following, per perlrequick:
successive matches against a string will have //g jump from match to match
The empty string pattern m//g also has effects which are far from obvious, stated above.
Taken together these produce non-trivial results in my tests and need mental tracing to understand. I removed them from the code here since leaving them begs a question on whether they are intended trickery or subtle bugs, thus distracting from the actual question -- which they do not affect at all.

I don't know what you think the g modifier does, but it makes no sense here.
I don't know what you think // (a match with an empty pattern) does, but it makes no sense here.
In list context, <$fh> returns all remaining lines in the file. It returns nothing the second time you evaluate it since since you've already reached the end of the file the first time you evaluated it.
Fix:
my #lines = <$fh>;
my #values1 = grep { /III[ABC]/ && /.../ } #lines;
my #values2 = grep { /[012]iii/ } #lines;
Of course, substitute ... for what you meant to use there.

Related

complex search/delete/move/replace operation using sed?

after several hours of searching and experimenting, I'm hoping someone can either help me or rub my nose in a post I've missed which acctually would be helpful as well come to think of it...
Problem:
I've made a quick&dirty fix in several dozens of php scripts (we use to enhance smarty capabilities) with security checks.
Example of input(part1):
///// SMARTY AUTH /////
$auth['model'] = isset($params['model']) ? $params['model'] : null;
$auth['requiredLevel'] = isset($params['requiredlevel']) ? $params['requiredlevel'] : null;
$auth['baseAuthorizationLevel'] = isset($params['_authorizationlevel']) ? $params['_authorizationlevel'] : null;
$auth['defaultRequiredLevel'] = AuthorizationLevel::AULE_WRITE;
$auth['baseModel'] = $smarty->getTemplateVars('model');
///// SMARTY AUTH /////
...which i'd like to replace with a much cleaner solution we've come up with. Now here's the rub; in one section of the file there's a block of lines, luckily with very distinct delimiter lines, but in one of those lines is a piece of code that needs to be merged with a replacement string which replaces a second pattern in a line which follows the before-said block, with optionally a variable number of lines in between.
I'm having trouble figuring out how to piece this nested code together as the shorthand code of sed is quite confusing to me.
So far I've tried to assemble the code needed to capture the first block, but sed keeps giving me the same error each time; extra characters after command
here are some of the attempts I've made:
sed -n 'p/^\/\/\/\/\/ SMARTY AUTH \/\/\/\/\/\\n.*\\n.*\\n.*\\n.*AULE_\([A-Z_]*\);$^.*$^^\/\/\/\/\/ SMARTY AUTH \/\/\/\/\/$/' function.xls_form.php
sed -n 'p/\(^.*SMARTY AUTH.*$^.*$^.*$^.*$^.*AULE_\([A-Z_]*\);$^.*$^.*SMARTY AUTH.*$/' function.xls_form.php
the second part is relatively easy compared to the first;
sed -ei'.orig' 's/RoleContextAuthorizations::smartyAuth(\$auth)/$smarty->hasAccess(\$params,AuthorizationLevel::AULE_\1)/' *.php
where \1 would be the matched snippet from the first part...
Edit:
The first codeblock is an example of input part 1 which needs to be removed; part 2 is RoleContextAuthorizations::smartyAuth($auth) which needs to be replaced with $smarty->hasAccess($params, AuthorizationLevel::AULE_<snippet from part1>)
/edit
Hoping somebody can point me in the right direction, Many thanks in advance!!!

The hold space is going to be key to solving this. You can copy material from the pattern space (where sed normally works) into the hold space, and do various operations with the hold space, etc.
You need to find the AuthorizationLevel::AULE_WRITE type text within the block markers, and copy that to the hold space, and then delete the text within the block markers. And then separately find the other pattern and replace it with information from the hold space.
Given that the markers use slashes, it is also time to use a custom search marker which is introduced by a backslash. The following could be in a file script.sed, to be used as:
sed -f script.sed function.xls_form.php
When you're sure it's working, you can play with -i options to overwrite the original.
\%///// SMARTY AUTH /////%,\%///// SMARTY AUTH /////% {
/.*\(AuthorizationLevel::AULE_[A-Z]\{1,\}\).*/{
s//$smarty->hasAccess($params,\1);/
x
}
d
}
/RoleContextAuthorizations::smartyAuth($auth)/x
The first line searches for the start and end marker, using \% to change the delimiter to %. There's then a group of actions in braces. The second line searches for the authorization level and starts a second group of actions. The substitute command replaces the line with the desired output line. The x swaps the pattern space and the hold space, copying the desired output line to the hold space (and copying the empty hold space to the pattern space — it's x for eXchange pattern and hold spaces). This has saved the AuthorizationLevel information. The inner block ends; the outer block deletes the line and continues the execution. Note that there's no need to escape the $ symbol most of the time — it would matter if it was at the end of a pattern (there's a difference between /a\$/ and /a$/, but no difference between /b$c/ and /b\$c/).
The last line then looks for the RoleContextAuthorizations line and swaps it with the hold space. Everything else is just let through.
Given a data file containing:
Gibberish
Rhubarb
///// SMARTY AUTH /////
$auth['model'] = isset($params['model']) ? $params['model'] : null;
$auth['requiredLevel'] = isset($params['requiredlevel']) ? $params['requiredlevel'] : null;
$auth['baseAuthorizationLevel'] = isset($params['_authorizationlevel']) ? $params['_authorizationlevel'] : null;
$auth['defaultRequiredLevel'] = AuthorizationLevel::AULE_WRITE;
$auth['baseModel'] = $smarty->getTemplateVars('model');
///// SMARTY AUTH /////
More gibberish
More rhubarb - it is good with strawberries, especially in yoghurt
RoleContextAuthorizations::smartyAuth($auth);
Trailing gibbets — ugh; worse are trailing giblets
Finish - EOF
The output from sed -f script.sed data is:
$ sed -f script.sed data
Gibberish
Rhubarb
More gibberish
More rhubarb - it is good with strawberries, especially in yoghurt
$smarty->hasAccess($params,AuthorizationLevel::AULE_WRITE);
Trailing gibbets — ugh; worse are trailing giblets
Finish - EOF
$
I think that's what was wanted.
You can convert the file of sed script into a single line of gibberish, but that's left as an exercise for the reader — it isn't very hard, but GNU sed and BSD (macOS) sed have different rules for when you need semicolons as part of a single line command; you were warned. There are also differences in the rules for the -i option between the GNU and BSD variants of sed.
If you have to preserve some portions of the RoleContextAuthorizations::smartyAuth line, you have to work harder, but it can probably be done. For example, you can add the hold space to the current pattern space with the G command, and then edit the information into the right places. It is simplest if every place the line occurs needs to look the same apart from the AULE_XYZ string — that's what I've assumed here.
Also, note that using x rather than h or g is lazy — but doesn't matter if there's only one RoleContextAuthorizations::smartyAuth line. Using the alternatives would mean that if a file has multiple RoleContextAuthorizations::smartyAuth lines, then you'd be able to make the same substitution in each, unless there's another ///// SMARTY AUTH ///// in the file.

mIRC Search for multiple words in text file

I am trying to search a text file that will return a result if more than one word is found in that line. I don't see this explained in the documentation and I have tried various loops with no success.
What I would like to do is something similar to this:
$read(name.txt, s, word1|word2|word3)
or even something like this:
$read(name.txt, w, word1*|*word2*|*word3)
I don't know RegEx that well so I'm assuming this can be done with that but I don't know how to do that.

The documentation in the client self is good but I also recommend this site: http://en.wikichip.org/wiki/mirc. And with your problem there is a nice article : http://en.wikichip.org/wiki/mirc/text_files
All the info is taken from there. So credits to wikichip.
alias testForString {
while ($read(file.txt, nw, *test*, $calc($readn + 1))) {
var %line = $v1
; you can add your own words in the regex, seperate them with a pipe (|)
noop $regex(%line,/(word1|word2|word3|test)/))
echo -a Amount of results: $regml(0)
}
}
$readn is an identifier that returns the line that $read() matched. It is used to start searching for the pattern on the next line. Which is in this case test.
In the code above, $readn starts at 0. We use $calc() to start at line 1. Every match $read() will start searching on the next line. When no more matches are after the line specified $read will return $null - terminating the loop.
The w switch is used to use a wildcard in your search
The n switch prevents evaluating the text it reads as if it was mSL code. In almost EVERY case you must use the n switch. Except if you really need it. Improper use of the $read() identifier without the 'n' switch could leave your script highly vulnerable.
The result is stored in a variable named %line to use it later in case you need it.
After that we use a noop to execute a regex to match your needs. In this case you can use $regml(0) to find the amount of matches which are specified in your regex search. Using an if-statement you can see if there are two or more matches.
Hope you find this helpful, if there's anything unclear, I will try to explain it better.
EDIT
#cp022
I can't comment, so I'll post my comment here, so how does that help in any way to read content from a text file?

TCL: Backslash issue (regsub)

I have an issue while trying to read a member of a list like \\server\directory
The issue comes when I try to get this variable using the lindex command, that proceeds with TCL substitution, so the result is:
\serverdirectory
Then, I think I need to use a regsub command to avoid the backslash substitution, but I did not get the correct proceedure.
An example of what I want should be:
set mistring "\\server\directory"
regsub [appropriate regular expresion here]
puts "mistring: '$mistring'" ==> "mistring: '\\server\directory'"
I have checked some posts around this, and keep the \\ is ok, but I still have problems when trying to keep always a single \ followed by any other character that could come here.
UPDATE: specific example. What I am actually trying to keep is the initial format of an element in a list. The list is received by an outer application. The original code is something like this:
set mytable $__outer_list_received
puts "Table: '$mytable'"
for { set i 0 } { $i < [llength $mitabla] } { incr i } {
set row [lindex $mytable $i]
puts "Row: '$row'"
set elements [lindex $row 0]
puts "Elements: '$elements'"
}
The output of this, in this case is:
Table: '{{
address \\server\directory
filename foo.bar
}}'
Row: '{
address \\server\directory
filename foo.bar
}'
Elements: '
address \\server\directory
filename foo.bar
'
So I try to get the value of address (in this specific case, \\server\directory) in order to write it in a configuration file, keeping the original format and data.
I hope this clarify the problem.

If you don't want substitutions, put the problematic string inside curly braces.
% puts "\\server\directory"
\serverdirectory
and it's not what you want. But
% puts {\\server\directory}
\\server\directory
as you need.

Since this is fundamentally a problem on Windows (and Tcl always treats backslashes in double-quotes as instructions to perform escaping substitutions) you should consider a different approach (otherwise you've got the problem that the backslashes are gone by the time you can apply code to “fix” them). Luckily, you've got two alternatives. The first is to put the string in {braces} to disable substitutions, just like a C# verbatim string literal (but that uses #"this" instead). The second is perhaps more suitable:
set mistring [file nativename "//server/directory"]
That ensures that the platform native directory separator is used on Windows (and nowadays does nothing on other platforms; back when old MacOS9 was supported it was much more magical). Normally, you only need this sort of thing if you are displaying full pathnames to users (usually a bad idea, GUI-wise) or if you are passing the name to some API that doesn't like forward slashes (notably when going as an argument to a program via exec but there are other places where the details leak through, such as if you're using the dde, tcom or twapi packages).

A third, although ugly, option is to double the slashes. \\ instead of \, and \ instead of \, while using double quotes. When the substitution occurs it should give you what you want. Of course, this will not help much if you do the substitution a second time.

Perl splitting text string (from HTML page, text document, etc.) by line into array?

This is kind of a weird question, at least for me, as I don't exactly understand what is fully involved in this. Basically, I have been doing this process where I save a scraped document (such as a web page) to a .txt file. Then I can easily use Perl to read this file and put each line into an array. However, it is not doing this based on any visible thing in the document (i.e., it is not going by HTML linebreaks); it just knows where a new line is, based on the .txt format.
However, I would like to cut this process out and just do the same thing from within a variable, so instead I would have what would have been the contents of the .txt file in a string and then I want to parse it, in the same way, line by line. The problem for me is that I don't know much about how this would work as I don't really understand how Perl would be able to tell where a new line is (assuming I'm not going by HTML linebreaks, as often it is just a web based .txt file (which presents to my scraper, www:mechanize, as a web page) I'm scraping so there is no HTML to go by). I figure I can do this using other parameters, such as blank spaces, but am interested to know if there is a way to do this by line. Any info is appreciated.
I'd like to cut the actual saving of a file to reduce issues related to permissions on servers I use and also am just curious if I can make the process more efficient.

Here's an idea that might help you: you can open from strings as well as files.
So if you used to do this:
open( my $io, '<', 'blah.txt' ) or die "Could not open blah.txt! - $!";
my #list = <$io>;
You can just do this:
open( my $io, '<', \$text_I_captured );
my #list = <$io>;

It's hard to tell what your code's doing since we don't have it in front of us; it would be easier to help if you posted what you had. However, I'll give it a shot. If you scrape the text into a variable, you will have a string which may have embedded line breaks. These will either be \n (the traditional Unix newline) or \r\n (the traditional Windows newline sequence). Just as you can split on a space to get (a first approximation of) the words in a sentence, you can instead split on the newline sequence to get the lines in. Thus, the single line you'll need should be
my #lines = split(/\r?\n/, $scraped_text);

Use the $/ variable, this determines what to break lines on. So:
local $/ = " ";
while(<FILE>)...
would give you chunks separated by spaces. Just set it back to "\n" to get back to the way it was - or better yet, go out of the local $/ scope and let the global one come back, just in case it was something other than "\n" to begin with.
You can eliminate it altogether:
local $/ = undef;
To read whole files in one slurp. And then iterate through them however you like. Just be aware that if you do a split or a splice, you may end up copying the string over and over, using lots of CPU and lots of memory. One way to do it with less is:
# perl -de 0
> $_="foo\nbar\nbaz\n";
> while( /\G([^\n]*)\n/go ) { print "line='$1'\n"; }
line='foo'
line='bar'
line='baz'
If you're breaking apart things by newlines, for example. \G matches either the beginning of the string or the end of the last match, within a /g-tagged regex.
Another weird tidbit is $/=\10... if you give it a scalar reference to an integer (here 10), you can get record-length chunks:
# cat fff
eurgpuwergpiuewrngpieuwngipuenrgpiunergpiunerpigun
# perl -de 0
$/ = \10;
open FILE, "<fff";
while(<FILE>){ print "chunk='$_'\n"; }
chunk='eurgpuwerg'
chunk='piuewrngpi'
chunk='euwngipuen'
chunk='rgpiunergp'
chunk='iunerpigun'
chunk='
'
More info: http://www.perl.com/pub/a/2004/06/18/variables.html
If you combine this with FM's answer of using:
$data = "eurgpuwergpiuewrngpieuwngipuenrgpiunergpiunerpigun";
open STRING, "<", \$data;
while(<STRING>){ print "chunk='$_'\n"; }
I think you can get every combination of what you need...

What's a good Perl regex to untaint an absolute path?

Well, I tried and failed so, here I am again.
I need to match my abs path pattern.
/public_html/mystuff/10000001/001/10/01.cnt
I am in taint mode etc..
#!/usr/bin/perl -Tw
use CGI::Carp qw(fatalsToBrowser);
use strict;
use warnings;
$ENV{PATH} = "bin:/usr/bin";
delete ($ENV{qw(IFS CDPATH BASH_ENV ENV)});
I need to open the same file a couple times or more and taint forces me to untaint the file name every time. Although I may be doing something else wrong, I still need help constructing this pattern for future reference.
my $file = "$var[5]";
if ($file =~ /(\w{1}[\w-\/]*)/) {
$under = "/$1\.cnt";
} else {
ErroR();
}
You can see by my beginner attempt that I am close to clueless.
I had to add the forward slash and extension to $1 due to my poorly constructed, but working, regex.
So, I need help learning how to fix my expression so $1 represents /public_html/mystuff/10000001/001/10/01.cnt
Could someone hold my hand here and show me how to make:
$file =~ /(\w{1}[\w-\/]*)/ match my absolute path /public_html/mystuff/10000001/001/10/01.cnt ?
Thanks for any assistance.

Edit: Using $ in the pattern (as I did before) is not advisable here because it can match \n at the end of the filename. Use \z instead because it unambiguously matches the end of the string.
Be as specific as possible in what you are matching:
my $fn = '/public_html/mystuff/10000001/001/10/01.cnt';
if ( $fn =~ m!
^(
/public_html
/mystuff
/[0-9]{8}
/[0-9]{3}
/[0-9]{2}
/[0-9]{2}\.cnt
)\z!x ) {
print $1, "\n";
}
Alternatively, you can reduce the vertical space taken by the code by putting the what I assume to be a common prefix '/public_html/mystuff' in a variable and combining various components in a qr// construct (see perldoc perlop) and then use the conditional operator ?::
#!/usr/bin/perl
use strict;
use warnings;
my $fn = '/public_html/mystuff/10000001/001/10/01.cnt';
my $prefix = '/public_html/mystuff';
my $re = qr!^($prefix/[0-9]{8}/[0-9]{3}/[0-9]{2}/[0-9]{2}\.cnt)\z!;
$fn = $fn =~ $re ? $1 : undef;
die "Filename did not match the requirements" unless defined $fn;
print $fn, "\n";
Also, I cannot reconcile using a relative path as you do in
$ENV{PATH} = "bin:/usr/bin";
with using taint mode. Did you mean
$ENV{PATH} = "/bin:/usr/bin";

You talk about untainting the file path every time. That's probably because you aren't compartmentalizing your program steps.
In general, I break up these sort of programs into stages. One of the earlier stages is data validation. Before I let the program continue, I validate all the data that I can. If any of it doesn't fit what I expect, I don't let the program continue. I don't want to get half-way through something important (like inserting stuff into a database) only to discover something is wrong.
So, when you get the data, untaint all of it and store the values in a new data structure. Don't use the original data or the CGI functions after that. The CGI module is just there to hand data to your program. After that, the rest of the program should know as little about CGI as possible.
I don't know what you are doing, but it's almost always a design smell to take actual filenames as input.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Multiple grep (with regex's) functions not working in Perl script - regex

Related

complex search/delete/move/replace operation using sed?

mIRC Search for multiple words in text file

TCL: Backslash issue (regsub)

Perl splitting text string (from HTML page, text document, etc.) by line into array?

What's a good Perl regex to untaint an absolute path?

Categories

Resources