How to apply negative regex on array in perl? - regex

Having this:
foo.pl:
#!/usr/bin/perl -w
#heds = map { /_h.+/ and s/^(.+)_.+/$1/ and "$_.hpp" } #ARGV;
#fls = map { !/_h.+/ and "$_.cpp" } #ARGV;
print "heds: #heds\nfls: #fls";
I want to separate headers from source files, and when I give input:
$./foo.pl a b c_hpp d_hpp
heds: e.hpp f.hpp
fls: e.cpp f.cpp a.cpp b.cpp
The headers are correctly separated, however the files are taken all. Why? I have applied the negative regex !/_h.+/ in the mapping so the files with *_h* should not be taken in account, but they are. Why so? and how to fix it?
Does not work even this:
#fls = map { if(!/_h.+/){ "$_.cpp" } } #ARGV;
still takes every files, despite the condition

The map { } for #heds includes a substitution on the $1 argument and changes it. Just reorder the mapppings to avoid the effect on #fls and you get the desired result. Though, if you need to access #ARGV after these mappings it is not the original #ARGV anymore, like in your example code.
#!/usr/bin/perl -w
#fls = map { !/_h.+/ and "$_.cpp" } #ARGV;
#heds = map { /_h.+/ and s/^(.+)_.+/$1/ and "$_.hpp" } #ARGV;
print "heds: #heds\nfls: #fls\n";

Related

Extracting a function body by name with BASH and regex

I have some automatically generated code from MATLAB coder. I would like to make a script to find my entries out of large file. I've successfully plowed my way through regex with BASH to get the main function main\( *([^)]+?)\), and then the body with /\{([^}]+)\}/; however, I'm having a terrible time glueing those together. All I need is the function names contained in main().
I realize that this could be a terrible exercise, but the automatically generated code gives me simple functions that looks like:
int main(int argc, const char * const argv[])
{
(void)argc;
(void)argv;
/* Initialize the application. You do not need to do this more than one time. */
RT_initialize();
/* Invoke the entry-point functions. You can call entry-point functions multiple times. */
main_RT();
/* Terminate the application. You do not need to do this more than one time. */
RT_terminate();
return 0;
}
I would like to extract the that function and body, but my regex is poorer than I recalled.
Any guidance would be greatly appreciated.
A simple way to fairly reliably extract the entire function body is to run the code through a formatter first:
indent -kr < mymain.c | sed -n 's/^int main(/,/^}/p'
cflow can give you a function call graph. eg:
cflow -d2 mymain.c
Due to some restrictions to being on BSD, the resulting BASH function follows to get the function body from a C source for a function by name. This was only tested with the well-formatted C code from MATLAB's Coder.
function getFunctionInC(){
TMPFILEIDENT="/tmp/indent.$$.tmp" #temp file
indent "$1" $TMPFILEIDENT
cat $TMPFILEIDENT | awk '
BEGIN { state = 0; last = ""; }
$0 ~ /^'$2'\(/ { print last; state = 1; }
{ if (state == 1) print; }
$0 ~ /^}/ { if (state) state = 2; }
{ last = $0; }
'
}
The formatting is terrible on the outputs, but I can easily pull the function names to dynamically create defines. Thanks to everyone who read the question.

multiple addition of header file even after checking

I want to add in a .C file incase its not present. Using Perl
MY CODE SNIPPET
my $flag = 0;
my $pos = 0;
open(FILE, $input) or die $!;
my #lines = <FILE>;
foreach(#lines)
{
$pos++;
#checks for #include where it can add stdint.h
if ($_ =~ (m/#include/))
{
#prevents multiple addition for each header file
if($flag == 0)
{
#checks whether stdint already present or not
unless($_ =~ m/#include <stdint.h>/ )
{
splice #lines,$pos,0,"#include <stdint.h>"."\n";
$flag = 1;
}
}
}
}
But my code is adding stdint.h everytime it runs which means multiple addition for every run.
whats wrong with the code
unless($_ =~ m/#include <stdint.h>/){
doesn't work even if i use
unless($_ =~ m/<stdint.h>/){
Imagine you have this C file:
#include <stdio.h>
#include <stdint.h>
int main(int argc, char ** argv) {
return 0;
}
What is supposed to happen when this goes through your script?
Nothing, because is already included
What actually happens though? This is where learning to use the Perl debugger or simply tracing by hand is really useful.
flag and pos are initialized to 0. The first line in the file is #include <stdio.h> which is not #include <stdint.h>, so your code immediately assumes the file is missing and adds it.
So, in your above code you insert #include <stdint.h> on the first include that is not it, regardless of whether or not it is actually there later in the file or before, which will always be any other include file.
What you should actually do is gather all of the include lines in an Array, then search for the file matching <stdint.h> only adding it if it isn't contained in the complete list.
Here is a way to do it:
open(my $FILE, '<', $input) or die $!;
my #lines = <$FILE>;
my $flag = 0;
my $pos = 0;
my $insert_pos = 1; #add stdin even if there're no other include
foreach(#lines) {
$pos++;
if (/#include/){
$insert_pos = $pos;
if (/#include <stdint.h>/) {
$insert_pos = 0;
last;
}
}
}
if ($insert_pos) {
splice #lines, $insert_pos, 0, "#include <stdint.h>"."\n";
}
This is an awful thing to be doing to a C project.
What you have coded adds #include <stdint.h> right after the first #include line, and has no effect on files that don't #include anything.
However, if you want to "edit" a file using Perl, then you should use Tie::File
The code in your question would look like this
use strict;
use warnings;
use Tie::File;
my ($input) = #ARGV;
tie my #c_file, 'Tie::File', $input or die qq{Unable to open C file "$input": $!};
for my $i (0 .. $#c_file) {
next unless $c_file[$i] =~ /#include/;
splice #c_file, $i, 0, '#include <stdint.h>';
last;
}

perl script to read content between marks

In the perl , how to read the contents between two marks. Source data like this
START_HEAD
ddd
END_HEAD
START_DATA
eee|234|ebf
qqq| |ff
END_DATA
--Generate at 2011:23:34
then I only want to get data between "START_DATA" and "END_DATA". How to do this ?
sub readFile(){
open(FILE, "<datasource.txt") or die "file is not found";
while(<FILE>){
if(/START_DATA/){
record(\*FILE);#start record;
}
}
}
sub record($){
my $fileHandle = $_[0];
while(<fileHandle>){
print $_."\n";
if(/END_DATA/) return ;
}
}
I write this code, it doesn't work. do you know why ?
Thanks
Thanks
You can use the range operator:
perl -ne 'print if /START_DATA/ .. /END_DATA/'
The output will include the *_DATA lines, too, but it should not be so hard to get rid of them.
Besides a few typos, your code is not too far off. Had you used
use strict;
use warnings;
You might have figured it out yourself. Here's what I found:
Don't use prototypes if you do not need them, or know what they do.
Normal sub declaration is sub my_function (prototype) {, but you can leave out the prototype and just use sub my_function {.
while (<fileHandle>) { is missing the $ sign to denote that it is
a variable (scalar) and not a global. Should be $fileHandle.
print $_."\n"; will add an extra newline. Just print; will do
what you expect.
if(/END_DATA/) return; is a syntax error. Brackets are not optional
in perl in this case. Unless you reverse the statement.
Use either:
return if (/END_DATA/);
or
if (/END_DATA/) { return }
Below is the cleaned up version. I commented out your open() while testing, so this would be a functional code example.
use strict;
use warnings;
readFile();
sub readFile {
#open(FILE, "<datasource.txt") or die "file is not found";
while(<DATA>) {
if(/START_DATA/) {
recordx(\*DATA); #start record;
}
}
}
sub recordx {
my $fileHandle = $_[0];
while(<$fileHandle>) {
print;
if (/END_DATA/) { return }
}
}
__DATA__
START_HEAD
ddd
END_HEAD
START_DATA
eee|234|ebf
qqq| |ff
END_DATA
--Generate at 2011:23:34
This is a pretty simple thing to do with regular expressions, just use the /s or /m (single line or multiple line) flags - /s allows the . operator to match newlines, so you can do /start_data(.+)end_data/is.

Why my perl script isn't finding bad indetation from my regex match

My work's coding standard uses this bracket indentation:
some declaration
{
stuff = other stuff;
};
control structure, function, etc()
{
more stuff;
for(some amount of time)
{
do something;
}
more and more stuff;
}
I'm writing a perl script to detect incorrect indentation. Here's what I have in the body of a while(<some-file-handle>):
# $prev holds the previous line in the file
# $current holds the current in the file
if($prev =~ /^(\t*)[^;]+$/ and $current =~ /^(?<=!$1\t)[\{\}].+$/) {
print "$file # line ${.}: Bracket indentation incorrect\n";
}
Here, I'm trying to match:
$prev: A line not ended with a semi-colon, followed by...
$current: A line not having the number of leading tabs+1 of the previous line.
This doesn't seem to match anything, at the moment.
the $prev variable needs some modification.
it should be something like \t* then .+ then not ending in semicolon
also, the $current should be like:
anything ending in ; or { or } not having the number of leading tabs+1 of the previous line.
EDIT
the perl code to try the $prev
#!/usr/bin/perl -l
open(FP,"example.cpp");
while(<FP>)
{
if($_ =~ /^(\t*)[^;]+$/) {
print "got the line: $_";
}
}
close(FP);
//example.cpp
for(int i = 0;i<10;i++)
{
//not this;
//but this
}
//output
got the line: {
got the line: //but this
got the line: }
it did not detect the line with the for loop ...
am i missing something...
i see a couple of problems...
your prev regex matches all lines which do not have a ; anywhere. which will break on lines like (for int x = 1; x < 10; x++)
if the indent of the opening { is incorrect, you will not detect that.
try this instead, it only cares if you have a ;{ (followed by any whitespace) at the end.
/^(\s*).*[^{;]\s*$/
now you should change your strategy so that if you see a line which does not end in { or ; you increment the indent counter.
if you see a line which ends in }; or } decrement your indent counter.
compare all lines against this
/^\t{$counter}[^\s]/
so...
$counter = 0;
if (!($curr =~ /^\t{$counter}[^\s]/)) {
# error detected
}
if ($curr =~ /[};]+/) {
$counter--;
} else if ($curr =~ /^(\s*).*[^{;]\s*$/) }
$counter++;
}
sorry for not styling my code according to your standards... :)
And you intend to only count tabs (not spaces) for indentation?
Writing this kind of checker is complicated. Just think about all the possible constructs that uses braces that should not change indentation:
s{some}{thing}g
qw{ a b c }
grep { defined } #a
print "This is just a { provided to confuse";
print <<END;
This {
$is = not $code
}
END
But anyway, if the issues above aren't important to you, consider whether the semi colon is important at all in your regex. After all, writing
while($ok)
{
sort { some_op($_) }
grep { check($_} }
my_func(
map { $_->[0] } #list
);
}
Should be possible.
Have you considered looking at Perltidy?
Perltidy is a Perl script that reformats Perl code into set standards. Granted, what you have isn't part of the Perl standard, but you can probably tweak the curly braces via the configuration file Perltidy uses. If all else fails, you can hack through the code. After all, Perltidy is just a Perl script.
I haven't really used it, but it might be worth looking into. Your problem is trying to locate all the various edge cases, and making sure you're handling them correctly. You can parse 100 programs to find that the 101st reveal problems in your formatter. Perltidy has been used by thousands of people on millions of lines of code. If there is an issue, it probably already has been found.

awk: Either modify or append a line, based on its existence

I have a small awk script that does some in-place file modifications (to a Java .properties file, to give you an idea). This is part of a deployment script affecting a bunch of users.
I want to be able to set defaults, leaving the rest of the file at the user's preferences. This means appending a configuration line if it is missing, modifying it if it is there, leaving everything else as it is.
Currently I use something like this:
# initialize
BEGIN {
some_value_set = 0
other_value_set = 0
some_value_default = "some.value=SOME VALUE"
other_value_default = "other.value=OTHER VALUE"
}
# modify existing lines
{
if (/^some\.value=.*/)
{
gsub(/.*/, some_value_default)
some_value_set = 1
}
else if (/^other\.value=.*/)
{
gsub(/.*/, other_value_default)
other_value_set = 1
}
print $0
}
# append missing lines
END {
if (some_value_set == 0) print some_value_default
if (other_value_set == 0) print other_value_default
}
Especially when the number of lines I want to control gets larger, this is increasingly cumbersome. My awk knowledge is not all that great, and the above just feels wrong - how can I streamline this?
P.S.: If possible, I'd like to stay with awk. Please don't just recommend that using Perl/Python/whatever would be much easier. :-)
BEGIN {
defaults["some.value"] = "SOME VALUE"
defaults["other.value"] = "OTHER VALUE"
}
{
for (key in defaults) {
pattern = key
gsub(/\./, "\\.", pattern)
if (match($0, "^" pattern "=.*")) {
gsub(/=.*/, "=" defaults[key])
delete defaults[key]
}
}
print $0
}
END {
for (key in defaults) {
print key "=" defaults[key]
}
}
My AWK is rusty, so I won't provide actual code.
Initialize an array with the regular expressions and values.
For each line, iterate the array and do appropriate substitutions. Clean out used entries.
At end, iterate the array and append lines for remaining entries.