awk error "makes too many open files" - regex

I have a awk based splitter that splits a huge file based on regex. But the problem is that I am getting a makes too many files error. Even i have a conditional close. If you could help me figure out what I am doing wrong I would be much grateful.
awk 'BEGIN { system("mkdir -p splitted/sub"++j) }
/<doc/{x="F"++i".xml";}{
if (i%5==0 ){
++i;
close("splitted/sub"j"/"x);
system("mkdir -p splitted/sub"++j"/");
}
else{
print > ("splitted/sub"j"/"x);
}
}' wiki_parsed.xml

The simple answer is that close isn't being called often enough. Here's an illustrative example of why:
Using an input file like:
<doc somestuff
another line
yet another line
<doc the second
still more data
<doc the third
<doc the fourth
<doc the fifth
I can make an executable awk file based on your script like:
#!/usr/bin/awk -f
BEGIN { system_(++j) }
/<doc/{x=++i}
{
if (i%5==0 ){ ++i; close_(j"/"x); system_(++j) }
else{ open_(j"/"x) }
}
function call_f(funcname, arg) { print funcname"("arg")" }
function system_(cnt) { call_f( "system", cnt ) }
function open_(f) { if( !(f in a) ) { call_f( "open", f ); a[f]++ } }
function close_(f) { call_f( "close", f ) }
which if I put into a file called awko can be run like awko data to produce the following:
system(1)
open(1/1)
open(1/2)
open(1/3)
open(1/4)
close(1/5)
system(2)
The script I made is just indicating how many times you're calling each function by shadowing a real function call with a local function with a trailing _. Notice how many times open() is printed compared to close() for the same arguments. Also, I ended up renaming print > to open_ just to illustrated that it's what's opening the files( once per file name ).
If I change the executable awk file to the following, you can see close being called enough:
#!/usr/bin/awk -f
BEGIN { system_(++j) }
/<doc/{ close_(j"/"x); x=++i } # close_() call is moved to here.
{
if (i%5==0 ){ ++i; system_(++j) }
else{ open_(j"/"x) }
}
function call_f(funcname, arg) { print funcname"("arg")" }
function system_(cnt) { call_f( "system", cnt ) }
function open_(f) { if( !(f in a) ) { call_f( "open", f ); a[f]++ } }
function close_(f) { call_f( "close", f ) }
which gives the following output:
system(1)
close(1/)
open(1/1)
close(1/1)
open(1/2)
close(1/2)
open(1/3)
close(1/3)
open(1/4)
close(1/4)
system(2)
where it should be clear that close() is being called one more time than enough. The first time it's being called on a file that doesn't exist. With a true close() call, the fact that such a file has never been printed should just be ignored and no actual close will be attempted. In each other case, the last open() matches a close() call.
Moving your close() call in your script as in the second example script should fix your error.

This is what i got it to be working perfectly
awk 'BEGIN { system("mkdir -p splitted/sub"++j) }
/<doc/{x="F"++i".xml";}{
if (i%1995==0 ){
++i;
system("mkdir -p splitted/sub"++j"/");
}
else{
print >> ("splitted/sub"j"/"x);
close("splitted/sub"j"/"x);
}
}' wiki_parsed.xml

Related

Why doesn't QTextBrowser::clear() work immediately when i call it?

I want achieve a function : when I clickt the "start" button,the textBrowser will be cleaned up immediately and then the textBrowser show something for user.But I use QtextBrowser::clear() ,it didn't work immediately.
//constructor
connect(StartBtn,SIGNAL(clicked()),this,SLOT(interfaceStart()));
bool MainWindow::interfaceStart()
{
ui->textBrowser->clear();// this line does not seem to work immediately before `while(fgets...)` loop is finished;
if(interfaceLine->text()!=QString("") && interfaceSpinBox->text()!="0"){
ui->textBrowser->insertPlainText(QString("Starting capture interface %1 \n\n").arg(interfaceLine->text()));//this line also doesn't work
std::string cc = std::string("tshark -i ")
+std::string((interfaceLine->text()).toLocal8Bit().data())
+std::string(" -a duration:")
+std::string(interfaceSpinBox->text().toLocal8Bit().data());
char buf[1024];
char const *command=cc.c_str();
std::cout<<command;
std::cout.flush(); // when i see this line work, console will output command.But textBroser is't cleaned up
FILE *ptr;
if((ptr=popen(command, "r"))!=NULL)
{
while(fgets(buf, 1024, ptr)!=NULL)
{
ui->textBrowser->insertPlainText(buf);
}
pclose(ptr);
ptr = NULL;
}
return true;
}
QMessageBox::warning(this,QString("Error"),QString("Interface %1 doesn't exist or time is 0!").arg(interfaceLine->text()));
return false;
}
I saw console print command first,but testBrowser isn't cleaned up.When popen() isfinished,testBrowser will be cleaned.Why?
How to make clear() work right now?Can i flush the textBrowser?

ifstream - monitor updates to file

I am using ifstream to open a file and read line by line and print to console.
Now, I also want to make sure that if the file gets updated, it reflects. My code should handle that.
I tried setting fseek to end of the file and then looking for new entries by using peek. However, that did not work.
Here's some code I used
bool ifRead = true;
while (1)
{
if (ifRead)
{
if (!file2read.eof())
{
//valid file. not end of file.
while (getline(file2read, line))
printf("Line: %s \n", line.c_str());
}
else
{
file2read.seekg(0, file2read.end);
ifRead = false;
}
}
else
{
//I thought this would check if new content is added.
//in which case, "peek" will return a non-EOF value. else it will always be EOF.
if (file2read.peek() != EOF)
ifRead = true;
}
}
}
Any suggestions on what could be wrong or how I could do this.

Parsing digits from command line argv

I want to change a perl script that executes a loop some times, and I want to pass the number of loops by command line option. The program now receives some options, then I need to change it to receive a new parameter, but it is the first time I see a perl script, then I don't know how to change.
The start of program (to parse command line options) is:
if ($#ARGV >= 1) {
for ($i = 1; $i <= $#ARGV; $i++) {
if ($ARGV[$i] =~ /^\-/) {
if ($ARGV[$i] =~ /\-test/) {
//do something
}
} else {
//do something other
}
}
}
I think that I must put something like:
if ($ARGV[$i] =~ /^\-L40/)
But it only match to 40, I don't know how to parse the number attached to the -L parameter to use for the loop limit.
Thanks in advance and sorry if there is any similar question, but I don't find any.
use Getopt::Long qw( );
sub usage {
print(STDERR "usage: prog [--test] [-L NUM]\n");
exit(1);
}
GetOptions(
'test' => \my $opt_test,
'L=i' => \my $opt_L,
)
or usage();
die("-L must be followed by a positive integer\n")
if defined($opt_L) && $opt_L < 1;
Something like:
my $loopLimit = 1; # default
if ($#ARGV >= 1)
{
for ($i = 1; $i <= $#ARGV; $i++)
{
if ($ARGV[$i] =~ /^\-/)
{
if ($ARGV[$i] =~ /\-test/)
{
# do something
}
elsif ($ARGV[$i] =~ /\-L(\d+)/) # -L followed by digits
{
$loopLimit = $1;
}
}
else
{
# do something other
}
}
}

TCL: loops How to get out of inner most loop to outside?

In the below code once I hit check_access as 0 how do I preserve the value and hit the
if condition below ($check_root && $check_access) . Break will only terminate the inner loop. But the other loops will continue as per me.
} else {
set check_access 0
break
}
}
}
if {$check_root && $check_access} {
set result 1
} else {
set result 0
}
The break and continue operations only go out one level of looping. If you need more than that, consider refactoring so that you can just return. Alternatively, try a custom exception in Tcl 8.6:
try {
foreach a $longList1 {
foreach b $longList2 {
if {[someCondition $a $b]} {
# Custom non-error exception
return -level 0 -code 123
}
}
}
} on 123 {} {
# Do nothing; we're out of the loop
}
break jumps to the end of the innermost loop only, and Tcl has no goto. But return, unless it's inside a catch or similar, exits a procedure which is like jumping to the end of it. So if you make the outermost loop the last command of the procedure (if your code is top-level, you have to put it in a procedure first to be able to use return), you can use return as a multi-break. Just move the commands after the loop out of the procedure and into the caller's code:
proc callMe {} {
foreach ... {
foreach ... {
if ... {
return
}
}
}
# move this code from here...
}
callMe
# ...to here
Another way is to put in extra tests:
set done 0
foreach ... {
foreach ... {
foreach ... {
if ... {
set done 1
break
}
}
if {$done} {break}
}
if {$done} {break}
}

Problem with using function remove(..) from stdio.h C++

Hello my question is why the following function fails to delete the file whose name is specified in dir1;
I use the function remove but it seems that there is some kind of a problem with it.
Please help me.
#include <stdio.h>
void test(char* dir1,char* dir2)
{
FILE * file1,* file2;
file1=fopen(dir1,"r");
file2=fopen(dir2,"w");
if(!file1){ return;}
int inpch;
char* string = new char[10];
string[9]='\0';
int br=0;
do
{
while((inpch=fgetc(file1))!=EOF)
{
string[br]=char(inpch);
br++;
if(br==9)break;
}
if(br!=9)
{
string[br]='\0';
fputs(string,file2);
return;
}
else
{
fputs(string,file2);
br=0;
}
}while(true);
fclose(file1);
remove(dir1);/// I DON"T UNDERSTAND WHY IT DOESN"T DELETE THE FILE.
fclose(file2);
}
I guess at some point before exiting the do-while loop, the following if condition becomes true, and the function returns before it reaches to the end of the function, without even calling the remove function.
if(br!=9)
{
string[br]='\0';
fputs(string,file2);
return; //<------------ here you're returning!
}
Did you want to write return or break? Looks like its there the problem lies.
Why don't you check for the return value and error code (errno) that tells you exactly why the function didn't succeed?
Replace your remove call with this :
if( remove( "myfile.txt" ) != 0 )
perror( "Error deleting file" );
else
puts( "File successfully deleted" );
and it should tell you what happened.