Grepping a 20g file in bash - regex
Question about code performance: I'm trying to run ~25 regex rules against a ~20g text file. The script should output matches to text files; each regex rule generates its own file. See the pseudocode below:
regex_rules=~/Documents/rulesfiles/regexrulefile.txt
for tmp in *.unique20gbfile.suffix; do
while read line
# Each $line in the looped-through file contains a regex rule, e.g.,
# egrep -i '(^| )justin ?bieber|(^| )selena ?gomez'
# $rname is a unique rule name generated by a separate bash function
# exported to the current shell.
do
cmd="$line $tmp > ~/outputdir/$tmp.$rname.filter.piped &"
eval $cmd
done < $regex_rules
done
Couple thoughts:
Is there a way to loop the text file just once, evaluating all rules and splitting to individual files in one go? Would this be faster?
Is there a different tool I should be using for this job?
Thanks.
This is the reason grep has a -f option. Reduce your regexrulefile.txt to just the regexps, one per line, and run
egrep -f regexrulefile.txt the_big_file
This produces all the matches in a single output stream, but you can do your loop thing on it afterward to separate them out. Assuming the combined list of matches isn't huge, this will be a performance win.
I did something similar with lex. Of course, it runs every other day, so YMMV. It is very fast, even on several hundred megabyte files on a remote windows share. It takes only a few seconds to process. I don't know how comfortable you are hacking up a quick C program, but I've found this to be the fastest, easiest solution for large scale regex problems.
Parts redacted to protect the guilty:
/**************************************************
start of definitions section
***************************************************/
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <getopt.h>
#include <errno.h>
char inputName[256];
// static insert variables
//other variables
char tempString[256];
char myHolder[256];
char fileName[256];
char unknownFileName[256];
char stuffFileName[256];
char buffer[5];
/* we are using pointers to hold the file locations, and allow us to dynamically open and close new files */
/* also, it allows us to obfuscate which file we are writing to, otherwise this couldn't be done */
FILE *yyTemp;
FILE *yyUnknown;
FILE *yyStuff;
// flags for command line options
static int help_flag = 0;
%}
%option 8bit
%option nounput nomain noyywrap
%option warn
%%
/************************************************
start of rules section
*************************************************/
(\"A\",\"(1330|1005|1410|1170)\") {
strcat(myHolder, yytext);
yyTemp = &(*yyStuff);
} //stuff files
. { strcat(myHolder, yytext); }
\n {
if (&(*yyTemp) == &(*yyUnknown))
unknownCount += 1;
strcat(myHolder, yytext);
//print to file we are pointing at, whatever it is
fprintf(yyTemp, "%s", myHolder);
strcpy(myHolder, "");
yyTemp = &(*yyUnknown);
}
<<EOF>> {
strcat(myHolder, yytext);
fprintf(yyTemp, "%s", myHolder);
strcpy(myHolder, "");
yyTemp = &(*yyUnknown);
yyterminate();
}
%%
/****************************************************
start of code section
*****************************************************/
int main(int argc, char **argv);
int main (argc,argv)
int argc;
char **argv;
{
/****************************************************
The main method drives the program. It gets the filename from the
command line, and opens the initial files to write to. Then it calls the lexer.
After the lexer returns, the main method finishes out the report file,
closes all of the open files, and prints out to the command line to let the
user know it is finished.
****************************************************/
int c;
// the gnu getopt library is used to parse the command line for flags
// afterwards, the final option is assumed to be the input file
while (1) {
static struct option long_options[] = {
/* These options set a flag. */
{"help", no_argument, &help_flag, 1},
/* These options don't set a flag. We distinguish them by their indices. */
{0, 0, 0, 0}
};
/* getopt_long stores the option index here. */
int option_index = 0;
c = getopt_long (argc, argv, "h",
long_options, &option_index);
/* Detect the end of the options. */
if (c == -1)
break;
switch (c) {
case 0:
/* If this option set a flag, do nothing else now. */
if (long_options[option_index].flag != 0)
break;
printf ("option %s", long_options[option_index].name);
if (optarg)
printf (" with arg %s", optarg);
printf ("\n");
break;
case 'h':
help_flag = 1;
break;
case '?':
/* getopt_long already printed an error message. */
break;
default:
abort ();
}
}
if (help_flag == 1) {
printf("proper syntax is: yourProgram.exe [OPTIONS]... INFILE\n");
printf("splits csv file into multiple files")
printf("Option list: \n");
printf("--help print help to screen\n");
printf("\n");
return 0;
}
//get the filename off the command line and redirect it to input
//if there is no filename then use stdin
if (optind < argc) {
FILE *file;
file = fopen(argv[optind], "r");
if (!file) {
fprintf (stderr, "%s: Couldn't open file %s; %s\n", argv[0], argv[optind], strerror (errno));
exit(errno);
}
yyin = file;
strcpy(inputName, argv[optind]);
}
else {
printf("no input file set, using stdin. Press ctrl-c to quit");
yyin = stdin;
strcpy(inputName, "\b\b\b\b\bagainst stdin");
}
//set up initial file names
strcpy(fileName, inputName);
strncpy(unknownFileName, fileName, strlen(fileName)-4);
strncpy(stuffFileName, fileName, strlen(fileName)-4);
strcat(unknownFileName, "_UNKNOWN_1.csv");
strcat(stuffFileName, "_STUFF_1.csv");
//open files for writing
yyout = stdout;
yyTemp = malloc(sizeof(FILE));
yyUnknown = fopen(unknownFileName,"w");
yyTemp = &(*yyUnknown);
yyStuff = fopen(stuffFileName,"w");
yylex();
//close open files
fclose(yyUnknown);
printf("Lexer finished running %s",fileName);
return 0;
}
To build this flex program, have flex installed, and use this makefile (adjust the paths):
TARGET = project.exe
TESTBUILD = project
LEX = flex
LFLAGS = -Cf
CC = i586-mingw32msvc-gcc
CFLAGS = -O -Wall
INSTALLDIR = /mnt/J/Systems/executables
.PHONY: default all clean install uninstall cleanall
default: $(TARGET)
all: default install
OBJECTS = $(patsubst %.l, %.c, $(wildcard *.l))
%.c: %.l
$(LEX) $(LFLAGS) -o $# $<
.PRECIOUS: $(TARGET) $(OBJECTS)
$(TARGET): $(OBJECTS)
$(CC) $(OBJECTS) $(CFLAGS) -o $#
linux: $(OBJECTS)
gcc $(OBJECTS) $(CFLAGS) -lm -g -o $(TESTBUILD)
cleanall: clean uninstall
clean:
-rm -f *.c
-rm -f $(TARGET)
-rm -f $(TESTBUILD)
uninstall:
-rm -f $(INSTALLDIR)/$(TARGET)
install:
cp -f $(TARGET) $(INSTALLDIR)
A quick (!= too fast) Perl solution:
#!/usr/bin/perl
use strict; use warnings;
We preload regexes so that we read their files only once. They are stored in the array #regex. The regex file is the first file given as argument.
open REGEXES, '<', shift(#ARGV) or die;
my #regex = map {qr/$_/} <REGEXES>;
# use the following if the file still includes the egrep:
# my #regex = map {
# s/^egrep \s+ -i \s+ '? (.*?) '? \s* $/$1/x;
# qr{$_}
# } <REGEXES>;
close REGEXES or die;
We go through each remaining file that was given as argument:
while (#ARGV) {
my $filename = shift #ARGV;
We pre-open files for efficiency:
my #outfile = map {
open my $fh, '>', "outdir/$filename.$_.filter.piped"
or die "Couldn't open outfile for $filename, rule #$_";
$fh;
} (1 .. scalar(#rule));
open BIGFILE, '<', $filename or die;
We print all lines that match a rule to the specified file.
while (not eof BIGFILE) {
my $line = <BIGFILE>;
for $ruleNo (0..$#regex) {
print $outfile[$ruleNo] $line if $line =~ $regex[$ruleNo];
# if only the first match is interesting:
# if ($line =~ $regex[$ruleNo]) {
# print $outfile[$ruleNo] $line;
# last;
# }
}
}
Cleaning up before the next iteration:
foreach (#outfile) {
close $_ or die;
}
close BIGFILE or die;
}
print "Done";
Invocation: $ perl ultragrepper.pl regexFile bigFile1 bigFile2 bigFile3 etc. Anything quicker would have to be written directly in C. Your hard-disk data transfer speed is the limit.
This should run quicker as the bash pendant because I avoid re-opening files or reparsing regexes. Plus, no new processes have to be spawned for external tools. But we could spawn several threads! (at least NumOfProcessors * 2 threads may be sensible)
local $SIG{CHLD} = undef;
while (#ARGV) {
next if fork();
...;
last;
}
I also decided to come back here and write a perl version, before noticing that amon had already done it. Since it's already written, here's mine:
#!/usr/bin/perl -W
use strict;
# The search spec file consists of lines arranged in pairs like this:
# file1
# [Ff]oo
# file2
# [Bb]ar
# The first line of each pair is an output file. The second line is a perl
# regular expression. Every line of the input file is tested against all of
# the regular expressions, so an input line can end up in more than one
# output file if it matches more than one of them.
sub usage
{
die "Usage: $0 search_spec_file [inputfile...]\n";
}
#ARGV or usage();
my #spec;
my $specfile = shift();
open my $spec, '<', $specfile or die "$specfile: $!\n";
while(<$spec>) {
chomp;
my $outfile = $_;
my $regexp = <$spec>;
chomp $regexp;
defined($regexp) or die "$specfile: Invalid: Odd number of lines\n";
open my $out, '>', $outfile or die "$outfile: $!\n";
push #spec, [$out, qr/$regexp/];
}
close $spec;
while(<>) {
for my $spec (#spec) {
my ($out, $regexp) = #$spec;
print $out $_ if /$regexp/;
}
}
Reverse the structure: read the file in, then loop over the rules so you only perform matchs on individual lines.
regex_rules=~/Documents/rulesfiles/regexrulefile.txt
for tmp in *.unique20gbfile.suffix; do
while read line ; do
while read rule
# Each $line in the looped-through file contains a regex rule, e.g.,
# egrep -i '(^| )justin ?bieber|(^| )selena ?gomez'
# $rname is a unique rule name generated by a separate bash function
# exported to the current shell.
do
cmd=" echo $line | $rule >> ~/outputdir/$tmp.$rname.filter.piped &"
eval $cmd
done < $regex_rules
done < $tmp
done
At this point though you could/should use bash (or perl's) built-in regex matching rather than have it fire up a separate egrep process for each match. You might also be able to split the file
and run parallel processes. (Note I also corrected > to >>)
Related
Redirect ffmpeg console output to a string or a file in C++
I'm trying to use ffmpeg to do some operations for me. It's really simple for now. I want to omit the ffmpeg output in my console, either redirecting them to strings or a .txt file that I can control. I'm on Windows 10. I have tried _popen (with and "r" and "w") and system("ffmpeg command > output.txt")', with no success. #include <iostream> #include <stdio.h> using namespace std; #define BUFSIZE 256 int main() { /* 1. x = system("ffmpeg -i video.mp4 -i audio.mp4 -c copy output.mp4 > output.txt"); */ /* 2. FILE* p; p = _popen("ffmpeg -i video.mp4 -i audio.mp4 -c copy output.mp4", "w"); _pclose(p); */ /* 3. char cmd[200] = { "ffmpeg -i video.mp4 -i audio.mp4 -c copy output.mp4" }; char buf[BUFSIZE]; FILE* fp; if ((fp = _popen(cmd, "r")) == NULL) { printf("Error opening pipe!\n"); return -1; } while (fgets(buf, BUFSIZE, fp) != NULL) { // Do whatever you want here... // printf("OUTPUT: %s", buf); } if (_pclose(fp)) { printf("Command not found or exited with error status\n"); return -1; } */ return 0; } Further in the development, I would like to know when the ffmpeg process finished (maybe I can monitor the ffmpeg return value?) or to display only the last line if the some error occurred.
I have made it to work. In the solution 1, I added " 2>&1" to the end of the string. Found it here: ffmpeg command line write output to a text file output-to-a-text-file Thanks!
How to wait for t32rem DO script to complete?
It seems that doing t32rem localhost DO script.cmm is non-blocking. How can I block in a shell script until the cmm script is done? Here is an abbreviated example: $ time t32rem localhost wait 5s real 0m5.048s $ cat wait-5s.cmm WAIT 5s ENDDO $ time t32rem localhost do wait-5s real 0m0.225s I can try to do some sort of t32rem localhost wait STATE.RUN() based on whatever the exact script is doing but this is not a very good solution. Reading through api_remote.pdf it does note that T32_Cmd for DO is non-blocking and recommends polling using T32_GetPractice but it's not clear how to translate this to t32rem.
In my opinion you questions is a rather good one. First the bummer: t32rem is not suitable to wait for the execution of a script. In fact t32rem cancels any running script before executing a command with T32_Stop(). (You can find the source code of t32rem in your TRACE32 installation at "C:\T32\demo\api\capi\test\t32rem.c") So your suggestion to use t32rem localhost wait STATE.RUN() will definitely not work because it would cancel the running script. Furthermore STATE.RUN()returns the running state of the debugged CPU and not of the PRACTICE interpreter. So in fact you have to use T32_GetPractice() to wait for the PRACTICE script to terminate. To use T32_GetPractice() you either have to link statically or dynamically the "API for Remote Control and JTAG Access in C" to an application that launches your script. For dynamic linking (e.g. from a Python script) load "C:\T32\demo\api\capi\dll\t32api.dll". (Depending on your host operation system you might need t32api64.dll, t32api.so, or t32api64.so instead.) For static linking (e.g. from a binary application written in C) add the files from "C:\T32\demo\api\capi\src" to your project. And here is the code to write a command line application t32do, which starts a PRACTICE script and waits until the script terminates: #include <stdlib.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include "t32.h" int main(int argc, char *argv[]) { int pstate; const char *script; if (argc == 4 && !strncmp(argv[2],"port=", 5)) { if ( T32_Config( "PORT=", argv[2]+5 ) == -1 ) { printf("Port number %s not accepted\n", argv[2] ); exit(2); } script = argv[3]; } else { if (argc != 3) { printf( "Usage: t32do <host> [port=<n>] <script>\n" ); exit(2); } script = argv[2]; } if ( T32_Config( "NODE=", argv[1] ) == -1 ) { printf( "Hostname %s not accepted\n", argv[1] ); exit(2); } if ( T32_Init() != 0 || T32_Attach(1) != 0){ printf( "Failed to connect to TRACE32\n" ); exit(2); } if ( T32_Cmd_f("DO \"%s\"", script) != 0 ){ // Launch PRACTICE script printf( "Failed to start PRACTICE script\n" ); T32_Exit(); exit(1); } while (T32_GetPracticeState(&pstate) == 0 && pstate != 0){ // Wait until PRACTICE script terminates usleep(10000); } T32_Exit(); return 0; } Put the source in a file named t32do.c in "C:\T32\demo\api\capi\src" and build the application with the following makefile, which works on both Windows (by using the MinGW compiler of Cygwin) and Linux: BIN := t32do OBJ := t32do.o hremote.o hlinknet.o OS := $(shell uname -s) ifneq ($(findstring CYGWIN,$(OS)),) CC := x86_64-w64-mingw32-gcc LOPT := -lws2_32 COPT := -DT32HOST_LE endif ifneq ($(findstring Linux,$(OS)),) CC := gcc COPT := -DT32HOST_LE endif all: $(BIN) $(BIN): $(OBJ) $(CC) $^ -s -o $# $(LOPT) %.o: %.c t32.h $(CC) -c $(COPT) -o $# $< clean: -rm $(OBJ) $(BIN) If it compiles and links fine, you'll get an application t32do.exe. Use it in the form: t32do <host> [port=<n>] <practice script> My example code above is licensed under Creative Commons Zero 1.0. Use it any way you wish, in any code you want.
My PowerShell Script Not Working As Expected (for compiling C++ files)
I am a C++ programmer. I wanted to automate the task of compiling, running and debugging of a program into one neat PowerShell script. But it unexpectedly throws unrelated error which I don't know why. The program takes C++ file(s) as input, produces a compiled .exe file and runs the program, all at once. It also takes other little debugging options. if (!($args.count -ge 1)) { Write-Host "Missing arguments: Provide the filename to compile" exit } $isRun = 1 $EXE_NM = [IO.Path]::GetFileNameWithoutExtension($args[0]) $GPP_ARGS = "-o " + $EXE_NM + ".exe" $count = 0 foreach ($op in $args) { if ($op -eq "-help" -or $op -eq "-?") { Write-Host "Format of the command is as follows:-" Write-Host "cpr [filename.cpp] {additional files}" Write-Host "{-add [compiler options] (all options of -add should be in double quotes altogether)}" Write-Host "[-d (short for -add -g)] [-nr (do not run automatically)]" exit } elseif ($op.Contains(".cxx") -or $op.Contains(".cpp")) { $op = """$op""" $GPP_ARGS += " " + $op } elseif ($op -eq "-add") { if (($count+1) -ne $args.Count) { $GPP_ARGS += " " + $args[$count+1] } } elseif ($op -eq "-d") { $GPP_ARGS += " -g" } elseif ($op -eq "-nr") { $isRun = 0 } $count += 1 } $err = & g++.exe $GPP_ARGS 2>&1 if ($LastExitCode -eq 0) { if ($isRun -eq 1) { if ($isDebug -eq 1) { gdb.exe $EXE_NM } else { iex $EXE_NM } } if ($err.length -ne 0) { Write-Host $err -ForegroundColor "Yellow" } } else { Write-Host "$err" -ForegroundColor "Red" } For example: When I try to do cpr.ps1 HCF.cpp it throws the following error: g++.exe: fatal error: no input files compilation terminated. I have ensured that the .cpp file exists in the current working directory.
I second the recommendation of using make rather than writing your own build script. A simple re-usable Makefile isn't that difficult to write: CXX = g++.exe CPPFLAGS ?= -O2 -Wall # get all files whose extension begins with c and is at least 2 characters long, # i.e. foo.cc, foo.cpp, foo.cxx, ... # NOTE: this also includes files like foo.class, etc. SRC = $(wildcard *.c?*) # pick the first file from the above list and change the extension to .exe APP = $(basename $(word 1, $(SRC))).exe $(APP): $(SRC) $(CXX) $(CPPFLAGS) -o $# $< .PHONY: run run: $(APP) #./$(APP) .PHONY: debug debug: $(APP) #gdb $(APP) .PHONY: clean clean: $(RM) $(APP) make builds the program (if required). make run executes the program after building it. make debug runs the program in gdb. make clean deletes the program. You can override the default CPPFLAGS by defining an environment variable: $env:CPPFLAGS = '-g' make
FFMPEG and convert swf to image?
I want to use ffmpeg to convert swf to png ,and I can't extract image from some kind of swf like: http://rapidshare.com/files/450953994/Movie1.swf and I use this code in bat file(1.bat) cws2fws Movie1.swf 3.swf ffmpeg -i 3.swf -f image2 -vcodec png tese%d.png Please help me!! I only want to convert swf to image also suggestion other way sound helpful?
Mencoder doesn't support the compression [swf # 0xc230a0]Compressed SWF format not supported. Give a try to http://www.swftools.org/download.html (I have tried myself after compiling swftools but without success). swfextract return $ swfextract test.swf Objects in file test.swf: [-i] 1 Shape: ID(s) 1 [-f] 1 Frame: ID(s) 0 No video, no sound, no png… Update −−−−−− After several errands, swfrender from swftools do the job. There is a non documented pagerange option. From swfrender.c: int args_callback_option(char*name,char*val) { if(!strcmp(name, "V")) { printf("swfrender - part of %s %s\n", PACKAGE, VERSION); exit(0); } else if(!strcmp(name, "o")) { […] } else if(!strcmp(name, "p")) { pagerange = val; return 1; } else if(!strcmp(name, "s")) { […] return 0; } Now knowing that, you could do a shell script (here quick and dirty with bash): #!/bin/bash let count=1 swfinput=$1 while : do output=`basename $swfinput .swf`$count.png swfrender $swfinput -p $count -o $output if [ ! -f $output ]; then break fi echo swfrender $swfinput -p $count -o $output ((count++)) done That's it…
Run C or C++ file as a script
So this is probably a long shot, but is there any way to run a C or C++ file as a script? I tried: #!/usr/bin/gcc main.c -o main; ./main int main(){ return 0; } But it says: ./main.c:1:2: error: invalid preprocessing directive #!
Short answer: //usr/bin/clang "$0" && exec ./a.out "$#" int main(){ return 0; } The trick is that your text file must be both valid C/C++ code and shell script. Remember to exit from the shell script before the interpreter reaches the C/C++ code, or invoke exec magic. Run with chmod +x main.c; ./main.c. A shebang like #!/usr/bin/tcc -run isn't needed because unix-like systems will already execute the text file within the shell. (adapted from this comment) I used it in my C++ script: //usr/bin/clang++ -O3 -std=c++11 "$0" && ./a.out; exit #include <iostream> int main() { for (auto i: {1, 2, 3}) std::cout << i << std::endl; return 0; } If your compilation line grows too much you can use the preprocessor (adapted from this answer) as this plain old C code shows: #if 0 clang "$0" && ./a.out rm -f ./a.out exit #endif int main() { return 0; } Of course you can cache the executable: #if 0 EXEC=${0%.*} test -x "$EXEC" || clang "$0" -o "$EXEC" exec "$EXEC" #endif int main() { return 0; } Now, for the truly eccentric Java developer: /*/../bin/true CLASS_NAME=$(basename "${0%.*}") CLASS_PATH="$(dirname "$0")" javac "$0" && java -cp "${CLASS_PATH}" ${CLASS_NAME} rm -f "${CLASS_PATH}/${CLASS_NAME}.class" exit */ class Main { public static void main(String[] args) { return; } } D programmers simply put a shebang at the beginning of text file without breaking the syntax: #!/usr/bin/rdmd void main(){} See: https://unix.stackexchange.com/a/373229/23567 https://stackoverflow.com/a/12296348/199332
For C, you may have a look at tcc, the Tiny C Compiler. Running C code as a script is one of its possible uses.
$ cat /usr/local/bin/runc #!/bin/bash sed -n '2,$p' "$#" | gcc -o /tmp/a.out -x c++ - && /tmp/a.out rm -f /tmp/a.out $ cat main.c #!/bin/bash /usr/local/bin/runc #include <stdio.h> int main() { printf("hello world!\n"); return 0; } $ ./main.c hello world! The sed command takes the .c file and strips off the hash-bang line. 2,$p means print lines 2 to end of file; "$#" expands to the command-line arguments to the runc script, i.e. "main.c". sed's output is piped to gcc. Passing - to gcc tells it to read from stdin, and when you do that you also have to specify the source language with -x since it has no file name to guess from.
Since the shebang line will be passed to the compiler, and # indicates a preprocessor directive, it will choke on a #!. What you can do is embed the makefile in the .c file (as discussed in this xkcd thread) #if 0 make $# -f - <<EOF all: foo foo.o: cc -c -o foo.o -DFOO_C $0 bar.o: cc -c -o bar.o -DBAR_C $0 foo: foo.o bar.o cc -o foo foo.o bar.o EOF exit; #endif #ifdef FOO_C #include <stdlib.h> extern void bar(); int main(int argc, char* argv[]) { bar(); return EXIT_SUCCESS; } #endif #ifdef BAR_C void bar() { puts("bar!"); } #endif The #if 0 #endif pair surrounding the makefile ensure the preprocessor ignores that section of text, and the EOF marker marks where the make command should stop parsing input.
CINT: CINT is an interpreter for C and C++ code. It is useful e.g. for situations where rapid development is more important than execution time. Using an interpreter the compile and link cycle is dramatically reduced facilitating rapid development. CINT makes C/C++ programming enjoyable even for part-time programmers.
You might want to checkout ryanmjacobs/c which was designed for this in mind. It acts as a wrapper around your favorite compiler. #!/usr/bin/c #include <stdio.h> int main(void) { printf("Hello World!\n"); return 0; } The nice thing about using c is that you can choose what compiler you want to use, e.g. $ export CC=clang $ export CC=gcc So you get all of your favorite optimizations too! Beat that tcc -run! You can also add compiler flags to the shebang, as long as they are terminated with the -- characters: #!/usr/bin/c -Wall -g -lncurses -- #include <ncurses.h> int main(void) { initscr(); /* ... */ return 0; } c also uses $CFLAGS and $CPPFLAGS if they are set as well.
#!/usr/bin/env sh tail -n +$(( $LINENO + 1 )) "$0" | cc -xc - && { ./a.out "$#"; e="$?"; rm ./a.out; exit "$e"; } #include <stdio.h> int main(int argc, char const* argv[]) { printf("Hello world!\n"); return 0; } This properly forwards the arguments and the exit code too.
Quite a short proposal would exploit: The current shell script being the default interpreter for unknown types (without a shebang or a recognizable binary header). The "#" being a comment in shell and "#if 0" disabling code. #if 0 F="$(dirname $0)/.$(basename $0).bin" [ ! -f $F -o $F -ot $0 ] && { c++ "$0" -o "$F" || exit 1 ; } exec "$F" "$#" #endif // Here starts my C++ program :) #include <iostream> #include <unistd.h> using namespace std; int main(int argc, char **argv) { if (argv[1]) clog << "Hello " << argv[1] << endl; else clog << "hello world" << endl; } Then you can chmod +x your .cpp files and then ./run.cpp. You could easily give flags for the compiler. The binary is cached in the current directory along with the source, and updated when necessary. The original arguments are passed to the binary: ./run.cpp Hi It doesn't reuse the a.out, so that you can have multiple binaries in the same folder. Uses whatever c++ compiler you have in your system. The binary starts with "." so that it is hidden from the directory listing. Problems: What happens on concurrent executions?
Variatn of John Kugelman can be written in this way: #!/bin/bash t=`mktemp` sed '1,/^\/\/code/d' "$0" | g++ -o "$t" -x c++ - && "$t" "$#" r=$? rm -f "$t" exit $r //code #include <stdio.h> int main() { printf("Hi\n"); return 0; }
Here's yet another alternative: #if 0 TMP=$(mktemp -d) cc -o ${TMP}/a.out ${0} && ${TMP}/a.out ${#:1} ; RV=${?} rm -rf ${TMP} exit ${RV} #endif #include <stdio.h> int main(int argc, char *argv[]) { printf("Hello world\n"); return 0; }
I know this question is not a recent one, but I decided to throw my answer into the mix anyways. With Clang and LLVM, there is not any need to write out an intermediate file or call an external helper program/script. (apart from clang/clang++/lli) You can just pipe the output of clang/clang++ to lli. #if 0 CXX=clang++ CXXFLAGS="-O2 -Wall -Werror -std=c++17" CXXARGS="-xc++ -emit-llvm -c -o -" CXXCMD="$CXX $CXXFLAGS $CXXARGS $0" LLICMD="lli -force-interpreter -fake-argv0=$0 -" $CXXCMD | $LLICMD "$#" ; exit $? #endif #include <cstdio> int main (int argc, char **argv) { printf ("Hello llvm: %d\n", argc); for (auto i = 0; i < argc; i++) { printf("%d: %s\n", i, argv[i]); } return 3==argc; } The above however does not let you use stdin in your c/c++ script. If bash is your shell, then you can do the following to use stdin: #if 0 CXX=clang++ CXXFLAGS="-O2 -Wall -Werror -std=c++17" CXXARGS="-xc++ -emit-llvm -c -o -" CXXCMD="$CXX $CXXFLAGS $CXXARGS $0" LLICMD="lli -force-interpreter -fake-argv0=$0" exec $LLICMD <($CXXCMD) "$#" #endif #include <cstdio> int main (int argc, char **argv) { printf ("Hello llvm: %d\n", argc); for (auto i = 0; i < argc; i++) { printf("%d: %s\n", i, argv[i]); } for (int c; EOF != (c=getchar()); putchar(c)); return 3==argc; }
There are several places that suggest the shebang (#!) should remain but its illegal for the gcc compiler. So several solutions cut it out. In addition it is possible to insert a preprocessor directive that fixes the compiler messages for the case the c code is wrong. #!/bin/bash #ifdef 0 xxx=$(mktemp -d) awk 'BEGIN { print "#line 2 \"$0\""; first=1; } { if (first) first=0; else print $0 }' $0 |\ g++ -x c++ -o ${xxx} - && ./${xxx} "$#" rv=$? \rm ./${xxx} exit $rv #endif #include <iostream> int main(int argc,char *argv[]) { std::cout<<"Hello world"<<std::endl; }
As stated in a previous answer, if you use tcc as your compiler, you can put a shebang #!/usr/bin/tcc -run as the first line of your source file. However, there is a small problem with that: if you want to compile that same file, gcc will throw an error: invalid preprocessing directive #! (tcc will ignore the shebang and compile just fine). If you still need to compile with gcc one workaround is to use the tail command to cut off the shebang line from the source file before piping it into gcc: tail -n+2 helloworld.c | gcc -xc - Keep in mind that all warnings and/or errors will be off by one line. You can automate that by creating a bash script that checks whether a file begins with a shebang, something like if [[ $(head -c2 $1) == '#!' ]] then tail -n+2 $1 | gcc -xc - else gcc $1 fi and use that to compile your source instead of directly invoking gcc.
Just wanted to share, thanks to Pedro's explanation on solutions using the #if 0 trick, I have updated my fork on TCC (Sugar C) so that all examples can be called with shebang, finally, with no errors when looking source on the IDE. Now, code displays beautifully using clangd in VS Code for project sources. Samples first lines look like: #if 0 /usr/local/bin/sugar `basename $0` $# && exit; // above is a shebang hack, so you can run: ./args.c <arg 1> <arg 2> <arg N> #endif The original intention of this project always has been to use C as if a scripting language using TCC base under the hood, but with a client that prioritizes ram output over file output (without the of -run directive). You can check out the project at: https://github.com/antonioprates/sugar
I like to use this as the first line at the top of my programs: For C (technically: gnu C as I've specified it below): ///usr/bin/env ccache gcc -Wall -Wextra -Werror -O3 -std=gnu17 "$0" -o /tmp/a -lm && /tmp/a "$#"; exit For C++ (technically: gnu++ as I've specified it below): ///usr/bin/env ccache g++ -Wall -Wextra -Werror -O3 -std=gnu++17 "$0" -o /tmp/a -lm && /tmp/a "$#"; exit ccache helps ensure your compiling is a little more efficient. Install it in Ubuntu with sudo apt update && sudo apt install ccache. For Go (golang) and some explanations of the lines above, see my other answer here: What's the appropriate Go shebang line?