Extracting a function body by name with BASH and regex - regex

I have some automatically generated code from MATLAB coder. I would like to make a script to find my entries out of large file. I've successfully plowed my way through regex with BASH to get the main function main\( *([^)]+?)\), and then the body with /\{([^}]+)\}/; however, I'm having a terrible time glueing those together. All I need is the function names contained in main().
I realize that this could be a terrible exercise, but the automatically generated code gives me simple functions that looks like:
int main(int argc, const char * const argv[])
{
(void)argc;
(void)argv;
/* Initialize the application. You do not need to do this more than one time. */
RT_initialize();
/* Invoke the entry-point functions. You can call entry-point functions multiple times. */
main_RT();
/* Terminate the application. You do not need to do this more than one time. */
RT_terminate();
return 0;
}
I would like to extract the that function and body, but my regex is poorer than I recalled.
Any guidance would be greatly appreciated.

A simple way to fairly reliably extract the entire function body is to run the code through a formatter first:
indent -kr < mymain.c | sed -n 's/^int main(/,/^}/p'
cflow can give you a function call graph. eg:
cflow -d2 mymain.c

Due to some restrictions to being on BSD, the resulting BASH function follows to get the function body from a C source for a function by name. This was only tested with the well-formatted C code from MATLAB's Coder.
function getFunctionInC(){
TMPFILEIDENT="/tmp/indent.$$.tmp" #temp file
indent "$1" $TMPFILEIDENT
cat $TMPFILEIDENT | awk '
BEGIN { state = 0; last = ""; }
$0 ~ /^'$2'\(/ { print last; state = 1; }
{ if (state == 1) print; }
$0 ~ /^}/ { if (state) state = 2; }
{ last = $0; }
'
}
The formatting is terrible on the outputs, but I can easily pull the function names to dynamically create defines. Thanks to everyone who read the question.

Related

C++ accepting command line argument with "-" symbol

I am new to c++ and trying to read command line arguments specified as below.
./helloworld -i input_file -o outputfile -s flag3 -t flag4
I tried hardcoding the flags by index as below
int main(int argc, char *argv[]) {
// argv[1] corresponds to -i
// argv[2] corresponds to input_file
// argv[3] corresponds to -o
// argv[4] corresponds to outputfile
// argv[5] corresponds to -s
// argv[6] corresponds to flag3
// argv[7] corresponds to -t
// argv[8] corresponds to flag4
}
Then i realized the order can be changed so I can't use hardcoded index, I used a
unordered_map<string, string> to put the -i, -o, -s, -t as keys and inputfile, outputfile, flag3, flag4 as values.
This is working fine, but I was wondering is there any better way to do the same.
Oh my gosh. Okay, you can do this manually, and I'll show you some code. But please look at getopt(). It already helps you out quite a bit, but it takes a little to get used to.
But here's how you could code it manually:
int index = 1;
while (index < argc) {
string cmnd = argv[index++];
if (cmnd == "-i") {
if (index >= argc) {
usage(); // This should provide help on calling your program.
exit(1);
}
inputFileName = argv[index++];
}
else if (cmnd == "-whatever") {
// Continue to process all your other options the same way
}
}
Now, this isn't how anyone does this. We use some version of getopt(). There's another one I like called getopt_long, I believe. You'll want to dig something up like that. Then I put my own wrapper around all of that so I can do some really cool things.
If you want to see the wrapper I use: https://github.com/jplflyer/ShowLib.git and look at the OptionHandler.h and .cpp. It's pretty cool. I think there's an example of how to use it somewhere.
But you need to know how it works under the hood, so for your first programs, maybe do it manually like I've shown you.
You can use a 3rdparty library to parsing commandline arguments.
For example: https://github.com/mirror/tclap

Ignore C comments and include statements using sed (//, /**, **/, #)

So I've got some example C code
/**
example text
**/
#include <stdio.h>
int main(){
int example = 0;
// example text
return;
}
How would I specifically use sed to ignore all lines starting with // or # while also ignoring lines in the range of /** to **/?
I've tried things along the lines of sed -E '/(^#|\/\*/,/\*\/|^\/\/)/!s/example/EXAMPLE/g' but I have a feeling I'm not using the | correctly as it pops an error saying "unmatched ("
My desired final output should be
/**
example text
**/
#include <stdio.h>
int main(){
int EXAMPLE = 0;
// example text
return;
}
The change from the sed command would have changed instances of the word "example" in the program to the uppercase version "EXAMPLE", and what I'm trying to do is make sure words on commented lines are not being changed.
Without ignoring the possibility to fall into circumstances that sed will not be the right tool for this job as sin and melpomene mention in comments, the bellow command will do the trick in your particular exercise:
sed -E '/(#|\/\/)/b ; /\/\*\*/,/\*\*\//b; s/example/EXAMPLE/g' file
/**
example text
**/
#include <stdio.h> example
int main(){
int EXAMPLE = 0;
// example text
return;
}
sed special word b makes use of labels:
'b LABEL'
Unconditionally branch to LABEL. The LABEL may be omitted, in
which case the next cycle is started.
In other words, instead of negating a pattern like /pattern/! you can use /pattern/b without a label and when /pattern/ is found sed jumps (because of b) to the next cycle skipping the substitution s/example/EXAMPLE/g command.
Your attempt does not work because you try to use logical OR | in a mix of patterns like # or // and also a range like /\/\*\*/,/\*\*\//

Parsing integer related to char c++

I'm doing a project that reads some string reactions from file at formula e.g: (5A+3B=c+10D) as an input. I need to do a parsing for the string reaction so that I can extract &(split) integer values beside a char and put them into a vector i.e vector associated with the reaction here is :[5 3 1 10].
I thought about std::strtok function but I think it cannot seperate integer values!!!
Can any one help me ??
Here my try:
int main()
{
std::string input;
std::getline(std::cin, input);
std::stringstream stream(input);
while(1) {
int n;
stream >> n;
char * pch;
pch = strtok (input," ");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.");
}
return 0;
}
}
To do some serious parsing work, you need to learn some language theory. Fortunately, it isn't very difficult.
The method we are going to cover here is what we called Top Down Recursive Parsing.
The full listing of the source code here is going to be too long for the purpose of this forum, instead, I will present some pseudo-code for it.
The first thing you will need to do is to define your grammar. What is considered valid and what is not, you represent a grammar like this:
formula := term
:= term + formula
:= term - formula
term := variable
:= coefficient variable
So a formula C + 2D can be represented as
formula
term
variable
C
+
formula
term
coefficient
2
variable
D
With this in mind, we first solve a simpler problem, there are only a few types of things we need from the input string
+
-
coefficient
variable
Only these four things are valid input, you may want to skip space. Splitting the input string into these 4 types of things is called lexical analysis. We typically implement a so called scanner to do this.
A scanner typically look like this
class Scanner
{
public:
Scanner(const char* text);
Token GetToken(); // The current token
void Scan(); // read the next token
}
Next, you will want to group these token into a tree like what I have shown you above. This logic we typically call it parsing and it implemented as a parser. You can implement a parser in many ways, here is one way you can do it with a top down predictive parser
class Parser
{
public:
private:
bool ParseVariable()
{
if (s.GetToken() is variable) { s.Scan(); return true; }
}
bool ParseTerm()
{
if (s.GetToken() is variable) { s.Scan(); return true; }
if (s.GetToken() is coefficient) { s.Scan(); return this->ParseVariable(); }
}
Scanner s;
}
The similar code goes on. Obviously one can extend the return type of those Parse() method to return something useful to its caller and assemble the representation you need for your purpose.
For my personal purposes, I wrote a few parsers for different languages. You can take a look at them as sample.
This is a sample in Python.
https://github.com/cshung/MiscLab/blob/master/GreatestCommonDivisor/polynomial_module.py
This is a sample in C++ with a small twist, I parsed the string backwards to avoid 'left recursion'
https://github.com/cshung/Competition/blob/master/Competition/LEET_BASIC_CALCULATOR.cpp
To see a top down parser in action in real life product, see this example in ChakraCore, which I proudly worked on some time ago.
https://github.com/Microsoft/ChakraCore/blob/master/lib/Parser/Parse.cpp

how to pass command-line arguments as a string to an embedded Python script executed from C++?

I have a C++ program which exposes a Python interface to execute users' embedded Python scripts.
The user inserts the path of the Python script to run and the command-line arguments.
Then the script is executed through
boost::python::exec_file(filename, main_globals, main_globals)
To pass the command-line arguments to the Python script we have to set them through the Python C-API function
PySys_SetArgv(int args, char** argv)
before calling exec_file().
But this requires to tokenize the user's string containing the command-line arguments to get the list of arguments, and then to pass them back to the Python interpreter through PySys_SetArgv.
And that's more than a mere waste of time, because in this way the main C++ program has to take the responsibility of tokenizing the command-line string without knowing the logics behind, which is only defined in the custom user's script.
A much nicer and cleaner approach would be something like this in metacode:
string command_line_args = '-v -p "filename" -t="anotherfile" --list="["a", "b"]" --myFunnyOpt'
exec_file( filename, command_line_args, ...)
I spent hours looking at the Boost and Python C-API documentation but I did not find anything useful.
Do you know if there is a way to achieve this, i.e. passing a whole string of command line
arguments to an embedded Python script from C++?
Update:
As Steve suggested in the comments here below, I solved my problem tokenizing the input string, following https://stackoverflow.com/a/8965249/320369.
In my case I used:
// defining the separators
std::string escape_char = "\\"; // the escape character
std::string sep_char = " "; // empty space as separator
std::string quote_char = ""; // empty string --> we don't want a quote char'
boost::escaped_list_separator<char> sep( escape_char, sep_char, quote_char );
because I wanted to be able to parse tuples containing strings as well, like:
'--option-two=("A", "B")'
and if you use:
escaped_list_separator<char> sep('\\', ' ', '"');
as in the original post, you don't get the quoted strings tokenized correctly.
Since you are not adverse to executing an external file, you can use a helper program to make your shell command do the parsing for you. Your helper program could be:
#include <stdio.h>
int main (int argc, char *argv[])
{
for (int i = 1; i < argc; ++i) printf("%s\n", argv[i]);
return 0;
}
And then you could have code that sends your single string of arguments to the helper program (perhaps using popen) and read back the parsed arguments, each arg on a separate line.
unparsed_line.insert(0, "./parser_helper ");
FILE *helper = popen(unparsed_line.c_str(), "r");
std::vector<std::string> args;
std::vector<const char *> argv;
std::string arg;
while (fgetstring(arg, helper)) {
args.push_back(arg);
argv.push_back(args.rbegin()->c_str());
}
pclose(helper);
The fgetstring routine is something I wrote that is like a cross between fgets and std::getline. It reads from the FILE * one line at a time, populating a std:string argument.
static bool
fgetstring (std::string &s, FILE *in)
{
bool ok = false;
std::string r;
char buf[512];
while (fgets(buf, sizeof(buf), in) != 0) {
++ok;
r += buf;
if (*r.rbegin() == '\n') {
r.resize(r.size()-1);
break;
}
}
if (ok) s = r;
return ok;
}
I seem to remember a post on SO that had a routine similar to this, but I couldn't find it. I'll update my post if I find it later.

awk: Either modify or append a line, based on its existence

I have a small awk script that does some in-place file modifications (to a Java .properties file, to give you an idea). This is part of a deployment script affecting a bunch of users.
I want to be able to set defaults, leaving the rest of the file at the user's preferences. This means appending a configuration line if it is missing, modifying it if it is there, leaving everything else as it is.
Currently I use something like this:
# initialize
BEGIN {
some_value_set = 0
other_value_set = 0
some_value_default = "some.value=SOME VALUE"
other_value_default = "other.value=OTHER VALUE"
}
# modify existing lines
{
if (/^some\.value=.*/)
{
gsub(/.*/, some_value_default)
some_value_set = 1
}
else if (/^other\.value=.*/)
{
gsub(/.*/, other_value_default)
other_value_set = 1
}
print $0
}
# append missing lines
END {
if (some_value_set == 0) print some_value_default
if (other_value_set == 0) print other_value_default
}
Especially when the number of lines I want to control gets larger, this is increasingly cumbersome. My awk knowledge is not all that great, and the above just feels wrong - how can I streamline this?
P.S.: If possible, I'd like to stay with awk. Please don't just recommend that using Perl/Python/whatever would be much easier. :-)
BEGIN {
defaults["some.value"] = "SOME VALUE"
defaults["other.value"] = "OTHER VALUE"
}
{
for (key in defaults) {
pattern = key
gsub(/\./, "\\.", pattern)
if (match($0, "^" pattern "=.*")) {
gsub(/=.*/, "=" defaults[key])
delete defaults[key]
}
}
print $0
}
END {
for (key in defaults) {
print key "=" defaults[key]
}
}
My AWK is rusty, so I won't provide actual code.
Initialize an array with the regular expressions and values.
For each line, iterate the array and do appropriate substitutions. Clean out used entries.
At end, iterate the array and append lines for remaining entries.