Regex to add hungarian notation - c++

I'm parsing .h and .cpp files and I need to find/replace all non-Hungarian notated variables with their Hungarian equivalents. "Augh, why?!" you ask? My employer requires Hungarian notation, 'nuff said.
Let's just deal with ints for now.
Given any of these cases...
int row; // no hungarian prefix
int nrow(9); // incorrect capitalization
int number; // hmm...
int nnumber = getValue(); // uh oh!
They should be changed to:
int nRow;
int nRow(9); // obviously ctor args and assignments shouldn't change
int nNumber;
int nNumber = getValue();
I'm shooting for just a single one-line call to s/// ideally.
For added challenge, if someone can get something to change ALL instances of this variable after the "type check" with int, that would earn you some brownie points.
Here's what I have so far:
s/(int\s+)(?!n)(\w+)/$1n\u$2/g;
This doesn't match things like int nrow or int number though.
Thanks in advance!

No.
More explicitly, you are trying to compile a program using a regex. This does not work that way.
For example, your one-line already forgets of function parameters, and it cannot parse any user defined type (struct, enum). Not to mention that .cpp suggests C++ suggest classes.
Also, what happens with method / function / inline comments?
My advice would be find somewhere a grammar compiler and pass it the c++ grammar, so for every definition you get the value and type written down to a file. Then you can have regex fun with it. You may try also to write every time each variable is used in order to replace them later automatically.
Yet a lot more complex that a simple regex, but that simple regex will fail so much that at the end you will be changing the code manually.
As a positive note, maybe when you tell your boss how much the chane does cost maybe he will think about it better.

Related

Is there a way to say the object only once when I have to use it again and again?

Take for example:
int main(void){
numberComparator comparator1;
comparator1.setA(78.321);
comparator1.showA();
comparator1.setB('c');
comparator1.setB("Yes");
comparator1.setB(124.213);
comparator1.showB();
comparator1.setB(12);
return 0;
}
Instead of saying comparator1 over and over again, can I do something shorter?
I understand that this doesn't really change much about how the program works, but it does make it easier to work around with testing a class I make.
I am doing overloading so that for an assortment of inputs into my comparator, my program can handle them without making the results go crazy. In this case, I want the input to be an int, but what if the input isn't?
The answer could be lying around the internet, but as my title may infer, I do not know how to state the question.
You are looking for something like with keyword which is part of, for example, Pascal language.
Unfortunately, C++ doesn't provide similar feature. Using the references, one can shorten the name of the class and somewhat alleviate the pain, i.e.
Comparator comparator1;
...
{
Comparator& cr = comparator1;
cr.a();
cr.b();
cr.c();
}
It depends. If numberComparator has a "fluent" interface, then each member function will return a reference to *this, and you can write:
comparator1
.setA(78.321)
.showA()
.setB('c')
.setB("Yes")
.setB(124.213)
.showB()
.setB(12);
Note that this is a bitch to debug by step-into (you have to step into every function until you get to the one you are interested in).
The alternative of course is "use a shorter name".
int main(void){
numberComparator c1;
c1.setA(78.321);
c1.showA();
c1.setB('c');
c1.setB("Yes");
c1.setB(124.213);
c1.showB();
c1.setB(12);
return 0;
}
There is really no point in having a particularly long name if it is limited in scope to a few lines. For a local variable, if it isn't limited in scope to a few lines, your function is probably too long.

Referenced parameters in C++

this is more like an ethical question:
if i have the following code:
void changeInt(int& value)
{
value = 7;
}
and i do:
int number = 3;
changeInt(number);
number will have value 7
I know that when the new stack frame will be created for changeInt function, new variables will be created and &value will point to number.
My concern here is that the caller, if it's not paying attention , can be fooled by thinking that is passing by value which actually, on the function frame , a reference will be created.
I know he can look in the header files and it's a perfect legitimate expression but still I find it unethical a bit :)
i think this should be somehow marked and enforced by syntax. Like in C# where you have ref keyword.
What do you guys think ?
This is one of those things where references are less clear than pointers. However, using pointers may lead to something like this:
changeInt(NULL);
when they actually should have done:
changeInt(&number);
which is just as bad. If the function is as clearly named as this, it's hardly a mystery that it actually changes the value passed in.
Another solution is of course to do:
int calculateNewInt(/* may need some input */)
{
return 7;
}
now
int number = 3;
...
number = calculateNewInt();
is quite obviously (potentially) changing number.
But if the name of the function "sounds like it changes the input value", then it's definitely fair to change the value. If in doubt, read the documentatin. If you write code that has local variables that you don't want to alter, make them const.
const int number = 3;
changeInt(number); /* Makes an error */
(Of course, that means the number is not changeable elsewhere either).
I know he can look in the header files and it's a perfect legitimate expression but still I find it unethical a bit :)
I think that's perfectly normal and part of the language. Actually, this is one of the bad things of C and C++: you have to check the headers all the time when dealing with an unknown API, since when calling a function you don't pass by reference explicitly.
That's not the case for all system languages though. IIRC Rust makes it obligatory to pass references explicitly.

How do I treat string variables as actual code?

That probably wasn't very clear. Say I have a char *a = "reg". Now, I want to write a function that, on reading the value of a, instantiates an object of a particular class and names it reg.
So for example, say I have a class Register, and a separate function create(char *). I want something like this:
void create(char *s) //s == "reg"
{
//what goes here?
Register reg; // <- this should be the result
}
It should be reusable:
void create(char *s) //s == "second"
{
//what goes here?
Register second; // <- this should be the result
}
I hope I've made myself clear. Essentially, I want to treat the value in a variable as a separate variable name. Is this even possible in C/C++? If not, anything similar? My current solution is to hash the string, and the hash table would store the relevant Register object at that location, but I figured that was pretty unnecessary.
Thanks!
Variable names are compile-time artifacts. They don't exist at runtime. It doesn't make sense in C++ to create a dynamically-named variable. How would you refer to it?
Let's say you had this hypothetical create function, and wrote code like:
create("reg");
reg.value = 5;
This wouldn't compile, because the compiler doesn't know what reg refers to in the second line.
C++ doesn't have any way to look up variables at runtime, so creating them at runtime is a nonstarter. A hash table is the right solution for this. Store objects in the hash table and look them up by name.
This isn't possible. C++ does not offer any facilities to process code at runtime. Given the nature of a typical C++ implementation (which compiles to machine code ahead of time, losing all information about source code), this isn't even remotely feasible.
Like I said in my comment:
What's the point? A variable name is something the compiler, but -most importantly- you, the programmer, should care about. Once the application is compiled, the variable name could be whatever... it could be mangled and senseless, it doesn't matter anymore.
You read/write code, including var-names. Once compiled, it's down to the hardware to deal with it.
Neither C nor C++ have eval functions
Simply because: you only compile what you need, eval implies input later-on that may make no sense, or require other dependencies.
C/C++ are compiled ahead of time, eval implies evaluation at runtime. The C process would then imply: pre-process, compile and link the string, in such a way that it still is part of the current process...
Even if it were possible, eval is always said to be evil, that goes double for languages like the C family that are meant to run reliably, and are often used for time-critical operations. The right tool for the job and all that...
A HashTable with objects that have hash, key, Register, collision members is the sensible thing to do. It's not that much overhead anyway...
Still feel like you need this?
Look into the vast number of scripting languages that are out there. Perl, Python... They're all better suited to do this type of stuff
If you need some variable creation and lookup you can either:
Use one of the scripting languages, as suggested by others
Make the lookup explicitly, yourself. The simplest approach is by using a map, which would map a string to your register object. And then you can have:
std::map<const char*, Register*> table;
Register* create(const char* name) {
Register* r = new Register();
table[name] = r;
return r;
}
Register* lookup(const char* name) {
return table[name];
}
void destroy(const char* name) {
delete table[name];
table.erase(name);
}
Obviously, each time you want to access a variable created this way, you have to go through the call to lookup.

How local constants are stored in c++ library files

I am writing a library where I need to use some constant integers. I have declared constant int as a local variable in my c function e.g. const int test = 45325;
Now I want to hide this constant variable. What it means is, if I share this library as a .so with someone, he should not be able to find out this constant value ?
Is it possible to hide constant integers defined inside a library ? Please help
Here is my sample code
int doSomething()
{
const int abc = 23456;
int def = abc + 123;
}
doSomething is defined as local function in my cpp file. I am referring this constant for some calculations inside the same function.
If I understand right, you're not so much worried about an exported symbol (since it's a plain normal local variable, I'd not worry about that anyway), but about anyone finding out that constant at all (probably because it is an encryption key or a magic constant for a license check, or something the like).
This is something that is, in principle, impossible. Someone who has the binary code (which is necessarily the case in a library) can figure it out if he wants to. You can make it somewhat harder by calculating this value in an obscure way (but be aware of compiler optimizations), but even so this only makes it trivially harder for someone who wants to find out. It will just mean that someone won't see "mov eax, 45325" in the disassembly right away, but it probably won't keep someone busy for more than a few minutes either way.
The constant will always be contained in the library in some form, even if it is as instructions to load it into a register, for the simple reason that the library needs it at runtime to work with it.
If this is meant as some sort of a secret key, there is no good way to protect it inside the library (in fact, the harder you make it, the more people will consider it a sport to find it).
The simplest is probably to just do a wrapper class for them
struct Constants
{
static int test();
...
then you can hide the constant in the .cpp file
You can declare it as
extern const int test;
and then have it actually defined in a compilation unit somewhere (.cpp file).
You could also use a function to obtain the value.

finding a function name and counting its LOC

So you know off the bat, this is a project I've been assigned. I'm not looking for an answer in code, but more a direction.
What I've been told to do is go through a file and count the actual lines of code while at the same time recording the function names and individual lines of code for the functions. The problem I am having is determining a way when reading from the file to determine if the line is the start of a function.
So far, I can only think of maybe having a string array of data types (int, double, char, etc), search for that in the line and then search for the parenthesis, and then search for the absence of the semicolon (so i know it isn't just the declaration of the function).
So my question is, is this how I should go about this, or are there other methods in which you would recommend?
The code in which I will be counting will be in C++.
Three approaches come to mind.
Use regular expressions. This is fairly similar to what you're thinking of. Look for lines that look like function definitions. This is fairly quick to do, but can go wrong in many ways.
char *s = "int main() {"
is not a function definition, but sure looks like one.
char
* /* eh? */
s
(
int /* comment? // */ a
)
// hello, world /* of confusion
{
is a function definition, but doesn't look like one.
Good: quick to write, can work even in the face of syntax errors; bad: can easily misfire on things that look like (or fail to look like) the "normal" case.
Variant: First run the code through, e.g., GNU indent. This will take care of some (but not all) of the misfires.
Use a proper lexer and parser. This is a much more thorough approach, but you may be able to re-use an open source lexer/parsed (e.g., from gcc).
Good: Will be 100% accurate (will never misfire). Bad: One missing semicolon and it spews errors.
See if your compiler has some debug output that might help. This is a variant of (2), but using your compiler's lexer/parser instead of your own.
Your idea can work in 99% (or more) of the cases. Only a real C++ compiler can do 100%, in which case I'd compile in debug mode (g++ -S prog.cpp), and get the function names and line numbers from the debug information of the assembly output (prog.s).
My thoughts for the 99% solution:
Ignore comments and strings.
Document that you ignore preprocessor directives (#include, #define, #if).
Anything between a toplevel { and } is a function body, except after typedef, class, struct, union, namespace and enum.
If you have a class, struct or union, you should be looking for method bodies inside it.
The function name is sometimes tricky to find, e.g. in long(*)(char) f(int); .
Make sure your parser works with template functions and template classes.
For recording function names I use PCRE and the regex
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
and then filter out names like "if", "while", "do", "for", "switch". Note that the function name is (\w+), group 1.
Of course it's not a perfect solution but a good one.
I feel manually doing the parsing is going to be a quite a difficult task. I would probably use a existing tool such as RSM redirect the output to a csv file (assuming you are on windows) and then parse the csv file to gather the required information.
Find a decent SLOC count program, eg, SLOCCounter. Not only can you count SLOC, but you have something against which to compare your results. (Update: here's a long list of them.)
Interestingly, the number of non-comment semicolons in a C/C++ program is a decent SLOC count.
How about writing a shell script to do this? An AWK program perhaps.