how to count all global variables in the cpp file - c++

Does it exist an any cpp code parser to solve this problem? For example
// B.cpp : Defines the entry point for the console application.
//
#include<iostream>
#include<vector>
#include<algorithm>
size_t N,M;
const size_t MAXN = 40000;
std::vector<std::pair<size_t,size_t> > graph[MAXN],query[MAXN],qr;
size_t p[MAXN], ancestor[MAXN];
bool u[MAXN];
size_t ansv[MAXN];
size_t cost[MAXN];
size_t find_set(size_t x){
return x == p[x] ? x : p[x] = find_set(p[x]);
}
void unite(size_t a, size_t b, size_t new_ancestor){
}
void dfs(size_t v,size_t ct){
}
int main(int argc, char* argv[]){
return 0;
}
This file has 10 global variables : ancestor, ansv, cost, graph, M, N, p, qr, query, u

You could invoke the compiler and count the exported global variables with the following shell command:
$ g++ -O0 -c B.cpp && nm B.o | grep ' B ' | wc -l
10
If you remove the line count, you get their names
$ g++ -O0 -c B.cpp && nm B.o | egrep ' [A-Z] ' | egrep -v ' [UTW] '
00000004 B M
00000000 B N
00111740 B ancestor
00142480 B ansv
00169580 B cost
00000020 B graph
000ea640 B p
000ea620 B qr
00075320 B query
00138840 B u
Let's see how this works.
g++ -O0 -c B.cpp: This calls the compiler without optimizations such that the output (B.o by default) is pretty much the compiled file without removed identifiers.
nm B.o: Calls nm a tool that (quote from link) "list symbols from object files". If, for example "the symbol is in the uninitialized data section", there is a "B".
We want to have global values (means uppercase) but not U, T or W. This is what the grep does.

Related

sed command line to change "aas=aas" into "aas = aas"

A Questions for the regular expression/ sed experts out there:
I need to beautify some c++ code.
The code is littered with various version of the assignment operator with different types of spacing.
i.e.
a=b
a =b
a= B
a = b
a= b
A = B. // the correct format needed, and so must be ignored by SED
There should only be one space around the =. If more are found, the extras must be removed.
I need to make a script that will scan through all files in folder and subfolders and search and replace as needed.
There are some variations, like the a+=b etc.
I run on OsX but have linux and windows machines available.
help much appreciated.
You can use this sed to insert a single space before and after all = operators:
Input file:
cat file
a ==b
a=b
a =b
a/=b
a *=b
a+= b
a-= b
a= B
a%= B
a = b
a= b
A = B
sed command:
sed -E 's~[[:blank:]]*([-+*/%=]?=)[[:blank:]]*~ \1 ~g' file
a == b
a = b
a = b
a /= b
a *= b
a += b
a -= b
a = B
a %= B
a = b
a = b
A = B
This is regex used for matching (using ~ as delimiter):
~[[:blank:]]*([-+*/%=]?=)[[:blank:]]*~ - matches 0 or more white spaces followed by an optional -+*/%= characters before a literal =. We are also capturing this operator in group #1
This is patter used in replacement:
~ \1 ~ Which means a space before and after string captured in group #1
It may interest you doing it with Perl
A simple file.cpp:
#include <iostream>
int main(){
int i = 3;
i += 3;
i-=3;
i * = 3; // not valid just for sure
i/=3;
int i2
=
3;
if( i
=
= i2 ){} // not valid just for sure
}
perl -lpe '$/=undef;s/\s*([=!%\*\/+-])?\s*(=)\s*/ $1$2 /g' file.cpp
the output:
#include <iostream>
int main(){
int i = 3;
i += 3;
i -= 3;
i *= 3;
i /= 3;
int i2 = 3;
if( i == i2 ){}
}

Calculate MD5 of a string in C++

I have a nice example of memory mapped files that calculates the MD5 hash of a file. That works fine with no problems.
I would like to change it to calculate the MD5 hash of a string.
So the example is:
(include #include <openssl/md5.h> to run this code, and also boost stuff if you want to run the one with the file)
unsigned char result[MD5_DIGEST_LENGTH];
boost::iostreams::mapped_file_source src(path);
MD5((unsigned char*)src.data(), src.size(), result);
std::ostringstream sout;
sout<<std::hex<<std::setfill('0');
for(long long c: result)
{
sout<<std::setw(2)<<(long long)c;
}
return sout.str();
The change I made is:
std::string str("Hello");
unsigned char result[MD5_DIGEST_LENGTH];
MD5((unsigned char*)str.c_str(), str.size(), result);
std::ostringstream sout;
sout<<std::hex<<std::setfill('0');
for(long long c: result)
{
sout<<std::setw(2)<<(long long)c;
}
return sout.str();
But this produces the result:
8b1a9953c4611296a827abf8c47804d7
While the command $ md5sum <<< Hello gives the result:
09f7e02f1290be211da707a266f153b3
Why don't the results agree? Which one is wrong?
Thanks.
EDIT:
So I got the right answer which is ticked down there. The correct way to call md5sum from terminal is:
$ printf '%s' "Hello" | md5sum
To avoid the new line being included.
You are passing a final newline to the md5sum program, but not to your code.
You can see that the bash <<< operator adds a newline:
$ od -ta <<<Hello
0000000 H e l l o nl
0000006
To avoid this, use printf:
$ printf '%s' Hello | od -ta
0000000 H e l l o
0000005
$ printf '%s' Hello | md5sum
8b1a9953c4611296a827abf8c47804d7 -
Alternatively, you could include a newline in your program version:
std::string str("Hello\n");

$ symbol in c++

I read the following code from an open source library. What confuses me is the usage of dollar sign. Can anyone please clarify the meaning of $ in the code. Your help is greatly appreciated!
__forceinline MutexActive( void ) : $lock(LOCK_IS_FREE) {}
void lock ( void );
__forceinline void unlock( void ) {
__memory_barrier(); // compiler must not schedule loads and stores around this point
$lock = LOCK_IS_FREE;
}
protected:
enum ${ LOCK_IS_FREE = 0, LOCK_IS_TAKEN = 1 };
Atomic $lock;
There is a gcc switch, -fdollars-in-identifiers which explicitly allows $ in idenfitiers.
Perhaps they enable it and use the $ as something that is highly unlikely to clash with normal names.
-fdollars-in-identifiers
Accept $ in identifiers. You can also explicitly prohibit use of $ with the option -fno-dollars-in-identifiers. (GNU C allows $ by
default on most target systems, but there are a few exceptions.)
Traditional C allowed the character $ to form part of identifiers.
However, ISO C and C++ forbid $ in identifiers.
See the gcc documentation. Hopefully the link stays good.
It is being used as part of an identifer.
[C++11: 2.11/1] defines an identifier as "an arbitrarily long sequence of letters and digits." It defines "letters and digits" in a grammar given immediately above, which names only numeric digits, lower- and upper-case roman letters, and the underscore character explicitly, but does also allow "other implementation-defined characters", of which this is presumably one.
In this scenario the $ has no special meaning other than as part of an identifier — in this case, the name of a variable. There is no special significance with it being at the start of the variable name.
Even if dollar sign are not valid identifiers according to the standard, it can be accepted. For example visual studio (I think ggc too but I'm not sure about that) seems to accept it.
Check this doc : http://msdn.microsoft.com/en-us/library/565w213d(v=vs.80).aspx
and this : Are dollar-signs allowed in identifiers in C++03?
The C++ standard says:
The basic source character set consists of 96 characters: the space
character, the control characters representing horizontal tab,
vertical tab, form feed, and new-line, plus the following 91 graphical
characters: a b c d e f g h i j k l m n o p q r s t u v w x y z A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ! = , \ " ’
There is no $ in the basic source character set described above; The $ character in your code is an extension to the basic source character set, which isn't required. Consider in Britain, where the pound symbol (£ or ₤) is used in place of the dollar symbol ($).

Is it possible to print only user defined variables by NM?

There are a lot of system variables in output of nm it looks like this
N
_CRT_MT
_CRT_fmode
_CRT_glob
Dictionary::variable4
namespace1::variable1
__cpu_features
__crt_xc_end__
__crt_xc_start__
__crt_xi_end__
__crt_xi_start__
__crt_xl_start__
__crt_xp_end__
__crt_xp_start__
__crt_xt_end__
__crt_xt_start__
__tls_end__
__tls_start__
__xl_a
__xl_c
__xl_d
__xl_z
_argc
_argv
_bss_end__
_bss_start__
_data_end__
_data_start__
_end__
_fmode
_tls_end
_tls_index
_tls_start
_tls_used
mingw_initltsdrot_force
mingw_initltsdyn_force
mingw_initltssuo_force
variable0
variable10
Is it possible to print only user defined variables - in this case variable10, variable0, Dictionary::variable1, Dictionary::variable4, N?
Not that I know of. But at least you can safely filter all variables starting with double underscore or underscore + uppercase letter, as these are reserved for the implementation:
$ nm -j foo | grep -v '^_[A-Z]\|^__\+[A-Za-z]'
N
Dictionary::variable4
namespace1::variable1
_argc
_argv
_bss_end__
_bss_start__
_data_end__
_data_start__
_end__
_fmode
_tls_end
_tls_index
_tls_start
_tls_used
mingw_initltsdrot_force
mingw_initltsdyn_force
mingw_initltssuo_force
variable0
variable10
You can probably filter more by adding additional patterns that reliably denote implementation-defined identifiers.
Alternatively, create an empty executable (i.e. one which contains no user-defined symbols) and compute the difference of the output of nm on each executable via comm:
$ # Preparation
$ echo 'int main() { }' > mt.cpp
$ g++ -o mt.out mt.cpp
$ nm -j mt.out > mt.symbols
$
$ nm -j your_exe > your_exe.symbols
$ comm -23 your_exe.symbols mt.symbols
N
Dictionary::variable4
namespace1::variable1
variable0
variable10

nested classes in GDB

In my C++ program I have a nested class defined as follows:
class A {
class B {
// ...
}
// ...
}
When I try casting a pointer in GDB like this: set $b = (A::B*)p
I get "A syntax error in expression"
I'm not familiar with the symbol (or debugging) information stored in the ELF files. I'm wondering what's wrong with my casting here and how to refer to a nested class in GDB.
The answer is to enclose the class name in single quotes:
set $b = ('A::B'*)p
See http://sourceware.org/bugzilla/show_bug.cgi?id=8693
Works for me (using current CVS GDB, as well as 7.3.1):
$ cat t.cc
struct A {
struct B {
int x;
};
int y;
};
int main()
{
A::B ab, *p = &ab;
return 0;
}
$ gcc -g t.cc && gdb -q ./a.out
(gdb) b main
Breakpoint 1 at 0x4005b8: file t.cc, line 10.
(gdb) r
Breakpoint 1, main () at t.cc:10
10 A::B ab, *p = &ab;
(gdb) p (A::B*)0x1
$1 = (A::B *) 0x1
(gdb) set $a = (A::B*)0x1
(gdb) p $a
$2 = (A::B *) 0x1
(gdb) quit