I need to figure out what this obfuscated C++ code (written by someone else) does. I've figured pretty much everything, except one tricky part:
bool part1(char *flag)
{
int *t = (int *) memfrob(flag, 8);
unsigned int b[] = {3164519328, 2997125270};
for (int i = 0; i < 2; b[i] = ~b[i], ++i);
return !(0<:t:>-0<:b:>+1<:t:>-1<:b:>);
}
What is going on in the return statement of this function? I have no idea what these colons mean...
I've tried googling what does the colon operator in C++ do, but found only answers about class constructors and the conditional expression, which doesn't seem relevant to this problem.
The code is making use of two-letter alternative tokens, also known as "digraphs". Specifically, <: is [, and :> is ].
So, syntax like 0<:t:> is just 0[t], and since array subscripts can be swapped with the array identifier, this is just t[0].
A great tool that can help with deobfuscating code is cppinsights.io. As can be seen in the link, the code is just doing some arithmetic on the array values (ignore the static_cast for this example, it's not important for the purposes of understanding the transformation).
I am trying to write a lexer, when I try to copy isdigit buffer value in an array of char, I get this core dumped error although I have done the same thing with identifier without getting error.
#include<fstream>
#include<iostream>
#include<cctype>
#include <cstring>
#include<typeinfo>
using namespace std;
int isKeyword(char buffer[]){
char keywords[22][10] = {"break","case","char","const","continue","default", "switch",
"do","double","else","float","for","if","int","long","return","short",
"sizeof","struct","void","while","main"};
int i, flag = 0;
for(i = 0; i < 22; ++i){
if(strcmp(keywords[i], buffer) == 0)
{
flag = 1;
break;
}
}
return flag;
}
int isSymbol_Punct(char word)
{
int flag = 0;
char symbols_punct[] = {'<','>','!','+','-','*','/','%','=',';','(',')','{', '}','.'};
for(int x= 0; x< 15; ++x)
{
if(word==symbols_punct[x])
{
flag = 1;
break;
}
}
return flag;
}
int main()
{
char buffer[15],buffer1[15];
char identifier[30][10];
char number[30][10];
memset(&identifier[0], '\0', sizeof(identifier));
memset(&number[0], '\0', sizeof(number));
char word;
ifstream fin("program.txt");
if(!fin.is_open())
{
cout<<"Error while opening the file"<<endl;
}
int i,k,j,l=0;
while (!fin.eof())
{
word = fin.get();
if(isSymbol_Punct(word)==1)
{
cout<<"<"<<word<<", Symbol/Punctuation>"<<endl;
}
if(isalpha(word))
{
buffer[j++] = word;
// cout<<"buffer: "<<buffer<<endl;
}
else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (j != 0))
{
buffer[j] = '\0';
j = 0;
if(isKeyword(buffer) == 1)
cout<<"<"<<buffer<<", keyword>"<<endl;
else
{
cout<<"<"<<buffer<<", identifier>"<<endl;
strcpy(identifier[i],buffer);
i++;
}
}
else if(isdigit(word))
{
buffer1[l++] = word;
cout<<"buffer: "<<buffer1<<endl;
}
else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (l != 0))
{
buffer1[l] = '\0';
l = 0;
cout<<"<"<<buffer1<<", number>"<<endl;
// cout << "Type is: "<<typeid(buffer1).name() << endl;
strcpy(number[k],buffer1);
k++;
}
}
cout<<"Identifier Table"<<endl;
int z=0;
while(strcmp(identifier[z],"\0")!=0)
{
cout <<z<<"\t\t"<< identifier[z]<<endl;
z++;
}
// cout<<"Number Table"<<endl;
// int y=0;
// while(strcmp(number[y],"\0")!=0)
// {
// cout <<y<<"\t\t"<< number[y]<<endl;
// y++;
// }
}
I am getting this error when I copy buffer1 in number[k] using strcpy. I do not understand why it is not being copied. When i printed the type of buffer1 to see if strcpy is not generating error, I got A_15, I searched for it, but did not find any relevant information.
The reason is here (line 56):
int i,k,j,l=0;
You might think that this initializes i, j, k, and l to 0, but in fact it only initializes l to 0. i, j, and k are declared here, but not initialized to anything. As a result, they contain random garbage, so if you use them as array indices you are likely to end up overshooting the bounds of the array in question.
At that point, anything could happen—in other words, this is undefined behavior. One likely outcome, which is probably happening to you, is that your program tries to access memory that hasn't been assigned to it by the operating system, at which point it crashes (a segmentation fault).
To give a concrete demonstration of what I mean, consider the following program:
#include <iostream>
void print_var(std::string name, int v)
{
std::cout << name << ": " << v << "\n";
}
int main(void)
{
int i, j, k, l = 0;
print_var("i", i);
print_var("j", j);
print_var("k", k);
print_var("l", l);
return 0;
}
When I ran this, I got the following:
i: 32765
j: -113535829
k: 21934
l: 0
As you can see, i, j, and k all came out such that using them as indices into any of the arrays you declared would exceed their bounds. Unless you are very lucky, this will happen to you, too.
You can fix this by initializing each variable separately:
int i = 0;
int j = 0;
int k = 0;
int l = 0;
Initializing each on its own line makes the initializations easier to see, helping to prevent mistakes.
A few side notes:
I was able to spot this issue immediately because I have my development environment configured to flag lines that provoke compiler warnings. Using a variable before it's being initialized should provoke such a warning if you're using a reasonable compiler, so you can fix problems like this as you run into them. Your development environment may support the same feature (and if it doesn't, you might consider switching to something that does). If nothing else, you can turn on warnings during compilation (by passing -Wall -Wextra to your compiler or the like—check its documentation for the specifics).
Since you declared your indices as int, they are signed integers, which means they can hold negative values (as j did in my demonstration). If you try to index into an array using a negative index, you will end up dereferencing a pointer to a location "behind" the start of the array in memory, so you will be in trouble even with an index of -1 (remember that a C-style array is basically just a pointer to the start of the array). Also, int probably has only 32 bits in your environment, so if you're writing 64-bit code then it's possible to define arrays too large for an int to fully cover, even if you were to index into the array from the middle. For these sorts of reasons, it's generally a good idea to type raw array indices as std::size_t, which is always capable of representing the size of the largest possible array in your target environment, and also is unsigned.
You describe this as C++ code, but I don't see much C++ here aside from the I/O streams. C++ has a lot of amenities that can help you guard against bugs compared to C-style code (which has to be written with great care). For example, you could replace your C-style arrays here with instances of std::array, which has a member function at() that does subscripting with bounds checking; that would have thrown a helpful exception in this case instead of having your program segfault. Also, it doesn't seem like you have a particular need for fixed-size arrays in this case, so you may better off using std::vector; this will automatically grow to accommodate new elements, helping you avoid writing outside the vector's bounds. Both support range-based for loops, which save you from needing to deal with indices by hand at all. You might enjoy Bjarne's A Tour of C++, which gives a nice overview of idiomatic C++ and will make all the wooly reference material easier to parse. (And if you want to pick up some nice C habits, both K&R and Kernighan and Pike's The Practice of Programming can save you much pain and tears).
Some general hints that might help you to avoid your cause of crash totally by design:
As this is C++, you should really refer to established C++ data types and schemes here as far as possible. I know, that distinct stuff in terms of parser/lexer writing can become quite low-level but at least for the things you want to achieve here, you should really appreciate that. Avoid plain arrays as far as possible. Use std::vector of uint8_t and/or std::string for instance.
Similar to point 1 and a consequence: Always use checked bounds iterations! You don't need to try to be better than the optimizer of your compiler, at least not here! In general, one should always avoid to duplicate container size information. With the stated C++ containers, this information is always provided on data source side already. If not possible for very rare cases (?), use constants for that, directly declared at/within data source definition/initialization.
Give your variables meaningful names, declare them as local to their used places as possible.
isXXX-methods - at least your ones, should return boolean values. You never return something else than 0 or 1.
A personal recommendation that is a bit controversional to be a general rule: Use early returns and abort criteria! Even after the check for file reading issues, you proceed further.
Try to keep your functions smart and non-boilerplate! Use sub-routines for distinct sub-tasks!
Try to avoid using namespace that globally! Even without exotic building schemes like UnityBuilds, this can become error-prone as hell for huger projects at latest.
the arrays keywords and symbols_punct should be at least static const ones. The optimizer will easily be able to recognize that but it's rather a help for you for fast code understanding at least. Try to use classes here to compound the things that belong together in a readable, adaptive, easy modifiable and reusable way. Always keep in mind, that you might want to understand your own code some months later still, maybe even other developers.
In java you can do something like:
Scanner sc = new Scanner(System.in);
int total = 0;
for(int i = 0; i<something;i++){
total+=sc.nextInt(); // <<< Doesn't require an extra variable
}
And my question is: can you do something similar in C or C++ ? and if there is, is it better?
This is what I currently do:
int total;
int aux; // <<< Need an extra variable to read input
for(int i = 0; i<something;i++){
scanf("%d",&aux);
total+=aux; // <<< and add the read value here
}
The obvious way to do it in C++ would be something like this:
int total = std::accumulate(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
0);
As it stands, this reads all the ints it can from the input file rather than requiring a separate specification of the number of input values. You could specify an N if you wanted to badly enough, but at least in my experience, you're not very likely to want that.
If you really want to specify N directly, the cleanest way to handle the situation would probably be to define an accumulate_n that works about like std::accumulate:
template <class InIt, class T>
T accumulate_n(InIt in, size_t n, T init) {
for (size_t i=0; i<n; i++)
init += *in++;
return init;
}
You'd use this about like the previous version, but (obviously enough) specifying the number of values to read:
int total = accumulate_n(std::istream_iterator<int>(std::cin),
something,
0);
I suppose I should add that (especially for production code) you'd probably want to add some constraints on the template parameters in the accumulate_n definition above. I also haven't tried to do anything about the possibility of bad input, such as containing something other than a number, or simply containing fewer items than specified. These can be dealt with, but offhand I don't remember how Java deals with them; I'd probably have to do some thinking/research to find/figure out exactly what reaction to such problems would be most appropriate.
Reading a number of variables from an input stream in c++ usually looks like follows (regarding your sample):
int total = 0;
int aux;
while(std::cin >> aux) {
// break on 'something' condition
total += aux;
}
So, I don't see a way how to do it without an auxiliary variable, that actually receives the value read from the std::istream, unless you provide a wrapper class yourself, that just provides that java like behavior.
"can you do something similar in C or C++ ? and if there is, is it better?"
I doubt it's worth to write such wrapper class for std::istream in c++ (You can just consider to use the std::accumulate() algorithm as mentioned in #JerryCoffin's answer).
For c language, there's no alternate choice, I actually can see/know about.
I came across a statement which I didn’t understand. Can anyone explain me please.
It is a C++ program to sort data.
#define PRINT(DATA,N) for(int i=0; i<N; i++) { cout<<"["<<i<<"]"<<DATA[i]<<endl; } cout<<endl;
And also when I tried to rearrange the statement in the below format,I got compilation error!
#define PRINT(DATA,N)
for(int i=0; i<N; i++)
{
cout<<"["<<i<<"]"<<DATA[i]<<endl;
}
cout<<endl;
It's a macro, each time you write PRINT(DATA,N) the pre-processor will substitute it for the entire for loop, including the variables.
You're missing \ signs at the end of each line. This tells it the Macro continues to the next line. (Look at Multi-statement Macros in C++
If you use macro, use brackets around any variables (DATA) and (N). The substitution is literal and this will allow usages like PRINT(data, x+1) which otherwise cause unexpected results.
Don't use macro unless you REALLY must, there are many problems that can arise from this, it doesn't have a scope and so on. You can write an inline method or use std::copy_n like Nawaz proposed
It can be used if you properly define it. But .... just because it can be used, does not mean that it should be used.
Use std::copy_n:
std::copy_n(data, n, std::stream_iterator<X>(std::cout, " "));
That will print all the n items from data to the stdout, each separated by a space. Note that in the above code, X is the type of data[i].
Or write a proper function (not macro) to print in your own defined format. Preferably a function template with begin and end as function parameters. Have a look at how algorithms from the Standard library work and are implemented. That will help you to come up with a good generic design of your code. Explore and experiment with the library generic functions!
This isn't something you want to use a macro for.
Write a template function that does the exact same thing:
template<typename T>
void PRINT(const T &data, size_t n){
for (size_t i=0;i<n;++i)
cout << "["<<i<<"]"<<data[i]<<endl;
}
You should really avoid using macros. The only reason I find you NEED macros is when you need to use the name of the input (as string), or location (LINE or FILE) e.g.:
#define OUT(x) #x<<"="<<x<<"; "
#define DEB std::cerr<<"In "<<__FILE__<<":"<<__LINE__<<": "
for use in printing like this:
DEB << OUT(i)<<OUT(val[i])<<OUT(some_func(val[i],3))<<endl;
Which will print
In file.cc:153: i=4; val[i]=10; some_func(val[i],3)=4.32;
This is a functionality you can't do without macros. Anything you CAN do without macros you SHOULD
I want to generate some variables using for command. look at code below:
for (char ch='a'; ch<='z'; ch++)
int ch=0;
It just an example, after running code above, I want to have int a, int b, int c ...
another example:
for (int i=0; i<10; i++)
int NewiEnd=0;
For example after running code above, we will have int New1End, int New2End etc.
Hope I'm clear enough, How can I do such thing in C++??
No, not possible, not exactly. However, this is possible:
std::map<char,int> vars;
for (char ch='a'; ch<='z'; ch++)
vars[ch] = 0;
std::cout << vars['a'] << vars['b'] << vars['c'];
You can also have std::map<std::string, int>.
std::map<std::string,int> vars;
for (int i=0; i<10; i++)
vars["New" + std::to_string(i) + "End"] = 0;
std::cout << vars["New5End"];
What you're trying to do isn't possible in C or C++.
What you seem to want is a map of the type:
std::map<std::string, int> ints;
This will let you call "variables" by name:
ints["a"] = 0;
ints["myVariable"] = 10;
Or as given in your example:
std::map<char, int> ints;
for (char ch='a'; ch<='z'; ch++)
ints[ch] = 0;
If you are just about to use 'a' - 'z' you could use an array of ints:
int ints['z' + 1];
ints['a'] = 0;
ints['z'] = 0;
But this allocates unnecessary space for the ascii characters below 'a'.
In C/C++ the variable names have "gone away" by the time the code has been compiled and run. You can't print out the name of an existing variable at run time via "reflection"...much less make new named variables. People looking for this feature find out that the only generalized way you can do it falls down to using the preprocessor:
generic way to print out variable name in c++
The preprocessor could theoretically be applied to your problem as well, with certain constraints:
Writing a while loop in the C preprocessor
But anyone reading your code would probably drive a stake through your heart, and be justified in doing so. Both Sunday-morning laziness and a strong belief that it's not what you (should) want leads me to not try and write a working example. :-)
(For the curious, the preprocessor is not Turing-Complete, although there are some "interesting" experiments)
The nature of C/C++ is to have you build up named tables on an as-needed basis. The languages that offer this feature by default make you pay for the runtime tracking of names whether you wind up using reflection or not, and that's not in the spirit of this particular compiled language. Others have given you answers that are more on the right track.