Clang AST find only syntax errors - c++

I'm using Clang to create some internal static code analyzers. For one of the analyzers, we need to take a raw string and check if it has any syntax errors.
We shouldn't consider missing symbols, missing headers, invalid function calls etc. as invalid syntax - as the only meaning is to see if it's a valid C/C++ code or not.
I thought initially that I could do it with ASTUnit:
auto AST = tooling::buildASTFromCodeWithArgs(MyCode,
Args,
"input.cc",
"clang-tool",
std::make_shared<PCHContainerOperations>(),
tooling::getClangStripDependencyFileAdjuster(),
tooling::FileContentMappings(),
&DiagConsumer);
llvm::outs() << "hasUncompilableErrorOccurred " << AST->getDiagnostics().hasUncompilableErrorOccurred() << "\n";
llvm::outs() << "hasUnrecoverableErrorOccurred " << AST->getDiagnostics().hasUnrecoverableErrorOccurred() << "\n";
llvm::outs() << "hasErrorOccurred " << AST->getDiagnostics().hasErrorOccurred() << "\n";
Taking two inputs: Hello world and #include <undefined.h> - both yields 1 in the outputs above - even when #include <undefined.h> is a correct C statement, but the issue with it (unlike with hello world, which's not a valid C code) - is that undefined.h is missing. Similarly, taking: int* p = malloc(sizeof(int)); as code will yield error in all of these calls if stdlib.h wasn't included.
I try to avoid such errors, so that every case, except from hello world, will be considered as valid code.
I did tried to iterate over it by creating a Raw Lexer, but it won't give me sufficient information.
Lexer Lex(CharRange.getBegin(), PP->getLangOpts(), Text.data(),
Text.data(), Text.data() + Text.size());
Token RawTok;
do {
Lex.LexFromRawLexer(RawTok);
llvm::outs() << "\t- " << RawTok.getKind() << "\n";
} while (RawTok.isNot(tok::eof));
I'd love to get any suggestions!

Related

if-statement on different typeid's

I am currently working on a code and I am trying to use a if-statement on a variable which was taken from a .txt file with a basic string. Its supposed to look like
if (a.variable == "string") {}
When I use
std::cout << a.variable << std::endl;
std::cout << "string" << std::endl;
I get the same results but when using
std::cout << typeid(a.variable).name() << std::endl;
std::cout << typeid("string").name() << std::endl;
I get different results:
NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
and
A5_c.
Could this be the reason why the if-statement failed? Unless I am incorrect, the first typeid stands for a basic string.
I am grateful for any input!
The code I use for reading it looks like:
std::string::size_type beginoption = section.find("=",position);
beginoption = beginoption +1;
std::string::size_type endoption = section.find("\n",position);
optionstorage = section.substr(beginoption, endoption - beginoption);
Two objects in C++ don't have to be the same type to compare as equal. You can compare string objects to string literals because there is an operator== overload that accepts std::string and char const * arguments. (The typeid() operator returns a different value because the two expressions have different types; one is a string object and the other is a char array -- but you can indeed still compare them.)
You mentioned that your "if statement is failing" but when you inspect the contents of the strings, they appear to be the same -- they may actually not be the same. For example, in your code, if a.variable has trailing whitespace, you would not see this in the output and yet the strings would also not be equal.
Try writing both strings surrounded by some characters. I suspect that you will see there is some extra whitespace somewhere:
std::cout << '[' << a.variable << ']' << std::endl;
std::cout << '[' << "string" << ']' << std::endl;
Consider also displaying a.variable.size(). If it's not 6, then the two strings cannot be equal since they have different lengths.
For the purposes of typeid, a.variable is of type std::string while the string literal "string" is of type char const [7].
That explains the output of
std::cout << typeid(a.variable).name() << std::endl;
std::cout << typeid("string").name() << std::endl;

filesystem::* strange results in windows filesystem paths with extended chars

Code doesn't do anything useful. It's just try & error code to figure out what's going on:
fs::path path("e:\\Σtest");
cout<<path << " exsits="<< fs::exists(path) << " is dir=" << fs::is_directory(path) << std::endl;
fs::path pathL(L"e:\\Σtest");
cout<<pathL << " exsits="<< fs::exists(pathL) << " is dir=" << fs::is_directory(pathL) << std::endl;
fs::path pathu(u"e:\\Σtest");
cout<<pathu << " exsits="<< fs::exists(pathu) << " is dir=" << fs::is_directory(pathu) << std::endl;
Output:
e:\Σtest exsits=0 is dir=0
e:\Σtest exsits=0 is dir=0
e:\Σtest exsits=0 is dir=0
I sure that folder Σtest exists. I guess there is encoding involve somehow. I can't figure out what sophisticated problem we have encounter here, someone can explain output?
EDIT:
Following #cpplearner advice to pass /utf-8 to compiler output changes (also code page for console was changed to utf-8 by chcp 65001):
e:\Σtest exsits=0 is dir=0
e:\?test exsits=1 is dir=1
e:\?test exsits=1 is dir=1
Question remain the same, what magic happen here?

Cryptonote C++ compile error with invalid operands

I am working with cryptonote repo for a project and am at the point where I need to compile the binaries.
When I run make, I get the following error:
/Documents/huntcoin/src/CryptoNoteCore/SwappedMap.h:185:14: error: invalid operands of types ‘<unresolved overloaded function type>’ and ‘const char [24]’ to binary ‘operator<<’
std::count << "SwappedMap cache hits: " << m_cacheHits << ", misses: " << m_cacheMisses << " (" << std::fixed << std::setprecision(2) << static_cast<double>(m_cacheMisses) / (m_cacheHits + m_cacheMisses) * 100 << "%)" << std::endl;
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am not super familiar with C++ and am sure it might be a simple parenthesis error, but it could be something more.
For some context, the previous make error I got was that std::cout was not defined, which I assumed was just a typo for count. Maybe that was wrong as well.
Any help with C++ or cryptonote would be much appreciated!
You've got an extra n that is causing you trouble. The code should read:
std::cout << "SwappedMap c.....
std::cout is the default console output (console output) stream while std::count is not defined
The std::cout is defined in a header file iostream so all you need to do is put this line of code next to other #include statements at the top of your file:
#include <iostream>
Cheers

C++ single quotes syntax

I am learning C++ and just started reading "Programming Principles and Practice" by Bjarne Stroustrup and he uses this code to illustrate a point:
#include "std_lib_facilities.h"
using namespace std;
int main() // C++ programs start by executing the function main
{
char c = 'x';
int i1 = c;
int i2 = 'x';
char c2 = i1;
cout << c << ' << i1 << ' << c2 << '\n';
return 0;
}
I am familiar in general with the difference between double and single quotes in the C++ world, but would someone kindly explain the construction and purpose of the section ' << i1 << '
Thanks
cout << c << ' << i1 << ' << c2 << '\n';
appears to be a typo in the book. I see it in Programming Principles and Practice Using C++ (Second Edition) Second printing. I do not see it listed in the errata.
According to the book, the intended output is
x 120 x
But what happens here is ' << i1 << ' attempts to compress the << i1 << to a multi-byte character and prints out an integer (most likely 540818464-> 0x203C3C20 -> ASCII values of ' ', '<', '<', ' ') because cout doesn't know wide characters. You'd need wcout for that. End result is output something like
x540818464x
and a warning or two from the compiler because while it's valid C++ code, it's almost certainly not what you want to be doing.
The line should most likely read
cout << c << ' ' << i1 << ' ' << c2 << '\n';
which will output the expected x 120 x
In other words, Linker3000, you are not crazy and not misunderstanding the example code.
Anyone know who I should contact to log errata or get a clarification on the off chance there is some top secret sneakiness going way over my head?
Before answering your question, here is a little background on what that is actually doing. Also note that there is a typo in the example, the string constant should have been double quoted:
cout << c << " << i1 << " << c2 << "\n";
In C++, operators can be overloaded so that they mean different things with different functions. In the case of cout, the << operator is overloaded as the "Insertion Operator". Think of it as taking the operand on the right, and inserting it (or sending it) into the operator on the left.
For example,
cout << "Hello World";
This takes the string "Hello World", and sends it to cout for processing.
So what beginners do not get is what something like this means:
cout << "Hello" << " World";
This is doing the same thing, but the operator precedence says to perform the injections from left to right. To make this work, the cout object returns itself as a function return value. Why is this important? Because the above statement is actually two separate operator evaluations:
(cout << "Hello") << " World";
This first injects "Hello" to cout, which outputs it, then continues to evaluate the next inject operator. Because cout returns itself, after the (cout << "Hello") is executed you have the following still to be evaluated:
cout << " World";
This expression injects " World" into the cout object, which then outputs " World", with the net effect being that you see "Hello World" just like the first time.
So in your example, what is it doing?
cout << c << " << i1 << " << c2 << "\n";
This is evaluated left to right as follows:
((((cout << c) << " << i1 << ") << c2) << "\n"); => Outputs value of c
((((cout ) << " << i1 << ") << c2) << "\n"); => Outputs string " << i1 << "
((( cout ) << c2) << "\n"); => Outputs value of c2
(( cout ) << "\n"); => Outputs newline character
( cout ); => No more output
Expression completes and returns the cout object as the expression value.
Assuming c='x' and c2='x', the final output from this expression is the following character string output on a single line:
x << i1 << x
For beginners, all those insertion operators << look a little strange. It is because you are dealing with objects. You could build the string up as a complete formatted object before injecting it into cout, and while that make the cout expression look simpler, we do not do that in C++ because it makes your code more complex and error prone. Note also, there is nothing special about the cout object. If you wanted to output to the standard error stream, you would use cerr instead. If you wanted to output to a file, your would instantiate a stream object that outputs to the desired file. That rest of the code in your example would be the same.
In C, the same thing would be done procedurally using a format string:
printf("%d << i1 << %d\n", i1, c2);
This is allowed in C++ too, because C++ is a superset of C. Many C++ programmers still use this output method, but that is because those programmers learned C first, and may not have fully embraced the object oriented nature of C++
Note that you may also have seen the << operator in the context of mathematical expressions like:
A = A << 8;
In this case, the << operator is the bitwise rotate operation. It has nothing to do with output to cout. It will rotate the bits in A to the left by eight bits.

C++ substr method - "invalid use of ‘this’ in non-member function"

I tried to compile the following code
std::string key = "DISTRIB_DESCRIPTION=";
std::cout << "last five characters: " << key.substr(this.end()-5) << '\n';
And the compiler says
error: invalid use of ‘this’ in non-member function
std::cout << "last five characters: " << key.substr(this.end()-5) << '\n';
^
substr is a "public member function" of std::string, why can't I use this?
I know I could just reference key again instead of this, but my original code was
std::cout << "Description: " << line.substr(found+key.length()+1).substr(this.begin(),this.length()-1) << '\n';
In the second use of substr, the string does not have a name, so the only way to refer to it would be this. I fixed it with
std::cout << "Description: " << line.substr(found+key.length()+1,line.length()-found-key.length()-2) << '\n';
But I am now curious to why this won't work.
this is only available when you are writing code as part of a non-static method of a class. In your particular case, it seems obvious to you that this should refer to key, but the compiler sees no reason for that.
Also, string.substr() takes an integer indicating the beginning position. string.end() returns an iterator, which will not work. What you likely want to do here is call string.length().
Simply replace the first piece of code with:
std::cout << "last five characters: " << key.substr(key.length()-5) << '\n';
And you should be okay.