std::codecvt do_out skipping characters after 1:N conversions - c++

I've tried to write an automatic indenter, however - it's skipping characters when it added new characters to the stream. I've tried debugging it and verified that from_next and to_next as well as from and to are working correctly.
Surely I've missed something in the specs but here is my code, maybe you an help me:
virtual result_t do_out(state_type& state, const intern_type* from, const intern_type* from_end, const intern_type*& from_next,
extern_type* to, extern_type* to_end, extern_type*& to_next) const override
{
auto result = std::codecvt_base::noconv;
while (from < from_end && to < to_end)
{
if (getState(state).missingWhitespaces > 0u && *from != '\n')
{
while (getState(state).missingWhitespaces > 0u && to < to_end)
{
*to = ' ';
to++;
getState(state).missingWhitespaces--;
}
if (to < to_end)
{
result = std::codecvt_base::partial;
}
else
{
result = std::codecvt_base::partial;
break;
}
}
else
{
*to = *from;
if (*from == '\n')
{
getState(state).missingWhitespaces = tabSize * indentLevel;
}
to++;
from++;
}
}
from_next = from;
to_next = to;
return result;
};
The state object is also working properly. The problem only occurs in between function calls.
Edit: Changing the result after if (to < to_end) to std::codecvt_base::ok doesn't solve the problem either.

After some more digging I found the solution to my problem. I got a detailed explanation of std::codecvt from this website: http://stdcxx.apache.org/doc/stdlibref/codecvt.html
It turned out, that I forgot to override these two methods:
virtual int do_length(state_type& state, const extern_type *from, const extern_type *end, size_t max) const;
Determines and returns n, where n is the number of elements of extern_type in the source range [from,end) that can be converted to max or fewer characters of intern_type, as if by a call to in(state, from, from_end, from_next, to, to_end, to_next) where to_end == to + max.
Sets the value of state to correspond to the shift state of the
sequence starting at from + n.
Function do_length must be called under the following preconditions:
state is either initialized to the beginning of a sequence or equal to
the result of the previous conversion on the sequence.
from <= end is well-defined and true.
Note that this function does not behave similarly to the C Standard
Library function mbsrtowcs(). See the mbsrtowcs.cpp example program
for an implementation of this function using the codecvt facet.
virtual int do_max_length() const throw();
Returns the maximum value that do_length() can return for any valid combination of its first three arguments, with the fourth argument max set to 1.
I implemented them this way and it worked:
virtual int do_length(state_type& state, const extern_type* from, const extern_type* end, size_t max) const override
{
auto numberOfCharsAbleToCopy = max;
numberOfCharsAbleToCopy -= std::min(static_cast<unsigned int>(numberOfCharsAbleToCopy), getState(state).missingWhitespaces);
bool newLineToAppend = false;
for (auto c = from + getState(state).missingWhitespaces; c < end && numberOfCharsAbleToCopy > 0u; c++)
{
if (*c == '\n' && !newLineToAppend)
{
newLineToAppend = true;
}
else if (*c != '\n' && newLineToAppend)
{
numberOfCharsAbleToCopy -= std::min(tabSize * indentLevel, numberOfCharsAbleToCopy);
if (numberOfCharsAbleToCopy == 0u)
{
break;
}
newLineToAppend = false;
}
}
return numberOfCharsAbleToCopy;
}
virtual int do_max_length() const throw() override
{
return tabSize * indentLevel;
}

Related

segfault, but not in valgrind or gdb

In my project, there is a library that has code to load an fbx using the FBX SDK 2017.1 from Autodesk.
Loading the fbx crashes in debug & release. The crash occurs in 2 different ways and what seems to be at random:
the crash is either simply "Segmentation fault" (most of the time)
the crash is a dump of all the libraries that may be involved in the crash, and the allusion of a problem with a realloc() call. (every once in a while) From the context of the message, I haven't been able to make out which realloc that may be (the message is followed by a dump of all the libs that are linked).
The code does contain realloc() calls, specifically in the allocation of buffers used in a custom implementation of an FbxStream
Most of the code path is entirely identical for windows, only a number of platform specific sections have been re-implemented. On windows, it runs as expected.
What strikes me is that if I run the program in either gdb or valgrind, the crash disappears! So I set out to find uninitialized members/values, but so far I could not find anything suspicious. I used CppDepend/CppCheck and VS2012 code analysis, but both came up empty on un-initialized variables/members
To give some background on FBX loading; the FBX SDK has a number of ways to deal with different types of resources (obj, 3ds, fbx,..). They can be loaded from file or from stream. To support large files, the stream option is the more relevant option. The code below is far from perfect, but what interests me mostly at present is the reason why valgrind/gdb would not crash. I've left the SDK documentation on top of ReadString, since it's the most complex one.
class MyFbxStream : public FbxStream{
uint32 m_FormatID;
uint32 m_Error;
EState m_State;
size_t m_Pos;
size_t m_Size;
const Engine::Buffer* const m_Buffer;
MyFbxStream& operator = (const MyFbxStream& other) const;
public:
MyFbxStream(const Engine::Buffer* const buffer)
: m_FormatID(0)
, m_Error(0)
, m_State(eClosed)
, m_Pos(0)
, m_Size(0)
, m_Buffer(buffer) {};
virtual ~MyFbxStream() {};
virtual bool Open(void* pStreamData) {
m_FormatID = *(uint32*)pStreamData;
m_Pos = 0;
m_State = eOpen;
m_Size = m_Buffer->GetSize();
return true;
}
virtual bool Close() {
m_Pos = m_Size = 0;
m_State = eClosed;
return true;
}
virtual int Read(void* pData, int pSize) const {
const unsigned char* data = (m_Buffer->GetBase(m_Pos));
const size_t bytesRead = m_Pos + pSize > m_Buffer->GetSize() ? (m_Buffer->GetSize() - m_Pos) : pSize;
const_cast<MyFbxStream*>(this)->m_Pos += bytesRead;
memcpy(pData, data, bytesRead);
return (int)bytesRead;
}
/** Read a string from the stream.
* The default implementation is written in terms of Read() but does not cope with DOS line endings.
* Subclasses may need to override this if DOS line endings are to be supported.
* \param pBuffer Pointer to the memory block where the read bytes are stored.
* \param pMaxSize Maximum number of bytes to be read from the stream.
* \param pStopAtFirstWhiteSpace Stop reading when any whitespace is encountered. Otherwise read to end of line (like fgets()).
* \return pBuffer, if successful, else NULL.
* \remark The default implementation terminates the \e pBuffer with a null character and assumes there is enough room for it.
* For example, a call with \e pMaxSize = 1 will fill \e pBuffer with the null character only. */
virtual char* ReadString(char* pBuffer, int pMaxSize, bool pStopAtFirstWhiteSpace = false) {
assert(!pStopAtFirstWhiteSpace); // "Not supported"
const size_t pSize = pMaxSize - 1;
if (pSize) {
const char* const base = (const char* const)m_Buffer->GetBase();
char* cBuffer = pBuffer;
const size_t totalSize = std::min(m_Buffer->GetSize(), (m_Pos + pSize));
const char* const maxSize = base + totalSize;
const char* sum = base + m_Pos;
bool done = false;
// first align the copy on alignment boundary (4byte)
while ((((size_t)sum & 0x3) != 0) && (sum < maxSize)) {
const unsigned char c = *sum++;
*cBuffer++ = c;
if ((c == '\n') || (c == '\r')) {
done = true;
break;
} }
// copy from alignment boundary to boundary (4byte)
if (!done) {
int64 newBytesRead = 0;
uint32* dBuffer = (uint32*)cBuffer;
const uint32* dBase = (uint32*)sum;
const uint32* const dmaxSize = ((uint32*)maxSize) - 1;
while (dBase < dmaxSize) {
const uint32 data = *(const uint32*const)dBase++;
*dBuffer++ = data;
if (((data & 0xff) == 0x0a) || ((data & 0xff) == 0x0d)) { // third bytes, 4 bytes read..
newBytesRead -= 3;
done = true;
break;
} else {
const uint32 shiftedData8 = data & 0xff00;
if ((shiftedData8 == 0x0a00) || (shiftedData8 == 0x0d00)) { // third bytes, 3 bytes read..
newBytesRead -= 2;
done = true;
break;
} else {
const uint32 shiftedData16 = data & 0xff0000;
if ((shiftedData16 == 0x0a0000) || (shiftedData16 == 0x0d0000)) { // second byte, 2 bytes read..
newBytesRead -= 1;
done = true;
break;
} else {
const uint32 shiftedData24 = data & 0xff000000;
if ((shiftedData24 == 0x0a000000) || (shiftedData24 == 0x0d000000)) { // first byte, 1 bytes read..
done = true;
break;
} } } } }
newBytesRead += (int64)dBuffer - (int64)cBuffer;
if (newBytesRead) {
sum += newBytesRead;
cBuffer += newBytesRead;
} }
// copy anything beyond the last alignment boundary (4byte)
if (!done) {
while (sum < maxSize) {
const unsigned char c = *sum++;
*cBuffer++ = c;
if ((c == '\n') || (c == '\r')) {
done = true;
break;
} } }
const size_t bytesRead = cBuffer - pBuffer;
if (bytesRead) {
const_cast<MyFbxStream*>(this)->m_Pos += bytesRead;
pBuffer[bytesRead] = 0;
return pBuffer;
} }
pBuffer = NULL;
return NULL;
}
virtual void Seek(const FbxInt64& pOffset, const FbxFile::ESeekPos& pSeekPos) {
switch (pSeekPos) {
case FbxFile::ESeekPos::eBegin: m_Pos = pOffset; break;
case FbxFile::ESeekPos::eCurrent: m_Pos += pOffset; break;
case FbxFile::ESeekPos::eEnd: m_Pos = m_Size - pOffset; break;
}
}
virtual long GetPosition() const { return (long)m_Pos; }
virtual void SetPosition(long position) { m_Pos = position; }
virtual void ClearError() { m_Error = 0; }
virtual int GetError() const { return m_Error; }
virtual EState GetState() { return m_State; }
virtual int GetReaderID() const { return m_FormatID; }
virtual int GetWriterID() const { return -1; } // readonly stream
virtual bool Flush() { return true; } // readonly stream
virtual int Write(const void* /*d*/, int /*s*/) { assert(false); return 0; } // readonly stream
};
I assume that there may be undefined behavior related to malloc/free/realloc operations that somehow do not occur in gdb. But if this is the case, I also expect the Windows binaries to have problems.
Also, I don't know if this is relevant, but the when I trace into the Open() function and print the "m_Buffer" pointer's value (or the "this"), I get a pointer value starting with 0xfffffff.. which for a Windows programmer looks like a problem. However, can I pull the same conclusion in linux, since I also saw this happening in static function calls etc.
if I run the program in either gdb or valgrind, the crash disappears!
There are two possible explanations:
There are multiple threads, the code exhibits a data race, and both GDB and Valgrind significantly affect execution timing.
GDB disables address randomization; Valgrind significantly affects program layout, and the crash is sensitive to the exact layout.
The steps I would take:
Set ulimit -c unlimited, run the program and get it to dump core, then use post-mortem analysis in GDB.
Run the program under GDB, use set disable-randomization off and see if you can get to crash point that way.
Run the program with Helgrind or DRD, Valgrind's thread error detectors.

program crash in std::sort() sometimes, can't reproduce

Description:
My program crash sometimes in std::sort(), I write a minimal program to reproduce this situation, but everything is just alright. Here is the minimal example:
typedef struct st {
int it;
char ch;
char charr[100];
vector<string *> *vs;
} st;
bool function(st *&s1, st *&s2) {
static int i = 1;
cout<<i<<" "<<&s1<<" "<<&s2<<endl;
++i;
return s1->it > s2->it;
}
int main(int argc, char **argv) {
vector<st *> ar;
for (int i = 0; i < 100; ++i) {
st *s = new st;
s->it = urandom32();
ar.push_back(s);
}
ar.clear();
for (int i = 0; i < 100; ++i) {
st *s = new st;
s->it = urandom32();
ar.push_back(s);
}
sort(ar.begin(), ar.end(), function);
return 0;
}
Here is the GDB stack info:
0 0x00007f24244d9602 in article_cmp (cand_article_1=0x7f23fd297010, cand_article_2=0x4015)
at src/recom_frame_worker.h:47
1 0x00007f24244fc41b in std::__unguarded_partition<__gnu_cxx::__normal_iterator > >,
cand_article*, bool ()(cand_article, cand_article*)> (__first=,
__last=, __pivot=#0x7f230412b350: 0x7f23fd297010,
__comp=0x7f24244d95e1 )
at /usr/include/c++/4.8.3/bits/stl_algo.h:2266
2 0x00007f24244f829c in std::__unguarded_partition_pivot<__gnu_cxx::__normal_iterator > >, bool
()(cand_article, cand_article*)> (__first=, __last=,
__comp=0x7f24244d95e1 )
at /usr/include/c++/4.8.3/bits/stl_algo.h:2296
3 0x00007f24244f1d88 in std::__introsort_loop<__gnu_cxx::__normal_iterator > >, long,
bool ()(cand_article, cand_article*)> (__first=, __last=,
__depth_limit=18,
__comp=0x7f24244d95e1 )
at /usr/include/c++/4.8.3/bits/stl_algo.h:2337
4 0x00007f24244ed6e5 in std::sort<__gnu_cxx::__normal_iterator > >, bool
()(cand_article, cand_article*)> (
__first=, __last=, __comp=0x7f24244d95e1 )
at /usr/include/c++/4.8.3/bits/stl_algo.h:5489
article_cmp is called in sort(article_result->begin(), article_result->end(), article_cmp); and article_result is a vector<cand_article*> *. cand_article is a struct.
Here is the definition of article_cmp:
bool article_cmp(cand_article* cand_article_1, cand_article* cand_article_2) {
return cand_article_1 -> display_time >= cand_article_2 -> display_time;
}
Here is a piece of code where the crash happens:
article_result->clear();
for(vec_iter = _channel_data -> begin(); vec_iter != _channel_data -> end(); vec_iter++) {
cand_article* cand = to_cand_group(*vec_iter);
if(cand == NULL) continue;
// refresh open loadmore
if(m_request.req_type == 1) {
if(cand -> display_time > m_request.start){
article_result->push_back(cand);
}
}else if(m_request.req_type == 2){
if(cand -> display_time < m_request.end){
article_result->push_back(cand);
}
}else{
article_result->push_back(cand);
}
}
sort(article_result->begin(), article_result->end(), article_cmp);
Question:
I don't know how to handle this kind of coredump, cause 0x4015 is a kernel space address? Any suggestions on how to fix this kind of bug? sorry, I can't reproduce this situation with a minimal program. And this happened in a single thread, so you don't need to think about multi-thread situation.
The rule is "if std::sort crashes, you have an invalid comparison function". Your comparison function is:
bool article_cmp(cand_article* lhs, cand_article* rhs) {
return lhs -> display_time >= rhs -> display_time;
}
This is not a strict weak ordering. In particular, if the display times are equal it returns true, which means that if you swap the arguments it will still return true ... and that is not allowed. You need:
bool article_cmp(cand_article* lhs, cand_article* rhs) {
return lhs -> display_time > rhs -> display_time;
}
The reason your simplified example works (congratulations for at least trying to simplify), is that you simplified the comparison function so it is valid. If the return statement was return s1->it >= s2->it;, and you used a smaller range of values, it too would probably crash.
Incidentally, a much more natural C++ declaration of your example structure would look like:
struct st { // No need for that typedef in C++
int it;
char ch;
std::string charr; // ... or *possibly* std::array<char,100>.
std::vector<std::string> vs; // Strings and vectors best held by value
};
Also note that I have actually used the std:: prefix.
Your minimal program is making memory leaks. Because it just removes all the items from the list but did not release the memory used by them. In the case your items are big enough, your program might get crashed after eating up all the memory. That's why your minimal program is still okay, because the items there are very small.
I would change your program to:
typedef struct st {
int it;
char ch;
char charr[100];
vector *vs;
} st;
bool function(st *&s1, st *&s2) {
static int i = 1;
cout<it > s2->it;
}
int main(int argc, char **argv) {
vector ar;
for (int i = 0; i < 100; ++i) {
st *s = new st;
s->it = urandom32();
ar.push_back(s);
}
release all the memory used my ar's items first
for (vector::iterator it = ar.begin(); it != ar.end(); ++it)
delete *it;
ar.clear();
for (int i = 0; i < 100; ++i) {
st *s = new st;
s->it = urandom32();
ar.push_back(s);
}
sort(ar.begin(), ar.end(), function);
return 0;
}

C++ std::set<string> Alphanumeric custom comparator

I'm solving a problem with a sorting non-redundant permutation of String Array.
For example, if input string is "8aC", then output should be order like {"Ca8","C8a", "aC8", "a8C", "8Ca", "9aC"}.I chose C++ data structure set because each time I insert the String into std:set, set is automatically sorted and eliminating redundancy. The output is fine.
But I WANT TO SORT SET IN DIFFERENT ALPHANUMERIC ORDER which is different from default alphanumeric sorting order. I want to customize the comparator of set the order priority like: upper case> lower case > digit.
I tried to customize comparator but it was quite frustrating. How can I customize the sorting order of the set? Here's my code.
set<string, StringCompare> setl;
for (i = 0; i < f; i++)
{
setl.insert(p[i]); //p is String Array. it has the information of permutation of String.
}
for (set<string>::iterator iter = setl.begin(); iter != setl.end(); ++iter)
cout << *iter << endl; //printing set items. it works fine.
struct StringCompare
{
bool operator () (const std::string s_left, const std::string s_right)
{
/*I want to use my character comparison function in here, but have no idea about that.
I'm not sure about that this is the right way to customize comparator either.*/
}
};
int compare_char(const char x, const char y)
{
if (char_type(x) == char_type(y))
{
return ( (int) x < (int) y) ? 1 : 0 ;
}
else return (char_type(x) > char_type(y)) ? 1 : 0;
}
int char_type(const char x)
{
int ascii = (int)x;
if (ascii >= 48 && ascii <= 57) // digit
{
return 1;
}
else if (ascii >= 97 && ascii <= 122) // lowercase
{
return 2;
}
else if (ascii >= 48 && ascii <= 57) // uppercase
{
return 3;
}
else
{
return 0;
}
}
You are almost there, but you should compare your string lexicographically.
I roughly added small changes to your code.
int char_type( const char x )
{
if ( isupper( x ) )
{
// upper case has the highest priority
return 0;
}
if ( islower( x ) )
{
return 1;
}
if ( isdigit( x ) )
{
// digit has the lowest priority
return 2;
}
// something else
return 3;
}
bool compare_char( const char x, const char y )
{
if ( char_type( x ) == char_type( y ) )
{
// same type so that we are going to compare characters
return ( x < y );
}
else
{
// different types
return char_type( x ) < char_type( y );
}
}
struct StringCompare
{
bool operator () ( const std::string& s_left, const std::string& s_right )
{
std::string::const_iterator iteLeft = s_left.begin();
std::string::const_iterator iteRight = s_right.begin();
// we are going to compare each character in strings
while ( iteLeft != s_left.end() && iteRight != s_right.end() )
{
if ( compare_char( *iteLeft, *iteRight ) )
{
return true;
}
if ( compare_char( *iteRight, *iteLeft ) )
{
return false;
}
++iteLeft;
++iteRight;
}
// either of strings reached the end.
if ( s_left.length() < s_right.length() )
{
return true;
}
// otherwise.
return false;
}
};
Your comparator is right. I would turn parameters to const ref like this
bool operator () (const std::string &s_left, const std::string &s_right)
and start by this simple implementation:
return s_left < s_right
This will give the default behaviour and give you confidence you are on the right track.
Then start comparing one char at the time with a for loop over the shorter between the length of the two strings. You can get chars out the string simply with the operator[] (e.g. s_left[i])
You're very nearly there with what you have.
In your comparison functor you are given two std::strings. What you need to do is to find the first position where the two strings differ. For that, you can use std::mismatch from the standard library. This returns a std::pair filled with iterators pointing to the first two elements that are different:
auto iterators = std::mismatch(std::begin(s_left), std::end(s_left),
std::begin(s_right), std::end(s_right));
Now, you can dereference the two iterators we've been given to get the characters:
char c_left = *iterators.first;
char c_right = *iterators.second;
You can pass those two characters to your compare_char function and it should all work :-)
Not absoloutely sure about this, but you may be able to use an enumerated class towards your advantage or an array and choose to read from certain indices in which ever order you like.
You can use one enumerated class to define the order you would like to output data in and another that contains the data to be outputed, then you can set a loop that keeps on looping to assign the value to the output in a permuted way!
namespace CustomeType
{
enum Outs { Ca8= 0,C8a, aC8, a8C, 8Ca, 9aC };
enum Order{1 = 0 , 2, 3 , 4 , 5};
void PlayCard(Outs input)
{
if (input == Ca8) // Enumerator is visible without qualification
{
string[] permuted;
permuted[0] = Outs[0];
permuted[1] = Outs[1];
permuted[2] = Outs[2];
permuted[3] = Outs[3];
permuted[4] = Outs[4];
}// else use a different order
else if (input == Ca8) // this might be much better
{
string[] permuted;
for(int i = 0; i<LessThanOutputLength; i++)
{
//use order 1 to assign values from Outs
}
}
}
}
This should work :
bool operator () (const std::string s_left, const std::string s_right)
{
for(int i = 0;i < s_left.size();i++){
if(isupper(s_left[i])){
if(isupper(s_right[i])) return s_left[i] < s_right[i];
else if(islower(s_right[i]) || isdigit(s_right[i]))return true;
}
else if(islower(s_left[i])){
if(islower(s_right[i])) return s_left[i] < s_right[i];
else if(isdigit(s_right[i])) return true;
else if(isupper(s_right[i])) return false;
}
else if(isdigit(s_left[i])){
if(isdigit(s_right[i])) return s_left[i] < s_right[i];
else if(islower(s_right[i]) || isupper(s_right[i])) return false;
}
}
}

if() skipping my variable check

I have following code:
std::vector<std::string> GetSameID(std::vector<string>& allFiles, int id) {
std::vector<std::string> returnVector;
for(std::vector<string>::iterator it = allFiles.begin(); it != allFiles.end(); ++it) {
if(GetID(*it) == id) {
int index = (*it).find("_CH2.raw");
if(index > 0) {
continue; //this works
}
if(0 < ((*it).find("_CH2.raw"))) {
continue; //this doesn't
}
string ext = PathFindExtension((*it).c_str());
if(ext == ".raw") {
returnVector.push_back(*it);
}
}
}
return returnVector;
}
My issue is, why is the if(0 < ((*it).find("_CH2.raw"))) not working that way? My files are named
ID_0_X_0_Y_128_CH1.raw
ID_0_X_0_Y_128_CH2.raw
(different ID, X and Y, for Channel 1 and Channel 2 on the oscilloscope).
When I do it the long way around (assign index, and then check index), it works, I don't understand though why the short version, which is more readable imo, is not working.
According to http://en.cppreference.com/w/cpp/string/basic_string/find, string::find() returns a size_t -- which is an unsigned type -- so it can never be less-than zero.
When it doesn't find something, it returns string::npos, which is also an unsigned type, but when you shove it into an int (implicitly converting it) it becomes a negative value -- this is why your first set of code works.

How to find out whether a member function is const or volatile with libclang?

I have an instance of CXCursor of kind CXCursor_CXXMethod. I want to find out if the function is const or volatile, for example:
class Foo {
public:
void bar() const;
void baz() volatile;
void qux() const volatile;
};
I could not find anything useful in the documentation of libclang. I tried clang_isConstQualifiedType and clang_isVolatileQualifiedType but these always seem to return 0 on C++ member function types.
I can think of two approaches:
Using the libclang lexer
The code which appears in this SO answer works for me; it uses the libclang tokenizer to break a method declaration apart, and then records any keywords outside of the method parentheses.
It does not access the AST of the code, and as far as I can tell doesn't involve the parser at all. If you are sure the code you investigate is proper C++, I believe this approach is safe.
Disadvantages: This solution does not appear to take into account preprocessing directives, so the code has to be processed first (e.g., passed through cpp).
Example code (the file to parse must be the first argument to your program, e.g. ./a.out bla.cpp):
#include "clang-c/Index.h"
#include <string>
#include <set>
#include <iostream>
std::string GetClangString(CXString str)
{
const char* tmp = clang_getCString(str);
if (tmp == NULL) {
return "";
} else {
std::string translated = std::string(tmp);
clang_disposeString(str);
return translated;
}
}
void GetMethodQualifiers(CXTranslationUnit translationUnit,
std::set<std::string>& qualifiers,
CXCursor cursor) {
qualifiers.clear();
CXSourceRange range = clang_getCursorExtent(cursor);
CXToken* tokens;
unsigned int numTokens;
clang_tokenize(translationUnit, range, &tokens, &numTokens);
bool insideBrackets = false;
for (unsigned int i = 0; i < numTokens; i++) {
std::string token = GetClangString(clang_getTokenSpelling(translationUnit, tokens[i]));
if (token == "(") {
insideBrackets = true;
} else if (token == "{" || token == ";") {
break;
} else if (token == ")") {
insideBrackets = false;
} else if (clang_getTokenKind(tokens[i]) == CXToken_Keyword &&
!insideBrackets) {
qualifiers.insert(token);
}
}
clang_disposeTokens(translationUnit, tokens, numTokens);
}
int main(int argc, char *argv[]) {
CXIndex Index = clang_createIndex(0, 0);
CXTranslationUnit TU = clang_parseTranslationUnit(Index, 0,
argv, argc, 0, 0, CXTranslationUnit_None);
// Set the file you're interested in, and the code location:
CXFile file = clang_getFile(TU, argv[1]);
int line = 5;
int column = 6;
CXSourceLocation location = clang_getLocation(TU, file, line, column);
CXCursor cursor = clang_getCursor(TU, location);
std::set<std::string> qualifiers;
GetMethodQualifiers(TU, qualifiers, cursor);
for (std::set<std::string>::const_iterator i = qualifiers.begin(); i != qualifiers.end(); ++i) {
std::cout << *i << std::endl;
}
clang_disposeTranslationUnit(TU);
clang_disposeIndex(Index);
return 0;
}
Using libclang's Unified Symbol Resolution (USR)
This approach involves using the parser itself, and extracting qualifier information from the AST.
Advantages: Seems to work for code with preprocessor directives, at least for simple cases.
Disadvantages: My solution parses the USR, which is undocumented, and might change in the future. Still, it's easy to write a unit-test to guard against that.
Take a look at $(CLANG_SRC)/tools/libclang/CIndexUSRs.cpp, it contains the code that generates a USR, and therefore contains the information required to parse the USR string. Specifically, lines 523-529 (in LLVM 3.1's source downloaded from www.llvm.org) for the qualifier part.
Add the following function somewhere:
void parseUsrString(const std::string& usrString, bool* isVolatile, bool* isConst, bool *isRestrict) {
size_t bangLocation = usrString.find("#");
if (bangLocation == std::string::npos || bangLocation == usrString.length() - 1) {
*isVolatile = *isConst = *isRestrict = false;
return;
}
bangLocation++;
int x = usrString[bangLocation];
*isConst = x & 0x1;
*isVolatile = x & 0x4;
*isRestrict = x & 0x2;
}
and in main(),
CXString usr = clang_getCursorUSR(cursor);
const char *usr_string = clang_getCString(usr);
std::cout << usr_string << "\n";
bool isVolatile, isConst, isRestrict;
parseUsrString(usr_string, &isVolatile, &isConst, &isRestrict);
printf("restrict, volatile, const: %d %d %d\n", isRestrict, isVolatile, isConst);
clang_disposeString(usr);
Running on Foo::qux() from
#define BLA const
class Foo {
public:
void bar() const;
void baz() volatile;
void qux() BLA volatile;
};
produces the expected result of
c:#C#Foo#F#qux#5
restrict, volatile, const: 0 1 1
Caveat: you might have noticed that libclang's source suggets my code should be isVolatile = x & 0x2 and not 0x4, so it might be the case you should replace 0x4 with 0x2. It's possible my implementation (OS X) has them replaced.