This article is about “dissecting” C++ records (classes and structs) to determine their constituent constructors, methods, fields, etc. Libclang is, of course, designed to do this. But, libclang does not understand the extra “keywords” and pseudo-macros inherent to Qt programming; I will describe the solution I found to parsing this non-standard code.
In case you are not familiar with Qt, the designers added several “keywords” to C++ to enable the Qt signal/slot concept. I won’t go into details on this subject here, please refer to this posting for more information: Signals & Slots. In short, you use the signals: “access specification” to introduce one or more signal specifications to a class, and the [public|protected|private] slots: access specification to introduce one or more slots (to which signal emissions are dispatched). There are also a number of Qt pseudo-macros, chief among them Q_PROPERTY, that are of interest when parsing a QObject.
The Qt framework uses a program called moc (for Meta Object Compiler) to parse QObject declarations. Moc understands, among other things, the signals and slots keywords and generates a new C++ file called moc_<header name>.cpp. Here is a good introduction to moc: Meta Object Compiler. Moc isn’t very flexible, it doesn’t work as an exploration tool of the sort I want. That’s where libclang came into the picture. My goal isn’t to replace moc, at least not at this point. Nor am I trying to create a Clang plugin, such that the process of parsing QObject’s would be integrated into the C++ compilation process. I want to create a simple code indexing system, and corresponding API, to enable simple programmatic access to a code base.
With libclang, you can programmatically explore the structure of any valid C/C++/ObjectiveC code. Libclang helps move C++ tooling into a new dimension. However, a Qt QObject declaration is NOT valid C++ code, due to the custom signal/slot keywords added to it. Libclang will not parse a QObject header. No problem, simply redefine signals: to public:, for instance, as moc does! But, you then lose the information about which methods in a class are signals, which are slots, and which are “normal” methods.
I started with the clang_tokenize function; it takes a libclang CXCursor and produces an array of tokens, including the signal and slot tokens. At this point I could tell that a class declaration contained signals and/or slots, and their location in the source code. But, I still didn’t have enough information; I briefly tried “parsing” the token array myself to figure out
Poring over the Clang Doxygen pages, I found clang_annotateTokens. This function takes the list of tokens found with clang_tokenize, and annotates them with CXCursor information. Now we can tell what cursor a token is associated with, the type of the cursor, and other information. We now can see the signals: and slots: markers in the code, AND completely understand the method signatures and any inline code.
I wrote a simple program that demonstrates annotation; I provide with the source a simple header file, Counter.h, to demonstrate C++ token annotation:
#include <QObject> class Counter : public QObject { Q_OBJECT public: Counter(); explicit Counter(int initialValue=0); void increment(); void reset(); int getCount() const; public slots: void add(int i); int subtract(int i); signals: void countUpdated(int i); private: int count_; };
Invoking ClangAnnotation Counter.h Counter results in this output:
1 Found class Counter 2 Token Cursor Cursor Kind Cursor Type 3 ================================================================================= 4 class Counter ClassDecl Counter 5 Counter Counter ClassDecl Counter 6 : Counter ClassDecl Counter 7 public class QObject C++ base class specifier QObject 8 QObject class QObject TypeRef QObject 9 { Counter ClassDecl Counter 10 Q_OBJECT Counter ClassDecl Counter 11 public CXXAccessSpecifier 12 : CXXAccessSpecifier 13 Counter Counter CXXConstructor void () 14 ( Counter CXXConstructor void () 15 ) Counter CXXConstructor void () 16 ; Counter ClassDecl Counter 17 explicit Counter CXXConstructor void (int) 18 Counter Counter CXXConstructor void (int) 19 ( Counter CXXConstructor void (int) 20 int initialValue ParmDecl int 21 initialValue initialValue ParmDecl int 22 = initialValue ParmDecl int 23 0 IntegerLiteral int 24 ) Counter CXXConstructor void (int) 25 ; Counter ClassDecl Counter 26 void increment CXXMethod void () 27 increment increment CXXMethod void () 28 ( increment CXXMethod void () 29 ) increment CXXMethod void () 30 ; Counter ClassDecl Counter 31 void reset CXXMethod void () 32 reset reset CXXMethod void () 33 ( reset CXXMethod void () 34 ) reset CXXMethod void () 35 ; Counter ClassDecl Counter 36 int getCount CXXMethod int () const 37 getCount getCount CXXMethod int () const 38 ( getCount CXXMethod int () const 39 ) getCount CXXMethod int () const 40 const getCount CXXMethod int () const 41 ; Counter ClassDecl Counter 42 public CXXAccessSpecifier 43 slots CXXAccessSpecifier 44 : CXXAccessSpecifier 45 void add CXXMethod void (int) 46 add add CXXMethod void (int) 47 ( add CXXMethod void (int) 48 int i ParmDecl int 49 i i ParmDecl int 50 ) add CXXMethod void (int) 51 ; Counter ClassDecl Counter 52 int subtract CXXMethod int (int) 53 subtract subtract CXXMethod int (int) 54 ( subtract CXXMethod int (int) 55 int i ParmDecl int 56 i i ParmDecl int 57 ) substract CXXMethod int (int) 58 ; Counter ClassDecl Counter 59 signals Counter ClassDecl Counter 60 : CXXAccessSpecifier 61 void countUpdated CXXMethod void (int) 62 countUpdated countUpdated CXXMethod void (int) 63 ( countUpdated CXXMethod void (int) 64 int i ParmDecl int 65 i i ParmDecl int 66 ) countUpdated CXXMethod void (int) 67 ; Counter ClassDecl Counter 68 private CXXAccessSpecifier 69 : CXXAccessSpecifier 70 int count_ FieldDecl int 71 count_ count_ FieldDecl int 72 ; Counter ClassDecl Counter 73 } Counter ClassDecl Counter 74 ; InvalidFile
Had we instead invoked ClangAnnotation Counter.h (without the class name, the second argument), we would see all classes that are explicitly or implicitly pulled into the source code by #include statements. ClangAnnotation is written to extract only class declarations. The table clearly shows the many to one relationship of tokens to cursor. Note lines 42-44 and 59 & 60; the signals and slots tokens are easily found. Notice the difference from lines 11 and 12, a valid C++ access specifier. Clang tokenizes the Qt keywords fine, but will become confused when it tries to figure out the syntax and semantics of 42-43 and and 59-60.
Now let’s look at source; I’m not going to go into all the details, refer to these other articles I wrote to learn more about the basics of libclang:
An Introduction to Clang Part 2
Let’s look at an important part of main. Notice the difference from lines 11 and 12 in the program output above, a valid C++ access specifier. Clang tokenizes the Qt keywords fine, but will become confused when it tries to figure out the syntax and semantics of 42-43 and and 59-60. As soon as Clang encounters this syntax error, it will stop parsing. We want it to keep going past that point. So, we do what the moc tool does, we redefine the Qt keywords that cause the problem with the ‘-D’ options. We also specify include paths to our Qt installation.
73 const char* args[] = { "-c", "-x", "c++", 74 "-D", "MYEXPORTDECL=", 75 "-D", "slots=", 76 "-D", "signals=public", 77 "-D", "Q_OBJECT=", 78 "-D", "Q_PROPERTY(...)=", 79 "-I", "/usr/local/Trolltech/Qt-4.8.5/include", 80 "-I", "/usr/local/Trolltech/Qt-4.8.5/include/QtCore" };
Most of the work is done in the visitor function:
12 CXChildVisitResult visitor(CXCursor cursor, CXCursor parent, 13 CXClientData clientData) 14 { 15 string cursorSpelling = toStdString(cursor); 16 17 CXTranslationUnit tu = clang_Cursor_getTranslationUnit(cursor); 18 CXSourceRange range = clang_getCursorExtent(cursor); 19 20 if ( cursor.kind == CXCursor_ClassDecl ) 21 { 22 if ( className == "" || className == cursorSpelling ) 23 { 24 cout << "Found class " << cursorSpelling << endl; 25 26 // 27 // Now we'll tokenize the range, which encompasses the whole class, 28 // and annotate it. 29 // 30 CXToken* tokens = 0; 31 unsigned int numTokens; 32 clang_tokenize(tu, range, &tokens, &numTokens); 33 34 CXCursor cursors[numTokens]; 35 clang_annotateTokens(tu, tokens, numTokens, cursors); 36 37 cout << std::left << setw(18) << "Token" << setw(18) << "Cursor" << setw(25) << 38 "Cursor Kind" << setw(24) << "Cursor Type" << endl; 40 endl; 41 for ( unsigned int idx=0; idx<numTokens; ++idx ) 42 { 43 CXType type = clang_getCursorType(cursors[idx]); 44 string cursorSpelling = toStdString(cursors[idx]); 45 string tokenSpelling = toStdString(tokens[idx], tu); 46 string typeSpelling = toStdString(type); 47 cout << std::left << setw(18) << tokenSpelling << setw(18) << cursorSpellin g << 48 setw(25) << toStdString(cursors[idx].kind) << setw(24) << typeSpelling < 0 ) 51 cout << endl << endl; 52 } 53 } 54 55 return CXChildVisit_Continue; 56 } 57
Line 20 ensures that we process only class declarations, while line 22 determines whether to process only the class named on the command line, or all classes. Lines 30 to 32 tokenize the cursor that is passed in, which in this case is the cursor that represents the entire range of the class declaration. Lines 34 to 35 perform the annotation of the tokens, giving us an array of CXCursor objects the same size as the token array. Then we simply print the table. Line 55 tells libclang to continue processing the file.
My next article will describe how I further dissect the class declaration to obtain method and field information, and provide the beginnings of an API.
The source for this article can be found on GitHub at this URL: git@github.com:MarkVTech/ClangAnnotation.git.