Dissecting C++ Classes and QObject’s with Libclang

This article is about “dissecting” C++ records (classes and structs) to determine their constituent constructors, methods, fields, etc.  Libclang is, of course, designed to do this.  But, libclang does not understand the extra “keywords” and pseudo-macros inherent to Qt programming; I will describe the solution I found to parsing this non-standard code.

In case you are not familiar with Qt, the designers added several “keywords” to C++ to enable the Qt signal/slot concept. I won’t go into details on this subject here, please refer to this posting for more information: Signals & Slots.  In short, you use the signals: “access specification” to introduce one or more signal specifications to a class, and the [public|protected|private] slots: access specification to introduce one or more slots (to which signal emissions are dispatched). There are also a number of Qt pseudo-macros, chief among them Q_PROPERTY, that are of interest when parsing a QObject.

The Qt framework uses a program called moc (for Meta Object Compiler) to parse QObject declarations.  Moc understands, among other things, the signals and slots keywords and generates a new C++ file called moc_<header name>.cpp.  Here is a good introduction to moc: Meta Object CompilerMoc isn’t very flexible, it doesn’t work as an exploration tool of the sort I want. That’s where libclang came into the picture.  My goal isn’t to replace moc, at least not at this point.  Nor am I trying to create a Clang plugin, such that the process of parsing QObject’s would be integrated into the C++ compilation process. I want to create a simple code indexing system, and corresponding API, to enable simple programmatic access to a code base.

With libclang, you can programmatically explore the structure of any valid C/C++/ObjectiveC code. Libclang helps move C++ tooling into a new dimension.  However, a Qt QObject declaration is NOT valid C++ code, due to the custom signal/slot keywords added to it.  Libclang will not parse a QObject header.  No problem, simply redefine signals: to public:, for instance, as moc does!  But, you then lose the information about which methods in a class are signals, which are slots, and which are “normal” methods.

I started with the clang_tokenize function; it takes a libclang CXCursor and produces an array of tokens, including the signal and slot tokens.  At this point I could tell that a class declaration contained signals and/or slots, and their location in the source code.  But, I still didn’t have enough information; I briefly tried “parsing” the token array myself to figure out

Poring over the Clang Doxygen pages, I found clang_annotateTokens.  This function takes the list of tokens found with clang_tokenize, and annotates them with CXCursor information.  Now we can tell what cursor a token is associated with, the type of the cursor, and other information.  We now can see the signals: and slots: markers in the code, AND completely understand the method signatures and any inline code.

I wrote a simple program that demonstrates annotation; I provide with the source a simple header file, Counter.h, to demonstrate C++ token annotation:

#include <QObject>

class Counter : public QObject
{
Q_OBJECT
public:
    Counter();
    explicit Counter(int initialValue=0);

    void increment();
    void reset();
    int getCount() const;

public slots:
    void add(int i);
    int subtract(int i);

signals:
    void countUpdated(int i);

private:
    int count_;
};

 

Invoking ClangAnnotation Counter.h Counter results in this output:

      1 Found class Counter
      2 Token             Cursor            Cursor Kind              Cursor Type
      3 =================================================================================
      4 class             Counter           ClassDecl                Counter
      5 Counter           Counter           ClassDecl                Counter
      6 :                 Counter           ClassDecl                Counter
      7 public            class QObject     C++ base class specifier QObject
      8 QObject           class QObject     TypeRef                  QObject
      9 {                 Counter           ClassDecl                Counter
     10 Q_OBJECT          Counter           ClassDecl                Counter
     11 public                              CXXAccessSpecifier
     12 :                                   CXXAccessSpecifier
     13 Counter           Counter           CXXConstructor           void ()
     14 (                 Counter           CXXConstructor           void ()
     15 )                 Counter           CXXConstructor           void ()
     16 ;                 Counter           ClassDecl                Counter
     17 explicit          Counter           CXXConstructor           void (int)
     18 Counter           Counter           CXXConstructor           void (int)
     19 (                 Counter           CXXConstructor           void (int)
     20 int               initialValue      ParmDecl                 int
     21 initialValue      initialValue      ParmDecl                 int
     22 =                 initialValue      ParmDecl                 int
     23 0                                   IntegerLiteral           int
     24 )                 Counter           CXXConstructor           void (int)
     25 ;                 Counter           ClassDecl                Counter
     26 void              increment         CXXMethod                void ()
     27 increment         increment         CXXMethod                void ()
     28 (                 increment         CXXMethod                void ()
     29 )                 increment         CXXMethod                void ()
     30 ;                 Counter           ClassDecl                Counter
     31 void              reset             CXXMethod                void ()
     32 reset             reset             CXXMethod                void ()
     33 (                 reset             CXXMethod                void ()
     34 )                 reset             CXXMethod                void ()
     35 ;                 Counter           ClassDecl                Counter
     36 int               getCount          CXXMethod                int () const
     37 getCount          getCount          CXXMethod                int () const
     38 (                 getCount          CXXMethod                int () const
     39 )                 getCount          CXXMethod                int () const
     40 const             getCount          CXXMethod                int () const
     41 ;                 Counter           ClassDecl                Counter
     42 public                              CXXAccessSpecifier
     43 slots                               CXXAccessSpecifier
     44 :                                   CXXAccessSpecifier
     45 void              add               CXXMethod                void (int)
     46 add               add               CXXMethod                void (int)
     47 (                 add               CXXMethod                void (int)
     48 int               i                 ParmDecl                 int
     49 i                 i                 ParmDecl                 int
     50 )                 add               CXXMethod                void (int)
     51 ;                 Counter           ClassDecl                Counter
     52 int               subtract          CXXMethod                int (int)
     53 subtract          subtract          CXXMethod                int (int)
     54 (                 subtract          CXXMethod                int (int)
     55 int               i                 ParmDecl                 int
     56 i                 i                 ParmDecl                 int
     57 )                 substract         CXXMethod                int (int)
     58 ;                 Counter           ClassDecl                Counter
     59 signals           Counter           ClassDecl                Counter
     60 :                                   CXXAccessSpecifier
     61 void              countUpdated      CXXMethod                void (int)
     62 countUpdated      countUpdated      CXXMethod                void (int)
     63 (                 countUpdated      CXXMethod                void (int)
     64 int               i                 ParmDecl                 int
     65 i                 i                 ParmDecl                 int
     66 )                 countUpdated      CXXMethod                void (int)
     67 ;                 Counter           ClassDecl                Counter
     68 private                             CXXAccessSpecifier
     69 :                                   CXXAccessSpecifier
     70 int               count_            FieldDecl                int
     71 count_            count_            FieldDecl                int
     72 ;                 Counter           ClassDecl                Counter
     73 }                 Counter           ClassDecl                Counter
     74 ;                                   InvalidFile

Had we instead invoked ClangAnnotation Counter.h (without the class name, the second argument), we would see all classes that are explicitly or implicitly pulled into the source code by #include statements. ClangAnnotation is written to extract only class declarations.  The table clearly shows the many to one relationship of tokens to cursor.   Note lines 42-44 and 59 & 60; the signals and slots tokens are easily found.  Notice the difference from lines 11 and 12, a valid C++ access specifier.  Clang tokenizes the Qt keywords fine, but will become confused when it tries to figure out the syntax and semantics of 42-43 and and 59-60.

Now let’s look at source; I’m not going to go into all the details, refer to these other articles I wrote to learn more about the basics of libclang:

An Introduction to Clang

An Introduction to Clang Part 2

Let’s look at an important part of main.  Notice the difference from lines 11 and 12 in the program output above, a valid C++ access specifier.  Clang tokenizes the Qt keywords fine, but will become confused when it tries to figure out the syntax and semantics of 42-43 and and 59-60. As soon as Clang encounters this syntax error, it will stop parsing.  We want it to keep going past that point.  So, we do what the moc tool does, we redefine the Qt keywords that cause the problem with the ‘-D’ options.  We also specify include paths to our Qt installation.

     73     const char* args[] = { "-c", "-x", "c++",
     74         "-D", "MYEXPORTDECL=",
     75         "-D", "slots=",
     76         "-D", "signals=public",
     77         "-D", "Q_OBJECT=",
     78         "-D", "Q_PROPERTY(...)=",
     79         "-I", "/usr/local/Trolltech/Qt-4.8.5/include",
     80         "-I", "/usr/local/Trolltech/Qt-4.8.5/include/QtCore" };

Most of the work is done in the visitor function:

     12 CXChildVisitResult visitor(CXCursor cursor, CXCursor parent,
     13     CXClientData clientData)
     14 {
     15     string cursorSpelling = toStdString(cursor);
     16 
     17     CXTranslationUnit tu = clang_Cursor_getTranslationUnit(cursor);
     18     CXSourceRange range = clang_getCursorExtent(cursor);
     19 
     20     if ( cursor.kind == CXCursor_ClassDecl )
     21     {
     22         if ( className == "" || className == cursorSpelling )
     23         {
     24             cout << "Found class " << cursorSpelling << endl;
     25 
     26             //
     27             // Now we'll tokenize the range, which encompasses the whole class,
     28             // and annotate it.
     29             //
     30             CXToken* tokens = 0;
     31             unsigned int numTokens;
     32             clang_tokenize(tu, range, &tokens, &numTokens);
     33 
     34             CXCursor cursors[numTokens];
     35             clang_annotateTokens(tu, tokens, numTokens, cursors);
     36 
     37             cout << std::left << setw(18) << "Token" << setw(18) << "Cursor" << setw(25) <<
     38                 "Cursor Kind"  << setw(24) << "Cursor Type" << endl;
     40                 endl;
     41             for ( unsigned int idx=0; idx<numTokens; ++idx )
     42             {
     43                 CXType type = clang_getCursorType(cursors[idx]);
     44                 string cursorSpelling = toStdString(cursors[idx]);
     45                 string tokenSpelling = toStdString(tokens[idx], tu);
     46                 string typeSpelling = toStdString(type);
     47                 cout << std::left << setw(18) << tokenSpelling << setw(18) << cursorSpellin        g <<
     48                     setw(25) << toStdString(cursors[idx].kind) << setw(24) <<  typeSpelling         < 0 )
     51                 cout << endl << endl;
     52         }
     53     }
     54 
     55     return CXChildVisit_Continue;
     56 }
     57 

Line 20 ensures that we process only class declarations, while line 22 determines whether to process only the class named on the command line, or all classes. Lines 30 to 32 tokenize the cursor that is passed in, which in this case is the cursor that represents the entire range of the class declaration. Lines 34 to 35 perform the annotation of the tokens, giving us an array of CXCursor objects the same size as the token array.  Then we simply print the table.  Line 55 tells libclang to continue processing the file.

My next article will describe how I further dissect the class declaration to obtain method and field information, and provide the beginnings of an API.

The source for this article can be found on GitHub at this URL: git@github.com:MarkVTech/ClangAnnotation.git.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.