Type Spelling in ruby-bindgen
This document explains why generating correct, fully-qualified C++ type spellings from libclang is inherently complex, and how ruby-bindgen addresses that complexity.
TL;DR
- There is no single libclang API that always returns the right C++ spelling for code generation.
- Canonical types are often semantically correct but syntactically wrong for generated bindings.
ruby-bindgenreconstructs type spellings based on cursor kind and context, with canonicalization only as a fallback.
The objective is not to compute a canonical or normalized type. The objective is to emit a C++ spelling that:
- Compiles correctly at the binding site
- Preserves user-facing API intent (typedefs, aliases, dependent names)
- Is stable across compilers and standard library implementations
The Problem
When generating Rice bindings, ruby-bindgen must emit fully-qualified C++ type names that are valid in generated code.
For example:
// Required
Constructor<cv::Mat, const cv::Range&, const cv::Range&>
// Incorrect (missing namespace qualification)
Constructor<Mat, const Range&, const Range&>
libclang exposes multiple APIs for retrieving type information, but none provide a complete, correct spelling for all C++ constructs.
This is a fundamental consequence of C++’s type system and libclang’s design goals.
Why a Canonical-Based Approach Fails
An initial strategy was to rely on type.canonical.spelling and then filter out implementation details. This approach fails in several unavoidable cases.
Failure Modes
| Feature | Result from canonical.spelling |
Why this is incorrect |
|---|---|---|
| Typedefs / aliases | SizeArray → int[3] |
Destroys alias intent and public API spelling |
| Templates | iterator → std::iterator |
Loses specialization and template context |
| Dependent types | Missing typename, incorrect qualification |
Canonical types erase dependency information |
| Namespaces | Over- or under-qualified names | Ignores lexical context |
In addition, canonical spellings frequently expose implementation details that must never appear in generated bindings:
__gnu_cxx::__normal_iterator<...>(libstdc++)_Ty,_Alloc,_Vector_iterator(MSVC STL)
Filtering these reliably is not possible without reconstructing the original spelling logic, which defeats the purpose of canonicalization.
Key Insight
canonical.spelling answers:
“What is this type semantically?”
ruby-bindgen must answer:
“How must this type be written so that user code compiles correctly and reflects the original API?”
These questions are fundamentally different.
What libclang Provides
libclang exposes several partial representations of a type, each optimized for a different purpose:
| API | Returns | Limitation |
|---|---|---|
type.spelling |
Source-level spelling | Often unqualified |
type.canonical.spelling |
Fully desugared type | Erases typedefs and context |
declaration.qualified_name |
Namespace-qualified name | Drops template arguments |
declaration.qualified_display_name |
Name + template parameters | May omit enclosing namespaces |
No single API preserves both spelling fidelity and correct qualification.
ruby-bindgen’s Strategy
ruby-bindgen does not attempt to normalize all types through a single representation.
Instead, type_spelling reconstructs the correct spelling based on cursor kind and context, using canonical information only as a constrained fallback.
Design Principle
Spelling fidelity is primary; canonicalization is secondary and opportunistic.
Cursor-Specific Handling
cursor_class_template
Template definitions (e.g., template<typename T> class Vec).
- Reconstruct template arguments explicitly
- Use
qualify_dependent_types_in_template_args - Do not consult
@type_name_map(template parameters must remain dependent)
cursor_typedef_decl inside a class template
Dependent typedefs (e.g., DataType<_Tp>::value_type).
- Emit
typename(required by the C++ standard) - Combine
qualified_namewithqualified_display_name - Preserve dependency instead of resolving it
cursor_typedef_decl (non-dependent)
Public typedefs (e.g., typedef Point_<int> Point2i).
- Preserve the typedef name
- Do not desugar to the underlying type
- Qualify template arguments via
@type_name_map
cursor_type_alias_decl
C++11 using declarations.
- Treated identically to
cursor_typedef_decl - Required for cross-compiler support (MSVC favors
using)
cursor_class_decl and related types
Concrete types and template instantiations.
- Start from
fully_qualified_name - Optionally consult
canonical.spelling- Only when it does not introduce implementation types
- Qualify template arguments using
qualify_template_args
The @type_name_map
During translation unit processing, ruby-bindgen builds a map from simple identifiers to fully-qualified names:
{
"Range" => "cv::Range",
"Mat" => "cv::Mat",
"Pixel" => "iter::Pixel"
}
This map is used to qualify unqualified template arguments, not to rewrite dependent names or template parameters.
Where Canonicalization Works
canonical.spelling is useful only in limited cases:
- Non-dependent
cursor_class_decltypes - Situations where alias preservation is irrelevant
- As a fallback sanity check for namespace qualification
It is not the primary source of truth.
Why Not Use Clang’s C++ API?
Clang’s C++ API (libTooling) provides:
PrintingPolicy- Direct AST printers for fully-qualified names
- Precise control over dependent type emission
However:
ruby-bindgenusesffi-clang, which exposes only libclang’s C API- libclang is designed for IDEs and static analysis tools, not code generation
- Reimplementing spelling logic is unavoidable in this environment
ruby-bindgen’s type spelling logic exists specifically to bridge this gap.
Summary
There is no single libclang call that can produce correct C++ type spellings in all cases.
The complexity in ruby-bindgen is inherent:
- C++ has typedefs, aliases, templates, dependent types, and contextual name lookup
- libclang exposes these differently depending on cursor kind
- Correct code generation requires spelling reconstruction, not canonicalization
The current design reflects these constraints deliberately.
Code Locations
All type-spelling logic lives in the RubyBindgen::Generators::Rice::TypeSpeller class:
lib/ruby-bindgen/generators/rice/type_speller.rb- Top-level entry points:
type_spelling,type_spellings,qualified_class_name,qualified_display_name - Declared / unexposed type handling:
type_spelling_declared,type_spelling_unexposed,type_spelling_pointer - Template handling:
qualify_template_args,qualify_dependent_types_in_template_args,qualify_template_parameter_packs,qualify_class_template_typedefs - Static member qualification:
qualify_class_static_members
The fully_qualified_name helper used by TypeSpeller is provided
directly by FFI::Clang::Type (ffi-clang ≥ 0.16).