Architecture
ruby-bindgen uses libclang, via ffi-clang, to parse C and C++ header files.
Libclang represents the Abstract Syntax Tree (AST) as a hierarchy of cursors. Each cursor is a node representing a declaration, statement, or expression. For example, parsing this header:
namespace cv {
class Mat {
Mat(int rows, int cols, int type);
int rows;
bool empty() const;
};
}
Produces a cursor tree like:
namespace (cv)
└── class_decl (Mat)
├── constructor (Mat)
│ ├── parm_decl (rows)
│ ├── parm_decl (cols)
│ └── parm_decl (type)
├── field_decl (rows)
└── cxx_method (empty)
For C and C++ bindings, ruby-bindgen uses a visitor pattern-style traversal to walk this tree. Each cursor kind is dispatched to a corresponding visit_* method (e.g., visit_class_decl, visit_cxx_method), which generates binding code via ERB templates.
flowchart TD
subgraph Input
H["C/C++ headers"]
Y["bindings.yaml"]
end
subgraph Parsing
FC["ffi-clang"]
AST["AST"]
FC --> AST
end
subgraph "Code Generation"
V1["Generator"]
ERB1["ERB templates"]
V1 --> ERB1
end
subgraph Output
F["Rice/FFI files"]
end
H --> FC
Y --> V1
AST --> V1
ERB1 --> F
For CMake bindings, ruby-bindgen runs as a second pass, scanning the output directory for previously generated *-rb.cpp files.
flowchart TD
subgraph Input
Y["cmake-bindings.yaml"]
S["*-rb.cpp"]
end
subgraph "Code Generation"
V2["CMake generator"]
ERB2["ERB templates"]
V2 --> ERB2
end
subgraph Output
F["CMakeLists.txt / CMakePresets.json"]
end
Y --> V2
S --> V2
ERB2 --> F
Source Layout
The key classes live under lib/ruby-bindgen/. Each output format (Rice, FFI, CMake) has its own directory under generators/ containing both the generator implementation and its ERB templates.
lib/ruby-bindgen/
├── config.rb # YAML config loading
├── inputter.rb # Header file discovery
├── outputter.rb # File writing with cleanup
├── parser.rb # ffi-clang AST parsing
├── name_mapper.rb # Exact/regex name remapping
├── namer.rb # C++ → Ruby name conversion
├── symbols.rb # skip / version / override matching
├── symbol_entry.rb # Per-symbol skip/version/override record
├── symbol_candidates.rb # Candidate name generation for lookup
├── type_pointer_formatter.rb # Pointer type formatting helpers
├── version.rb
├── refinements/ # Extensions to ffi-clang and stdlib classes
│ ├── cursor.rb # Cursor: ruby_name, cruby_name, anonymous_definer, namer
│ └── string.rb # String: camelize, underscore, upcase_first
└── generators/
├── generator.rb # Base class shared by generators
├── rice/ # C++ Rice binding generator
│ ├── rice.rb # Main C++ generator (AST visitor); the helpers
│ │ # below are nested under Rice (e.g.
│ │ # RubyBindgen::Generators::Rice::TypeSpeller).
│ ├── type_speller.rb # Reconstructs qualified C++ type names
│ ├── type_index.rb # Index of typedefs and template instantiations
│ ├── signature_builder.rb # Builds Rice method signatures
│ ├── template_resolver.rb # Class template resolution
│ ├── iterator_collector.rb# Detects begin/end iterator pairs
│ ├── function_pointer.rb # Function pointer typedef handling
│ ├── reference_qualifier.rb# Reference / const qualifiers
│ └── *.erb # ERB templates
├── ffi/ # C FFI binding generator
│ ├── ffi.rb # Main C generator
│ └── *.erb # ERB templates
└── cmake/ # CMake build file generator
├── cmake.rb # Main CMake generator
├── guard.rb # Per-target build guards
└── *.erb # ERB templates
Most generator methods delegate to ERB templates for code generation. Each template receives the current cursor and any generator state as local variables, and outputs a string of generated code.
For example, the Rice generator's cxx_method.erb template generates a define_method call:
define_method<%= signature %>("<%= name %>", &<%= qualified_parent %>::<%= cursor.spelling %>,
<%= all_args.join(", ") %>)
Key Classes
Config
The Config class loads the YAML configuration file and resolves platform-specific settings. It detects whether to use clang: or clang-cl: based on RUBY_PLATFORM (mswin uses clang-cl:; Linux, macOS, and MinGW use clang:), resolves relative paths against the config file's directory, and provides hash-like access to all configuration values.
Inputter
The Inputter class discovers header files to process. Given a base directory and glob patterns from the config (match: and skip:), it iterates over matching files and yields both absolute and relative paths.
Parser
The Parser class wraps ffi-clang's Index and drives the processing loop:
def generate(visitor)
visitor.visit_start
inputter.each do |path, relative_path|
translation_unit = @index.parse_translation_unit(path, clang_args, [],
[:detailed_preprocessing_record, :skip_function_bodies])
visitor.visit_translation_unit(translation_unit, path, relative_path)
end
visitor.visit_end
end
For each header file, it calls parse_translation_unit, which returns a translation unit object. Visitors access the root cursor via translation_unit.cursor.
Parse options include :skip_function_bodies (we only need declarations, not implementations) and :detailed_preprocessing_record (to see preprocessor directives).
The parser also checks diagnostics after each translation unit and raises on fatal/error diagnostics.
Outputter
The Outputter class writes generated files to the output directory. It tracks all written paths and applies whitespace cleanup (removing excessive blank lines and blank lines before closing braces) to keep the output tidy.
Namer
The Namer class converts C++ names to Ruby conventions:
CamelCaseclass names stay as-is (Ruby classes are CamelCase)camelCaseandPascalCasemethod names becomesnake_caseisFoo()/is_foo()becomefoo?(anybool-returning method whose name starts withis, plus any zero-argbool-returning method, gets the?suffix)- C++ operators map to Ruby operators (
operator+→+,operator==→==) operator[]maps to both[]and[]=(if the return type is a reference)operator()maps tocall- Conversion operators like
operator int()map toto_i,operator string()toto_s - C variable names for Rice classes use the
rb_cprefix (e.g.,rb_cCvMat)
Extensions to ffi-clang and stdlib
The lib/ruby-bindgen/refinements/ directory holds open monkey-patches that
extend ffi-clang and Ruby's String. They are loaded globally — not actual
Ruby refinements — so any code that loads ruby-bindgen sees the additions.
- Cursor (
refinements/cursor.rb) — addsruby_name,cruby_name,anonymous_definer, and a class-levelnameraccessor that the generators set duringgenerate. - String (
refinements/string.rb) — addscamelize,underscore, andupcase_firstfor name conversion.
Reconstructing qualified C++ type names is handled by the TypeSpeller class
(lib/ruby-bindgen/generators/rice/type_speller.rb), which calls
FFI::Clang::Type#fully_qualified_name (provided directly by ffi-clang ≥
0.16) and post-processes the result. See Type Spelling.
Rice Generator Details
The Rice generator is the largest part of the codebase because C++ has the most features to handle. Some notable aspects:
Traversal
The generator traverses the AST recursively. Each cursor kind is dispatched to a visit_* method (e.g., visit_class_decl, visit_cxx_method) which checks whether the cursor should be skipped and, if not, renders an ERB template to generate the binding code.
Filtering
Before generating code for a cursor, the visitor applies several filters. See filtering for details.
Template Handling
C++ class templates require special treatment. See templates for details.
Type Spelling
Reconstructing correct C++ type names from libclang's type information is one of the trickiest parts of the codebase. The TypeSpeller class (lib/ruby-bindgen/generators/rice/type_speller.rb) handles:
- Namespace qualification (
cv::MatnotMat) - Template argument qualification (
std::vector<cv::Point>notstd::vector<Point>) - Typedef resolution (knowing when to use the alias vs the underlying type)
- Dependent types in templates (adding
typenamewhere required) - Elaborated types (
enum FoovsFoo)
Default Arguments
Libclang provides limited information about default argument values. ruby-bindgen extracts default values from the source text and wraps them in static_cast with fully qualified types to ensure they compile in the generated context:
// Original: void resize(int size, const Scalar& value = Scalar())
// Generated:
Arg("value") = static_cast<const cv::Scalar&>(cv::Scalar())