Architecture
ruby-bindgen uses libclang, via ffi-clang, to parse C and C++ header files.
Libclang represents the Abstract Syntax Tree (AST) as a hierarchy of cursors. Each cursor is a node representing a declaration, statement, or expression. For example, parsing this header:
namespace cv {
class Mat {
Mat(int rows, int cols, int type);
int rows;
bool empty() const;
};
}
Produces a cursor tree like:
namespace (cv)
└── class_decl (Mat)
├── constructor (Mat)
│ ├── parm_decl (rows)
│ ├── parm_decl (cols)
│ └── parm_decl (type)
├── field_decl (rows)
└── cxx_method (empty)
For C and C++ bindings, ruby-bindgen uses the visitor pattern to walk this tree. Each cursor kind is dispatched to a corresponding visit_* method (e.g., visit_class_decl, visit_cxx_method), which generates binding code via ERB templates.
flowchart TD
subgraph Input
H["C/C++ headers"]
Y["bindings.yaml"]
end
subgraph Parsing
FC["ffi-clang"]
AST["AST"]
FC --> AST
end
subgraph "Code Generation"
V1["Visitor"]
ERB1["ERB templates"]
V1 --> ERB1
end
subgraph Output
F["Rice/FFI files"]
end
H --> FC
Y --> V1
AST --> V1
ERB1 --> F
For CMake bindings, ruby-bindgen runs as a second pass, scanning the output directory for previously generated *-rb.cpp files.
flowchart TD
subgraph Input
Y["bindings.yaml"]
S["*-rb.cpp"]
end
subgraph "Code Generation"
V2["CMake visitor"]
ERB2["ERB templates"]
V2 --> ERB2
end
subgraph Output
F["CMakeLists.txt / CMakePresets.json"]
end
Y --> V2
S --> V2
ERB2 --> F
Source Layout
The key classes live under lib/ruby-bindgen/. Each output format (Rice, FFI, CMake) has its own directory under visitors/ containing both the visitor implementation and its ERB templates.
lib/ruby-bindgen/
├── config.rb # YAML config loading
├── inputter.rb # Header file discovery
├── outputter.rb # File writing with cleanup
├── parser.rb # ffi-clang AST parsing
├── namer.rb # C++ → Ruby name conversion
├── version.rb
├── refinements/ # Extensions to ffi-clang classes
│ ├── cursor.rb
│ ├── type.rb
│ ├── translation_unit.rb
│ ├── string.rb
│ └── source_range.rb
└── visitors/
├── rice/ # C++ Rice binding generator
│ ├── rice.rb # Visitor (~2100 lines)
│ └── *.erb # ERB templates
├── ffi/ # C FFI binding generator
│ ├── ffi.rb # Visitor
│ └── *.erb # ERB templates
└── cmake/ # CMake build file generator
├── cmake.rb # Visitor
└── *.erb # ERB templates
Most visitor methods delegate to ERB templates for code generation. Each template receives the current cursor and any visitor state as local variables, and outputs a string of generated code.
For example, the Rice visitor's cxx_method.erb template generates a define_method call:
define_method<<%= method_signature(cursor) %>("<%= cursor.ruby_name %>", &<%= cursor.qualified_name %>,
<%= arguments(cursor) %>).
Key Classes
Config
The Config class loads the YAML configuration file and resolves platform-specific settings. It detects whether to use clang: or clang-cl: based on RUBY_PLATFORM, resolves relative paths against the config file's directory, and provides hash-like access to all configuration values.
Inputter
The Inputter class discovers header files to process. Given a base directory and glob patterns from the config (match: and skip:), it iterates over matching files and yields both absolute and relative paths.
Parser
The Parser class wraps ffi-clang's Index and drives the processing loop:
def generate(visitor)
visitor.visit_start
inputter.each do |path, relative_path|
translation_unit = @index.parse_translation_unit(path, clang_args, [],
[:detailed_preprocessing_record, :skip_function_bodies])
visitor.visit_translation_unit(translation_unit, path, relative_path)
end
visitor.visit_end
end
For each header file, it calls parse_translation_unit, which returns a translation unit object. Visitors access the root cursor via translation_unit.cursor.
Parse options include :skip_function_bodies (we only need declarations, not implementations) and :detailed_preprocessing_record (to see preprocessor directives).
Outputter
The Outputter class writes generated files to the output directory. It tracks all written paths and applies whitespace cleanup (removing excessive blank lines and blank lines before closing braces) to keep the output tidy.
Namer
The Namer class converts C++ names to Ruby conventions:
CamelCaseclass names stay as-is (Ruby classes are CamelCase)camelCaseandPascalCasemethod names becomesnake_caseisFoo()/hasFoo()becomefoo?- C++ operators map to Ruby operators (
operator+→+,operator==→==) operator[]maps to both[]and[]=(if the return type is a reference)operator()maps tocall- Conversion operators like
operator int()map toto_i,operator string()toto_s - C variable names for Rice classes use the
rb_cprefix (e.g.,rb_cCvMat)
Refinements to ffi-clang
ruby-bindgen extends ffi-clang's classes using Ruby refinements in lib/ruby-bindgen/refinements/:
- Cursor - adds
ruby_name,cruby_name,qualified_name,class_name_cpp, and methods for finding children by kind - Type - adds
fully_qualified_spellingfor reconstructing C++ type names with proper namespace qualification and template arguments - TranslationUnit - adds
includesto extract#includedirectives - String - adds
camelizeandunderscorefor name conversion - SourceRange - adds
textfor extracting source text from a range
Rice Visitor Details
The Rice visitor is the most complex (~2100 lines) because C++ has the most features to handle. Some notable aspects:
Traversal
The visitor traverses the AST recursively. Each cursor kind is dispatched to a visit_* method (e.g., visit_class_decl, visit_cxx_method) which checks whether the cursor should be skipped and, if not, renders an ERB template to generate the binding code.
Filtering
Before generating code for a cursor, the visitor applies several filters. See filtering for details.
Template Handling
C++ class templates require special treatment. See templates for details.
Type Spelling
Reconstructing correct C++ type names from libclang's type information is one of the trickiest parts of the codebase. The type_spelling family of methods handles:
- Namespace qualification (
cv::MatnotMat) - Template argument qualification (
std::vector<cv::Point>notstd::vector<Point>) - Typedef resolution (knowing when to use the alias vs the underlying type)
- Dependent types in templates (adding
typenamewhere required) - Elaborated types (
enum FoovsFoo)
Default Arguments
Libclang provides limited information about default argument values. ruby-bindgen extracts default values from the source text and wraps them in static_cast with fully qualified types to ensure they compile in the generated context:
// Original: void resize(int size, const Scalar& value = Scalar())
// Generated:
Arg("value") = static_cast<const cv::Scalar&>(cv::Scalar())