Enumerators

Although not used that frequently, Ruby supports enumerators that enable both internal and external iteration. The easiest way to create an enumerator is to not pass a block to an enumerable method. For example:

a = [1, 2, 3, 4, 5]

# Common way to iterate
a.each do |i|
  puts i
end

# Get an enumerator instead
enumerator = a.each

# Now  use it
enumerator.map |i|
  i * 2
end

Rice has built-in support to returning enumerators for STL containers such as std::vector, std::map and std::unordered_map.

Implementing enumerators is tricky - and in fact requires a number of Rice features. So let’s take a look at how enumerator support is implemented for std::vector.

Implementation

Let’s start with looking at the code:

define_method("each", [](T& vector) -> const std::variant<std::reference_wrapper<T>, Object>
{
  if (!rb_block_given_p())
  {
    auto rb_size_function = [](VALUE recv, VALUE argv, VALUE eobj) -> VALUE
    {
      // Since we can't capture the vector from above (because then we can't send
      // this lambda to rb_enumeratorize_with_size), extract it from recv
      T* receiver = Data_Object<T>::from_ruby(recv);
      return detail::To_Ruby<size_t>().convert(receiver->size());
    };

    return rb_enumeratorize_with_size(detail::selfThread, Identifier("each").to_sym(), 0, nullptr, rb_size_function);
  }

  for (Value_T& item : vector)
  {
    VALUE element = detail::To_Ruby<Value_T>().convert(item);
    detail::protect(rb_yield, element);
  }

  return std::ref(vector);
});

We will go through each section in more detail below.

Method Signature

First, Rice defines an each method to support Ruby’s enumerable module. Its signature looks like this:

define_method("each", [](T& vector) -> const std::variant<std::reference_wrapper<T>, Object>

Since std::vector doesn’t have an each method, Rice creates a lambda function instead that interacts with the vector. The vector is passed in by reference T& to avoid a copy.

Even more interestingly, notice the return type is a std::variant. This is needed because the method can either return a Ruby enumerator or the vector.

In the first case, returning the vector is the same as returning this from a C++ member function or self from a Ruby function. This allows methods to be chained together - for example vector.a.b.

We have to return a reference to the vector and not a copy. Besides being potentially wasteful, a copy would result in creating a new Ruby object. Thus self would no longer be self - which would be quite unexpected. However, std::variants cannot container references and thus what we need to return is a std::reference_wrapper<T>.

In the second case, we want to return a new Ruby enumerator which has a type of VALUE. However, we can’t directly return a VALUE because Rice will interpret it as an unsigned long long (which in fact is what it is). Instead, we return a Rice::Object. For more information refer to Return class.

Creating an Enumerator

Next, let’s look at the code that returns an enumerator:

if (!rb_block_given_p())
{
  auto rb_size_function = [](VALUE recv, VALUE argv, VALUE eobj) -> VALUE
  {
    // Since we can't capture the vector from above (because then we can't send
    // this lambda to rb_enumeratorize_with_size), extract it from recv
    T* receiver = Data_Object<T>::from_ruby(recv);
    return detail::To_Ruby<size_t>().convert(receiver->size());
  };

  return rb_enumeratorize_with_size(detail::selfThread, Identifier("each").to_sym(), 0, nullptr, rb_size_function);
}

If a block is not provided by the user, then the method should return an enumerator. The enumerator is created like this:

return rb_enumeratorize_with_size(detail::selfThread, Identifier("each").to_sym(), 0, nullptr, rb_size_function);

Notice the first parameter of rb_enumeratorize_with_size requires a Ruby instance and not a C++ instance. The correct Ruby instance is the one that is wrapping the C++ instance which is stored in a thread-local variable called selfThread.

Supporting Enumerator Size

The rb_enumeratorize_with_size call includes an optional pointer to a function that can return the size of the enumerated object, in this case the vector. That is implemented as another lambda function:

auto rb_size_function = [](VALUE recv, VALUE argv, VALUE eobj) -> VALUE
{
  // Since we can't capture the vector from above (because then we can't send
  // this lambda to rb_enumeratorize_with_size), extract it from recv
  T* receiver = Data_Object<T>::from_ruby(recv);
  return detail::To_Ruby<size_t>().convert(receiver->size());
};

Since this lambda is being sent to C code, it cannot capture any local variables. Thus it does not have direct access to the T& vector parameter. Instead, it needs to extract the vector from the Ruby object wrapping the vector:

T* receiver = Data_Object<T>::from_ruby(recv);

It then needs to determine the vector size and return it back as a Ruby object:

return detail::To_Ruby<size_t>().convert(receiver->size());

Yielding to a Block

Finally we get to the most common use case by far - yielding values to a passed in block:

for (Value_T& item : vector)
{
  VALUE element = detail::To_Ruby<Value_T>().convert(item);
  detail::protect(rb_yield, element);
}

The code is fairly simple. Iterate over each item in the vector by reference (no copies!), wrap it in a Ruby object, and return it to the block. Note the call to rb_yield is done via detail::protect in case Ruby raises an exception.

Returning Self

Last we return self is a common practice in Ruby to enable method chaining. Self in this case is the Ruby object wrapping the vector. By returning a reference to the vector, Rice is smart enough to map it back to the original Ruby object.

return std::ref(vector);

As explained above, we need to put the vector inside a std::reference_wrapper to include it in the returned variant.