Recutils, GOOPS and virtual slots
For the past month or so I’ve been contributing to GNU Recutils, a set of tools for editing human-readable plain text databases. It’s a cool project in its own right, I’ve been using recutils myself for tracking workouts and storing cooking recipes. The cool part of it is its attempt to be both human-readable and machine-readable, which makes it very easy to use programmatically and then with a simple text editor.
The powerful querying facilities of recutils is what turns it into a thing of beauty. In particular, selection expressions are expressions for querying recfiles. For instance, here’s how I would query exercises in my workout log for squats:
recsel -t Exercise -e "Name ~ 'Squat'" workouts.rec
This would match records of type Exercise
where the Name
field matches
regular expressions, so Squat
will match all exercise varieties with the word
Squat in it.
The machine readability makes it easy to write programs or tools that interact with recfiles. I’ve become maintainer of the Emacs recfile major mode rec-mode. The major mode makes heavy use of the command line tools of the recutils suite to do provide automatic fixing and parsing of recfiles.
if it’s possible to put Lisp in it, someone will
For fun and profit, I’ve also been writing GNU Guile bindings for librec, the library powering recutils itself. The bindings actually interface with the C library directly using Guile’s amazing C extensions. I was interested in using recfiles in a Guile program, and while it would not have been too difficult to write a parser myself, I thought it was more important to not write one myself. What is more, Guile makes it almost too easy to wrap libraries, I had a functioning Scheme interface for parsing records in less than an hour.
Let’s explore what that interface looks like. We start with the simplest data type in librec, fields.
A recutils record is defined as an ordered collection of fields. Below is a record of three fields:
Book: Structure and Interpretation of Computer Programs
Author: Harold Abelson
Author: Gerald Sussman
The inner field type of librec is defined as rec_field_t
, which is an opaque
data type wrapping rec_field_s
:
typedef struct rec_field_s *rec_field_t;
The underlying rec_field_s
structure is a bit more complicated since it
includes location data for the field, but for our example imagine it contains
just the fields name
and value
, which are null-terminated strings. You don’t
need to know anything about that, since librec offers an extensive API for
working with the opaque types.
To make a new field, you would write:
rec_field_t field = rec_field_new("Author", "Harold Abelson");
To get the value and name, you use rec_field_value
and rec_field_name
:
const char *name = rec_field_name(field); /* "Author" */
const char *value = rec_field_value(field); /* "Harold Abelson */
To modify its name or value, you can use:
rec_field_set_name(field, "Book");
rec_field_set_value(field, "Structure and Interpretation of Computer Programs");
How do we wrap these into Guile, using C extensions? To start with, we can simply make some Scheme methods that work with plain pointers and pass that pointer value around.
SCM_DEFINE (scm_field_new, "new-field", 2, 0, 0, (SCM scm_name, SCM scm_value),
"Make a new field from a string and value.")
{
SCM_ASSERT_TYPE(scm_is_string(scm_name), scm_name, 1, "new-field", "string");
SCM_ASSERT_TYPE(scm_is_string(scm_value), scm_value, 2, "new-field", "string");
const char *name = scm_to_utf8_string(scm_name);
const char *value = scm_to_utf8_string(scm_value);
rec_field_t field = rec_field_new (name, value);
if (!field)
return SCM_BOOL_F;
return scm_from_pointer(field, destroy_field);
}
This defines two functions: destroy_field
for letting the garbage collector
get rid of unused fields, and then a scm_field_new
function defined using the
SCM_DEFINE
macro. The procedure is straightforward: assert both parameters are
strings, convert to const char*
, create the field and return it if it was
successful, otherwise return Scheme false #f
. The last bit creates a pointer
object
to store the pointer address, and passes the destroy_field
as the finalizer
parameter for the garbage collector.
In the Guile REPL, it looks like this:
scheme@(recutils)> (new-field "foo" "bar")
$2 = #<pointer 0x7fc0654040f0>
OK, it seems to be a pointer all right. Let’s define some helper methods to work with that:
SCM_DEFINE(scm_field_get_name, "field-name", 1, 0, 0, (SCM ptr),
"Get the name of a field")
{
rec_field_t field = (rec_field_t)scm_to_pointer(ptr);
const char *name = rec_field_name(field);
return scm_from_utf8_string(name);
}
SCM_DEFINE(scm_field_get_value, "field-value", 1, 0, 0, (SCM ptr),
"Get the value of a field")
{
rec_field_t field = (rec_field_t)scm_to_pointer(ptr);
const char *value = rec_field_value(field);
return scm_from_utf8_string(value);
}
Loading this extension into the REPL, we get
scheme@(recutils)> (new-field "foo" "bar")
$1 = #<pointer 0x7fa123d0b980>
scheme@(recutils)> (field-name $1)
$2 = "foo"
scheme@(recutils)> (field-value $1)
$3 = "bar"
What about modifying the field? Well, that’s easy:
SCM_DEFINE(scm_field_set_name, "set-field-name!", 2, 0, 0, (SCM ptr, SCM scm_name),
"Set the name of a field")
{
SCM_ASSERT_TYPE(scm_is_string(scm_name), scm_name, 1, "set-field-name!", "string");
rec_field_t field = (rec_field_t)scm_to_pointer(ptr);
const char *name = scm_to_utf8_string(scm_name);
bool result = rec_field_set_name(field, name);
return scm_from_bool(result);
}
SCM_DEFINE(scm_field_set_value, "set-field-value!", 2, 0, 0, (SCM ptr, SCM scm_value),
"Set the value of a field")
{
SCM_ASSERT_TYPE(scm_is_string(scm_value), scm_value, 1, "set-field-value!", "string");
rec_field_t field = (rec_field_t)scm_to_pointer(ptr);
const char *value = scm_to_utf8_string(scm_value);
bool result = rec_field_set_value(field, value);
return scm_from_bool(result);
}
Using all this in the REPL yields:
scheme@(recutils)> (new-field "foo" "bar")
$1 = #<pointer 0x7ffcac406530>
scheme@(recutils)> (set-field-name! $1 "Blah")
$2 = #t
scheme@(recutils)> (set-field-value! $1 "Test")
$3 = #t
scheme@(recutils)> (field-name $1)
$4 = "Blah"
scheme@(recutils)> (field-value $1)
$5 = "Test"
There we go!
the smell of raw pointers
OK, this looks great. But somehow it feels funny to pass a raw pointer object
around as a parameter. Ideally, I’d like to define some sort of structure that
wraps the raw pointer into something less raw. Well, turns out Guile has
exactly that in the define-wrapped-pointer-type
macro! With the above
constructor and procedures, we can go further:
(define-wrapped-pointer-type
field-ptr field-ptr? wrap-field-ptr unwrap-field-ptr
(lambda (ptr port)
(format port "#<field-ptr name=~s value=~s 0x~x>"
(field-name (unwrap-field-ptr ptr))
(field-value (unwrap-field-ptr ptr))
(pointer-address (unwrap-field-ptr ptr)))))
What the macro defines are a type name (field-ptr
), a predicate
(field-ptr?
), methods for wrapping and unwrapping, and lastly a printer
for pretty printing our pointer. The printer outputs a human readable
representation of the printer, in which we leverage the procedures defined
above, field-name
and field-value
.
scheme@(recutils)> (wrap-field-ptr (new-field "Author" "Harold Abelson"))
$2 = #<field-ptr name="Author" value="Harold Abelson" 0x7f8a6950b2d0>
scheme@(recutils)> (field-ptr? $2)
$3 = #t
scheme@(recutils)> (unwrap-field-ptr $2)
$4 = #<pointer 0x7f8a6950b2d0>
This makes it a bit easier to pass around field values so that we can treat them like structures, or records in Scheme parlance. That said, constructing the values is still a bit tedious, especially now that our Scheme user would have to constantly wrap and unwrap values if they are to work with a field.
What if we could work with fields as if they were pure Scheme objects and the underlying machinery – pointers and so forth – would be hidden from us? Well, we can use GOOPS, but first let’s digress into the exciting world of FFI.
why not dynamic FFI?
These days the Guile manual recommends using Dynamic
FFI when
working with a foreign function interface. That is, the above examples are just
C code, but we could have done the same with just regular Scheme using the
(system foreign)
module. This is what I would do in many other languages
(Common Lisp, Python, and so on…). In such a case, I could make my Scheme
module completely separate from recutils and librec, since I just need the
dynamic library libguile-recutils.so
for it to functions. But there are subtle
reasons why writing these extensions in C is a good idea.
As I went ahead and wrote the bindings, I had a curious thought: I’m writing functionality for working with recfiles from Guile. But what about adding Guile facilities to recutils? What about letting recutils users extend the programs using Scheme? Wouldn’t it be cool if instead of recutils selection expressions I could pass Scheme programs as the query language? Indeed, this was a topic worth exploring!
The consequence of this was that now I was adding code to recutils itself to
link against Guile, which means I will already have a dependency to the Guile C
library libguile
. So, since I’m now already working with the C API of Guile,
limiting myself to the strange world of dynamic FFI was starting to feel rather
tedious.
From the start I wanted to work with the real deal: the wrapper types of the Guile extensions would be real wrappers. Each field in Scheme would be represented by a librec C struct underneath. This is so that I can leverage the bidirectional design above, and there is no need to parse or convert values twice when crossing language barriers. So, how do we make a Scheme API that is both nice to use and still C structs underneath? Well, the answer is GOOPS and object-oriented programming!
GOOPS, virtual slots, and you
Working with raw pointers and even pointer records can be painful. It would be much better if we could make fields like this:
(make <field> #:name "Author" #:value "Gerald Sussman")
This is a GOOPS
class,
of type <field>
. The constructor has two keyword arguments #:name
and
#:value
for the rec names.
How can we get a class that has both getters and setters (in terms of slot-ref
and slot-set!
) that work on the underlying pointer? Easy enough, the answer is
virtual
slots!
If we were to define an ordinary class with slots name
and value
, Guile
would allocate memory for those and if we are to juggle the pointer alongside
all of this, both the name and value would be in two places: once, behind the
pointer (in C world) and in Scheme, as a slot in the class.
But first, how do we create a class <field>
that wraps a pointer? Easy enough,
we can use #:init-form
as the slot
option:
(define-class <field> ()
;; Internal pointer type.
(ptr #:init-form (wrap-rec-ptr (new-field "" ""))
#:init-keyword #:ptr)
The use of #:init-form
causes the following expression to be evaluated every
time a new class is instantiated, creating a field with an empty name and value.
To get the signature we desire above, we need to use virtual accessors. These
let us override the getter and setter #:slot-ref
and #:slot-set!
respectively which will work on the raw pointer, instead of occupying memory
like a normal slot would. This is achieved using #:allocation #:virtual
:
(define-class <field> ()
(ptr #:init-form (wrap-field-ptr (new-field "" ""))
#:init-keyword #:ptr)
(name #:init-keyword #:name #:allocation #:virtual
#:accessor field-name
#:slot-ref (lambda (field)
(%field-name (unwrap-field-ptr (slot-ref field 'ptr))))
#:slot-set! (lambda (field name)
(set-field-name!
(unwrap-field-ptr (slot-ref field 'ptr)) name)))
(value #:init-keyword #:value #:allocation #:virtual
#:accessor field-value
#:slot-ref (lambda (field)
(%field-value (unwrap-field-ptr (slot-ref field 'ptr))))
#:slot-set! (lambda (field value)
(set-field-value! (unwrap-field-ptr (slot-ref field 'ptr)) value))))
Note that the procedures we defined previously in C were renamed to
%field-value
since it would otherwise conflict with the #:accessor
slot option.
So using #:virtual
lets us write GOOPS classes and not worry about double
allocation. It looks like a regular GOOPS class but actually it is modifying a
pointer to a C struct underneath using a C API. Moreover, the biggest benefit of
this is the ability to pass values in the constructor. If we didn’t have
#:virtual
, we’d have to write separate accessor methods like this:
(define-method (f-name (field <field>))
(%field-name (unwrap-field-ptr (slot-ref field 'ptr))))
(define-method ((setter f-name) (field <field>) name)
(set-field-name! (unwrap-field-ptr (slot-ref field 'ptr)) name))
But the problem with this and any other approach is that you’d still have memory
allocated for the slots. All <field>
s will have unnecessary name
and value
slots allocated. I think the only way to get this behaviour if #:virtual
were
not available would be to create a custom method for initialize
. I think the
same applies in other CLOS-like systems (and CLOS itself), but I’m not sure.
context is everything, friends
I don’t think many Guile users will find Scheme bindings for recutils that useful in itself, as a library. Guix uses recfiles in its search output, but its record generation is hand-written, usage not deep enough to warrant using the library.
But I think a case can be made for recutils itself, that is, if recutils were
to develop extensibility via Guile, the extension mechanism can load the
recutils
Scheme module as its base runtime. I discussed the idea over at IRC
with Jose Marchesi, the Recutils author and maintainer, and he thought it was a
good idea as long as there’s someone there to maintain it.
Maybe this will fly, I don’t know. I don’t see any big technical barriers for it to not work, even if it amounts to just adding Scheme bindings without extending recutils itself. That said, every now and then I’m running into the limitations of selection expressions, so being able to use Scheme as a measure of last resort would be interesting, if nothing else.
As of early December 2020 I have bindings for parsing and creating records and fields, so expect an early release of the Scheme bindings to appear within the next few months.
Have I mentioned I also plan to make Common Lisp bindings as well? Well, now I have, but that’s another story!