Erlang Driver Walkthrough with Berkeley DB in C

While thinking about ErlFS, I realized I'd need at least two subsystems. One for finding a node with the data I am looking for and one to actually store that data. The first will be covered by Chordial. For the storage portion, I was originally going to use a homegrown method where I would just store the data in files named after their key. I decided I should explore the various possiblities and so I did a little research and decided to take a crack at writing a port driver for Erlang. Thanks go to Kevin Smith for his article on writing linked-in drivers for Erlang.

There are several way to write an Erlang driver. One way is to spawn a process and communicate over a pipe to your program and another is to bring your program in the same memory space and communicate directly with in-memory vectors. The first is safe; if your program crashes, your Erlang application can recover using standard OTP principles, such as supervisors. The in-vm-memory model is much more dangerous; if your program crashes, the whole Erlang VM crashes with it. This method, however, is much faster, as it does not have to copy memory and then have the OS switch context for each call. We will be covering this method.

Writing your first Erlang driver can be very difficult as there are very limited examples and the documentation can be very obscure. In this article, I will walk you through writing one for a common purpose-- writing to a database.

All sources can be found at the project's GitHub page.

Build Directory and Makefiles

First let's start with a simple OTP build directory structure. I'll create a new folder called erl_bdb_store and in that make the following folders: ebin, include, priv, and src. This is what it should look like:

  • erl_bdb_store/
    • ebin (to store the erlang bytecode/binaries)
    • include (to store erlang include files)
    • priv (to store non erlang projects/code)
    • src (to store erlang source code)
  • Next, we'll need a couple simple makefiles. Please note that these currently only compile under linux (tested on Ubuntu). The first one just goes to the /priv and /src directories and runs the Makefiles in each.

    ./Makefile:


    The next one compiles the C source (the meat-and-potatoes of the driver) into a shared library. This is different on each operating system and will need to be changed for each. I will do this sometime in the future.

    ./priv/Makefile:


    The erlang Makefile is simple
    ./src/Makefile


    Erlang Source

    Now we will make our Erlang wrapper for the driver. It is a standard OTP gen_server behavior callback.
    ./src/bdb_store.erl

    The two important parts are the init function and the handle_call function. The init function loads and starts the driver.


    The handle_call function communicates with the driver to put, get and delete records from the database file. This just forwards and translates the commands to the actual driver, where all of the real work happens:


    Notice the primary difference between each call is the first byte of each Message binary. This byte is inspected inside the driver and there it is determined which function to perform:


    C Source

    The C code does most of the work. It handles creating all of the error messages and performing all of the commands the Erlang source communicates to it.

    Let's take a look at the C source piece-by-piece.

    First we'll create a header file with settings and function prototypes.
    ./priv/bdb_drv.h:

    At the top we include necessary Erlang headers followed by a couple standard C headers and then the Berkeley DB header. Next we define the path where we want our database to store data, and then we define constants for the byte values of the commands sent to the driver from the Erlang VM. The _bdb_drv_t struct is very important, as we cannot use global variables and so we must keep our state in a struct. The reason is that the Erlang VM can and will start a new instance of the driver many times for performance and we want to keep our state localized. The last entries in the header file are our function prototypes.


    Now to our C implementation file (./priv/bdb_drv.c).
    Include our header file:


    Specify the callbacks we will be implementing:

    This array defines the callback functions that will be invoked at various points by the Erlang VM. In this example, we are implementing start, stop and outputv. This means that when the Erlang VM starts an instance of our driver, it will call our function start, when it sends a message, it will call outputv and when it stops the instance, it will call stop. outputv is only used if output is not defined in the array. output uses a buffer whereas outputv does not, meaning there is no overhead in copying the data, so it is faster. See driver_entry documentation for more detail.


    Next is boilerplate which tells the VM which struct to use as the state holder:


    Here, we define the start function and open up the database for reading and writing.

    If Berkeley DB returns an error, it is propagated to the Erlang VM as a typical {error, Reason} tuple.


    The stop function closes the database and releases the driver:


    outputv will serve as our entry point when a message is sent to the driver:

    It interprets the first byte of the message to determine which function to call.


    As it's name implies, the put function inserts a record into the database:

    The function grabs the first 20 bytes after the first command byte and uses it as the key. This leaves room for 160 bits-- enough for a SHA1 hash. All of the remaining bytes are used as the value to be stored. This function returns the atom ok or the typical {error, Reason} tuple.


    The get function is similar to put, except we have to free the memory allocated by Berkeley after the record has been retrieved:

    First we call an Erlang function named driver_alloc_binary (see erl_driver#driver_alloc_binary) which keeps a reference count, starting with 1. We then return the value with driver_output_term which increases the reference count to 2 and then call driver_free_binary which brings it to 1 again. When the Erlang VM has finished with it, it will also call driver_free_binary which brings the reference count to 0 and it will be freed from memory.


    The del function deletes a record and is the most simple of the database functions:


    Finally we have our catch-all function which will be called if an unrecognized command byte is sent. It returns the tuple {error, unkown_command}:


    Usage

    Let's compile it and jump into an Erlang shell to test it out:


    Now, let's start the gen_server we wrote and perform a couple tests:


    That's it for the walkthrough. If you have any questions, please post a comment or use the contact form. You can also email me at the email address listed in the source. I hope it helps you on a future project! Please let me know if it does!

    All sources can be found at the project's GitHub page.