Oct 25, 2018 • 14 min read

Eavesdropping on GenServers Via Tracing

“How can I verify, during testing, that a GenServer or process received a message?”

I have asked myself that on a number of occasions while working with Elixir, especially when interacting with dependencies outside of my control that generate processes and GenServers.

There are many answers to the question, including but not limited to:

creating a module that delegates calls to the dependency and using the project’s test config to swap in a mock module
passing a mock module or GenServer directly to the caller you wish to test
using Mox for general mocking
using erlang’s trace/3 on the target GenServer and monitoring its messages

The focus of this post will be on the latter of the bunch - utilizing Erlang’s trace/3 function.

Getting Started

If you’re familiar with creating your own project in Elixir, you can follow along by dropping the code below in a project and running the tests. Or, you can clone and reference the full example project in github.

The Sender and Receiver

Let’s say we’re building a framework that uses a GenServer (hereafter called MySender) to perform various tasks. During the execution of these tasks, MySender will call out to various other GenServers, including one in a dependency named BlackBoxReceiver. While we have some control over MySender, its logic is very complicated and spaghettified, and we have no control over BlackBoxReceiver. When the framework calls MySender, it executes an amount of business logic both before and after calling BlackBoxReceiver. For testing purposes, we need to verify both that a call was made by MySender to BlackBoxReceiver, as well as the contents of call itself.

defmodule TracingInTests.MySender do
  @moduledoc "Data sender."

  use GenServer
  alias TracingInTests.BlackBoxReceiver

  def start_link(receiver_pid) do
    GenServer.start_link(__MODULE__, receiver_pid)
  end

  @doc """
  Executes a lot of business logic using `data`, and gives you actionable
  results in return.
  """
  def action(sender_pid, data), do: GenServer.call(sender_pid, {:action, data})

  def init(receiver_pid) do
    {:ok, receiver_pid}
  end

  def handle_call({:action, data}, _from, receiver_pid) do
    # business logic executed here
    result = BlackBoxReceiver.store(receiver_pid, data)
    # more business logic executed here
    {:reply, result, receiver_pid}
  end
end

And what about BlackBoxReceiver? In practice, BlackBoxReceiver is a black box we can’t crack open. However, for the sake of this blog post we’ll use the implementation below:

defmodule TracingInTests.BlackBoxReceiver do
  @moduledoc "Black box receiver we have no control over."

  use GenServer

  def start_link, do: GenServer.start_link(__MODULE__, [])

  @doc "Store your data in the black box."
  def store(ref, data), do: GenServer.call(ref, {:store, data})

  def init(_), do: {:ok, nil}

  def handle_call({:store, _data}, _from, state) do
    {:reply, {:ok, :data_received}, state}
  end
end

Having both modules ready to go, we know MySender should call BlackBoxReceiver. In fact, we’re pretty sure it’s all working because executing code by hand in the shell looks to be working, and our unit tests cover a lot of angles. Unfortunately, we still have this nagging feeling that we haven’t covered all the bases. So how can we verify that MySender actually sent that vital message to BlackBoxReceiver?

Tracing the Receiver

What we need is a way to watch BlackBoxReceiver and be notified when a particular type of message is sent to it. That’s where Erlang’s trace/3 comes in. When a process executes trace/3 with a pid and list of flags, it will be notified when particular things happen to the process being traced. This is perfect for our test case, where we want to verify BlackBoxReceiver received a particular message. Let’s start by using trace/3 in a test case.

defmodule TracingInTestsTest do
  @moduledoc false

  use ExUnit.Case

  alias TracingInTests.{BlackBoxReceiver, MySender}

  test "verify a process received a particular message pattern" do
    # we start instances of both BlackBoxReceiver and MySender
    {:ok, receiver_pid} = BlackBoxReceiver.start_link()
    {:ok, sender_pid} = MySender.start_link(receiver_pid)

    # the juicy bit: we use trace/3 to trace BlackBoxReceiver
    :erlang.trace(receiver_pid, true, [:receive])

    # Next, let's call MySender's workhorse function
    {:ok, :data_received} = MySender.action(sender_pid, a: 1)

    # Now that MySender has finished, let's see if the message was sent
    assert_receive({:trace, ^receiver_pid, :receive, {:"$gen_call", _, {:store, _}}})
  end
end

Now run it via mix test and you will get something like:

> mix test
.

Finished in 0.02 seconds
1 test, 0 failures

Randomized with seed 986991

The test passes! Great, we have verified both that MySender sent a message to BlackBoxReceiver, and that the message was received. But… what exactly happened?

Erlang’s trace/3

Let’s start with the first line of interest in the unit test:

> :erlang.trace(receiver_pid, true, [:receive])
1

To understand what that line means, we need to open a browser tab to erlang’s documentation on trace/3. If you are not familiar with Erlang documentation, I’ll paraphrase the trace/3’s signature description…

erlang:trace(PidPortSpec, How, FlagList) -> integer() translates into:

In Elixir, you would make the call with :erlang.trace(..., ..., ...). In Erlang, it’s erlang:trace(..., ..., ...).
PidPortSpec and its list of possible values means you can pass in a pid or port, or any of a list of atoms representing different categories, such as ‘all’, ‘existing_processes’ and more. In Erlang, atoms are lowercase values like all, whereas Elixir uses :all.
How is a boolean value that determines whether the flags you specify in FlagList will be used when tracing PidPortSpec.
FlagList is a list of atom flags that act as a filter for the types of trace messages the calling process will receive, and/or options for the tracing itself
-> means the following type of data will be returned by this function call, just like Elixir specs and ::.
integer() is the type of return value meaning:

Returns a number indicating the number of processes that matched PidPortSpec. If PidPortSpec is a process identifier, the return value is 1. If PidPortSpec is all or existing, the return value is the number of processes running. If PidPortSpec is new, the return value is 0.

Now to apply that information to the line from our test case:

> :erlang.trace(receiver_pid, true, [:receive])
1

:erlang.trace is the call
receiver_pid is the pid we are tracing
true means we want to apply the flags to the trace
[:receive] is saying we are only interested in messages being sent to receiver_pid
the return value is a 1 since we are passing a pid in for PidPortSpec

Now that we understand trace/3’s signature and what our particular call means, let’s look at what it’s actually. We can accomplish that by running iex -S mix run from the command line within our test project and running commands like I have done below. Note that I’ve added spacing to improve readability:

> iex -S mix run
Erlang/OTP 20 [erts-9.3.3.3] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Interactive Elixir (1.6.6) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> alias TracingInTests.{BlackBoxReceiver, MySender}
[TracingInTests.BlackBoxReceiver, TracingInTests.MySender]
iex(2)> {:ok, receiver_pid} = BlackBoxReceiver.start_link()
{:ok, #PID<0.116.0>}
iex(3)> {:ok, sender_pid} = MySender.start_link(receiver_pid)
{:ok, #PID<0.118.0>}

iex(4)> :erlang.trace(receiver_pid, true, [:receive])
1
iex(5)> flush()
:ok

iex(6)> {:ok, :data_received} = MySender.action(sender_pid, a: 1)
{:ok, :data_received}
iex(7)> flush()
{:trace, #PID<0.116.0>, :receive,
 {:"$gen_call", {#PID<0.118.0>, #Reference<0.2707531522.1177288705.212023>},
  {:store, [a: 1]}}}
:ok

iex(8)> {:ok, :data_received} = MySender.action(sender_pid, b: 2)
{:ok, :data_received}
iex(9)> flush()
{:trace, #PID<0.116.0>, :receive,
 {:"$gen_call", {#PID<0.118.0>, #Reference<0.2707531522.1177288705.212048>},
  {:store, [b: 2]}}}
:ok

Leave your iex shell from above open, we’ll need it again shortly. For now, we’ll jump down to iex(5)>, in which we call IEx.Helpers.flush/0. Flush empties the shell’s message inbox and prints out what it found and returns :ok. When we first call it, it has nothing to show. But after we made our call to MySender.action/2, we flushed again. This time, the shell had a message in its inbox:

{:trace, #PID<0.116.0>, :receive,
 {:"$gen_call", {#PID<0.118.0>, #Reference<0.2707531522.1177288705.212023>},
  {:store, [a: 1]}}}

We received a tuple in the format {:trace, <traced process' pid>, <trace type>, <message>}. The traced process’ pid was #PID<0.116.0>, the pid of BlackBoxReceiver. The trace type was :receive, meaning this is a message received by BlackBoxReceiver. And the message was the one sent to BlackBoxReceiver by #PID<0.118.0>, MySender. Finally, you see the exact data that was sent - {:store, [a: 1]}.

Now go back to your iex shell. Instead of flushing the message box, we’re going to handle the messages directly:

iex(10)> {:ok, :data_received} = MySender.action(sender_pid, c: 3)
{:ok, :data_received}

iex(11)> {:trace, bbr_pid, :receive, {:"$gen_call", {mysender_pid, _}, data}} = receive do msg -> msg end
{:trace, #PID<0.116.0>, :receive,
 {:"$gen_call", {#PID<0.118.0>, #Reference<0.2707531522.1177288705.212070>},
  {:store, [c: 3]}}}

iex(12)> bbr_pid
#PID<0.116.0>
iex(13)> mysender_pid
#PID<0.118.0>
iex(14)> data
{:store, [c: 3]}

iex(15)> {:ok, :data_received} = MySender.action(sender_pid, {:d, 4})
{:ok, :data_received}
iex(16)> Process.info(self(), :messages)
{:messages,
 [
   {:trace, #PID<0.116.0>, :receive,
    {:"$gen_call", {#PID<0.118.0>, #Reference<0.2707531522.1177288705.212113>},
     {:store, {:d, 4}}}}
 ]}

First, we have MySender perform another action that will cause it to send a message to BlackBoxReceiver. Next, we write a shorthand version of a receive block receive do msg -> msg end to grab the top message off our inbox. Finally, we use pattern matching to grab the important pieces of the message, then evaluate them to see what they hold.

Also, I tossed in Process.info/2, which when called on a process with :messages will return the list of messages in the process’ inbox.

Now that we understand just what trace/3 is doing, let’s now review what the line in our test case containing assert_receive is doing.

ExUnit’s assert_receive/3

ExUnit provides a number of possible assertions that can be used in your unit tests. One of those is ExUnit.Assertions.assert_receive/3, which lets a developer check whether the test process has a particular message sitting in its inbox. assert_receive takes a pattern and waits as long as timeout to verify the process has received a message in the pattern’s format.

Let’s review the assert_receive used in our test:

assert_receive({:trace, ^receiver_pid, :receive, {:"$gen_call", _, {:store, _}}})

The above should make more sense now that we know how tracing works and the types of messages it sends to the process performing the tracing. The test is asserting that it received a trace message and asserts a match against BlackBoxReceiver’s pid via ^receiver_pid. Finally, it validates the payload of MySender’s message to BlackBoxReceiver: {:store, _}.

Conclusion

Erlang’s tracing capabilities are immense and powerful. This blog post doesn’t even scratch the surface of what’s possible with trace/3 and Erlang’s myriad other tracing features. In fact, it’s more like a faint, short-lived breeze blowing over its skin.

However, it’s more than capable of fulfilling our original need, answering the question, “How can I verify, during testing, that a GenServer or process received a message?”

I highly recommend reading through trace/3’s documentation and playing with its many options in the iex shell.

Final Code

All of the code necessary to get your feet wet with :erlang.trace/3 can be found in this elixir project.

Post by: rishenko