Track-switching in a large Elixir web application

was called will use the new value, but other instances will continue to use their cached value until they are cycled out.

To fix that, we need to extractvalue = get_value_from_redis()Application.

put_env(:my_application, :use_new_logic?, value)into a process wrapping these lines in an endless loop, sleeping for a couple of seconds during each round.

Our app still was only under mild load and had multiple instances running only during times of deployment, so we gladly skipped this step.

Testing with a feature toggleWe want to be able to test both new-world and old-world code in the same codebase.

Some of the code exercised by our tests would have to call the feature toggle.

This leads to an interesting problem: The implementation given above relies on shared state, held by the cache and in the backing store — but tests requiring different values for the toggle should be able to run in parallel.

Let’s investigate this a bit:Code exercised from a unit test should not need to call the toggle function.

Such code is intended to be called from higher levels that would use the toggle, if needed.

(This depends a bit on your application architecture and how you break up your units, but it probably holds true in a web application, where it can be expected that the feature toggle is only called from a controller.

)Request and end-to-end tests might execute code that would run through the toggle router.

In their case, we either need to give up test parallelism, or change the toggle code to enable different tests having different opinions on the toggle value.

So, we need to find mechanisms for passing the desired toggle value from some tests down to the implementation.

Passing the toggle value down in a request testA request test (in Phoenix parlance, an endpoint test using ConnTest) builds a special Plug.

Conn struct for each request, sends it into the Phoenix application stack (the application endpoint), receives it back and asserts against it.

You can think of ConnTest as a lightweight utility to short-circuit the web server that is normally preparing and passing down the Conn.

We could simply pass the desired toggle state from the test piggy-back with the conn, using put_private/3.

This approach would be in line with the general attitude toward explicitness in the Elixir world.

This value of explicitness is very pleasant to work with — it makes data flow more obvious, code more search- and readable and also easier to understand and to debug.

We decided to take a short-cut for these reasons:both the feature toggle and the required testing modifications had a limited lifetime, they were meant to be thrown away after switching to the new code for good.

very few people were required to understand the code for switching to the new code logic.

our request test setup was inconsistent, so the approach of modifying the conn would not be that simple.

In a request test, both the test code and the controller code execute in the same Erlang process.

You can verify this by putting IO.

inspect(self()) into both and seeing the same PID twice in the output.

This allows us to use a less-well-known feature of Erlang to pass the per-test toggle value on: the process dictionary.

The process dictionary works as a key-value store and implements hidden state within a process.

It is a strangeling in the architecture, and passing data through it is usually frowned upon.

Using data side-loaded in the process dictionary is the opposite of handling it explicitly.

In rare occasions, it can be a really helpful technique.

Here’s how we can use it for our purposes:# In the test setup:Process.

put(:test_use_new_logic?, true)# New switchboard code:defmodule MyApplication.

Switchboard def use_new_logic?.do case Process.

get(:test_use_new_logic?) do nil -> # old switchboard code here (persistence and cache) test_value -> test_value end end .

endThere are quite a number of tests that need this kind of setup, so I want to make the test setup simpler (and simpler to remove).

We can use ExUnit’s tagging mechanism for this.

Tests can be tagged with metadata individually or in bulk like this:defmodule SomeTest do use ExUnit.

Case @moduletag tag1: value1 # for all tests in the module @tag tag2: value2 # only for this test test "something" do .

endendThe idea is to tag tests that require setup for stubbing the toggle value, and implement the stubbing inside a setup block.

The tags of a test are available to each setup block.

To avoid repeating the setup block as well, we extract it like this:defmodule FeatureToggleTest do defmacro __using__(_) do setup tags do Process.

put(:test_use_new_logic?, tags.

use_new_code?) end endendand use this shared setup in our request tests like this:defmodule SomeRequestTest do use MyApplication.

ConnTest use FeatureToggleTest @tag use_new_code?: true test "something" do .

endendIf we forget to setup the stubbing for a request test, the implementation will still run through the persistence and cache code in the feature toggle.

We can simply raise an exception up there to find and then properly tag all such tests.

Note that each test runs in its own process, so no cleanup of the process dictionary is necessary.

Passing the toggle value from a full-stack testKeeping the toggle value in a process’ state won’t help us much when writing full-stack tests using a browser to interact with our site.

Requests to our web application are typically initiated by an action inside the browser, like clicking a link, as instructed by the test.

So an integration test passes information to the browser, running in a different operating system process, which then issues a web request.

The request is then handled in an Erlang process different from the one running the test.

We need a mechanism for communicating from the test process to the process handling the request.

The SQL Sandbox already does this!Ecto and Phoenix allow us to run end-to-end tests in parallel, to the effect that the rendered page content reflects the state of the database as set up by a test.

This window into the database content is state shared between the test and the controller servicing the request — across process boundaries!Indeed, the Phoenix/Ecto stack has already solved a problem similar to ours.

I give a brief overview of the stack and the data flow involved:each test process checks out a connection from the SQL Sandbox and claims ownership.

All database interaction through this connection happens inside a transaction, and all database effects are invisible outside of it.

the test configures the framework responsible for controlling the browser session (Hound or Wallaby) with metadata — containing the test’s PIDwhen the web request is processed, this metadata is used to grant the process handling the request allowance to the connection owned by the test processany queries in the web request will subsequently use the same database connection, and act inside the same transaction as the test code.

For the curious, the cross-process mechanism works by adding a payload to the user-agent header, to be parsed in the code starting here.

Although Phoenix.



Sandbox has SQL in its name, we can use it for our purposes as well.

There is a test case template for feature tests (these execute the application code end-to-end), a file named similar to test/support/feature_case.

ex, that roughly looks like this:defmodule MyApplication.

FeatureCase do use ExUnit.

CaseTemplate using do quote do .

# aliases and imports end end setup tags do :ok = Ecto.





Repo) unless tags[:async] do Ecto.





Repo, {:shared, self()}) end metadata = Phoenix.





Repo, self()) # Wallaby specific, but looks almost the same when using Hound {:ok, session} = Wallaby.

start_session(metadata: metadata) {:ok, session: session} endendThe last paragraph of this code computes the necessary metadata for the Ecto SQL sandbox mechanism and passes it on to the end-to-end testing framework (Wallaby in our case).

We add one line to amend the test framework metadata with information from the test metadata tags:metadata = Map.

put(metadata, :use_new_code?, tags.

use_new_code?)Step 2: Mark tests to use old or new codeThe setup in step 1 takes exactly the same test tags as we used above for request tests.

We tag all end-to-end tests that require our mechanism in the same way as we did for request tests.

If we forget to tag an end-to-end test, we get an immediate failure because the above setup code is executed, and tags.

use_new_code?.requires the :use_new_code?.key to be present in the metadata map tags.

Step 3: Extract the metadata and pass the flag on to the toggle routerAs part of the standard setup for asynchronous end-to-end tests, a plug in the application endpoint is used to extract the metadata and pass it on to Ecto’s SQL sandbox.

We do a similar thing right next to it:defmodule MyApp.

Endpoint do use Phoenix.

Endpoint, otp_app: :my_app if Application.

get_env(:my_app, :sql_sandbox) do plug Phoenix.



Sandbox plug :extract_feature_toggle # <– ours!.end def extract_feature_toggle(conn, _) do conn |> get_req_header("user-agent") |> List.

first |> Phoenix.




decode_metadata |> case do %{use_new_code?: flag} -> Process.

put(:test_use_new_logic?, flag) _ -> # No metadata was passed.

Happens when hit by request test, # not end-to-end test.

Do nothing.

:ok end conn end .

endIn the setup for end-to-end tests, we instructed the browser testing framework to add the value for our feature toggle as metadata to all requests.

The extract_feature_toggle function plug tries to extract this value.

If present, it writes it to the process dictionary.

We have already written our toggle function to accept the toggle value from there because our request tests use that mechanism.

PLEASE NOTE that the if Application.

get_env(:my_app, :sql_sandbox) conditional around our function plug is REALLY important here!.We must never use Phoenix.



Sandbox in production code, since it eventually calls :erlang.

binary_to_term() to deserialize the payload.

An attacker could craft requests with a prepared user-agent header to make this call generate Erlang atoms, which are never garbage collected, until resources are exhausted and our app crashes.

Conclusions and final thoughtsHaving both old-world and new-world code side-by-side during the transition had some effect on the application code in various places.

Obviously, we need a database schema that can service both worlds.

The same holds for our database factory.

A couple more places were affected after all, and a good amount of careful planning was strictly required for our approach.

We are glad we took this route, however.

When we changed the feature toggle to using the new code in production, we quickly realized a mistake and went back.

This meant no downtime, no stress for us, and only a minimal delay needed for fixing the issue and re-deploying.

A few hours later we decided that the new code worked as desired.

What followed was a couple of days of removing the old implementation and what had become obsolete, starting with the old-world tests, and eventually dropping columns in the database.

All the testing specific modifications shown above were deliberately minimal and easy to find, hence easy to remove.

It looks like we used the frameworks in a way they were not really intended for.

While the mechanism for passing metadata to an in-browser test run is documented, the work required for getting it back out is not immediately obvious.




Sandbox exposes decode_metadata publicly, but not extract_metadata, which we had to replicate.

It speaks to the ecosystem and community that the necessary steps quickly became clear when looking at the code and trying a few things out.

My general impression is that the abstractions around the popular frameworks written in Elixir are mostly paper-thin, and the result is low volume of implementation code that is easy to understand.

With several years of experience in building and maintaining Elixir applications, we can help you build applications that can change as your business does.

Get in touch!Originally published at 9elements.

com on February 20, 2019.


. More details

Leave a Reply