This post belongs to the Three real-world examples of distributed Elixir series.
- A gentle introduction to distributed Elixir.
- The tale of the periodic singleton process and the three different global registries.
- The distributed download requester and progress tracker.
- The distributed application version monitor.
While deploying any project to a production environment, you can either scale it vertically (adding more resources to the single instance where it is running) or horizontally (adding multiple instances). If you don't like putting all the eggs in a single basket and choose the horizontal approach, Elixir offers all the necessary distributed features that you need out of the box without any additional dependencies, letting you build a cluster between the different instances of your application.
Clustering your service allows you to do very interesting stuff, from spawning new processes in any instance to sending messages between cluster nodes, letting you build very creative solutions. Although this might sound complex, in reality, it is straightforward to achieve since these distributed capabilities are integrated into the language that you already know, letting you design your applications in a totally different way. Along with these post series, we will explore three different real-world use cases of distributed Elixir, but before, let's go back to the basics and see how to build an Elixir cluster. Let's get cracking!
Building a simple cluster
Let's generate a basic OTP application, and start iterating from there:
➜ mix new simple_cluster --sup
...
➜ cd simple_cluster
To build the cluster, we need two things:
- To provide a name for the current application instance.
- To connect the different nodes once the application starts.
The first one is straightforward, and we can achieve it by adding the --name
argument to the start command:
➜ iex --name n1@127.0.0.1 -S mix
Erlang/OTP 24 [erts-12.0.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(n1@127.0.0.1)1>
Easy, right? Please note the iex prompt iex(n1@127.0.0.1)1>
containing the node name we just assigned. Let's start a new node in a different terminal window setting the n2
name:
➜ iex --name n2@127.0.0.1 -S mix
Erlang/OTP 24 [erts-12.0.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(n2@127.0.0.1)1>
Now that we have both nodes up and running, let's create the cluster by manually connecting the nodes using Elixir's Node.connect/1:
iex(n1@127.0.0.1)1> Node.connect :"n2@127.0.0.1"
true
iex(n1@127.0.0.1)2>
To confirm that everything went fine, let's run Node.list/0 on each node, which returns the list of nodes to which the current instance has connected:
iex(n1@127.0.0.1)2> Node.list
[:"n2@127.0.0.1"]
iex(n1@127.0.0.1)3>
iex(n2@127.0.0.1)2> Node.list
[:"n1@127.0.0.1"]
iex(n2@127.0.0.1)3>
Our first clustered application is ready, yay! However, connecting manually to each of the nodes from iex
is less than ideal. There is a more convenient way of doing it, which is adding a sys.config file to the root of the project with the following content:
[{kernel,
[
{sync_nodes_optional, ['n1@127.0.0.1', 'n2@127.0.0.1']},
{sync_nodes_timeout, 5000}
]}
].
This file sets default values on the application start, in this particular case:
sync_nodes_optional
: the list of possible nodes in the cluster.sync_nodes_timeout
: the timeout to synchronize the nodes.
To apply this configuration, let's start each node setting the --erl
parameter with the configuration file:
➜ iex --name n1@127.0.0.1 --erl "-config sys.config" -S mix
Erlang/OTP 24 [erts-12.0.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(n1@127.0.0.1)1>
➜ iex --name n2@127.0.0.1 --erl "-config sys.config" -S mix
Erlang/OTP 24 [erts-12.0.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(n2@127.0.0.1)1
To confirm that the nodes are connected, let's create a module that spawns a process on each node that observes any change in the cluster membership:
# ./lib/simple_cluster/observer.ex
use GenServer
require Logger
def start_link(_), do: GenServer.start_link(__MODULE__, %{})
@impl GenServer
def init(state) do
:net_kernel.monitor_nodes(true)
{:ok, state}
end
@impl GenServer
def handle_info({:nodedown, node}, state) do
# A node left the cluster
Logger.info("--- Node down: #{node}")
{:noreply, state}
end
def handle_info({:nodeup, node}, state) do
# A new node joined the cluster
Logger.info("--- Node up: #{node}")
{:noreply, state}
end
This simple GenServer calls :net_kernel.monitor_nodes/1 on its initialization, subscribing to any node status change in the cluster. Therefore, it receives both {:nodeup, node}
and {:nodedown, node}
messages whenever a node joins or leaves the cluster. Let's add this generic server to the main supervision tree of the application:
# ./lib/simple_cluster/application.ex
defmodule SimpleCluster.Application do
use Application
@impl true
def start(_type, _args) do
children = [
SimpleCluster.Observer
]
opts = [strategy: :one_for_one, name: SimpleCluster.Supervisor]
Supervisor.start_link(children, opts)
end
end
Let's start again our two nodes and see what happens. If we start n2
, we can see the following log message in n1
:
06:02:40.129 [info] --- Node up: n2@127.0.0.1
If we stop n2
, we can see the corresponding logger message:
06:05:22.051 [info] --- Node down: n2@127.0.0.1
To test out communication between the nodes, let's add a new module that sends a message to all the nodes in the cluster, printing the result:
# .lib/simple_cluster/ping.ex
defmodule SimpleCluster.Ping do
use GenServer
require Logger
def start_link(_) do
GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
end
def ping do
Node.list()
|> Enum.map(&GenServer.call({__MODULE__, &1}, :ping))
|> Logger.info()
end
@impl GenServer
def init(state), do: {:ok, state}
@impl GenServer
def handle_call(:ping, from, state) do
Logger.info("--- Receiving ping from #{inspect(from)}")
{:reply, {:ok, node(), :pong}, state}
end
end
This GenServer has two different parts. First of all, it exposes a public ping/0
function, which takes all the nodes in the cluster and sends them a :ping
message using GenServer.call/3. This function accepts the following as the first parameter:
server() :: pid() | name() | {atom(), node()}
By using {__MODULE__, &1}
we are saying: Send :ping
to the process with the name SimpleCluster.Ping
in the node &1
. This takes us to the second part of the module, the callback handle_call(:ping, from, state)
, which receives the incoming message logging the sender and responding with a {:ok, node(), :pong}
tuple. Let's add this module to the main supervision tree, restart our instances and see it in action:
# ./lib/simple_cluster/application.ex
defmodule SimpleCluster.Application do
...
def start(_type, _args) do
children = [
SimpleCluster.Observer,
SimpleCluster.Ping
]
...
end
end
iex(n1@127.0.0.1)1> SimpleCluster.Ping.ping
06:33:19.704 [info] [{:ok, :"n2@127.0.0.1", :pong}]
:ok
iex(n2@127.0.0.1)1>
06:33:19.701 [info] --- Receiving ping from {#PID<6589.174.0>, [:alias | #Reference<6589.2917998909.4144300034.261849>]}
Nodes have automagically connected, and processes can communicate between them as we expected. Nevertheless, this is again far from ideal in a real-world application deployed into a production environment. How would we handle dynamic IPs? How would we manage new nodes connecting or leaving the cluster? Thankfully there is a library that addresses this for us.
Automatic cluster formation with libcluster
libcluster provides a mechanism for automatically forming clusters of Erlang nodes, with either static or dynamic node membership, offering a wide variety of strategies and even letting you create your own. We will not dive too deep into its internal details in this series, but you can look at its different strategies in its official docs. To use it, let's get rid of the sys.config
file, and add libcluster
to our application dependencies:
➜ rm sys.config
# mix.exs
defmodule SimpleCluster.MixProject do
use Mix.Project
...
defp deps do
[
{:libcluster, "~> 3.3"}
]
end
end
Don't forget to run the corresponding mix deps.get
:P. To create the cluster, libcluster uses different strategies, and in this particular case, we will use the Cluster.Strategy.Epmd strategy, in which we can set the list of hosts as we did with the former sys.config
file. Let's go ahead and add the cluster supervisor and its configuration to the main supervision tree:
# ./lib/simple_cluster/application.ex
defmodule SimpleCluster.Application do
use Application
@impl true
def start(_type, _args) do
children = [
{Cluster.Supervisor, [topologies(), [name: SimpleCluster.ClusterSupervisor]]},
SimpleCluster.Observer,
SimpleCluster.Ping
]
opts = [strategy: :one_for_one, name: LibclusterCluster.Supervisor]
Supervisor.start_link(children, opts)
end
defp topologies do
[
example: [
strategy: Cluster.Strategy.Epmd,
config: [
hosts: [
:"n1@127.0.0.1",
:"n2@127.0.0.1"
]
]
]
]
end
end
Now we can start both nodes without the --erl
flag:
➜ iex --name n1@127.0.0.1 -S mix
Erlang/OTP 24 [erts-12.0.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
07:06:38.384 [warn] [libcluster:example] unable to connect to :"n2@127.0.0.1"
Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(n1@127.0.0.1)1>
07:06:56.968 [info] --- Node up: n2@127.0.0.1
iex(n1@127.0.0.1)2> SimpleCluster.Ping.ping()
:ok
07:07:36.098 [info] [{:ok, :"n2@127.0.0.1", :pong}]
07:10:10.305 [info] --- Node down: n2@127.0.0.1
Everything is working like before, yay! Now that we have ready the basics of a clustered Elixir application, in the following posts we will implement three creative solutions around it, starting with the most simple one in which we will build a singleton process across the cluster in charge of executing a periodic task.
Happy coding!