Processes


Concurrency is one of Elixir's most boasted features, perhaps, due to the myriad of benefits associated with concurrent programming. In the words of Joe Armstrong (one of the creators of Erlang):

Concurrent programming can be used to improve performance, to create scalable and fault-tolerant systems, and to write clear and understandable programs for controlling real-world applications. - Programming Erlang, Section 1.2

To achieve such a powerful feature set, performant, scalable, and fault-tolerant, there must certainly be a catch. After all, doesn't everyone want their programs to be blazing fast, scalable, and fault-tolerant? As it turns out, achieving concurrency is often extremely complicated. The attempt often leads to bugs that are hard to diagnose and even harder to fix. Some of the issues that may arise are race conditions, deadlocks, and starvation, just to name a few. In fact, while giving a talk at an Apple conference, Steve Jobs said, “The way the processor industry is going is to add more and more cores, but nobody knows how to program those things...”. Jobs is referring to the difficulty in programming an application to run across multiple cores and efficiently utilize system resources. However, in Elixir, concurrent programming becomes something simple, thanks to the Erlang Virtual Machine.

Elixir process or OS process

Elixir processes are not operating system processes. They are handled entirely by the Erlang VM. This means the notorious complexity involved with concurrency completely vanishes. Erlang handles the spawning of processes, their scheduling, and even their distribution across cores to best utilize the system resources. So, what is a process exactly? They are lightweight, independent units of execution that can be spawned in milliseconds. It's not uncommon for an Elixir/Erlang application to spawn thousands, hundreds-of-thousands, or even millions of processes. This makes processes the fundamental unit of concurrency in Elixir, which resembles the Actor concurrency model.

Actor concurrency model

The concurrency model in Elixir can be defined as follows:

  • Each actor is a process
  • Processes perform a specific task
  • Processes are independent, they share no memory
  • Processes communicate via message passing

The easiest way to spawn a new process is by using the spawn/1 function. This is a Kernel function. Since the Kernel module is imported automatically it can be used directly from the iex shell. The function takes a single argument, an anonymous function to execute. The return value will be a process identifier, also known as a pid. Every process has a unique identifier. Let's start up an iex session to look at some examples. It's worth noting that the iex shell itself is a process. We can retrieve its pid with the self() function. Take a look:

iex(1)> self()
#PID<0.84.0>

The pid is one of Elixir's built in data types. Your number will likely be different than mine but that's okay. Each process is uniquely identified for the life of the application, or in this case, the life of the iex session. Now let's create a new process.

iex(2)> spawn(fn -> IO.puts "hi" end)
hi
#PID<0.91.0>

We created a new process that executed a function. After it's execution, the process will exit. We can verify this with by using Process.alive?. Let's write the same function but this time bind it to a variable.

iex(3)> pid = spawn(fn -> IO.puts "hi" end)
hi
#PID<0.93.0>

iex(4)> Process.alive?(pid)
false

iex(5)> me = self()
#PID<0.84.0>

iex(6)> Process.alive?(me)
true

By storing the process id in the variable pid, we can verify that the process terminated after executing. However, our terminal session is still alive as we would expect. Only spawning processes isn't very useful, nor is it much fun. Let's add some message passing between processes and see how that works.

Passing Messages

We spawn a new process with spawn and we send a message with send. The send/2 function takes two arguments, a pid and a message. This makes sense, it's the same as sending a letter via postal mail. To send your friend a letter you would need to include the address where the letter should be sent and the message itself. The syntax of send looks like this:

send({pid, message})

Processes can receive messages with receive. The receive block explicitly lists the types of messages a process should expect. The syntax of a receive block is as follows:

receive do
{sender, message1} -> action_to_take
{sender, message2} -> aciton_to_take
end

In the receive block above, we expect the sender to to send a message in the format of a tuple including their process_id and the message. Messages do not have to the form of {sender, "message"} but it's a common practice to see. The alternative method often used is {:atom, message}, where the receiver then pattern matches on the :atom. Depending on the message, the appropriate action is taken. What if the sender were to send something like, message3, which isn't listed in the receive block? What would happen?

Each process responds to messages accordingly, based on what is in the receive block. The messages go into the mailbox of the process, where they are executed in the order they are received. If an unexpected message is received it still goes into the mailbox of the process, it just won't be processed. It will sit in the mailbox, taking up space, until either the process dies or the message is manually dealt with via the flush() function.

We can prevent unexpected messages from piling up by specifying an action to take for all messages that aren't explicitly listed. The way we accomplish this is the same way specify an "anything else" clause within a function, the underscore ( _ ) character.

receive do
{sender, message1} -> action_to_take
{sender, message2} -> aciton_to_take
_                  -> action_to_take
end

It's important to remember to use the "match all" operator, the underscore, last. If you put it on the first line it will match all cases. That isn't the behavior we're looking for. We want to use it as a catch-all and therefore it is the last line of the receive block.

As we saw earlier, once a process is finished executing it dies. This is the desired behavior. We use processes to perform some execution and once they are finished we don't want them sitting around taking up valuable computing resources. But what if we want our process to continue handling messages? In this case, we need to tell our process to call itself again after it's finished executing a message. Does this sound familiar? It's the definition of recursion! Consider this good news, bad news delivery module:

defmodule Processes101 do
  def listen do
    receive do
      {:good, msg} -> IO.puts "I've got good news, #{msg}"
      {:bad, msg} -> IO.puts "Sorry to inform you but #{msg}"
      _          -> IO.puts "Unknown message"
    listen
  end
end

The last thing the receive block does it call itself. In this way, after processing either a good or bad message it will call itself again and continue listening.

Example:

Interactive Elixir (1.6.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> actor = spawn( fn ->
...(1)> receive do
...(1)> {sender, message} -> IO.puts "Received #{message} from process #{inspect sender}"
...(1)> end
...(1)> end)
#PID<0.93.0>
iex(2)> send actor, {self(), "Hallo, ich bin Frederick"}
Received Hallo, ich bin Frederick from process #PID<0.84.0>
{#PID<0.84.0>, "Hallo, ich bin Frederick"}

results matching ""

    No results matching ""