Collections in Elixir (data types continued)
Elixir contains several collective data structures that are well-suited for functional programming. In this section we will cover Char lists, Binaries, Tuples, Lists, Keyword lists and Maps. Some of these data types were introduced in the previous chapter. Let's build on what we've learned so far to gain a more comprehensive understanding of Elixir collections.
Char Lists
Char lists are literally a list of characters. Some examples in iex will demonstrate how char lists work.
# note the single quotes
iex(1)> is_list('banana')
true
# note the double quotes
iex(2)> is_binary("banana")
true
If we enclose the word "banana" in single quotes we have a list of characters. If we enclose the word "banana" in double quotes we have a UTF-8 encoded binary (string). Recall from the section Data Types, that strings are surrounded by double quotes. If we use single quotes we are expressing a list of characters. These characters each correspond to code points assigned by the Unicode standard. We can check the code point for a character using the " ? " question mark character, followed by the character in question.
iex(3)> ?b
98
iex(4)> ?a
97
So, the code point for the letter "b" is 98 and for "a" it's 97. Let's inspect further to gain a better understanding of what's going on behind the scenes.
iex(5)> i 'banana'
Term
'banana'
Data type
List
Description
This is a list of integers that is printed as a sequence of characters
delimited by single quotes because all the integers in it represent valid
ASCII characters. Conventionally, such lists of integers are referred to as
"charlists" (more precisely, a charlist is a list of Unicode codepoints,
and ASCII is a subset of Unicode).
Raw representation
[98, 97, 110, 97, 110, 97]
Reference modules
List
Implemented protocols
IEx.Info, Collectable, Enumerable, Inspect, List.Chars, String.Chars
There you have it. By using single quotes, 'banana' is simply a raw representation of code points. To be specific, the code points are [98, 97, 110, 97, 110, 97]. Char lists are primarily used when interfacing with Erlang, or, any library that does not accept binaries as arguments.
Binaries
Binaries are useful when accessing data as a sequence of bytes. In Elixir, binary literals are enclosed in << >>
double angle brackets.
iex(1)> box = <<1, 2, 3, 4, 255>>
<<1, 2, 3, 4, 255>>
iex(2)> byte_size box
5
iex(3)> bit_size box
40
Most users, at least initially, will have little use for binaries. They are sometimes required when working with network packets or media files. At this point, it's only important to understand that they exist and be familiar with the literal syntax.
Tuples
Tuples are an ordered collection of elements. They are generally limited to two or three items, which are encased in curly brackets and separated by commas. For example, { :ok, data }
constitutes a two element tuple. In the previous example, we have a two element tuple, the atom ":ok" and the variable data. Tuples serve an incredibly important purpose in Elixir in terms of pattern matching.
Lists
The figure below represents an Elixir linked-list. The list is easily recognizable by the " [ ] " square brackets and consists of five elements separated by commas. In Elixir, the head of a list refers to the first element. There is a special function hd
, which returns the head of a list. The tail of a list refers to the list, minus the head. There is a special function tl
, which returns the tail of a list.
Keyword Lists
Lists are a central part of Elixir. A keyword list is an associative data structure where the first item is a key, given as an atom. The syntax for defining a keyword list is [key: value]
. Internally it maps to a tuple, {:key, value}
. We can check this by starting a session in iex:
# Create a list of tuples
iex(1)> list = [{:key, "value"}, {:this, "that"}]
[key: "value", this: "that"]
# notice the return value from elixir in the form [key: value]
iex(2)> list == [key: "value", this: "that"]
true
Keyword lists have special characteristics that make them especially useful:
- Keys must be atoms
- Keys are ordered
- Keys can be repeated
Maps
Maps are similar to keyword lists except that they are unordered and allow any data type to act as the key. Maps are created with the syntax %{}
. Thus, a map literal would look like the following:
iex(1)> states = %{ "NY" => "New York", "PA" => "Pennsylvania" }
%{"NY" => "New York", "PA" => "Pennsylvania"}
MapSets
The Elixir documentation refers to MapSets as the "goto" set data structure in Elixir. A set can contain any kinds of elements. By definition, a set cannot contain duplicate elements. If there is an attempt to insert a duplicate element it will simply result in a non-operation. MapSets can be constructed using MapSet.new/0
.
Structs
Structs are defined using defstruct
, followed by a keyword list. They must be defined inside of a module. The name of the struct is then inherited by the module it's defined in. For example:
defmodule Card do
defstruct [:value, :suit]
end
# Example: %Card{value: 8, suit: hearts}
In the example above, the Card struct is defined with two fields, "suit" and "value". You may notice that a struct (short for structure) looks remarkably similar to a Map. This is because a struct is simply an extension of a map. The benefit of using structs is compile-time checking to ensure only the fields defined in the struct are used.
For example, let's say we create a Card struct with an age field, like this:
%Card{value: 8, suit: "Hearts", age: 1}
Elixir will throw an error at compile time, **Key Error :age not found in struct
, telling us that our Card struct is malformed. We'll get the same "Key Error" if we spelled one of the fields incorrectly, "suite" instead of "suit", for example. Structs ensure that our data adheres to a specific definition.