Ruby - Reading Files


Working with files in Ruby is straightforward. The IO and File classes mix in the enumerable module, making use of the each method. This allows us to iterate over a file object in the same manner as other objects. The file should be assigned to a variable to make it easy to work with. The variable name "file" will do just fine:

>> file = File.open('file.txt', 'r')

We open the file by calling the open method and passing in the file name. The second parameter is the mode in which we wish to open the file. The modes are:

Mode Explanation
'r' Read-only, start at beginning of file
'r+' Read-Write, start at beginning of file
'w' Write-only, truncates existing file or creates new file
'w+' Read-Write, truncates existing file or creates new file
'a' Write-only, start at end of file if it exists, otherwise creates
'a+' Read-Write, start at end of file if exists, otherwise creates
'b' Binary file mode
't' Text file mode

Although six modes of opening files seems complex at first, it only depends on two factors, operation and existence. What is the operation you wish to perform (Read, Write) and does the file exist yet or are you creating it? If you are reading from a file then "r" will be satisfactory. If you intend on adding content to a file then "a+" is the safest option. Using "a+" appends to the file if it exists, protecting you from overwriting any content. At this point our file has been opened and is saved to thevariable 'file'. We can proceed with reading the file:

>> file.each_line {|line| puts line}
whaaaaaaaaaaaat
#<File:file.txt>
>> file.close
nil

To read out each line of our file we used the each_line method, followed by a simple code block. Ruby returned the one line of text in our file as well as the file object. The last step is to close the file by calling close. Each time a file is opened in ruby it remains open until closed. It is important to remember to close files after working with them. Alternatively, a file can be passed to a block. When the block ends the file automatically closes. Here's what that looks like:

>> File.open('file.txt') do |file|
file.each_line {|line| puts line}
end
# the file closes when the block exits

Using the File class along with the open and close methods is one way of reading files. There are other options for accomplishing this as well, of course. By now you should be used to Ruby giving you multiple ways to do things. File is a subclass of Ruby's IO class, which is the basis for all input and output in Ruby. We can use the public instance method read on the IO class and pass in the file.

>> IO.read('file.txt')
"whaaaaaaaaaaaat\n"

As you can see, this is a more convenient way of reading from a file. What took three lines before has been reduced to a single line of code.

Comma Separated Values

It's a common practice to store data in lists where the values are separated by commas, i.e. CSV files. Knowing how to work with CSV files is a basic requirement for any developer. Eventually, most of the data we work with we be gathered from API calls. Extracting and organizing data from CSV files is a good place to start with that end in mind. Download the example data set, SAMPLE DATA, to follow along. The data consists of the names of 20 people with their associated phone numbers, emails, and ages.

Place the file in your working directory, for example, create a new directory from the command line with mkdir followed by the directory name ruby-files. Move the file into the directory either by dragging and dropping or the mv command. The name of the file is 'Sample Data - Sheet1.csv' which has several spaces and capital letters, making it difficult to work with. Let's rename it to something that's easier to work with from ruby, 'sample-data.csv' will work great. Right click and 'rename' the file or use the mv command again to rename the file using the syntax mv oldname newname. Finally, create a new ruby file and open it up inside your favorite editor. Inside the new ruby file we'll use the readlines method to read and print each file line by line. Copy the following code into your ruby file.

lines = File.readlines "sample-data.csv"
lines.each do |line|
  puts line
end
# If you've saved your csv file under a different name you'll need to supply that name to 'readlines'.

This snippet is quite straightforward. We are utilizing the readlines method from the File class to store each line in our variable 'lines'. Then we use the each method to iterate over each line and puts to print it to the screen. Save the ruby file as 'readfile.rb' and from the command line run 'ruby readfile.rb'. (Note: Your terminal should be in the working directory, i.e. the 'ruby-files' directory if you've been following along. ) The output to the terminal should be each line of the csv file. Here are the first three lines for reference:

first_name,last_name,phone,email,age
Robert,Warner,6103968877,[email protected],30
Samantha,Davis,5709984390,[email protected],31

We could continue to parse the results manually using methods such as split and each with index. The issue with this is that we are attempting to reinvent the wheel, so to speak. Ruby provides these functionalities in the csv library. We can work with the csv library if we include it in our program. So, at the top of the file place require "csv", this will mix in the csv library.

Why isn't the library included by default?

Ruby's core is composed of the necessary libraries required to do the most basic work with Ruby. Additional functionality, for example the standard libraries, can easily be mixed in with a require statement. This gives ruby a lighter memory footprint. There are additional libraries that can be mixed in as well. RubyGems is a package manager that allows developers to use third party programs and libraries in a self-contained format called 'gems'. Gems are the Ruby equivalent to JavaScripts Node packages. The ability to extend the language is one of the core features of both Ruby and Javascript.


Once we require the csv library we no longer need to use the readlines method. We will make use of CSV's open method to load the file. Referring back to the first line of output, we see the first line contains header information (column descriptions as in name, email, etc.). We can provide additional parameters to indicate the presence of headers and convert those headers to symbols to make them easier to work with. We can refactor our code to this:

# bring in csv library
require "csv"
# open file and convert headers to symbols
contents = CSV.open "sample-data.csv", headers: true, header_converters: :symbol
# iterate over each row 
contents.each do |row|
  name = row[:first_name]
  puts name
  # print the first name
end

After converting the headers to symbols we can access each row by their respective symbol. This provides a much easier way print the first name than by manually parsing the file ourselves. We can access the other rows in the same way:

require "csv"

contents = CSV.open "sample-data.csv", headers: true, header_converters: :symbol
contents.each do |row|
  name = row[:first_name]
  age = row[:age]
  puts name + ' is ' + age + ' years old '
end

Here are the first three rows of output for reference:

Robert is 30 years old
Samantha is 31 years old
Elliot is 19 years old

The csv library has made extracting data from our file relatively simple. Remember, without the csv library we would need to use split to separate the columns and turn them into arrays. Then we would need to tell ruby to ignore the first line because that is the header info. And finally we would specify we want to print the column at index==0 for the first name and column at index == 4 for the age.

These examples demonstrate working with a CSV file and are supposed to be simple. However, if you're developing something more complicated it's a good idea to check for an existing library or gem before deciding to build your own. The chances are somebody else has had a similar issue and solutions may be readily available. Read more about the CSV library or check out some of the awesome gems available at Ruby Gems.

results matching ""

    No results matching ""