Tuesday, September 21, 2010

Huge speed gain migrating large tables in Rails

Have you ever needed to make multiple column adds/removes/changes to a large table in Rails in a single migration? If so, you've probably noticed that each of these individual changes issues a single ALTER TABLE command to the database, which can take a long time on a large table. At least with MySQL, combining these all into a single ALTER TABLE dramatically reduces the amount of time it takes for your migration to run.

Rather than write custom SQL for this, the folks at XING have created a very cool Rails plugin called alter_table. Their blog post, Alter Table Rails Plugin, gives more background on this. Check it out, it's a good, quick read.

One thing they didn't mention in their post was what kind of performance gains they observed. I was curious, so I did a quick-and-dirty experiment.

Consider an existing table named "songs" that has 330K records in it. First, I tried just adding a column named "foo" using the normal migration style we're all accustomed to:

class AddSongFoo < ActiveRecord::Migration
def self.up
add_column :songs, :foo, :decimal, :precision => 15, :scale => 10
add_index :songs, :foo
end
def self.down
# ...
end
end
view raw gistfile1.rb hosted with ❤ by GitHub

On my MacBook Pro, this is what the timings for this looked like:

scottwb-mbp% rake db:migrate
(in /Users/scottwb/src/testapp)
== AddSongFoo: migrating =================================================
-- add_column(:songs, :foo, :decimal, {:precision=>15, :scale=>10})
-> 445.2030s
-- add_index(:songs, :foo)
-> 508.5548s
== AddSongFoo: migrated (953.7581s) ======================================
view raw gistfile1.txt hosted with ❤ by GitHub

Now, after rolling that back, installing the alter_table plugin (as per their instructions), I rewrote this migration to use the new alter_table method:

class AddSongFoo < ActiveRecord::Migration
def self.up
alter_table :songs do |t|
t.add_column :foo, :decimal, :precision => 15, :scale => 10
t.add_index :foo
end
end
def self.down
# ...
end
end
view raw gistfile1.rb hosted with ❤ by GitHub

Now, on the same machine, this migration takes roughly half the time because it does all the alterations in a single pass. Here are the new timings:

scottwb-mbp% rake db:migrate
(in /Users/scottwb/src/testapp)
== AddSongFoo: migrating =================================================
-- execute("ALTER TABLE `songs` ADD `foo` decimal(15,10), ADD INDEX `index_songs_on_foo` (`foo`)")
-> 509.8871s
== AddSongFoo: migrated (509.8876s) ======================================
view raw gistfile1.txt hosted with ❤ by GitHub

In real life, I had a table with almost 10 million rows that I needed to do 10 alterations to. Using this plugin, I cut my migration time down by almost 10x! If you find yourself doing more than one alteration on the same table in a single migration, I highly recommend checking out alter_table. You can find their source code at:


No comments: