Friday, October 5, 2007

Copy model data between databases

Update: I have modified the task to use save_with_valdation(false) and made it available
as a download here.

------------------


This is an update on the post I made yesterday about how easy it is to copy model data between production and development databases.

The problem with the approach mentioned here is that the id of the original model data is not preserved. As a result, if you copy models with relationships defined using foreign keys, the relationships are not maintained.

I found that instead of using [model].create, I can use [model].new and then set the id explicitly.
model_data.each do |m|
new = model_data.new m.attributes
new.id = m.id
new.save!
end
I wrote up a rake task to do something like
rake db:cloneModelData MODEL=User,Account BATCH=Y
And the task is
namespace :db do
desc "Clone the data associated with a model from the production database to the development database.\n\n" +
" # If the development database contains any data related to the model, it is purged before the copy is made.\n" +
" # Specify the model class name on the command line as MODEL=X,Y,Z (no spaces)\n" +
" # Source database as SOURCE=db (production by default)\n" +
" # Destination database as DESTINATION=db (development by default)\n\n" +
" # To enable batch mode BATCH=Y (By default, the task queries the user to view the yaml and \n" +
" # write it to a file (just in case-call me chicken if you wish)\n" +
" # e.g. rake db:cloneModelData MODEL=User,Account,Post BATCH=Y\n"
task :cloneModelData => :environment do

batchmode = ENV['BATCH'] ? ENV['BATCH'] =~ /^y$/i : false
sourcedb = ENV['SOURCE'] || 'production'
puts "Connecting to source database '#{sourcedb}'"
ActiveRecord::Base.establish_connection(sourcedb)

unless ENV['MODEL']
print "Please specify the model class name(s) seperated by commas: "
model_names = $stdin.gets.strip
else
model_names = ENV['MODEL'].strip
end

data = {}
model_names.split(/,/).each do |model_name|

begin
klass = Kernel.const_get model_name
rescue
raise "Hmm unable to get a Class object corresponding to #{model_name}. Check if you specified the correct model name: #{$!}"
end

unless klass.superclass == ActiveRecord::Base
raise "#{model_name}.superclass (#{klass.superclass}) is not ActiveRecord::Base"
end

puts "\nCollecting data for Model Class - #{model_name}"

# Collect all the data from te production database
data[model_name] = klass.find(:all)
puts "Found #{data[model_name].length} records in the #{sourcedb} database"

unless batchmode
print "\nDo you want to see a yaml dump? [y|N]: "
if $stdin.gets.strip =~ /^y/i
puts data[model_name].to_yaml
end
print "\nDo you want to direct the yaml dump to a file? [Y|n]: "
unless $stdin.gets.strip =~ /^n/i
print "\nPlease specify the file name : "
File.open($stdin.gets.strip, 'w') do |f|
f.write data[model_name].to_yaml
end
end
end

end

destdb = ENV['DESTINATION'] || 'development'
puts "\n--------------------------------------\n\nEstablising connection with the '#{destdb}' database\n\n"

ActiveRecord::Base.establish_connection(destdb)

puts "Now deleting model related data from the #{destdb} db\n"

data.keys.each do |model_name|

klass = Kernel.const_get model_name

olddata = klass.find(:all)

puts "\nFound #{olddata.length} #{model_name} records in the #{destdb} database"
unless batchmode
print "Do you want to see a yaml dump? [y|N]: "
if $stdin.gets.strip =~ /^y/i
puts olddata.to_yaml
end
end

puts "Deleting old data"
olddata.each {|f| f.destroy}

puts "Now copying the #{sourcedb} model data into the #{destdb} database"
print "Copying #{model_name} records: "
i = 0
data[model_name].each do |d|
begin
print "#{ i+=1}, "

obj = klass.new d.attributes
obj.id = d.id
obj.save!

#klass.reflect_on_all_associations.each { |a|
# puts classname + " -> " + a.name.to_s.camelize.singularize + " [label="+a.macro.to_s+"]"
#}

rescue
raise "Error saving model data: #{$!}"
end
end
puts "\nDone copying #{model_name}\n"
end
puts "\nDone."

end
end

Njoy!

8 comments:

Alex said...

This only works on databases that are local to each other (on the same computer/network) correct?

- Chetan Patil said...

It depends on whatever is set up in your rails config/database.yml for the production or development databases (which can be set up on on seperate servers). It uses ActiveRecord::Base.establish_connection which reads the database.yml to get the connection details.
- c

Alex said...

But isn't each setting in database.yml relative to the current environment(machine)? If so, why would I have a remove host for my database?

- Chetan Patil said...

Not sure I completely understand your question.

The database can reside on another machine or behind a proxy for a cluster of databases.

Take a look at how I set up rails for connecting with an Oracle database here - http://justbarebones.blogspot.com/2007/
09/ror-oracle-sqlplus-and-life.html.
The database lives on a remote server while my rails stack is running on a vmware instance with 256 MB ram.

You would use a similar configuration to connect to a mysql database on a remote server.

Of course, using the Unix domain sockets is much faster than using TCP/IP sockets, but I can then share the same database between multiple rails stacks as well as keep the overall cost of the machine that runs the rails stack down.

Giovanni said...

This is a really GREAT piece of code. It made it quite easy to migrate data from MySQL to SQLite.

To have it working in my case, however, I had to specify the order in which models are processed, due to the dependencies among them.

One model is persisted only when all the models it references have already been persisted.

To achieve that, I just had to replace line 65

data.keys.each do |model_name|

with

model_names.split(/,/).each do |model_name|

When executing the task, I obviously displayed the models in such an order that the most dependent ones appear in the end, so that their dependencies are persisted before them:

rake db:cloneModelData MODEL=Address,Department,Employee BATCH=Y\n

Zoran said...

This is very useful! Unfortunately it doesn't work for globalized models.

Hint: don't forget to set your current sequence ID of every table after running rake...

klochner said...

Thanks - I found this useful for restoring (accidentally) deleted data from a backup database image.

BP said...

Very helpful. I have never seen that trick of changing db connections on the fly.

I wanted to preserve id's across databases so I ended up using new and assigning id's. I found I had to remove the id from the attributes otherwise I got an error. What I ended up with was:

objs.each {|o|
atts = o.attributes
atts.delete("id")
new_o=Model.new atts
new_o.id =o.id
new_o.save_with_validation false }