I had a huge MySQL dump that took forever (as in: days) to import, while I actually just wanted to have the full database structure with some data to use on my development machine.
After trying several suggestions on how to speed up slow MySQL dump imports (which did not result in any significant improvement), I chose to import just some rows per table to suffice my needs. Since editing the file was not an option, I used a short Ruby script to manage that.
Here is how:
pv huge.dump | ruby -e 'ARGF.each_line { |l| m = l.match(/^INSERT INTO \`.+\` .+ VALUES \((\d+),/); puts l if !m || m[1].to_i < 200_000 || l =~ /schema_migrations/ }' | mysql -uUSER -pSECRET DATABASE_NAME
The command above does the following:
- It sends
huge.dump
to stdout. You could do that withcat
, but I chose to usepv
for a nice progress bar. - The output is piped into a Ruby script which:
- Checks, line by line, if the current line is an
INSERT
statement. - If it is not, it's printed (to stdout)
- If it is, we only print it when it inserts an ID smaller than 200'000 (IDs are always the first column, so we can check against "
VALUES (\d+,
"). - If it includes any mention of
schema_migrations
we also print it (because we want them all).
- Checks, line by line, if the current line is an
- The result of the Ruby script is piped into the mysql client. Replace
USER
andSECRET
with your database credentials, andDATABASE_NAME
with the database you are going to import into.
Note the following:
- This is far from perfect.
- I assume a significant amount of time is spent in Ruby. This could probably be improved by using tools such as
sed
andawk
, but I did not want to go down that road. - Because of encoding issues this can break the imported data in many ways.
- It also would not work as expected for tables who are missing an
id
column (which you shouldn't do), or where that column is not the first. - Dumps usually insert records in batches, so we check only for the first ID that is inserted per batch.
- Records in the database may very well not match up and be considered invalid, just because associated records (with higher IDs) might be missing, or similar.
Posted by Arne Hartherz to makandra dev (2014-01-15 10:44)