I had a huge MySQL dump that took forever (as in: days) to import, while I actually just wanted to have the full database structure with some data to use on my development machine.
After trying several suggestions on how to speed up slow MySQL dump imports (which did not result in any significant improvement), I chose to import just some rows per table to suffice my needs. Since editing the file was not an option, I used a short Ruby script to manage that.
Here is how:
pv huge.dump | ruby -e 'ARGF.each_line { |l| m = l.match(/^INSERT INTO \`.+\` .+ VALUES \((\d+),/); puts l if !m || m[1].to_i < 200_000 || l =~ /schema_migrations/ }' | mysql -uUSER -pSECRET DATABASE_NAME
The command above does the following:
- It sends
huge.dumpto stdout. You could do that withcat, but I chose to usepvfor a nice progress bar. - The output is piped into a Ruby script which:
- Checks, line by line, if the current line is an
INSERTstatement. - If it is not, it's printed (to stdout)
- If it is, we only print it when it inserts an ID smaller than 200'000 (IDs are always the first column, so we can check against "
VALUES (\d+,"). - If it includes any mention of
schema_migrationswe also print it (because we want them all).
- Checks, line by line, if the current line is an
- The result of the Ruby script is piped into the mysql client. Replace
USERandSECRETwith your database credentials, andDATABASE_NAMEwith the database you are going to import into.
Note the following:
- This is far from perfect.
- I assume a significant amount of time is spent in Ruby. This could probably be improved by using tools such as
sedandawk, but I did not want to go down that road. - Because of encoding issues this can break the imported data in many ways.
- It also would not work as expected for tables who are missing an
idcolumn (which you shouldn't do), or where that column is not the first. - Dumps usually insert records in batches, so we check only for the first ID that is inserted per batch.
- Records in the database may very well not match up and be considered invalid, just because associated records (with higher IDs) might be missing, or similar.
Posted by Arne Hartherz to makandra dev (2014-01-15 10:44)