Tips and best practices for migrating a legacy website into Drupal 7 with the Migrate API

drupal_articleLast year, we have migrated a site from a legacy CMS with a lot of custom code into Drupal 7. With the Migrate module (which is now in Drupal 8 core by the way), it was not really hard to do, even though the source database was not really well-designed. The site had around 10k users and 100k entries which ended up being nodes in Drupal, with around 700 videos to migrate. While this is still not a huge migration by any means, it was big enough to make us think about the best practices for Drupal migrations.

We have collected some tips and tricks if you need to do anything similar:

Building the right environment for the migration work

If you would like to work on a migration efficiently, you will need to plan a bit. There are some steps which can save you a lot of time down the road:

  • As always, a good IDE can help a lot. At NewPush, we are using PhpStorm – in that, keeping both the migration code, the source and the target databases opened is straightforward. Also, it is really easy to filter the database table displays looking for a given value which comes really handy. Anyway, any tool that you can use is fine, as long as you are able to quickly check the difference between the source and target versions of the data, which is essential.
  • I know this can be a bit of a pain sometimes, but still: try to ensure that the site can be built from scratch if necessary. This is where techniques like Features step in. I will not explain this in detail since it is outside the scope of the article. Just make sure you ask yourself the question: “what if I really mess up the configuration of this site?” (In a real world scenario you will often need to adjust things like content types, field settings etc. etc. and you will probably need a method for keeping the final configuration in a safe place.)
  • Before getting started, try to figure out a good way for validating the migration. This is kind of easy if you have 20-30 items to move over, but when you need to deal with 50k+ nodes, it is not going to be that trivial – especially if the source data is not very clean.

How to work with your migrations

The next thing to work on is optimizing your day-to-day work. Making sure that you can perform the basic tasks fast is essential, and you will need to figure out the best methods for testing and avoiding errors as well.

  • Use Drush as much as possible. It is faster and less error-prone than UI. There are a lot of handy parameters for migration work. Talking about the parameters, the –feedback and the –limit switches are really handy for quick testing. With the –idlist parameter, you can exactly specify what to import, which is great for checking edge cases.
  • Try to run the full migration from time to time. This is usually not very convenient in a development environment, so having access to another box where you can leave a migration running for hours can make things quite a bit easier.
  • Don’t forget to roll back a migration before modifying the migration code – it is better to avoid database inconsistency issues.

Migration coding tips

Talking about coding the migration itself, there are several things to consider, most of them are pretty basic coding best practices:

  • Try to extract the common functionality into a parent class. You will probably need some cleanup/convert routines; this is the best place for them.
  • Try to document your assumptions. Especially when dealing with not-so-clean data this can be really useful. (Will you remember why the row with ID 2182763 should be excluded from a given migration? I will not, so I always try to add a bunch of comments whenever I find such an edge case.)
  • Use the dd() function provided by Devel – this can make things easier to follow.
  • You will most likely run into the “Class MigrateSomething no longer exists” error at some time. Give it a drush migrate-deregister –orphans and it will go away.

How to optimize the performance of your migration code

It is no surprise that running a full migration often takes a very long time.

  • Other modules can greatly affect the speed of migration/rollback. Try to migrate with only a minimal set of modules to speed things up. This is the easiest way to get some extra speed. (And also if you are on a dev machine, make sure that Xdebug is not active. That is a “great” way to make things much slower.)
  • Using the migrate_instrument_*() functionality, you can figure out the bottlenecks in your migration.
  • In hook_migrate_api(), you can temporarily disable some hooks to speed up the migration.

I hope you have found some useful tips in this article. If you have a question, a tip to add, or perhaps an interesting story about a migration that you have done, just share them in the comments, we are all ears!


Loading of over 1 Trillion RDF Triples

Franz Inc., a leading supplier of Graph Database technology, with critical support from Stillwater SuperComputing Inc. and Intel, today announced it has achieved its goal of being the first to load and query a NoSQL database with a trillion RDF statements. RDF (also known as triples or quads), the cornerstone of the Semantic Web, provides a more flexible way to represent data than relational database and is at the heart of the W3C push for the Semantic Web.

Dr. Aasman point out: “NoSQL databases like Hadoop and Cassandra fail on joins. Big Enterprise, big web companies and big government intelligence organizations are all looking into big data to work with massive amounts of semi-unstructured data. They are finding that NoSQL databases are wonderful if one needs access to a single object in an ocean of billions of objects, however, they also find that the current NoSQL databases fall short if you need to run graph database operations that require many complicated joins. A typical example would be performing a social network analysis query on a large telecom call detail record database.”