Tips and best practices for migrating a legacy website into Drupal 7 with the Migrate API

drupal_articleLast year, we have migrated a site from a legacy CMS with a lot of custom code into Drupal 7. With the Migrate module (which is now in Drupal 8 core by the way), it was not really hard to do, even though the source database was not really well-designed. The site had around 10k users and 100k entries which ended up being nodes in Drupal, with around 700 videos to migrate. While this is still not a huge migration by any means, it was big enough to make us think about the best practices for Drupal migrations.

We have collected some tips and tricks if you need to do anything similar:

Building the right environment for the migration work

If you would like to work on a migration efficiently, you will need to plan a bit. There are some steps which can save you a lot of time down the road:

  • As always, a good IDE can help a lot. At NewPush, we are using PhpStorm – in that, keeping both the migration code, the source and the target databases opened is straightforward. Also, it is really easy to filter the database table displays looking for a given value which comes really handy. Anyway, any tool that you can use is fine, as long as you are able to quickly check the difference between the source and target versions of the data, which is essential.
  • I know this can be a bit of a pain sometimes, but still: try to ensure that the site can be built from scratch if necessary. This is where techniques like Features step in. I will not explain this in detail since it is outside the scope of the article. Just make sure you ask yourself the question: “what if I really mess up the configuration of this site?” (In a real world scenario you will often need to adjust things like content types, field settings etc. etc. and you will probably need a method for keeping the final configuration in a safe place.)
  • Before getting started, try to figure out a good way for validating the migration. This is kind of easy if you have 20-30 items to move over, but when you need to deal with 50k+ nodes, it is not going to be that trivial – especially if the source data is not very clean.

How to work with your migrations

The next thing to work on is optimizing your day-to-day work. Making sure that you can perform the basic tasks fast is essential, and you will need to figure out the best methods for testing and avoiding errors as well.

  • Use Drush as much as possible. It is faster and less error-prone than UI. There are a lot of handy parameters for migration work. Talking about the parameters, the –feedback and the –limit switches are really handy for quick testing. With the –idlist parameter, you can exactly specify what to import, which is great for checking edge cases.
  • Try to run the full migration from time to time. This is usually not very convenient in a development environment, so having access to another box where you can leave a migration running for hours can make things quite a bit easier.
  • Don’t forget to roll back a migration before modifying the migration code – it is better to avoid database inconsistency issues.

Migration coding tips

Talking about coding the migration itself, there are several things to consider, most of them are pretty basic coding best practices:

  • Try to extract the common functionality into a parent class. You will probably need some cleanup/convert routines; this is the best place for them.
  • Try to document your assumptions. Especially when dealing with not-so-clean data this can be really useful. (Will you remember why the row with ID 2182763 should be excluded from a given migration? I will not, so I always try to add a bunch of comments whenever I find such an edge case.)
  • Use the dd() function provided by Devel – this can make things easier to follow.
  • You will most likely run into the “Class MigrateSomething no longer exists” error at some time. Give it a drush migrate-deregister –orphans and it will go away.

How to optimize the performance of your migration code

It is no surprise that running a full migration often takes a very long time.

  • Other modules can greatly affect the speed of migration/rollback. Try to migrate with only a minimal set of modules to speed things up. This is the easiest way to get some extra speed. (And also if you are on a dev machine, make sure that Xdebug is not active. That is a “great” way to make things much slower.)
  • Using the migrate_instrument_*() functionality, you can figure out the bottlenecks in your migration.
  • In hook_migrate_api(), you can temporarily disable some hooks to speed up the migration.

I hope you have found some useful tips in this article. If you have a question, a tip to add, or perhaps an interesting story about a migration that you have done, just share them in the comments, we are all ears!


WHMCS OpenSRS sync error for a domain

Problem: OpenSRS domain sync error

Rarely for a domain under OpenSRS management, the following error show up for each operation: “Details could not be retrieved for your domain.com. Error: Check your browser's encoding type, and modify it to reflect your password's encoding type. For support regarding yourdomain.com, please contact your reseller

Solution: the OpenSRS domain password

Initially we looked at the troubleshooting guide, but couldn’t find the relevant info. The WHMCS support folks pointed us in the right direction:

This is a very rare error, it is caused by a character in the domain’s password field in the mod_opensrs table that OpenSRS’ API does not recognise.
To resolve the issue, simply change the domain’s password via the OpenSRS control panel to that does not contain special characters, then using a tool such as phpmyadmin or RazorSQL edit the domain’s record in the mod_opensrs table of the WHMCS database and enter the new password directly.
As always before making changes to the database please backup.


Online fax service with SSL API

Problem

You need to create an online application that is capable of sending a FAX securely (PCI, HIPAA or other compliance).

Solution

After trying trustfax and eFax, neither of which has a secure API, Ralph found that Metro Fax has a SSL API for developers and the cost is reasonable.

The following SDK as well as some supporting documentation below will help you get started: WsfSDK

The MetroFax webservice gateway is available at:

https://wsf.metrofax.com/webservice.asmx

And there is supporting documentation (NDoc) available below:

https://wsf.metrofax.com/doc

The attached SDK contains sample implementations of numerous common methods.


How to avoid the phpBB worm with Apache Rewrite Engine

This solution was suggested by Raymond Dijkxhoorn on BugTraq:

If you cannot fix it (virtual servers) fast for all your clients you could also try with
something like this:

        RewriteEngine On
        RewriteCond %{QUERY_STRING} ^(.*)echr(.*) [OR]
        RewriteCond %{QUERY_STRING} ^(.*)esystem(.*)
        RewriteRule ^.*$                                -               [F]

We had some vhosts where this worked just fine. On our systems we didnt see any valid
request with echr and esystem, just be gentle with it, it works for me, it could work
for you ;)


How to validate emails using PHP

Complete and thorough php email validation and php email verification can be found at PHPClasses.org: Email Validation:

<<This is a PHP class that attempts to validate a given e-mail address at three levels: matching the address against a RFC compliant regular expression, verifing the existence of the destination SMTP server by verifying the respective DNS MX record, and connecting to that server to see if the given address is accepted as a valid recipient. The class also features a debugging output option that lets you see the remote SMTP server connection and data exchange dialog to see the real cause why an apparently valid address may not be accepting messages>>

Here is the code for the class:

<?php
/*
* email_validation.php
*
* @(#) $Header: /home/mlemos/cvsroot/emailvalidation/email_validation.php,v 1.24 2008/12/28 07:29:35 mlemos Exp $
*
*/

class email_validation_class
{
var
$email_regular_expression="^([-!#$%&'*+./0-9=?A-Z^_`a-z{|}~])+@([-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+\.)+[a-zA-Z]{2,6}$";
var
$timeout=0;
var
$data_timeout=0;
var
$localhost="";
var
$localuser="";
var
$debug=0;
var
$html_debug=0;
var
$exclude_address="";
var
$getmxrr="GetMXRR";

var $next_token="";
var
$preg;
var
$last_code="";

Function Tokenize($string,$separator="")
{
if(!
strcmp($separator,""))
{
$separator=$string;
$string=$this->next_token;
}
for(
$character=0;$character<strlen($separator);$character++)
{
if(
GetType($position=strpos($string,$separator[$character]))=="integer")
$found=(IsSet($found) ? min($found,$position) : $position);
}
if(IsSet(
$found))
{
$this->next_token=substr($string,$found+1);
return(
substr($string,0,$found));
}
else
{
$this->next_token="";
return(
$string);
}
}

Function OutputDebug($message)
{
$message.="n";
if(
$this->html_debug)
$message=str_replace("n","<br />n",HtmlEntities($message));
echo
$message;
flush();
}

Function GetLine($connection)
{
for(
$line="";;)
{
if(@
feof($connection))
return(
0);
$line.=@fgets($connection,100);
$length=strlen($line);
if(
$length>=2
&& substr($line,$length-2,2)=="rn")
{
$line=substr($line,0,$length-2);
if(
$this->debug)
$this->OutputDebug("S $line");
return(
$line);
}
}
}

Function PutLine($connection,$line)
{
if(
$this->debug)
$this->OutputDebug("C $line");
return(@
fputs($connection,"$linern"));
}

Function ValidateEmailAddress($email)
{
if(IsSet(
$this->preg))
{
if(
strlen($this->preg))
return(
preg_match($this->preg,$email));
}
else
{
$this->preg=(function_exists("preg_match") ? "/".str_replace("/", "\/", $this->email_regular_expression)."/" : "");
return(
$this->ValidateEmailAddress($email));
}
return(
eregi($this->email_regular_expression,$email)!=0);
}

Function ValidateEmailHost($email,&$hosts)
{
if(!
$this->ValidateEmailAddress($email))
return(
0);
$user=$this->Tokenize($email,"@");
$domain=$this->Tokenize("");
$hosts=$weights=array();
$getmxrr=$this->getmxrr;
if(
function_exists($getmxrr)
&&
$getmxrr($domain,$hosts,$weights))
{
$mxhosts=array();
for(
$host=0;$host<count($hosts);$host++)
$mxhosts[$weights[$host]]=$hosts[$host];
KSort($mxhosts);
for(
Reset($mxhosts),$host=0;$host<count($mxhosts);Next($mxhosts),$host++)
$hosts[$host]=$mxhosts[Key($mxhosts)];
}
else
{
if(
strcmp($ip=@gethostbyname($domain),$domain)
&& (
strlen($this->exclude_address)==0
|| strcmp(@gethostbyname($this->exclude_address),$ip)))
$hosts[]=$domain;
}
return(
count($hosts)!=0);
}

Function VerifyResultLines($connection,$code)
{
while((
$line=$this->GetLine($connection)))
{
$this->last_code=$this->Tokenize($line," -");
if(
strcmp($this->last_code,$code))
return(
0);
if(!
strcmp(substr($line, strlen($this->last_code), 1)," "))
return(
1);
}
return(-
1);
}

Function ValidateEmailBox($email)
{
if(!
$this->ValidateEmailHost($email,$hosts))
return(
0);
if(!
strcmp($localhost=$this->localhost,"")
&& !
strcmp($localhost=getenv("SERVER_NAME"),"")
&& !
strcmp($localhost=getenv("HOST"),""))
$localhost="localhost";
if(!
strcmp($localuser=$this->localuser,"")
&& !
strcmp($localuser=getenv("USERNAME"),"")
&& !
strcmp($localuser=getenv("USER"),""))
$localuser="root";
for(
$host=0;$host<count($hosts);$host++)
{
$domain=$hosts[$host];
if(
ereg('^[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}$',$domain))
$ip=$domain;
else
{
if(
$this->debug)
$this->OutputDebug("Resolving host name "".$hosts[$host].""...");
if(!
strcmp($ip=@gethostbyname($domain),$domain))
{
if(
$this->debug)
$this->OutputDebug("Could not resolve host name "".$hosts[$host]."".");
continue;
}
}
if(
strlen($this->exclude_address)
&& !
strcmp(@gethostbyname($this->exclude_address),$ip))
{
if(
$this->debug)
$this->OutputDebug("Host address of "".$hosts[$host]."" is the exclude address");
continue;
}
if(
$this->debug)
$this->OutputDebug("Connecting to host address "".$ip.""...");
if((
$connection=($this->timeout ? @fsockopen($ip,25,$errno,$error,$this->timeout) : @fsockopen($ip,25))))
{
$timeout=($this->data_timeout ? $this->data_timeout : $this->timeout);
if(
$timeout
&& function_exists("socket_set_timeout"))
socket_set_timeout($connection,$timeout,0);
if(
$this->debug)
$this->OutputDebug("Connected.");
if(
$this->VerifyResultLines($connection,"220")>0
&& $this->PutLine($connection,"HELO $localhost")
&&
$this->VerifyResultLines($connection,"250")>0
&& $this->PutLine($connection,"MAIL FROM: <$localuser@$localhost>")
&&
$this->VerifyResultLines($connection,"250")>0
&& $this->PutLine($connection,"RCPT TO: <$email>")
&& (
$result=$this->VerifyResultLines($connection,"250"))>=0)
{
if(
$result)
{
if(
$this->PutLine($connection,"DATA"))
$result=($this->VerifyResultLines($connection,"354")!=0);
}
else
{
if(
strlen($this->last_code)
&& !
strcmp($this->last_code[0],"4"))
$result=-1;
}
if(
$this->debug)
$this->OutputDebug("This host states that the address is ".($result ? ($result>0 ? "valid" : "undetermined") : "not valid").".");
@
fclose($connection);
if(
$this->debug)
$this->OutputDebug("Disconnected.");
return(
$result);
}
if(
$this->debug)
$this->OutputDebug("Unable to validate the address with this host.");
@
fclose($connection);
if(
$this->debug)
$this->OutputDebug("Disconnected.");
}
else
{
if(
$this->debug)
$this->OutputDebug("Failed.");
}
}
return(-
1);
}
};

?>

Here is the test code for implementing the class:

<?php
/*
* test_email_validation.html
*
* @(#) $Header: /home/mlemos/cvsroot/emailvalidation/test_email_validation.php,v 1.11 2003/12/12 15:25:52 mlemos Exp $
*
*/

?><HTML>
<HEAD>
<TITLE>Test for Manuel Lemos's PHP E-mail validation class</TITLE>
</HEAD>
<BODY>
<H1><CENTER>Test for Manuel Lemos's PHP E-mail validation class</CENTER></H1>
<HR>
<?php
require("email_validation.php");

$validator=new email_validation_class;

/*
* If you are running under Windows or any other platform that does not
* have enabled the MX resolution function GetMXRR() , you need to
* include code that emulates that function so the class knows which
* SMTP server it should connect to verify if the specified address is
* valid.
*/
if(!function_exists("GetMXRR"))
{
/*
* If possible specify in this array the address of at least on local
* DNS that may be queried from your network.
*/
$_NAMESERVERS=array();
include(
"getmxrr.php");
}
/*
* If GetMXRR function is available but it is not functional, you may
* use a replacement function.
*/
/*
else
{
$_NAMESERVERS=array();
if(count($_NAMESERVERS)==0)
Unset($_NAMESERVERS);
include("rrcompat.php");
$validator->getmxrr="_getmxrr";
}
*/

/* how many seconds to wait before each attempt to connect to the
destination e-mail server */
$validator->timeout=10;

/* how many seconds to wait for data exchanged with the server.
set to a non zero value if the data timeout will be different
than the connection timeout. */
$validator->data_timeout=0;

/* user part of the e-mail address of the sending user
(info@phpclasses.org in this example) */
$validator->localuser="info";

/* domain part of the e-mail address of the sending user */
$validator->localhost="phpclasses.org";

/* Set to 1 if you want to output of the dialog with the
destination mail server */
$validator->debug=1;

/* Set to 1 if you want the debug output to be formatted to be
displayed properly in a HTML page. */
$validator->html_debug=1;

/* When it is not possible to resolve the e-mail address of
destination server (MX record) eventually because the domain is
invalid, this class tries to resolve the domain address (A
record). If it fails, usually the resolver library assumes that
could be because the specified domain is just the subdomain
part. So, it appends the local default domain and tries to
resolve the resulting domain. It may happen that the local DNS
has an * for the A record, so any sub-domain is resolved to some
local IP address. This  prevents the class from figuring if the
specified e-mail address domain is valid. To avoid this problem,
just specify in this variable the local address that the
resolver library would return with gethostbyname() function for
invalid global domains that would be confused with valid local
domains. Here it can be either the domain name or its IP address. */
$validator->exclude_address="";

if(IsSet($_GET["email"]))
$email=$_GET["email"];
if(IsSet(
$email)
&&
strcmp($email,""))
{
if((
$result=$validator->ValidateEmailBox($email))<0)
echo
"<H2><CENTER>It was not possible to determine if <TT>$email</TT> is a valid deliverable e-mail box address.</CENTER></H2>n";
else
echo
"<H2><CENTER><TT>$email</TT> is ".($result ? "" : "not ")."a valid deliverable e-mail box address.</CENTER></H2>n";
}
else
{
$port=(strcmp($port=getenv("SERVER_PORT"),"") ? intval($port) : 80);
$site="http://".(strcmp($site=getenv("SERVER_NAME"),"") ? $site : "localhost").($port==80 ? "" : ":".$port).GetEnv("REQUEST_URI");
echo
"<H2>Access this page using a URL like: $site?email=<A HREF="$site?email=mlemos@acm.org"><TT>your@test.email.here</TT></A></H2>n";
}
?>
<HR>
</BODY>
</HTML>


What is PHP or ASP?

ASP, PHP, JSP are all programming languages (aka scripting languages), that allow to write dynamic websites. Dynamic websites have a business logic to perform functions such as a signup form or a full-blown eCommerce application. Dynamic websites also have a presentation logic, that allows to change the look and feel of the site based on various preferences set by either the customer or the webmaster.