Jets in the sky

Last week I shared the first steps for merging our multiple webs into the one you’re seeing right now. In that post I explained how I exported the databases of the three source sites, how I cleaned them so that they only contain relevant tables, and how I loaded all those tables in the same database without any collisions.

The database I used for the merge was the following:

Tables ready before the migration.
Tables ready before the migration.

As you can see, there were four blocks: three of them corresponded to our previous websites, which were WPrincipiante, Nelio A/B Testing, and (the old version of) Nelio Software. You can easily spot them because of the prefixes we used. The fourth block contained the tables we used in the final multi-site installation.

Today, I describe the mapping we defined between source and destination tables, so that we could move content from one end to the other properly. Even though this seems an easy step, it’s actually the most important one for merging sites—a clear mapping guides the whole migration process. Let’s go!

Warm-up

Our new website is a multi-site WordPress installation, available in English (neliosoftware.com) and Spanish (neliosoftware.com/es/). The goal of this migration was to merge our three blogs and had them all under the same domain. On the one hand, Nelio A/B Testing and (the old) Nelio Software blogs, both in English, had to be merged in the English version of our new site. On the other hand, WPrincipiante (a Spanish blog) had to be moved to the Spanish version.

There’s plenty of data that could be moved from one site to the other, but we decided we’d only move “blogcontent. That is, users, posts, comments, categories, and tags. Note that the order in which I mentioned the elements is not random—some elements depend on others, so we had to start with those that don’t depend on anything else, or we would have had trouble dealing with identifiers. Users don’t refer to any other element, so they were easily moved. Then we moved posts, which hold a reference to their authors (that is, to users). Finally, we moved comments (which link to both posts and users), and categories and tags (which classify posts).

Moving Users

These were the tables we had to move to the destination tables of our multi-site installation:

  • abtesting_users and abtesting_usermeta,
  • neliosoftware_users and neliosoftware_usermeta,
  • wprincipiante_users and wprincipiante_usermeta.

In case you don’t know, multi-site WordPress installations use multiple table sets (one per sub-site). However, user-related tables are shared among all installations. Therefore, we only needed to consider two destination tables: wp_users and wp_usermeta.

In our case, the only users we had in our source websites were 11 post authors. In order to do that, we simply needed to insert each source user into the destination table. This is as simple as running a query on the source users table (i.e. abtesting_users, neliosoftware_users, and wprincipiante_users) and insert each match into the multi-site’s wp_users table

One of the things you should take into account when merging content from different sources is identifier collisions. For example, if we have a user named john.wayne in abtesting_users (whose ID is 3) and another user named mary.smith in neliosoftware_users (whose ID is also 3), when I try to insert the second (mary) I’ll get an error telling me that there’s already a user (john) with that same ID.

In order to overcome this problem, we used offsets, which simply added a certain number to source IDs so that they didn’t collide with already used IDs. For instance, assume we first move abtesting_users, and assume the highest ID used in that table is 7. Well, that’s the offset we have to apply to the next table (neliosoftware_users). By applying this idea to the previous example, we’d first moved john.wayne, whose ID is 3, and then we’d move mary.smith, whose ID would be 10 (her source ID was 3, but we added an offset: 7+3=10).

You have to be very careful when working with offsets. As I stated a few lines above, elements are related with each other (e.g. posts hold a reference to their authors). You always need to know the exact offsets you applied to a particular table, so that you can apply the same offset to the references. For example, when we moved Nelio Software posts to the destination table, we had to add the offset we computed in the previous paragraph to the post_author column, so that the author was linked to the proper user. Thus, for example, a post written by mary.smith had “user 3” as its author in the source database, but should have “user 10” as its author in the destination database (which is the new ID of our beloved Mary).

Finally, we had to move the following three tables: abtesting_usermeta, neliosoftware_usermeta y wprincipiante_usermeta, which contain meta-information related to our users such as social profiles, description, personal data, and so on. There was plenty of data here that we wanted to keep, but there was also a lot of stuff that we wanted to get rid of.

Filas de la tabla abtesting_usermeta para el usuario con identificador número 4.
Rows of table abtesting_usermeta for the user with ID number. You see that there are many rows with metadata trash.

So, how do you tell what’s relevant and what isn’t? Well, I’m afraid you have to decide by yourself. In our case, we looked at the meta_key column, which contains the unique name of the meta-field, and we “guessed” whether that field could be relevant or not. My advice here is: focus on one user only and look at all the meta-data you have for him (the previous screenshot shows an example). When you see something that clearly makes sense, write down the meta’s name and move the data. If it doesn’t, just get rid of it. Once you’re done, just move the relevant metas. Oh! And don’t forget to add the proper offset to user_id ?

Another important detail (“god is in the details”) when moving users is their permissions. Permissions are defined in two different meta-fields in usermeta: wp_capabilities and wp_user_level. Source permissions are automatically inherited for the main sub-site (in our case, neliosoftware.com). But if we wanted a user to have the same permissions in one of our sub-sites (for example, neliosoftware.com/es/, whose site ID is 3), we had to define two extra meta-fields: wp_3_capabilities and wp_3_user_level.

And that’s all we had to do to move our users! I know it’s been slightly complex, but I really hope you got all the concepts right ?. As you’ll see shortly, we applied the same ideas over and over again, so don’t be afraid of reading the remaining paragraphs ?. Just bear this ind mind: look at the source tables and save the relevant data, whilst getting rid of anything else.

Moving Posts

Post data is stored in the following tables:

  • abtesting_posts and abtesting_postmeta,
  • neliosoftware_posts and neliosoftware_postmeta,
  • wprincipiante_posts and wprincipiante_postmeta.

Since we wanted to run a multi-language site, we had to move content to one sub-site or the other depending on its language. In our case, this was quite straightforward. On the one hand, we moved English content (from Nelio A/B Testing and Nelio Software) to wp_posts and wp_postmeta. On the other hand, we moved Spanish content (from WPrincipiante) to wp_3_posts and wp_3_postmeta.

The process was exactly the same we just applied. First, we selected the published posts (post_status is 'publish') in the source posts tables (i.e. abtesting_posts, neliosoftware_posts, and wprincipiante_posts). Since we were not interested in moving pages, we only focused on the entries that were posts (that is, entries whose post_type is 'post').

It’s also worth mentioning that the posts table does not only contain pages and posts, but also meta-data of the files you uploaded to your media library. Since we were also interested in moving files to our destination WordPress site (we didn’t want to lose any images, did we?), we had to select them too. Therefore, our SQL query also had to include entries whose post_type was 'attachment':

SELECT * 
FROM prefix_posts
WHERE (post_type = 'post' AND post_status = 'publish') OR
      (post_type = 'attachment')

* Simply replace prefix_posts with the appropriate value (i.e. abtesting_posts, neliosoftware_posts, or wprincipiante_posts, as required).

Again, don’t forget to apply offsets. In this particular case, we had to compute new offsets for post IDs and we had to fix post_author‘s values applying the offsets we computed when we moved users.

Another important aspect while migrating posts is column guid. WordPress uses the value of this column to generate your RSS feeds properly. You have to change its value so that there are no collisions in your destination database. In our case, we applied the following rules:

foreach row r in wprincipiante_posts:
if r.post_type = 'post' then
  r.guid = 'https://neliosoftware.com/es/?p=' + r.ID
if r.post_type = 'attachment' then
  r.guid = r.guid->replace( 'http://wprincipiante.es/wp-content/uploads/',  
                   'https://neliosoftware.com/es/wp-content/uploads/sites/3/' )
foreach row r in abtesting_posts or neliosoftware_posts:
if r.post_type = 'post' then
  r.guid = 'https://neliosoftware.com/?p=' + r.ID
si r.post_type = 'attachment' then
  r.guid = r.guid->replace( '*/wp-content/uploads/', 
                   'https://neliosoftware.com/wp-content/uploads/' )

Don’t be scared! ? The previous fragment is actually quite simple ?. Let’s break it down. If we’re fixing a post’s guid  (post_type = 'post') we simply need to use the post’s short URL, which is built by simply appending the query string ?p=post.ID to your domain.

If, on the other hand, we’re fixing an file (post_type = 'attachment') we simply need to make sure that the path in the URL points to the real destination path. In our case, all images from WPrincipiante would then be under the uploads of one sub-site (https://neliosoftware.com/es/wp-content/uploads/sites/3/) and all the images from the other two sites would be under https://neliosoftware.com/es/wp-content/uploads/.

Once this is clear, we could then move the content from the old tables to the new ones. As I said, English content was moved to wp_posts and Spanish content was moved to wp_3_posts. Now let’s take a look at postmeta tables.

These tables contain meta-information of your posts. They’re actually quite similar to user meta-information, which we already analyzed in depth. Again, the easiest way to tell what to move and what to get rid of is to focus on one post and look at each meta-field individually. An important (and standard) example of this meta-information is _thumbnail_id, which tells WordPress the featured image used by a post.

When moving meta-data, don’t forget to add the offsets you just computed for post IDs to post_id. Also, remember to apply this offset to your _thumbnail_ids, or your posts might end up using either an incorrect image or none at all.

After this, you know everything you need to know to move post-related content. Once you’re done, you’ll have effectively merged all your content into a single blog! ?

Moving Comments

Once we already moved users and posts, it was time we moved comments too. Luckily, this part was quite easy (at least, easier than the previous ones). These were our source tables:

  • abtesting_comments  and abtesting_commentmeta,
  • neliosoftware_comments and neliosoftware_commentmeta,
  • wprincipiante_comments and wprincipiante_commentmeta.

In our case, commentmeta tables had no relevant information, so I completely ignored them. This is the SQL query I run to get the comments I wanted to move:

SELECT * 
FROM prefix_comments
WHERE comment_approved <> 'spam'

which basically returns all comments, excluding those marked as spam. We had to move every single comment in the result to the final tables… but, again, we couldn’t forget to apply the offsets we computed so far ?

Just as we did in the users and posts tables, we first computed new offsets for our comment IDs (in the column named comment_ID). Then, we added post offsets to comment_post_ID (which links the comment to a certain post) and user offsets to user_id (which links the comment to its author).

And there you go! We just moved comments and we got rid of all the spam we had in just a couple of lines!

Moving Categories and Tags

We followed a completely different approach for moving categories and tags. When we looked at our source categories and tags, we quickly realized that there were too many of both. As you may already know, categories and tags classify your content so that it can be easily found by your visitors. If there are too many (for instance, we had a lot of tags used by one post only), they aren’t very helpful.

We decided to simplify both taxonomies: our new blog would only have 4 categories (Community, Online Marketing, WordPress, and Inside Nelio) and we’d only use up to 25 tags. We also decided we wouldn’t create any new categories or tags from now on; we should be able to classify all our content within these categories and tags.

Once we knew the “final” categories and tags we wanted, we simply needed to map the old ones with the new ones. This wasn’t easy (as I said, there were a lot of categories and tags), but I think the final result is pretty good. There’s no general rule for deciding this mapping—it’s up to you! But, regardless of the mapping you decide, the steps for applying it are the same.

In the end, the mapping rules you’ll have will look similar to:

if post had category C1, C2, C3, ..., or Cn
then add new category X
if post had tag T1, T2, T3, ..., or Tn
then add new tag Y

which tell us how posts have to categorized and tagged in the new website. In our case, we defined over 200 rules (don’t worry, I’m not sharing them ?).

To apply all these rules efficiently, you’ll need a function in your code that retrieves the current categorization and tagging of your posts (by looking at the following tables: terms, term_taxonomy y term_relationships) and creates the new rows in the destination tables (wp_term_relationships or wp_3_term_relationships, in our case).

Taxonomy management in WordPress is slightly complex, so make sure you read the Codex first and learn how it works before attempting to automatically generate database entries.

That’s all, folks! Well, almost…

It wasn’t that hard, was it? I warned you—you must care for details if you want to complete a successful migration. If you read the whole post, congrats! ? I’m happy you invested some of your time in learning how complex a migration can be! And I do hope you also learned how to overcome some of the common issues you might encounter ?.

In the third (and last, I promise ?) post of this series we’ll talk about post-processing. Essentially, we’ll see how to smooth all the edges and end up with an awesome website! And I’ll also detail some of the problems we encountered (yes, we had some ?), and how we overcame them.

I honestly think it’ll be an interesting post, probably easier than this one. So I want you to be there! And, as always, if you have any questions you’d like me to answer, just add a comment and let me know.

Featured image by Marc Wieland.

Leave a Reply

Your email address will not be published. Required fields are marked *

I have read and agree to the Nelio Software Privacy Policy

Your personal data will be located on SiteGround and will be treated by Nelio Software with the sole purpose of publishing this comment here. The legitimation is carried out through your express consent. Contact us to access, rectify, limit, or delete your data.