Use SQL to Extract Posts, Categories, Tags, and Comments from WordPress

Updated on

This post shows how to extract content (posts, categories, tags, and comments) out of a WordPress database using SQL.

SQL Query to Extract WordPress Posts with Categories and Tags

Here’s a SQL query to get the posts out of WordPress along with the associated categories and tags.

SELECT
    p.ID id,
    p.post_title,
    p.post_author,
    p.post_date,
    p.post_date_gmt,
    p.post_excerpt,
    p.post_content,
    p.post_status,
    p.comment_status,
    p.post_name,
    p.post_modified,
    p.post_modified_gmt,
    p.post_content_filtered,
    p.post_parent,
    p.guid,
    p.post_type,
    p.post_mime_type,
    p.comment_count,
    GROUP_CONCAT(DISTINCT c.`name`) as categories,
    GROUP_CONCAT(DISTINCT t.`name`) as tags
FROM wp_posts p
LEFT JOIN wp_term_relationships cr
ON (p.`id`=cr.`object_id`)
LEFT JOIN wp_term_taxonomy ct
ON (ct.`term_taxonomy_id`=cr.`term_taxonomy_id`
    AND ct.`taxonomy`='category')
LEFT JOIN wp_terms c ON
(ct.`term_id`=c.`term_id`)
LEFT JOIN wp_term_relationships tr
ON (p.`id`=tr.`object_id`)
LEFT JOIN wp_term_taxonomy tt
ON (tt.`term_taxonomy_id`=tr.`term_taxonomy_id`
    AND tt.`taxonomy`='post_tag')
LEFT JOIN wp_terms t
ON (tt.`term_id`=t.`term_id`)
-- I wanted to exclude a few specific post types
WHERE post_type
NOT IN (
    'revision',
    'attachment',
    'nav_menu_item',
    'oembed_cache',
    'postman_sent_mail'
)
GROUP BY p.id;

The above query is partially based on this code.

SQL Query to Extract WordPress Comments

This SQL query will extract comments out of WordPress. I used a Ruby script to dynamically insert the post IDs into the SQL. (I created a single, machine-readable JSON file with all the posts and comments for later processing. The next step for me was to convert each post into a markdown file so that they could be loading into Gatsby, which is the framework that I was migrating to.)

SELECT
    comment_ID comment_id,
    comment_post_ID comment_post_id,
    comment_author,
    comment_author_url,
    comment_date,
    comment_date_gmt,
    comment_content,
    comment_approved,
    comment_agent,
    comment_type,
    comment_parent,
    user_id
FROM wp_comments
WHERE comment_post_id = 123 -- change this to your post ID
AND comment_approved = '1';

Using Ruby and mysql2

When I processed my WordPress database, I used a Ruby gem called mysql2. It’s an easy way to perform SQL queries and process each database row as a Ruby hash.

# This isn't working code -- it's just to illustrate the basic idea
require 'json'
require 'mysql2'

@client = Mysql2::Client.new(
  host: 'localhost',
  username: 'db_username',
  password: 'db_password',
  database: 'db_name'
)

results = @client.query("
  your SQL goes here
")

results.each do |row|
  # each row in the database is now accessible in this loop as `row`
end

content = 'put your final output for the file here'

# Turn the Ruby hash into JSON and write it to a file.
json = JSON.pretty_generate(content)
File.open('content.json', 'w') { |f| f.write json }

WP Gatsby Markdown Exporter

If you’re migrating to Gatsby.js, there’s also a WordPress plugin called wp-gatsby-markdown-exporter that might help some people. It can be run as a CLI on the server. I found that it didn’t do exactly what I wanted, but other people might find it useful, and the source code might be a starting point for customization.

Get Help Migrating Your WordPress Site

If you have questions or tips, leave a comment below.

Tagged with: Programming DatabasesSQLWordPressRuby

Feedback and Comments

What did you think about this page? Do you have any questions, or is there anything that could be improved? You can leave a comment after clicking on an icon below.