Migrating from Textpattern to Hugo
Over the last few weeks, I've been working to switch my Blog from Textpattern to Hugo, and migrating the content (mainly my posts) to Markdown.
Unfortunately, Textpattern does not provide any export in a convenient format - There are ways to export a DB dump as backup or plugins for exporting data, but nothing out of the box.
So I decided to dump my posts from the database myself and I'd like to show how.
Find connection params
If you're struggling to find the credentials to your Textpattern database like me, it's best to have a look in the configuration file, config.php
:
$ cat textpattern/config.php
<?php
$txpcfg['db'] = 'textpattern';
$txpcfg['user'] = 'txpuser';
$txpcfg['pass'] = '###';
$txpcfg['host'] = 'localhost:3306';
$txpcfg['table_prefix'] = '';
$txpcfg['txpath'] = '/var/www/kleemans.ch/textpattern';
$txpcfg['dbcharset'] = 'utf8';
$txpcfg['siteurl'] = "https://kleemans.ch";
define('PROTOCOL', 'https://');
?>
Connect to MySQL database
After getting the connection parameters, we can connect to the database:
$ mysql --host=localhost --user=txpuser --password=###
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 500
Server version: 5.7.27-0ubuntu0.16.04.1 (Ubuntu)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
...
After this we tell MySQL which database we want to use...
use textpattern
... and we can have a look at the tables:
mysql> show tables;
+-----------------------+
| Tables_in_textpattern |
+-----------------------+
| rah_sitemap |
| rah_sitemap_prefs |
| textpattern |
| txp_category |
| txp_css |
| txp_discuss |
| txp_discuss_nonce |
| txp_file |
| txp_form |
| txp_image |
| txp_lang |
| txp_link |
| txp_log |
| txp_page |
| txp_plugin |
| txp_prefs |
| txp_section |
| txp_token |
| txp_users |
+-----------------------+
19 rows in set (0.00 sec)
This gives back an overview of the whole Textpattern system. Parts of it are explained in the Textpattern documentation: Textpattern Schema reference.
I only found the schema for the newest version (4.9), but also with my version (4.6.2) the schema was the same.
Exporting content
Next up is actually exporting/dumping the blog content. I tried this several ways, but ended up with the simplest one.
I tried making SELECT * INTO OUTFILE
work, but this needs elevated rights (admin) and even after getting admin access I still wasn't able to export, due to the server running in a mode where exports wouldn't be allowed.
So I stuck with the most simple export (actually the first one I tried):
mysql --user=txpuser --password textpattern -B -e "select * from textpattern;" > dump.tsv
The downside was that it was not a "proper" tsc/csv, as one post would span multiple lines without the usual quoting char, like the following example. The line would start with post ID, date, author etc., separated by a proper tab, but the content itself would be on new lines.
851 2020-09-21 06:33:46 NULL adrianus 2020-09-21 06:33:46 adrianus Automate downloading songs from youtube Streaming [...]
\n
\n![](/images/420.png)
\n
\n## Getting song charts
...
Other approaches like this example with regex would allow for a "," delimiter, but the post would still be spread across multiple lines.
Generating posts from dump
After having an export in my hands, the rest was quite simple. I just needed to map the columns to attributes (for example, the 6th column was is the title), and generating files from the content.
Note that because of posts span several lines, we can't just assume one post per line, and the lines are "stitched" together until we recognize the beginning of a new post.
# Read dump & concat lines
header_row = True
raw_posts = []
raw_post = ''
first_row = True
with open('dump.tsv', 'r') as dump_file:
for row in dump_file.readlines():
if row.startswith('\\n') or first_row:
raw_post += row
first_row = False
else:
raw_posts.append(raw_post)
raw_post = row
print('Read', len(raw_posts), 'raw posts.')
# Build markdown posts
posts = []
for post in raw_posts:
row = post.split('\t')
posted = row[1]
title = row[6]
body = row[8].replace('\\n', '').replace('\\t', ' ')
category1 = row[13]
category2 = row[14]
status = row[18]
section = row[21]
keywords = row[23]
url = row[25]
content = '---\ntitle: ' + title + "\n"
content += 'date: ' + posted + "\n"
content += 'category: ' + category1 + "\n"
content += 'keywords: ' + keywords + "\n"
if status not in ('4', '5'):
content += 'draft: true\n'
content += '---\n\n'
content += body
filename = posted.split(' ')[0] + '_' + url + '.md'
post = { 'filename': filename, 'content': content }
posts.append(post)
# Write files
for post in posts:
with open('posts/' + post['filename'], 'w') as post_file:
post_file.write(post['content'])
print('Finished, wrote', len(posts), 'posts.')
I still had to do some manual cleanup after that (including converting Textile hyperlinks to Markdown), but still, a basic export and creation of a file per post saved a lot of time.
Thanks for reading!