It’s been a crazy month, with lots of drama all over the place. Here at thisweekincas.com, we had an episode where the site was all screwed up and not loading or only partially loading, blank white pages, and the whole bit. During the process of keeping it together and trying to restore full functionality, numerous database imports and exports were performed under a variety of circumstance. During the rush, apparently the most recent database backup file was somehow uncompressed outside of MySQL before final import.

Several days later, that decompression/unzipping basically converted every quotation mark, em dash, en dash, ellipses and other special characters into some really ugly-looking codes.

What are they?

I think what happened is that the restoration database that we ended up using had been opened in a file/text editor. It’s just a guess, and sort of irrelevant, but the text editor converted our UTF-8 characters into some other character set, like ISO-8859-1. So after restoration, we ended up with hundreds of these weird characters in the database – quotes, hyphens, dashes, and ellipses were all converted to Klingon:

“ = left quote = “ †= right quote = ” ‘ = left single quote = ‘ ’ = right single quote = ’ — = en dash = – – = em dash = — • = hyphen = – … = ellipsis = …

Identifying most of these characters was relatively painless, but the en-dash and em-dash characters may be reversed (i.e., – = em dash, and — = en dash). Testing the other character replacements in the database was easy, but discerning between instances of em & en dashes proved futile. So do your own testing and make good backups before making any mass changes. Hopefully someone can help us out with more of the specifics.

Clean ‘em up

Before making any changes to your database, make sure you have a good backup (or three). Then to clean up these weird characters from the WordPress database, use a program like phpMyAdmin to execute the following queries.

Clean up post_content UPDATE wp_posts SET post_content = REPLACE(post_content, ‘“’, ‘“’); UPDATE wp_posts SET post_content = REPLACE(post_content, ‘‒, ‘”’); UPDATE wp_posts SET post_content = REPLACE(post_content, ‘’’, ‘’’); UPDATE wp_posts SET post_content = REPLACE(post_content, ‘‘’, ‘‘’); UPDATE wp_posts SET post_content = REPLACE(post_content, ‘—’, ‘-‘); UPDATE wp_posts SET post_content = REPLACE(post_content, ‘–’, ‘—’); UPDATE wp_posts SET post_content = REPLACE(post_content, ‘•’, ‘-‘); UPDATE wp_posts SET post_content = REPLACE(post_content, ‘…’, ‘…’);

Clean up comment_content UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘“’, ‘“’); UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘‒, ‘”’); UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘’’, ‘’’); UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘‘’, ‘‘’); UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘—’, ‘-‘); UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘–’, ‘—’); UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘•’, ‘-‘); UPDATE wp_comments SET comment_content = REPLACE(comment_content, ‘…’, ‘…’);

Other tables

While cleaning up the DigWP database, several other weird characters also showed up in various places, but they were very few in number. I also noticed several instances of converted quotes, dashes, and hyphens scattered around in some other tables, mostly in the options table, buried deep within temporary rss_ data. So I didn’t bother with anything beyond the post_content and comment_content tables, but easily could have done so by modifying the previous queries like so:

UPDATE [table_name] SET [col_name] = REPLACE([col_name], ‘…’, ‘…’);

Just replace [table_name] with whatever table you want to clean up, col_name with the column name, and then replicate or edit the query with the proper character replacements.

Lesson learned

Take-home message: don’t open your database in a text editor. But if you do, execute these SQL queries for easy clean-up.

Contents