scríbhinní ⁊ deocháin do chuallucht na Gaelainne


Músgraí WYSIWYM WP 2 — Add Semantic Functionality to the WordPress Graphical Text Editor

Author: Mícheál Ó Lochlainn, Áras Shorcha Ní Ghuairim, Acadamh na hOllscolaíochta Gaeilge, Ollscoil na hÉireann, Gaillimh.

Published: 5th November 2015.

Updated: 11th February 2017.


WordPress, plug-in, WYSIWYM, WYSIWYG, semantic, archiving, digitisation, Gaelic script, Insular script, seana-chló, Celtic languages, HTML5.


Don't want to read the blurb? Go straight to the downloads:

Don't know what semantic HTML is? Have a read of this first, then come back.

This article presents the second version of Músgraí WYSIWYM WP, a WordPress plug-in to modify the functionality of TinyMCE, the WordPress graphical text editor. It strips TinyMCE of its WYSIWYG functionality, which writes visually-based non-semantic and semantically compromised markup, and replaces it with semantically sound WYSIWYM functionality of its own.

I originally created Músgraí WYSIWYM WP because I don't like the way the WordPress / TinyMCE italic and bold buttons mark up selected text with the HTML tags <em> (which carries the semantic meaning of 'emphatic stress') and <strong> ('strong importance'). There are plenty of other reasons to italicise and embolden text strings, and they having nothing to do with emphasis or importance. Doesn't matter if you're a human being and you have your sight of course but the blind and disabled use the Internet too and non-semantic markup can spoil the output of their text reading devices. And a fair share of the handy Web functionality we all enjoy depends on the machine mining and processing of website data.

I've also come across, in informal, semi-professional and professional contexts, too many WordPress users with very good intentions but extremely bad hearing, and they paving the path by 'structuring' content with semantically meaningless visual ornamentation like text colour, alignment and so on — all used inconsistently of course. So when I started work on the plug-in I decided to see what I could do about that as well. (This isn't a WordPress-only problem; it happens wherever structured data needs to be entered consistently into electronic systems. I wonder how many petabytes of Word documents there are in the world, and they written and filed in such a hames of a way that only the author can unravel them and if he or she ever goes under a bus the data in them will be as good as lost.)

With all this said, nothing in this article is intended to criticise either WordPress or TinyMCE. They're excellent general-purpose tools that were not specifically made to create semantically meaningful WYSIWYM content. It is to their designers' credit that they can be modified to do so.

It is not that plug-ins weren't already available to semantify WordPress / TinyMCE; there's rakes of them. But all the ones I could find seemed to focus on giving the user more and more options and more and more freedom. Keys to the sweet shop. Músgraí WYSIWYM WP takes the less-is-more approach and tightly restricts what the user can do, making it harder for un-skilled or un-disciplined data entry operatives to lose the run of themselves and go clicking every button in sight just because they think the results look good.

Although with all that said, it is worth pointing out that, no more than WordPress itself, Músgraí WYSIWYM WP is just a tool and it is neither idiot-proof nor abuse-proof. You have to use it properly to get the best out of it. Anyone can pick up a wood chisel and use it to drive screws.

I should also mention that Músgraí WYSIWYM WP (or its individual modules) might or might not play nicely with other plug-ins that make changes to the TinyMCE interface; it depends entirely on the changes. This isn't quite by design but it is near enough. By definition, less-is-more taketh away as well as giveth. This plug-in is a specialised tool that does a specialised job. And special tools don't fit every toolbox. You just have to see how you get on.

The benefits of semantically sound markup to a WordPress site

So why bother? Because like backups and redundency, semantically sound WYSIWYM electronic document authoring is A Good Thing. In the context of the Músgraí WYSIWYM WP plug-in, the reasons can be summed up like so:

Changes since Músgraí WYSIWYM WP 1.0

I was happy enough with the first version of the plug-in but even though the design was modular enough I always thought that it was a bit of a God object. The jobs done by each module were fairly varied but other than going under the bonnet and hacking the PHP code yourself there was no way to disable (or better yet, remove) the ones you didn't need. A lot of people might have a use for some of them but not many would need them all together. I wasn't too happy with the way I was after allocating certain functionalities across the modules either. Músgraí WYSIWYM WP version 2 sees after all that. There are three main changes:


See the next page for screenshots of Músgraí WYSIWYM WP in operation.

But does it write valid markup?

When I released version 1 of this plug-in, one or two people wondered if the markup written by Músgraí WYSIWYM WP was, in actual fact, standards-compliant. It was, and it is. It was one of the things I tested for during development, by feeding fragments of the plug-in's markup to two W3C validators: the Markup Validator and the Nu Html Checker. Both returned fewer errors than Steps have released double concept albums.

Just a guess but I'd say they doubted you could use Irish language class name values, whether they were semantically meaningful or not. I'm not aware of any formal standard for semantically-based class names, nor of one that insists they be all in The Queen's English. There are some widely-used conventions for commonly-used elements but I don't believe they extend down to this level of specificity. The fact is though, there's nothing in the manual that says class names have to speak English and be décent, or be limited to ASCII characters. Anyone who's still not convinced, here's a HTML5 document containing Músgraí WYSIWYM WP markup. Submit it to the validators for yourselves and see what they say.

Músgraí WYSIWYM WP plug-in core

Updated on 27th April 2016 to version 2.1.2. Style preview now enabled in the styleselect dropdown menu.

If all you want is a WordPress-based website that writes semantically clean, minimalist content, without rakes of WYSIWYG deócháin, the core on its own should do you.

Download the plug-in core

Install the plug-in core

Just extract the músgraí-wysiwym-wp directory from the tar.gz or zip file, upload it to the WordPress plugins directory and activate as normal. The plug-in core installs a link to a status page on the Dashboard menu.

If you're upgrading from Músgraí WYSIWYM WP version 1, follow these instructions.

Músgraí WYSIWYM WP plug-in modules

Install one or more plug-in modules to get extended semantically-based functionality.

To install a module, just extract it from the tar.gz or zip file and upload it into the plugin core directory (not the WordPress plugins directory). Once uploaded, an entry for the module should apear on the status page.

To update a module, delete the existing one before uploading the new one.

Note: As of version 1.2.0 of all Extended WYSIWYM Functionality (xxxxxx) modules, they're no longer either-or jobs. You can install whatever pick-n-mix you want.

Stable Músgraí WYSIWYM WP modules

Extended WYSIWYM Functionality (General)

This module adds general purpose, semantically sound markup tools to the TinyMCE editor. These are located in the formats dropdown, on the first row of TinyMCE tools. It also adds a simple visual 'house style' for these elements to the TinyMCE editor and to pages and posts as presented by WordPress to website visitors.

Updated on 1st March 2016 to version 1.2.0. Installation of multiple Extended WYSIWYM Functionality (xxxxxx) modules together now permitted. The module was also renamed from Extended WYSIWYM Functionality (Basic) so any older versions will have to be deleted before this one is installed.

Updated on 15th Febuary 2016 to version 1.1.1. Citation functionality expanded. Underline, using the HTML u tag, now available for marking up spelling mistakes. Not a common use case but it is a semantically valid use of the tag and could be handy when quoting or citing textual content, especially in archival contexts.

Extended WYSIWYM Functionality (Digital Archiving)

This module adds tools for marking up digital realisations of original textual artefacts, where WordPress is being used as a digital archiving platform.

At the moment, there's only four tools in this set: replicate a single underline that indicates stressed emphasis in the original artefact, replicate a double underline that indicates stressed emphasis in the original artefact, replicate a purely ornamental (that is, carrying no semantic meaning) single underline in the original artefact and replicate a purely ornamental double underline in the original artefact. I could have added more variations on the theme but there are so many of them, where would you stop? I'll be adding other tools to the module out in the time though.

These tools all apply appropriate semantic markup while allowing the user to create visually faithful digital realisations of physical originals. They are located in the formats dropdown on the first row of TinyMCE tools.

Alphabets (Gaelic)

This module adds button sets to TinyMCE for entering Gaelic and Insular Script characters: dotted consonants, small r and s with long tails, Tironian et ⁊ɼl…

Preserve HTML Character Entities

For the moment anyway, Músgraí WYSIWYM WP writes certain specialised and lesser-used textual characters into pages and posts in the form of HTML character entities. Left to its own devices, TinyMCE catches these during save operations and converts them into the actual character. Preserve HTML Character Entities stops it from doing that.

In this modern day and age of Unicode and UTF-8, entities really shouldn't be necessary. They are — or at least they should be — a relic of a bygone era when Max Headroom was bigger than Madonna and the only textual characters worth supporting on any computer, anywhere on the face of the Earth, were the ninty-five that you needed to write things in American English.

Sadly, there's still no shortage of computer systems (and techies) that haven't moved with the times in this regard and although WordPress and TinyMCE aren't among them I decided to play it safe — mostly as insurance against content migration to, through or by one of these dinosauruses.

If it was certain that the content would be kept on WordPress forever there wouldn't be any need for this, since it can reasonably be supposed that WordPress will always render its own pages and posts correctly. But things do change and as management, staff and fashions come and go they bring and take their own software and platform preferences with them. Textual characters other than the basic alphanumerics and punctuation of US English have a habit of getting lost or corrupted — sometimes beyond recovery — during platform, software and framework migrations. (Don't they sometimes even fail to survive the journey of a simple email message from sender to recipient?)

So, if you think a bit of 'migration-proofing' might be handy out in the time, install this module. If you don't, don't!

Version 1.5.0 of the module (released 11th February 2017) updates it to support entities written by the new Specialised Characters (Irish Linguistics) module.

Specialised Characters (General)

This module adds button sets to TinyMCE for entering a small number of general-use characters that haven't any keys on normal keyboards: non-breaking space (best practice for Gaelic / Celtic family names, to tie elements like 'Ó', 'Mac' and 'ap' to what follows them), non-breaking hyphen (useful for the prefixes 'n‑' and 't‑' in Irish, which should never be split across lines from the following word), figure dash (used as a numerical delineater in things like telephone numbers but not used to specify ranges; read more about figure dashes), n‑dash (generally, to specify numerical ranges such as the years 1878–1967; read about n‑dashes), m‑dash (generally — although there are some subtlties — a colon or brackets alternative; and read about m‑dashes), ellipsis (…), primes (inches and feet; seconds and minutes) ⁊ɼl…

Specialised Characters (Irish Linguistics)

This module superceeds the Specialised Characters (Linguistic) module and may be of use to scholars working in the Irish linguistics field. It adds button sets to TinyMCE for entering Irish phonemic notational symbols which can't easily be typed using standard keyboards. There are two sets.

The first is the full pallet of IPA symbols specified by Wikipedia for textual representation of Irish phonemes in their articles.

The second is a subset of the 'domestic' Irish notational convention. Most of the symbols in this convention are standard Roman letters that you can just type in, so the module only provides the awkward ones: a (the double-storey small Latin a), ɑ (the single-storey a, also called script a), ɩ (the small Latin iota), ɪ (the small Latin capital I), ə (schwa) ⁊rl… I say 'convention' and not 'standard' because that's exactly what it is. Yes, there's a 'core' notation which finds favour with the devout but scholars of Irish linguistics have a long tradition, which survives to modern times, of devising personal remixes to best suit the work in hand. This module supports the 'core' notation but extends it slightly with symbols found in the corpus.

Experimental Músgraí WYSIWYM WP modules

All the experimental modules are for Celtic language annotation; at the moment anyway. But there's so many of them it was making the page untidy. I have them moved to their own article.

Upgrading from Músgraí WYSIWYM WP version 1

To upgrade from Músgraí WYSIWYM WP version 1:

To keep full Músgraí WYSIWYM WP version 1 functionality, install these modules: