gaoluinn

scríbhinní ⁊ deocháin do chuallucht na Gaelainne

go to main article

Celtic Language Annotation Modules for Músgraí WYSIWYM WP

Author: Mícheál Ó Lochlainn, Áras Shorcha Ní Ghuairim, Acadamh na hOllscolaíochta Gaeilge, Ollscoil na hÉireann, Gaillimh.

Published: 18th April 2016.

Updated: 1st July 2016.

Keywords

WordPress, plug-in, WYSIWYM, WYSIWYG, semantic, archiving, digitisation, Gaelic script, Insular script, seana-chló, Celtic languages, HTML5.

Introduction

Fo fo! The Músgraí WYSIWYM WP modules presented here are all experimental: either betas or clay models. With that said, most of them are probably fit enough for real-world use but they might still get a bit of tweaking that could break backwards-compatibility. Always read the lable.

These modules add linguistic options to the formats dropdown on the first row of tools in the WordPress graphical editor (TinyMCE) and these options allow the user to mark up text- and document sections as being written in one of the Insular Celtic languages or in one of the non-Celtic languages that are, or have been, spoken natively in the various Insular Celtic territories. The markup can be used to specify both language and territory.

The modules generally come in pairs: a simple markup version for general-purpose use and a detailed markup version for scholarly types and linguistic power users. The detailed markup versions are a bit everything and the kitchen sink, to the point that some of them support ancient languages that might not have extant literatures or corpora, if they ever had them at all, or ones that have only limited extant graffitis. 'Experimental'; the clue's in the question. I'm just revving them up to see what they do.

The surviving — or reviving — Insular Celtic languages are Irish, Scottish and Manx Gaelic (the Goidelic languages) and Welsh, Cornish and Breton (the Brittonic ones). For the purposes of this work, the Celtic territories are Ireland, Scotland, The Isle of Man, Wales, Cornwall and Brittany, along with parts of Canada and Argentina. These modules are focussed on the languages and dialects of historically established, generationally continuous and geographically contiguous native speaker communities as opposed to, with genuine but realistic respect, well-meaning innovations and nebulous networks where it is at least one hundred years and five thousand L1 speakers too early to tell.

The under-the-bonnet (HTML) aspect of the markup complies fully with the appropriate standard; BCP 47, drawing on ISO 639, ISO 15924 and ISO 3166-1. See Richard Ishida's article on the subject if you want to find out more. I am after taking advantage of the private use extensions permitted by BCP 47 to develop certain functionality, but any text marked up by the modules is still entirely machine-parsable and linguistically minable.

The visible aspect of the markup is restricted for the time being though. Just because the markup is machine-parsable it doesn't necessarily mean that the content has to be visually ornamented. If you're not careful, styling documents to mark text written in different languages and dialects will end up looking like a ransome note or like the video for Ashes to Ashes only with the colour turned up. (Not to mention the behaviour in Braille readers, although the markup does provide a key that speech synthesisers could use to utter the text in an appropriate voice.) So for now I have each language's markup styled to be highlighted with a different colour when viewed in the TinyMCE editor, and with shades of the same within the language groups, but on pages and posts as presented by WordPress to website visitors they have no styling at all. We'll see how it goes.

Geata Bán

The Celtic Language Annotation Modules are based on customised HTML markup that isn't particularly complicated but that was carefully put together. The markup for specifying each language and dialect is based on controlled sets of values set out in a template called Geata Bán, which I designed specifically for this purpose. The boring, under-the-bonnet technical specs are documented on this page.

The plug-in modules

For now, most of these modules only come with an English user interface. They can be downloaded in both zip and tar.gz archives.

Brittonic languages

Extended WYSIWYM Functionality (Simple Markup for Breton)

This module allows the user to mark up text as being written in Breton or in French French.

Extended WYSIWYM Functionality (Detailed Markup for Breton)

This module allows the user to mark up text as being written in Old, Middle or Modern Breton; or in Old, Middle or Modern French.

Extended WYSIWYM Functionality (Simple Markup for Cornish)

Mark up text as being written in Cornish (orthography-neutral); or in Cornish- or British English.

Extended WYSIWYM Functionality (Detailed Markup for Cornish)

Mark up text as being written in Old, Middle or Modern Cornish (also specify orthographies of Revived Cornish); in Old or Middle English; or in Modern Cornish- or British English.

Extended WYSIWYM Functionality (Simple Markup for Welsh)

Mark up text as being written in Welsh (including Patagonian), in British English or in Argentinian Spanish.

Extended WYSIWYM Functionality (Detailed Markup for Welsh)

Mark up text as being written in Old, Middle or Modern Welsh (including Patagonian); in Old, Middle or Modern English; in Argentinian Spanish or in Welsh Romani.

Goidelic languages

The three surviving Goidelic languages are really just parts of the one linguistic continuum. Of course, the geographic separations were always going to cause some amount of divergence but, no more than the surviving Brittonic languages, the isolating encroachment of an uncouth jargon is after widening the gap.

When speaking English, 'Irish', 'Gaelic' / 'Gàidhlig' and 'Manx' would be more natural names for the flavours of the language local to Ireland, Scotland and The Isle of Man respectively. So to allow for the Scottish use of 'Gaelic', which might mean any or all of them unless there's a context, the terms 'Irish Gaelic', 'Scottish Gaelic' and 'Manx Gaelic' are used here for clarity — even if they are a bit stilted.

Extended WYSIWYM Functionality (Simple Markup for Manx Gaelic)

Mark up text as being written in Manx Gaelic, in Manx English or in British English.

Extended WYSIWYM Functionality (Detailed Markup for Manx Gaelic)

Mark up text as being written in Manx Gaelic; in Old or Middle English; in Modern Manx English or in Modern British English.

Extended WYSIWYM Functionality (Simple Markup for Scottish Gaelic)

Mark up text as being written in Scottish Gaelic (specifying it as originating in Scotland, Britain or Canada, and as being in either the Roman or the Gaelic script); in Scottish-, British- or Canadian English; or in Canadian French.

Extended WYSIWYM Functionality (Detailed Markup for Scottish Gaelic)

Mark up text as being written in Scottish Gaelic (specifying it as originating in Scotland, Britain or Canada, and as being in either the Roman or the Gaelic script); in Old or Middle English; in Modern Scottish-, British- or Canadian English; in Canadian French; in Scots; or in Scottish Cant (Traveller Scots).

Extended WYSIWYM Functionality (Simple Markup for Irish Gaelic)

Mark up text as being written in Irish Gaelic (specifying it as being in either the Roman or the Gaelic script); or in Irish- or British English.

Extended WYSIWYM Functionality (Detailed Markup for Irish Gaelic)

Coming…

Extinct historical languages

Extended WYSIWYM Functionality (Extinct Historical Languages)

Mark up text as being written in Cumbric, Pictish and Anglo-Norman.

Isn't it funny what standards will do if you let them? Yes, the markup written by this module is standards-compliant and no, the module itself isn't meant to be taken too seriously… Unless your other car's a TARDIS. Don't forget to bring your tape recorder.

Early versions

Extended WYSIWYM Functionality (Celtic Linguistics)

This module was just a proof of concept and is not being developed. It works away fine but is not really for use in production.

It adds tools to the TinyMCE editor to let the user mark up textual content as being in any of the living Insular Celtic languages or in most of the non-Celtic languages with L1 speaker-bases in the Celtic territories (British English, Canadian English, French French, Canadian French and Argentinian Spanish).

These tools are located in the formats dropdown on the first row of TinyMCE tools. Visual styling is applied only in the TinyMCE editor.

go to main article