Wednesday, 21 April 2010

Introduction to the UMdC file format for Ancient Egyptian

UMdC (Unicode Manuel de Codage) is a new file format for documents containing Ancient Egyptian. This informal note is aimed at people familiar with versions of the ‘Manuel de Codage’ (MdC) protocol as used in applications such as InScribe, JSesh, MacScribe and WinGlyph. My objective here is to explain a little about what UMdC is, why I’ve devised this new format and how I envisage it being used. The good news is the fact that UMdC is highly compatible with MdC so there is little in the way of learning curve required and there is no need to ditch existing software tools and methodologies entirely.

Manuel de Codage
A scheme for representing Ancient Egyptian was published in 1988. Manual for the Encoding of Hieroglyphic Texts for Computer-input (Jan Buurman, Nicolas Grimal, Jochen Hallof, Michael Hainsworth and Dirk van der Plas, Informatique et Egyptologie 2, Paris 1988). This is generally known as Manuel de Codage or simply MdC. It is useful to refer to the original scheme as MdC88.

MdC88 was never a formal specification and has been interpreted and extended in several ways for use by applications that work with Ancient Egyptian. There is no ‘standard’ MdC, only dialects. In many simple cases this is not a problem, everybody agrees what ‘+sO34-N37:Y1’- represents as hieroglyphs. However for more complex texts there is scope for ambiguity, confusion and incompatibility.

UMdC basics
Here is a list of some UMdC characteristics. My goal has been to keep things as simple as possible and avoid scenarios which may be useful for some purposes but are not in my opinion appropriate to be addressed in an MdC-like approach. It is not a specification; I simply want to give a flavour of what is involved.
  1. UMdC files must use the ‘.umdc’ file extension (i.e. umdc file type) except in special circumstances. MdC88 did not define rules for file names so several alternatives are in use.
  2. UMdC files must use UTF-8 (Unicode 8 bit) encoding. Unicode allows most modern languages to be written from English to Kanji to Arabic and Hebrew. MdC88 specified ASCII so even the accented characters popular in some European languages are not present. Dialects of MdC often use the ISO-8859-1 (Latin-1 Western European) 8 bit coding or similar but more by accident than design and there is a lot of scope for confusion.
  3. UMdC files begin with the 8 characters ++++UMdC so software knows this is really meant to be a UMdC file and can proceed accordingly. MdC88 compatible software will interpret this sequence as a comment. It is permissible to precede this sequence with the Unicode BOM (some text editors such as Windows Notepad add the BOM and it would be confusing not to accept this) although doing so may throw some software!
  4. UMdC follows MdC88 in stating that all text content is preceded by +l (normal text), +b (normal, bold text), +i (normal, italic text), +t (transliteration), +c (Coptic), +g (Greek) +s (hieroglyphs) and ++ (Comment). The ! and !! conventions for end of of line, end of page are used. This means UMdC is very compatible with MdC at one level. The rules here are however more tightly defined as will be detailed in specifications.
  5. UMdC adds the notion of umdc-instruction. All umdc-instructions begin with the three characters +++. The beauty of this approach is MdC88-compatible software interprets a umdc-instruction as a comment so although information may be unused hieroglyph segments etc. survive unchanged. UMdC uses umdc- instructions for most new functionality such as rich text formatting options.
  6. UMdC version 1.0 requires that Gardiner codes and mnemonics are based on EGPZ 1.0 specifications. MdC88 defined Gardiner codes but this set was superseded so although everyone agrees what “A1” and "n" mean the same is not true beyond the common Egyptian Grammar set.
  7. Applications can elect to use application-specific umdc-instructions to implement features such as special hieroglyph layout options or non-EGPZ coding conventions. This is not encouraged except where unavoidable and there are rules.
  8. UMdC itself cannot be extended by an application provider, only as an official change to the specification. A non-complying UMdC file counts as an error pure and simple; there are no 'dialects'. Rules govern future official extensions to avoid breaking software written to the current specification.
In short, the UMdC file containing

++++UMdC+lHello +sV9:W24-O49-!

Corresponds to MdC88

+lHello +sV9:W24-O49-!

UMdC Development Roadmap
I am working on UMdC documentation (to go on http://www.egpz.com/). This consists of user-oriented material, a technical reference, and implementation guidance for software writers and others who want to support UMdC.

I am also working on a complete implementation of a document editor for umdc files to be included in InScribeX Web (Preview 4) (to go on http://www.inscribex.com/).

UMdC support is also being included in a second edition of the InScribe 2004 software, namely InScribe 2004SE. As part of this, the ‘.InScribe’ file format is being adapted to be 100% UMdC compatible. Support for ‘.InScribe’ and ‘.umdc’ file integration into Windows search is another useful feature.

My aim is to get most if not all of this work completed in advance of the Informatique et Égyptologie 2010 meeting to be held early July in Liège. The project is unfunded so we shall see!

Tuesday, 20 April 2010

Silverlight 4 Release, Moonlight 3 Preview, and InScribeX Web

Last Thursday Microsoft released Silverlight 4, a significant milestone for the InScribeX Web project since SL4 brings a useful set of new functionality for delivering the InScribeX Web approach to Ancient Egyptian. Silverlight 4 runs on Windows (XP, Vista, 7) and Mac OSX (Intel systems). No platform change since version 3 beyond Google Chrome web browser now officially supported.

Meanwhile Novell continues development of Moonlight, the open source Linux equivalent to Silverlight. Moonlight is running several months behind Silverlight with the current release Moonlight 2 corresponding to Silverlight 2. Moonlight 3 release is expected sometime this Summer although previews have been available since February and the latest version (preview 6) made available last week is in pretty good shape.

With that background, I thought it would be useful to state now how these changes are affecting InScribeX Web.

InScribe Web Preview 2
I made this version available in July last year to run on Silverlight 2 not long before Silverlight 3 appeared. This version is still available and runs with Moonlight 2 on Linux as well as Silverlight 2, 3 or 4. I expect this version to remain live, unchanged, until a few weeks after Moonlight 3 release. At that point I expect to retire this version as there will be no need to continue Silverlight 2 compatibility.
 
InScribe Web Preview 3
This version is written for Silverlight 3 and tested to work with Moonlight 3 previews and Silverlight 4 release. I've held back on making this available until Silverlight 4 was released and out in the field for a couple of weeks so probably an early May release date. As it stands this version is not greatly enhanced over preview 2 although I've changed the interface to make more effective use of screen space and added a bunch of behind the scenes changes.  This redesign especially benefits netbooks and other low resolution devices. I'll probably migrate some features of Preview 4 back into Preview 3 for the sake of Linux users once Moonlight 3 is available and before the Moonlight 4 release. In particular some UMdC support. Once Moonlight 4 is available (Winter?) this version can be retired and Preview 4 used cross platform.
 
InScribe Web Preview 4
Preview 4 requires Silverlight 4 or later and makes use of some of the new functionality, notably to enable rich text editing and printing. Probably late May/early June for the first cut then some incremental changes to follow during the rest of the year. Preview 4 introduces and supports a new file format for texts incorporating Ancient Egyptian, namely UMdC (Unicode Manuel de Codage).
 
About UMdC
The question of file formats for hieroglyphs and Ancient Egyptian has been a thorn in my side for some time. The problem is not how to devise ways of representing Egyptian in interesting and more powerful representations but rather how to evolve current ways of working with hieroglyphs without adding unnecessary complications in forseeable future directions. I've finally settled on this UMdC approach as the simplest solution to remove this blockage.  A topic I hope to cover in more depth tomorrow.