Thursday, 30 September 2010

Simplified Egyptian: A brief Introduction

I coined the term ‘Simplified Egyptian’ several years ago as a technical approach to making Ancient Egyptian in hieroglyphs more useable in the modern digital world (see HieroglyphsEverywhere.pdf, Bob Richmond, 2006).

The snag in creating an implementation has long been external factors such as the status of web browsers and Word Processors, along with the associated de-facto or formal industry standards. The devil is in the detail and there are many idiosyncrasies in modern technology once one departs from the everyday. A notion like Simplified Egyptian would be no more than a curiosity if it were not widely accessible on personal computers and other digital devices.

One factor in the equation was the need to include Egyptian Hieroglyphs in the Unicode standard (published in Unicode 5.2, October 2009). Implementations of 5.2 are slowly becoming available; for instance Google web search now accepts hieroglyphs although Microsoft Bing and Yahoo search do not yet. Another key factor is support by Internet browsers. Firefox 3.6 looks viable now and I expect the latest versions of other popular browsers to support Egyptian to some degree within the next few months.

As various pieces of the technical puzzle appear to be coming together in the 2011 timeframe I thought it would be useful to summarise now what I see Simplified Egyptian being about. I envisage putting more flesh on the bones in future blog posts on a prototype implementation as leisure time permits (this is an unfunded project at present so time is the enemy).

Simplified Egyptian (SE) works as follows.

1. Define a subset of the Unicode 5.2 list of characters for Egyptian Hieroglyphs, avoiding variants and rarely used (in Middle Egyptian) characters.
2. Define fixed rules for combining hieroglyphs into groups so these rules can be implemented in TrueType/OpenType fonts or alternative rendering methods.
3. Use left to right writing direction.
4. Define data tables and algorithms for text manipulation and sorting.
5. Define ‘normalized forms’ for guidance on ‘correct’ ways of writing and processing Simplified Egyptian.

A more recent notion - Super-Simplified Egyptian (SSE) - takes these principles further by identifying an even more condensed subset of the Hieroglyphic script, a proper subset of SE, with a palette of fewer than 200 hieroglyphs.

There is no question that the SE method is highly anachronistic, SSE extremely so. Nevertheless, there is some utility in the approach.

I am also aware that superficially what I’m proposing suggests a flavour of modernised Egyptian at odds with the requirements of Egyptology for working with an ancient language with script usage that evolved over 3000 years. I will make no apologies for the fact that this is indeed one application and if SE encourages wider understanding of Egyptian albeit at a reduced technical level that is no bad thing in my opinion. Nevertheless, the most interesting aspect from my own interest is the question of how to use such a mechanism to enable improvements for academically sound publication and study of ancient texts in the context of 3000 years of language/script evolution. A non-trivial topic I shall not touch on further today.

My plan is to make available some small working examples of Simplified Egyptian on a series of web pages during the next few weeks. These examples use a WOFF (Web Open Font Format) font derived from my InScribe font. The reasons for doing this now are twofold.

1. A new generation of Internet Browsers pays greater attention to industry standards and should be capable of supporting Simplified Egyptian. Firefox 4 and Internet Explorer 9 are in Beta at the moment and I want to make the samples available for browser testing in case there is a need to shake out any browser bugs.

2. InScribe 3 for Windows will not use Simplified Egyptian. Originally it was my intention that SE would be a feature but it turned out to introduce too many complications in modes of use. Nevertheless, InScribe 3 retains some ‘SE-friendly’ characteristics and I want to be able to test these for real on the Web as I complete work on the software.

The samples will not work for users of older browser technology (right now this is a high ninety something per cent of internet capable devices). My short term concern is only that an elegant and simple to use implementation works as and when devices gain adequate support for internet standards.

That is not to say that workarounds can’t be contrived for devices whose manufacturers or users are not able or prepared to adapt to the new standards-based internet landscape. I'm happy to hear of any proposals.

Right now, this means use Firefox 3.6 or later to view samples as intended. I’m also tracking Chrome, Internet Explorer 9 Beta and Safari releases.

Wednesday, 29 September 2010

Browser of the month: Firefox

My post yesterday Quick test for Ancient Egyptian in web browsers (September 2010) actually exposed three bugs.

None of which involved Firefox. In fact Firefox 3.6 and later correctly display transliterations and hieroglyphs on a Windows system with a suitable Unicode 5.2 font containing hieroglyphs and the other characters.

The bugs are:
1. Latest releases of Chrome, Internet explorer (8 and 9 Beta) and Safari do not pick up that there is a local font with hieroglyphs. Basically a bug with Unicode 5.2 support I think. Attn: Apple, Google, Microsoft.
2. The same three browsers incorrectly process characters in the SMP given as character references e.g. �� Firefox is correct in displaying this pair as two bad characters per HTML specifications. This UTF-16 type of surrogate representation is not valid HTML: in my example the correct character reference is 𓄿 Attn: Apple, Google, Microsoft.
3. The Blogspot post editing software gratuitiously changed my UTF-8 text into character references �� for no apparent reason. The editing software also is buggy when I try to re-edit the post. Attn: Google, Blogger

So a gold star to Mozilla/Firefox.

The wooden spoon ought to go to Google for hitting all three bugs but in mitigation I'll observe that hieroglyphs are now supported by Google search (unlike the situation with Microsoft Bing and Yahoo who haven't even caught up with Unicode 5.1 never mind 5.2. A tribute to corporate lethargy - wake up guys).

After discovering the Blogger bug, I've opened a secondary blog on WordPress - Journal of Total Obscurity 2. For the time being this remains my main blog but WordPress will be used for posts with hieroglyphs.

I've retained my original post here 'Quick test for Ancient Egyptian in web browsers (September 2010)' so bugs can be monitored but I've uploaded the correct version at Test page for Ancient Egyptian Hieroglyphs in Unicode (September 2010) on WordPress.

Tuesday, 28 September 2010

Quick test for Ancient Egyptian in web browsers (September 2010)

A quick test note to check Ancient Egyptian in Web browsers.

If you have a (Unicode 5.2 compatible) Egyptian font installed on your system, the next few lines ought to make sense:


(in MdC this Egyptian transliteration reads +taa*AA*iIwWbBpPfFmMnNrRhh*HH*xx*XX*ss*SS*qq*kKgGtt*TT*dd*DD*)


(in MdC these Egyptian hieroglyphs read +s-A-i-y-a-w-b-p-f-m-n-r-h-H-x-X-s-S-q-k-g-t-T-d-D)

In fact, this is a FAIL for hieroglyphs today on Windows for Chrome (6.0.472.63), Firefox (3.6.10) Internet Explorer 9 (Beta 9.0.7930), and Safari (5.0.2). Only Firefox successfully displays the transliteration.

Tantalizingly, the Firefox edit box does work:

Technically, all a browser needs to do is ennumerate all fonts on the host system and if the font implicit in the HTML is not present, use any font that supports the characters if available. Perhaps there needs to be some magic setting in the TrueType fonts for the browsers to work although this ought not to be necessary so I will count this as a multi-browser bug.

The lines should read:

Update. This site, Blogger, turned my HTML hieroglyph strings into entities, e.g. hieroglyphs in UTF-8 into  �� etc. Firebox has a bug in this case (entities in Unicode SMP) but not when raw UTF-8 is used in HTML so Firefox is very close to working, indeed it is good for many web pages. Blogger is a bit broken, the entities are simply confusing and bring nothing to the party.

Monday, 27 September 2010

Q. Why is my screen too small?

A. It’s a tradition.

Last month I mentioned the 25 year old RM Nimbus PC-186 and its 640x250 display. 250 was the number of lines displayable on a ‘CGA’ class CRT monitor of that time (more precisely at 50/60Hz non-interlaced). The 14" CGA was the only mass produced monitor available at a reasonable price in 1985 and it was that fact as much as the cost of the driver electronics that influenced the low resolution choice of display mode. By 1988, 14" ‘VGA’ type monitors were in mass production at 640x480 resolution and these soon gained higher definition 800x600.

During 1987-1993 one part of my job with RM involved working with a series of US Silicon Valley based companies who were growing the capabilities of PC graphics systems into the affordable market. Computer graphics has always been a personal interest so it was fun to be involved in bringing out the then emerging technology that is nowadays is taken for granted. My main role was writing device drivers for Windows and working with the chip designers to boost performance. During this period, the ‘holy grail’ was to reach 1024x768 24bit colour with an inexpensive design, a point we reached for the first time with a Cirrus Logic chip in 1993. This hit acceptable performance goals for Windows 3.1, removing the need for the transient 256 colour type displays popular for a while but problematic from an application programming point of view.

Two flies in the ointment. 1. Computer monitor manufacturers took a long time to come around to the obvious fact that 14"/15" CRT displays were too small for applications like word processing and spreadsheets. The sweet spot was 17"/19" but it seemed to take forever before it was accepted this was a volume market and the price benefits of mass production held sway (21" and above were cool but too unwieldy in CRT except for specialist applications such as CAD). 2. Most employers, schools and universities regarded it as acceptable to save a hundred dollars or so even if that meant seeing armies of highly paid employees and students hunched over small monitors peering at a fraction of a spreadsheet or page of text.

So much for history, though I’ll repeat the point that it seems to be a well-established tradition to use displays that are too small for purpose. Eventually things get better so nowadays good flat screen displays for desktop computers are very affordable. Although last year I visited the newsroom of a popular newspaper and it was almost laughable to see journalists and typesetters using displays that were obviously too small to efficiently work with a tabloid format. All to save the cost of a lunch or two!

Moores Law in the twenty first century. Electronics shrinking to give high functionality with reduced power consumption, and the consequential growth of small format computing: laptops, netbooks, smartphones, tablets, eBook readers. In each case the same pattern. Early devices have less than usable screen sizes and not just for reasons of manufacturing cost. Product marketing tries to avoid the fact that the emperor has no clothes. Keen leading edge users, in denial, claim its all ok. Markets learn and devices gradually move to something more ergonomic and pleasant to use.

This topic came to mind while I was tweaking an InScribe design for netbooks (typically a usable 1024x600 10" display nowadays after that unfortunate early fad for the 90s-retro 800x480 resolution on 7" in 2008). Reading today’s announcement of the upcoming RIM ‘PlayBook’ device (7" LCD, too small for its aspirations in my opinion. See the, I expect ill-fated, Dell Streak.). Not that 7"/8" is a bad format for many purposes (Note to Amazon with the 6” Kindle. And Sony. Try measuring a paperback book!).

Incidentally whatever the flaws in the first generation iPad, 9.7" is not dramatically smaller than the optimum size for purpose so kudos to Apple for bucking the usual pattern (although I personally think 11-12" touchscreen hits the right compromise between portability and function).

So if you find yourself peering at the internet through a 3.5" supposedly state of the art smartphone remember that for users to suffer for a while is a tradition, you are paying the price of being a part of history in the making, and things will soon get better (better for smartphones I suspect means about a 4.2-4.5" with narrow bezel compromise in current tech).

PS. Inches not metric; another tradition.

Thursday, 23 September 2010

Document embedding and OLE

There are a number of technical issues concerned with what I'm attempting to accomplish with the InScribe software. The biggest thorn in my side is the issue of embedding, visualising, and editing embedded data in compound documents. This is therefore something of a background note on the topic.

The notion of embedding or linking an ‘object’ in a document is commonplace. Web pages incorporate pictures, videos and specialized objects such as Flash or Silverlight interactive components. Word processing documents likewise contain objects, sometimes interactive objects, alongside the text. The idea of ‘compound document’ goes back to the 1980s.

The current situation with embedded objects is chaotic. Examples. Microsoft Office and Open Office, the two most popular office suites, have different schemes for add-ins. Even limiting attention to one vendor, Microsoft Office has added features in the 2003, 2007 then 2010 editions: all well and good but makes it difficult to support a diverse user base (Office 2003 is still widely used). Firefox, Chrome, Internet Explorer each has its own plugin approach and the story gets more complicated when considering non-Windows Internet browsers.

We are sorely missing standard, flexible, open approaches to embedding. On the web side of the coin, HTML5 is a move in the right direction though in itself no panacea despite what some less technically minded commentators may say on the subject (a topic for a future blog entry!).

A concrete example. There is no bottom line hardware-related reason nowadays that a simple photo editor plug-in software component could not operate on devices as diverse as eBook readers, smartphones, tablet computers, as well as notebook and desktop computers. Such a plug-in could enrich many applications, not only web browsing but word processors and camera management tools. In our Babel like world of 2010, to accomplish such a thing in software would involve writing versions for iOS, OS X, Android, Windows XP, Windows 7, Symbian, Kindle, Blackberry, Gnome, Wii... the list goes on. The developer then needs to tackle application-specific rules for plug-ins (if such exist). A relatively simple piece of software is almost impossible to deploy widely.

I am not advocating one ring to rule them all, simply highlighting the severe lack of standard ways for applications and components to interoperate and deploy.

In fact, for Microsoft Windows applications there has been one solution for almost 20 years. OLE (Object Linking and embedding) was introduced to Windows in 1990 and expanded substantially to version 2 in 1993. Parts of the OLE system were renamed ActiveX controls in 1996. I’ve exploited OLE with the InScribe software to enable in-place editing of Ancient Egyptian in Word Processing and other applications.

OLE has never been an ideal technology solution. Parts are overcomplicated and error prone during development. OLE design is too specific to classic Windows architecture. There was originally insufficient attention given to security issues although this is now largely addressed. Nevertheless, for Windows applications, OLE provided some solutions to the big problem of making applications and components work together by defining standard rules for interoperability of functions like embedding and compound documents.

Unfortunately, OLE development by Microsoft pretty much stopped well over a decade ago, more a victim of fashion rather than any logical reason I suspect. One side effect is that something as fundamental as how copy and paste works between Windows applications is still stuck in a 1990s time-warp.

As the internet grew and new technologies such as Java and .Net became available, inasmuch as there was any attempt to address application interoperability, solutions tended to be product specific. Microsoft themselves targetted ‘the enterprise’ with less focus on the general personal computer user. Rather than an OLE philosophy where third parties can expand the capabilities of Windows and its applications in a general way, Microsoft Office became focussed on enterprise oriented Office specific add-ons. Linux and other alternatives failed to rise to the challenge of developing a more open and flexible approach.

OLE continues to be supported in Windows and Office, but pretty much maintenance only. OpenOffice and other products continue likewise. I have not looked at the latest Adobe Creative Suite (CS5) but recall an earlier move around CS3 to remove some OLE functionality. Microsoft Office Word 2010 runs OLE embedding in a compatibility mode. I don’t expect OLE to disappear in the next few years but it is certainly becoming less usable.

If anyone reading this understands why the issue of application interoperability and OLE type functionality is missing from .Net and WPF I'd be delighted to find out.

This slow decline of OLE and the lack of practical modern alternatives proved a stumbling block in my development of a new version of the InScribe for Windows software. I expect other developers are in a similar situation of having to make some undesirable compromises in order to get a product released. On a positive note, the problem has stimulated a number of interesting ideas for future directions and I hope to touch on these here during the next few months.

Meanwhile a call for anyone working on document embedding to please learn from the past. Lets try to ensure whatever is latest and greatest at least accomplishes what OLE did 20 years ago (and still almost does).