Silverlight made it onto the BBC News on Tuesday – Coders decry Silverlight change. Take an unfortunate choice of words by a senior executive or two; add the reactive and ill-informed commentators on some web message boards; then mix in some natural concerns from developers. Bang! Tempests in teacups, the media love them.
Personally speaking, I find it reassuring to observe that the amateur tradition is alive and well in Microsoft and at least one major multinational company is not self-wrapped in a cloak of PR and spin-doctoring. That being said, the last few minutes of Doctor Who The Christmas Invasion ought to be made compulsory viewing for all senior executives.
As a developer I'm happy so long as .Net is treated as a strategic family of products. Thanks to Novell it may become so on Unix/Linux too (even if the Linux ‘community’ is slow to recognize what the third wave of Unix is really about). Hey theres another tabloid headline: C/C++ is dead!
I hope I'm not alone in being pleased to learn Silverlight 5 is not being rushed out. Especially if it means some of the niggles are resolved and the SL/WP7/WPF portability model improved. And Unicode 6.0 of course! A Mix 2011 Beta with Summer release please.
Two real news stories for developers:
An interesting talk at PDC 2010 for C# developers: ‘The Future of C# and Visual Basic’ by Anders Hejlsberg – don’t be put off like I almost was by the Visual Basic tag, it is hardly mentioned so we are not subjected yet again to the irony implicit in the keyword Dim. The main theme is simplification of Asynch programming with the new await keyword for the next .Net revision. Along with parallel constructs, this pattern brings very useful ways of exploiting multi-core processors to .Net in a clean software design. The talk is summarised here.
Developers in the .Net/WPF/Silverlight space should also check out PDC 2010: 3-Screen Coding: Sharing code between Windows Phone, Silverlight, and .NET by Shawn Burke. I alluded to the value of portable code last month in Of Characters and Strings although I didn’t highlight the .Net 4 changes that enable sharing of binary assemblies (a topic in its own right). The new tooling for Visual Studio to assist in creating Portable Assemblies, as previewed by Shawn, should be very helpful in managing the shared assembly model. It should also help focus Microsoft development on removing some of the irritating incompatibilities between Silverlight and WPF.
I just can’t wait for await.
Bob Richmond
Showing posts with label Silverlight. Show all posts
Showing posts with label Silverlight. Show all posts
Thursday, 4 November 2010
Friday, 22 October 2010
Of Characters and Strings (in .Net, C#, Silverlight …): Part 2
"Your manuscript I’ve read my friend
And like the half you’ve pilfered best;
Be sure the piece you yet may mend –
Take courage man and steal the rest."
Anon (c. 1805)
As mentioned in Of Characters and Strings (in .Net, C#, Silverlight …): Part 1 many of the text processing features of .Net are such that code can be re-used over Silverlight, WPF, XNA, Mono, Moonlight etc. In this note I want to draw attention to a few key points and differences over platforms. It is far from a comprehensive analysis and I’ve skipped some important topics including collation, the variety of ways of displaying text ,and input methods.
A quick aside: I failed to highlight in Part 1 how 16-bit confusion in software is not limited to .Net, similar issues arise in Java (although Java has some helpful UTF-32 methods that .Net lacks) and the situation is far worse in the world of C/C++.
I’ll restrict attention to Silverlight 4, Silverlight 3 for Windows Phone 7 OS (WPOS7.0), and .Net 4. In case the significance of the opening quote passes the reader by, it is to be hoped that Silverlight 5 and any new branches on the .Net tree feature increased compatibility, deprecating certain .Net 4 elements where appropriate.
.Net Text in general
Many Silverlight (and WPF) programmers rely mainly on controls and services written by others to deal with text so are unfamiliar with the lower level implications – in this case the main thing is to make sure you test input, processing, and display components with text that contains non-BMP characters, combining character sequences, and a variety of scripts and fonts. You may also like to check out different culture settings. Relying on typing querty123 on a UK/US keyboard and using the default font is a sure-fire recipe for bugs in all but the simplest scenarios.
.Net text is oriented around UTF-16 so you will be using Unicode all or almost all the time. The dev. tools, e.g. Visual Studio 2010 and Expression Blend 4, work pretty well with Unicode and cope with the basics of non-BMP characters. I find the main nuisance with the VS 2010 code editor is that only one font is used at a given time, I can’t currently combine the attractive Consolas font with fall-back to other fonts when a character is undefined in Consolas.
For portability over .Net implementations, the best bet at present is to write the first cut in Silverlight 3 since for this topic SL3 is not far off being a subset of .Net 4. A good design will separate the text functionality support from the platform-specific elements, e.g. it’s not usually a good idea to mix text processing implementation in with the VM in a MVVM design or in the guts of a custom control. I often factor out my text processing into an assembly containing only portable code.
Text display on the various platforms is a large topic in its own right with a number of issues for portability once you go beyond using standard controls in SL, WPF etc. I’ll not go into this here. Likewise with text input methods.
Versions of Unicode
It would be really useful if the various flavours and versions of .Net were assertive on text functionality, e.g. Supports Unicode 6.0 (the latest version) or whatever. By support, I mean the relevant Unicode data tables are used (e.g. scripts, character info) not to imply that all or even most functionality is available for all related languages, scripts and cultures. Currently, the reality is much more fuzzy and there are issues derived from factors such as ISO/Unicode scripts missing in OpenType. Methods like Char.IsLetter can be useful in constructing a 'version detector' to figure out what is actually available at runtime.
Char structure
The .Net Char structure is a UTF-16 code unit, not a Unicode character. Current MSDN documentation still needs further updates by Microsoft to avoid this confusion (as noted in Part 1). Meanwhile the distinction should be kept in mind by developers using Char. For instance although the method Char.IsLetter(Char) returns true if the code unit is question is a letter-type character in the BMP, IsLetter(Char) is not in general a function you would use when looking for letters in a UTF-16 array.
It is therefore often a good idea to use strings or ‘string literal’ to represent characters, in preference to Char and ‘character literal’. Inexperienced programmers or those coming from a C/C++ background may find this odd to begin with, being familiar with patterns like
for (i=0; i < chars.Length; i++) { if (chars[i]=='A' { etc. }};
Fortunately, Char provides methods to work correctly with characters, for instance Char.InLetter(String, Int32). Char.IsLetter("A",0) returns true and the function works equally well for Char.IsLetter("\u10900",0) so long as your platform supports Unicode 5.1 (the version of Unicode that introduced U+10900 𐤀 PHOENICIAN LETTER ALF).
Char is largely portable apart from a few quirks. I find it especially puzzling that Char.ConvertFromUtf32 and Char.ConvertToUtf32 are missing from SL 3/4. Faced with this I wrote my own conversions and use these even on .Net 4.0 where the methods are available in Char, this way keeping my code portable and efficient.
I also wrote a CharU32 structure for use in UTF-32 processing (along with a StringU32) rather than extend Char to contain methods like IsLetter(int) for UTF-32 (as is done in Java). Makes for cleaner algorithms and something Microsoft may want to consider for .Net futures.
The CharUnicodeInfo class in SL 3/4 is a subset, missing GetDecimalDigit, a rather specialist helper for working with decimal numbers in non-Latin scripts and in most cases one would use GetNumericValue which is portable.
CharUnicodeInfo.GetUnicodeCategory and Char.GetUnicodeCategory have subtle differences.
String class
The .Net String is a sequential collection of Char objects whose value is immutable (i.e. its value is never modified but can be replaced. Informally, String encapsulates a read-only UTF-16 array. As with Char, the MSDN documentation tends to confuse Unicode character with UTF-16 code unit.
Comments above on use of string and string literal for Char also apply here. For instance the method IndexOf(Char) can only be used to find a Char, i.e. code unit. IndexOf(String) must be used to find an arbitrary Unicode character. If you try entering '\u10900' in C# in Visual Studio 2010, you will be warned “Too many characters in character literal” a reminder .NET character literals are not quite characters and "\u10900" is needed.
Much of String is portable. I’ll just mention a few differences:
Net 4.0 has extra String methods like StartsWith(String, Boolean, CultureInfo), a method which gives more flexibility in working with multiple cultures. SL 3/4 is oriented around current culture and invariant culture so not well suited to multilingual applications.
The whole model of culture and locale in software is way too complex to go into here, I’ll just say that the old fallacies like physical presence in Iceland means a user can speak Icelandic and never spends US dollars or that individuals use only one language have no place in the 21st Century.
SL 3/4 is missing the Normalize method, a very handy function when working with Unicode.
SL 3/4 has fewer constructors available to the programmer than Net 4 but some of those missing are error prone and wouldn’t normally be used in .Net 4.
StringBuilder class
Useful, gives measureable performance improvements constructing a lot of strings.
Very portable, just a few differences, I’ve not encountered these in real life code yet. Rather odd that Append(Decimal) is missing from SL 3/4 though.
StringInfo class
StringInfo is designed to work with actual Unicode characters so is very useful. I get the impression it is less well known among programmers than it ought to be.
Largely portable. However SL 3/4 is missing the SubstringByTextElements methods and LengthInTextElements both of which are handy for processing Unicode correctly in readable code. I’ve found myself simulating something similar to gain portability, although that being said, the portable enumeration methods make for a more common pattern.
Encoding classes
Encoding classes are often used while reading or writing text streams or converting from one text format to another, usually with a class derived from Encoding: SL 3/4 only provides the UnicodeEncoding (UTF-16) and UTF8Encoding classes, in contrast to .Net 4 which also has ASCIIEncoding, UTF7Encoding, and UTF32Encoding.
ASCII and UTF-7 are understandable, omission of UTF32Encoding is hard to see as anything other than an oversight given that UTF-32 is a useful weapon in the armoury of Unicode programming (although hardly ever a good idea for file streams). One of the first things I did in the early days of Silverlight 2 beta was to write conversion code. I hope Microsoft will add this to Silverlight 5, although for portability reasons it may be years before we can settle down with it.
The base Encoder class in .Net 4 has quite a number of extra methods and properties for Mail and Browser purposes as well as codepage handling. I tend to agree with the decision in SL to drop some of this complexity and discourage use of legacy stream formats even if that means occasionally custom code is required to convert ‘Windows ANSI’, ‘Mac OS Roman’ and the ISO/IEC 8859 series of 8 bit encodings.
The main features of Utf8Encoding such as BOM handling and exceptions are portable. Unicode normalization features are missing from SL 3/4 again, as with the String class.
Not the most portable part of .Net.
And like the half you’ve pilfered best;
Be sure the piece you yet may mend –
Take courage man and steal the rest."
Anon (c. 1805)
As mentioned in Of Characters and Strings (in .Net, C#, Silverlight …): Part 1 many of the text processing features of .Net are such that code can be re-used over Silverlight, WPF, XNA, Mono, Moonlight etc. In this note I want to draw attention to a few key points and differences over platforms. It is far from a comprehensive analysis and I’ve skipped some important topics including collation, the variety of ways of displaying text ,and input methods.
A quick aside: I failed to highlight in Part 1 how 16-bit confusion in software is not limited to .Net, similar issues arise in Java (although Java has some helpful UTF-32 methods that .Net lacks) and the situation is far worse in the world of C/C++.
I’ll restrict attention to Silverlight 4, Silverlight 3 for Windows Phone 7 OS (WPOS7.0), and .Net 4. In case the significance of the opening quote passes the reader by, it is to be hoped that Silverlight 5 and any new branches on the .Net tree feature increased compatibility, deprecating certain .Net 4 elements where appropriate.
.Net Text in general
Many Silverlight (and WPF) programmers rely mainly on controls and services written by others to deal with text so are unfamiliar with the lower level implications – in this case the main thing is to make sure you test input, processing, and display components with text that contains non-BMP characters, combining character sequences, and a variety of scripts and fonts. You may also like to check out different culture settings. Relying on typing querty123 on a UK/US keyboard and using the default font is a sure-fire recipe for bugs in all but the simplest scenarios.
.Net text is oriented around UTF-16 so you will be using Unicode all or almost all the time. The dev. tools, e.g. Visual Studio 2010 and Expression Blend 4, work pretty well with Unicode and cope with the basics of non-BMP characters. I find the main nuisance with the VS 2010 code editor is that only one font is used at a given time, I can’t currently combine the attractive Consolas font with fall-back to other fonts when a character is undefined in Consolas.
For portability over .Net implementations, the best bet at present is to write the first cut in Silverlight 3 since for this topic SL3 is not far off being a subset of .Net 4. A good design will separate the text functionality support from the platform-specific elements, e.g. it’s not usually a good idea to mix text processing implementation in with the VM in a MVVM design or in the guts of a custom control. I often factor out my text processing into an assembly containing only portable code.
Text display on the various platforms is a large topic in its own right with a number of issues for portability once you go beyond using standard controls in SL, WPF etc. I’ll not go into this here. Likewise with text input methods.
Versions of Unicode
It would be really useful if the various flavours and versions of .Net were assertive on text functionality, e.g. Supports Unicode 6.0 (the latest version) or whatever. By support, I mean the relevant Unicode data tables are used (e.g. scripts, character info) not to imply that all or even most functionality is available for all related languages, scripts and cultures. Currently, the reality is much more fuzzy and there are issues derived from factors such as ISO/Unicode scripts missing in OpenType. Methods like Char.IsLetter can be useful in constructing a 'version detector' to figure out what is actually available at runtime.
Char structure
The .Net Char structure is a UTF-16 code unit, not a Unicode character. Current MSDN documentation still needs further updates by Microsoft to avoid this confusion (as noted in Part 1). Meanwhile the distinction should be kept in mind by developers using Char. For instance although the method Char.IsLetter(Char) returns true if the code unit is question is a letter-type character in the BMP, IsLetter(Char) is not in general a function you would use when looking for letters in a UTF-16 array.
It is therefore often a good idea to use strings or ‘string literal’ to represent characters, in preference to Char and ‘character literal’. Inexperienced programmers or those coming from a C/C++ background may find this odd to begin with, being familiar with patterns like
for (i=0; i < chars.Length; i++) { if (chars[i]=='A' { etc. }};
Fortunately, Char provides methods to work correctly with characters, for instance Char.InLetter(String, Int32). Char.IsLetter("A",0) returns true and the function works equally well for Char.IsLetter("\u10900",0) so long as your platform supports Unicode 5.1 (the version of Unicode that introduced U+10900 𐤀 PHOENICIAN LETTER ALF).
Char is largely portable apart from a few quirks. I find it especially puzzling that Char.ConvertFromUtf32 and Char.ConvertToUtf32 are missing from SL 3/4. Faced with this I wrote my own conversions and use these even on .Net 4.0 where the methods are available in Char, this way keeping my code portable and efficient.
I also wrote a CharU32 structure for use in UTF-32 processing (along with a StringU32) rather than extend Char to contain methods like IsLetter(int) for UTF-32 (as is done in Java). Makes for cleaner algorithms and something Microsoft may want to consider for .Net futures.
The CharUnicodeInfo class in SL 3/4 is a subset, missing GetDecimalDigit, a rather specialist helper for working with decimal numbers in non-Latin scripts and in most cases one would use GetNumericValue which is portable.
CharUnicodeInfo.GetUnicodeCategory and Char.GetUnicodeCategory have subtle differences.
String class
The .Net String is a sequential collection of Char objects whose value is immutable (i.e. its value is never modified but can be replaced. Informally, String encapsulates a read-only UTF-16 array. As with Char, the MSDN documentation tends to confuse Unicode character with UTF-16 code unit.
Comments above on use of string and string literal for Char also apply here. For instance the method IndexOf(Char) can only be used to find a Char, i.e. code unit. IndexOf(String) must be used to find an arbitrary Unicode character. If you try entering '\u10900' in C# in Visual Studio 2010, you will be warned “Too many characters in character literal” a reminder .NET character literals are not quite characters and "\u10900" is needed.
Much of String is portable. I’ll just mention a few differences:
Net 4.0 has extra String methods like StartsWith(String, Boolean, CultureInfo), a method which gives more flexibility in working with multiple cultures. SL 3/4 is oriented around current culture and invariant culture so not well suited to multilingual applications.
The whole model of culture and locale in software is way too complex to go into here, I’ll just say that the old fallacies like physical presence in Iceland means a user can speak Icelandic and never spends US dollars or that individuals use only one language have no place in the 21st Century.
SL 3/4 is missing the Normalize method, a very handy function when working with Unicode.
SL 3/4 has fewer constructors available to the programmer than Net 4 but some of those missing are error prone and wouldn’t normally be used in .Net 4.
StringBuilder class
Useful, gives measureable performance improvements constructing a lot of strings.
Very portable, just a few differences, I’ve not encountered these in real life code yet. Rather odd that Append(Decimal) is missing from SL 3/4 though.
StringInfo class
StringInfo is designed to work with actual Unicode characters so is very useful. I get the impression it is less well known among programmers than it ought to be.
Largely portable. However SL 3/4 is missing the SubstringByTextElements methods and LengthInTextElements both of which are handy for processing Unicode correctly in readable code. I’ve found myself simulating something similar to gain portability, although that being said, the portable enumeration methods make for a more common pattern.
Encoding classes
Encoding classes are often used while reading or writing text streams or converting from one text format to another, usually with a class derived from Encoding: SL 3/4 only provides the UnicodeEncoding (UTF-16) and UTF8Encoding classes, in contrast to .Net 4 which also has ASCIIEncoding, UTF7Encoding, and UTF32Encoding.
ASCII and UTF-7 are understandable, omission of UTF32Encoding is hard to see as anything other than an oversight given that UTF-32 is a useful weapon in the armoury of Unicode programming (although hardly ever a good idea for file streams). One of the first things I did in the early days of Silverlight 2 beta was to write conversion code. I hope Microsoft will add this to Silverlight 5, although for portability reasons it may be years before we can settle down with it.
The base Encoder class in .Net 4 has quite a number of extra methods and properties for Mail and Browser purposes as well as codepage handling. I tend to agree with the decision in SL to drop some of this complexity and discourage use of legacy stream formats even if that means occasionally custom code is required to convert ‘Windows ANSI’, ‘Mac OS Roman’ and the ISO/IEC 8859 series of 8 bit encodings.
The main features of Utf8Encoding such as BOM handling and exceptions are portable. Unicode normalization features are missing from SL 3/4 again, as with the String class.
Not the most portable part of .Net.
Tuesday, 19 October 2010
Of Characters and Strings (in .Net, C#, Silverlight …): Part 1
“The time has come,” the Walrus said,
“To talk of many things:
Of shoes—and ships—and sealing-wax—
Of characters—and strings—
And why the sea# is boiling hot—
And whether pigs have wings.”
(With apologies to Lewis Carroll, the Walrus, and the Carpenter).
During discussion of my comments ISO/Unicode scripts missing in OpenType on the Unicode mailing list, the point came up about desirability of greater understanding of Unicode among programmers and others involved with software development. For a start, there is one popular myth to dispel, the subject of this post which I hope to be the first of several notes on Unicode in .Net.
Myth debunk: a Unicode character is neither a cabbage nor a 16 bit code.
The origin of 16-bit confusion lies in the history of Unicode. Twenty years ago there were two initiatives underway to replace the already out-dated and problematic variety of 7/8-bit character encodings used to represent characters in modern scripts. A true Babel of ‘standard’ encodings back then made it impractical to write software to work with the worlds writing systems without a tremendous level of complexity. Unicode was originally conceived as a 16 bit coding to replace this mess. Meanwhile, the International Organization for Standardization (ISO) was working on ISO 10646 the ‘Universal Character Set’ UCS with space for many more characters than a 16-bit encoding has room for. The original ISO proposals for encoding were widely regarded as over complex so the ISO/Unicode approaches were merged by the time Unicode 2.0 was released in 1996. ISO 10646 now defines the Universal Character Set for Unicode. With unification, the notion of 16-bit characters became obsolete although a 16-bit encoding method remains (UTF-16) along with the popular 8-bit coding (UTF-8) and a 32-bit coding (UTF-32). Each encoding has its virtues. UTF stands for Unicode Transformation Format.
To understand what constitutes the Unicode notion of ‘character’, refer to http://www.unicode.org/versions/Unicode6.0.0/ (or the earlier version while the text of 6.0 is being completed). I will try to summarize briefly.
1. An abstract character is a unit of information for representation, control or organization of textual data. A Unicode abstract character is an abstract character encoded by the Unicode standard. Abstract characters not directly encoded in Unicode may well be capable of being represented by a Unicode combining character sequence. Each Unicode abstract character is assigned a unique name. Some combining sequences are also given names in Unicode, asserting their function as abstract characters.
2. A Unicode encoded character can be informally thought of as an abstract character along with its assigned Unicode code point (an integer in the range 0 to 10FFFF hexadecimal, the Unicode codespace). As noted above it is also assigned a unique name.
3. A Unicode character or simply character is normally used as shorthand for the term Unicode encoded character.
Here are two useful ways of describing Unicode characters:
U+006D LATIN SMALL LETTER M
U+13000 EGYPTIAN HIEROGLYPH A001
U+1F61C FACE WITH STUCK-OUT TONGUE AND WINKING EYE
And similar with the actual character displayed
U+006D – m – LATIN SMALL LETTER M
U+13000 – 𓀀 – EGYPTIAN HIEROGLYPH A001
U+1F61C – 😜 – FACE WITH STUCK-OUT TONGUE AND WINKING EYE
The first form is often preferable in scenarios where font support might not be present to display the actual character although on this blog I prefer to use the characters to encourage font diversity.
Note the conventional use of hexadecimal to state the value of the Unicode code point. This convention is different to that used in HTML where characters as numeric entities are written using decimal numbers rather than hexadecimal, e.g. 𓀀 (13000 hexadecimal equals 77824 decimal).
From a programming perspective, the simplest way of representing Unicode is UTF-32 where each code point fits comfortably into a 32 bit data structure, e.g. in C# a uint or int (C/C++ programmers note C# defines as 32 bit, the size does not vary with CPU register size). Not entirely trivial because there may still be combining sequences. However UTF-32 is not used all that much in practice, not least because of memory cost.
Nowadays, most files containing Unicode text use UTF-8 encoding. UTF-8 uses 1 byte (octet) to encode the traditional 127 ASCII characters and up to 4 bytes to encode other characters. XML and HTML files are popular file formats that use Unicode (Mandatory in XML, optional in HTML where a surprising amount of the web, possibly 50%, still uses legacy encodings). I strongly recommend UTF-8 for text files rather than UTF-16 or legacy 8-bit encodings aka code pages etc. Having worked on several multilingual content-intensive projects, this is the golden rule, although I won’t expand further today on the whys and wherefores. [However I ought to mention the catch that is the ‘Byte order mark’, a byte sequence (0xEF, 0xBB, 0xBF) sometimes used at the start of a UTF-8 stream to assert UTF-8 not legacy text; this can confuse the novice particularly with ‘.txt’ files which can be Unicode or legacy. Windows Notepad uses BOM for Unicode text files. Visual Studio 2010 also uses BOM to prefix data in many file types including XML, XAML and C# code.]
UTF-16 is very popular with software writers working in C/C++ and .Net languages such as C#. A version of UTF-16 was the standard data format for Unicode 1.0. Unicode characters with character codes less than 0x10000 are said to belong to the Unicode BMP (Basic Multilingual Plane) and these are represented by one 16 bit number in UTF-16, other characters require two 16 bit numbers i.e. two UTF-16 codes from a range that do not encode characters, the so called surrogate code points dedicated to this purpose. As of Unicode 6.0, fewer than 50% of characters belong to the BMP but BMP characters account for a huge proportion of text in practice. This is by design; all popular modern languages have most script/writing system requirements addressed by the BMP and there are even specialist scripts such as Coptic defined here. Processing UTF-16 is often more efficient than UTF-8 and in most cases uses half the memory of UTF-32, all in all a good practical compromise solution.
Which brings me back to the 16-bit myth. The fact that so many popular characters belong to the BMP and only require one code unit in UTF-16 means it is easy to be mistaken into thinking most means all. The problem doesn’t even arise with UTF-8 and UTF-32 but the fact is much software uses UTF-16, indeed UTF-16 is essentially the native text encoding for Windows and .Net.
Example sources of 16-bit confusion:
The article on character sets at http://www.microsoft.com/typography/unicode/cs.htm is brazen:

This article is dated to 1997 but was probably written much earlier. Windows NT 3.1 (1993) was notable as the first computer operating system to use Unicode as its native text encoding and Microsoft deserves credit for this, alongside Apple who also did much to help early uptake of Unicode (but would not have a new operating system until OSX was released in 2001). I’m quoting this as an example of the fact that there are many old documents on the Web, confusing even when from reputable sources. I should mention, in contrast, much of MSDN (and indeed much of the relevant information on Wikipedia) is pretty up to date and reliable although not perfect on this subject.
The definition of the .Net Char structure on MSDN, http://msdn.microsoft.com/en-us/library/system.char.aspx, is much more recent.

Er, no. Char is not a Unicode character. It is a 16 bit Unicode code unit in UTF-16. Actually, this is explained later on in the Char documentation but the headline message is confusing and encourages programmers to use Char inappropriately.
The reasons I chose the Microsoft examples rather than the myriad of other confusing statements on the web are twofold. Firstly I'm focussing on .Net, C# etc. here. Secondly, Microsoft are generally ahead of the game with Unicode compared with other development systems which makes errors stand out more.
Fact is .Net actually works very well for software development with Unicode. The basic classes such as 'String' are Unicode (String is UTF-16) and it is almost true to say it is harder to write legacy than modern.
I had hoped to get a little further on the actual technicalities of working with Unicode characters and avoiding 16-bit pitfalls but time has proved the enemy. Another day.
Just three useful (I hope) points on .Net to conclude.
1. Code that works with String and Char should avoid BMP-thinking, e.g. if you want to parse a String, either avoid tests like IsLetter(Char) or wrap their usage in logic that also handles surrogates.
2. String, Char and the useful StringInfo class belong to the System namespaces and are pretty portable over the gamut of .Net contexts including Silverlight, WPF, XNA as well as the Novell parallel universe with Mono, MonoTouch, Moonlight etc. With a little care it can be straightforward to write text processing code that works across the board to target Windows, Mac, Linux, WP7 and whatever comes next.
3. Always test text-related code with strings that include non-BMP characters, and preferably also with data that includes combining sequences and usage instances of OpenType features such as ligatures.
“To talk of many things:
Of shoes—and ships—and sealing-wax—
Of characters—and strings—
And why the sea# is boiling hot—
And whether pigs have wings.”
(With apologies to Lewis Carroll, the Walrus, and the Carpenter).
During discussion of my comments ISO/Unicode scripts missing in OpenType on the Unicode mailing list, the point came up about desirability of greater understanding of Unicode among programmers and others involved with software development. For a start, there is one popular myth to dispel, the subject of this post which I hope to be the first of several notes on Unicode in .Net.
Myth debunk: a Unicode character is neither a cabbage nor a 16 bit code.
The origin of 16-bit confusion lies in the history of Unicode. Twenty years ago there were two initiatives underway to replace the already out-dated and problematic variety of 7/8-bit character encodings used to represent characters in modern scripts. A true Babel of ‘standard’ encodings back then made it impractical to write software to work with the worlds writing systems without a tremendous level of complexity. Unicode was originally conceived as a 16 bit coding to replace this mess. Meanwhile, the International Organization for Standardization (ISO) was working on ISO 10646 the ‘Universal Character Set’ UCS with space for many more characters than a 16-bit encoding has room for. The original ISO proposals for encoding were widely regarded as over complex so the ISO/Unicode approaches were merged by the time Unicode 2.0 was released in 1996. ISO 10646 now defines the Universal Character Set for Unicode. With unification, the notion of 16-bit characters became obsolete although a 16-bit encoding method remains (UTF-16) along with the popular 8-bit coding (UTF-8) and a 32-bit coding (UTF-32). Each encoding has its virtues. UTF stands for Unicode Transformation Format.
To understand what constitutes the Unicode notion of ‘character’, refer to http://www.unicode.org/versions/Unicode6.0.0/ (or the earlier version while the text of 6.0 is being completed). I will try to summarize briefly.
1. An abstract character is a unit of information for representation, control or organization of textual data. A Unicode abstract character is an abstract character encoded by the Unicode standard. Abstract characters not directly encoded in Unicode may well be capable of being represented by a Unicode combining character sequence. Each Unicode abstract character is assigned a unique name. Some combining sequences are also given names in Unicode, asserting their function as abstract characters.
2. A Unicode encoded character can be informally thought of as an abstract character along with its assigned Unicode code point (an integer in the range 0 to 10FFFF hexadecimal, the Unicode codespace). As noted above it is also assigned a unique name.
3. A Unicode character or simply character is normally used as shorthand for the term Unicode encoded character.
Here are two useful ways of describing Unicode characters:
U+006D LATIN SMALL LETTER M
U+13000 EGYPTIAN HIEROGLYPH A001
U+1F61C FACE WITH STUCK-OUT TONGUE AND WINKING EYE
And similar with the actual character displayed
U+006D – m – LATIN SMALL LETTER M
U+13000 – 𓀀 – EGYPTIAN HIEROGLYPH A001
U+1F61C – 😜 – FACE WITH STUCK-OUT TONGUE AND WINKING EYE
The first form is often preferable in scenarios where font support might not be present to display the actual character although on this blog I prefer to use the characters to encourage font diversity.
Note the conventional use of hexadecimal to state the value of the Unicode code point. This convention is different to that used in HTML where characters as numeric entities are written using decimal numbers rather than hexadecimal, e.g. 𓀀 (13000 hexadecimal equals 77824 decimal).
From a programming perspective, the simplest way of representing Unicode is UTF-32 where each code point fits comfortably into a 32 bit data structure, e.g. in C# a uint or int (C/C++ programmers note C# defines as 32 bit, the size does not vary with CPU register size). Not entirely trivial because there may still be combining sequences. However UTF-32 is not used all that much in practice, not least because of memory cost.
Nowadays, most files containing Unicode text use UTF-8 encoding. UTF-8 uses 1 byte (octet) to encode the traditional 127 ASCII characters and up to 4 bytes to encode other characters. XML and HTML files are popular file formats that use Unicode (Mandatory in XML, optional in HTML where a surprising amount of the web, possibly 50%, still uses legacy encodings). I strongly recommend UTF-8 for text files rather than UTF-16 or legacy 8-bit encodings aka code pages etc. Having worked on several multilingual content-intensive projects, this is the golden rule, although I won’t expand further today on the whys and wherefores. [However I ought to mention the catch that is the ‘Byte order mark’, a byte sequence (0xEF, 0xBB, 0xBF) sometimes used at the start of a UTF-8 stream to assert UTF-8 not legacy text; this can confuse the novice particularly with ‘.txt’ files which can be Unicode or legacy. Windows Notepad uses BOM for Unicode text files. Visual Studio 2010 also uses BOM to prefix data in many file types including XML, XAML and C# code.]
UTF-16 is very popular with software writers working in C/C++ and .Net languages such as C#. A version of UTF-16 was the standard data format for Unicode 1.0. Unicode characters with character codes less than 0x10000 are said to belong to the Unicode BMP (Basic Multilingual Plane) and these are represented by one 16 bit number in UTF-16, other characters require two 16 bit numbers i.e. two UTF-16 codes from a range that do not encode characters, the so called surrogate code points dedicated to this purpose. As of Unicode 6.0, fewer than 50% of characters belong to the BMP but BMP characters account for a huge proportion of text in practice. This is by design; all popular modern languages have most script/writing system requirements addressed by the BMP and there are even specialist scripts such as Coptic defined here. Processing UTF-16 is often more efficient than UTF-8 and in most cases uses half the memory of UTF-32, all in all a good practical compromise solution.
Which brings me back to the 16-bit myth. The fact that so many popular characters belong to the BMP and only require one code unit in UTF-16 means it is easy to be mistaken into thinking most means all. The problem doesn’t even arise with UTF-8 and UTF-32 but the fact is much software uses UTF-16, indeed UTF-16 is essentially the native text encoding for Windows and .Net.
Example sources of 16-bit confusion:
The article on character sets at http://www.microsoft.com/typography/unicode/cs.htm is brazen:

This article is dated to 1997 but was probably written much earlier. Windows NT 3.1 (1993) was notable as the first computer operating system to use Unicode as its native text encoding and Microsoft deserves credit for this, alongside Apple who also did much to help early uptake of Unicode (but would not have a new operating system until OSX was released in 2001). I’m quoting this as an example of the fact that there are many old documents on the Web, confusing even when from reputable sources. I should mention, in contrast, much of MSDN (and indeed much of the relevant information on Wikipedia) is pretty up to date and reliable although not perfect on this subject.
The definition of the .Net Char structure on MSDN, http://msdn.microsoft.com/en-us/library/system.char.aspx, is much more recent.

Er, no. Char is not a Unicode character. It is a 16 bit Unicode code unit in UTF-16. Actually, this is explained later on in the Char documentation but the headline message is confusing and encourages programmers to use Char inappropriately.
The reasons I chose the Microsoft examples rather than the myriad of other confusing statements on the web are twofold. Firstly I'm focussing on .Net, C# etc. here. Secondly, Microsoft are generally ahead of the game with Unicode compared with other development systems which makes errors stand out more.
Fact is .Net actually works very well for software development with Unicode. The basic classes such as 'String' are Unicode (String is UTF-16) and it is almost true to say it is harder to write legacy than modern.
I had hoped to get a little further on the actual technicalities of working with Unicode characters and avoiding 16-bit pitfalls but time has proved the enemy. Another day.
Just three useful (I hope) points on .Net to conclude.
1. Code that works with String and Char should avoid BMP-thinking, e.g. if you want to parse a String, either avoid tests like IsLetter(Char) or wrap their usage in logic that also handles surrogates.
2. String, Char and the useful StringInfo class belong to the System namespaces and are pretty portable over the gamut of .Net contexts including Silverlight, WPF, XNA as well as the Novell parallel universe with Mono, MonoTouch, Moonlight etc. With a little care it can be straightforward to write text processing code that works across the board to target Windows, Mac, Linux, WP7 and whatever comes next.
3. Always test text-related code with strings that include non-BMP characters, and preferably also with data that includes combining sequences and usage instances of OpenType features such as ligatures.
Saturday, 7 August 2010
Missing in Silverlight 4: a functional GlyphTypeface class
Warning. Obscurity level: HIGH.
This note is primarily aimed at the Silverlight development team in Microsoft Redmond. Other Silverlight developers may also want to understand a limitation of Silverlight 4.
Background
Applications that require superscripts, subscripts, and other rich text functionality need control of character/glyph positional placement. Advanced typography is also useful in applications such as e-book readers where it is often desirable to accurately represent the look and feel of the book. Specialist applications that do mathematical typography (and my ancient Egyptian work) need this kind of precision. From a developer perspective, it is the GlyphTypeface class in the .Net/WPF System.Windows.Media namespace that provides much of required functionality for WPF applications.
The problem
The Silverlight 4 documentation available from Microsoft (see http://msdn.microsoft.com/en-us/library/system.windows.media.glyphtypeface(VS.95).aspx) states
Er no! The WPF 4 version of GlyphTypeface indeed does this. However Silverlight 4 only supports reading the name of the font and its version number. All the useful functionality is missing. The documentation quoted only applies to WPF.
It is therefore impossible in general to implement advanced typography in Silverlight. A big hole - typography has been possible since Windows 3.1, the first release (1992) to incorporate scalable (TrueType) fonts. [Note: sure there are clumsy workarounds in very special circumstances but I won’t go into those today].
The solution
Expand the GlyphTypeface in Silverlight 5 to provide all missing functionality except where this conflicts for some reason with the Silverlight security model. In particular, discovery of the black box for a glyph is essential, as is CharacterToGlyphMap (without which the ‘Glyphs’ class has only limited use). A fairly small amount of straightforward work in the Silverlight runtime yields a big benefit to third party developers and should also help functional enhancement to controls such as RichTextBox.
Note: Windows Phone 7 is also lacking functionality here.
This note is primarily aimed at the Silverlight development team in Microsoft Redmond. Other Silverlight developers may also want to understand a limitation of Silverlight 4.
Background
Applications that require superscripts, subscripts, and other rich text functionality need control of character/glyph positional placement. Advanced typography is also useful in applications such as e-book readers where it is often desirable to accurately represent the look and feel of the book. Specialist applications that do mathematical typography (and my ancient Egyptian work) need this kind of precision. From a developer perspective, it is the GlyphTypeface class in the .Net/WPF System.Windows.Media namespace that provides much of required functionality for WPF applications.
The problem
The Silverlight 4 documentation available from Microsoft (see http://msdn.microsoft.com/en-us/library/system.windows.media.glyphtypeface(VS.95).aspx) states
| “The GlyphTypeface object is a low-level text object that corresponds to a single face of a font family as represented by an OpenType font file, or serialized as a block of memory in a document. Each glyph defines metrics that specify how it aligns with other glyphs. The correct GlyphTypeface to use for a run of characters in a given logical font is normally determined by the Silverlight font system. The GlyphTypeface object provides properties and methods for the following: · Obtaining font face common metrics, such as the ratio of ascent and descent to em size. · Obtaining metrics, outlines, and bitmaps for individual glyphs.” |
Er no! The WPF 4 version of GlyphTypeface indeed does this. However Silverlight 4 only supports reading the name of the font and its version number. All the useful functionality is missing. The documentation quoted only applies to WPF.
It is therefore impossible in general to implement advanced typography in Silverlight. A big hole - typography has been possible since Windows 3.1, the first release (1992) to incorporate scalable (TrueType) fonts. [Note: sure there are clumsy workarounds in very special circumstances but I won’t go into those today].
The solution
Expand the GlyphTypeface in Silverlight 5 to provide all missing functionality except where this conflicts for some reason with the Silverlight security model. In particular, discovery of the black box for a glyph is essential, as is CharacterToGlyphMap (without which the ‘Glyphs’ class has only limited use). A fairly small amount of straightforward work in the Silverlight runtime yields a big benefit to third party developers and should also help functional enhancement to controls such as RichTextBox.
Note: Windows Phone 7 is also lacking functionality here.
Monday, 10 May 2010
InScribeX Web Preview 3 released
I have just released Preview 3 of the InScribeX Web software on http://www.inscribex.com/. This version replaces Preview 2 for Windows and Mac users and works with Silverlight version 3 or 4. Linux users will probably want to stick with Preview 2 which runs with Moonlight 2 for the time being (see note below).
As illustrated, the user interface has been changed to require less screen space. This is very useful on low resolution displays, especially those found on netbooks. I have also chosen this two page view for the dictionaries so English-Egyptian and Egyptian to English can be viewed simultaneously (although it is probable that additional ways of working with the dictionaries will follow at some point).
Some features I had hoped to include in Preview 3 have been deferred in order that the software works with the current pre-release of Moonlight 3 (Moonlight is the equivalent to Silverlight for Linux systems). I hope to update Preview 3 over the summer to track Moonlight development and make a few additions and changes to functionality, the most interesting being to add some basic UMdC editing features and include some revised dictionary content.
Preview 3 is about 25% smaller than preview 2 so loads faster over the web.
Coming soon ... InScribe Web Preview 4
Preview 4 is being developed in parallel to Preview 3 and I've adopted a development approach to allow components to be shared between the two versions. This sounds rather complicated but makes sense from my development perspective as part of the strategy of making InScribeX cross-platform over a range of computers and other devices. For the majority of Windows and Mac users, all this means is you should use Preview 3 for the time being then switch to Preview 4 when it is available (best guess sometime this summer).
Preview 4 takes advantage of new features in Silverlight 4 to enable printing and rich text editing of Egyptian texts among other enhancements. Watch this space.
InScribe Web on Linux
Moonlight 2 was released in December 2009 as a Linux FireFox plugin (this can be downloaded for popular modern Linux distributions from www.go-mono.com/moonlight/download.aspx). Moonlight 2 enables InScribe Web Preview 2 operation on Linux systems.
Pre-release 'alpha quality' Moonlight 3 plugins for Firefox and Chrome browsers on Linux can be downloaded from go-mono.com/moonlight/prerelease.aspx. InScribe Web Preview 2 appears to work as with Moonlight 2. InScribe Web Preview 3 mostly appears to run okay on the most recent (April) plugin versions. However one unavoidable problem at the moment is the full dictionaries take an extremely long time to load. I've therefore limited the dictionaries to 100 entries under Linux for the time being until the Moonlight bug is fixed (a good reason to stick with InScribe Web Preview 2). I'm planning to track Moonlight 3 pre-release versions towards release, updating Preview 3 if necessary and feasible.
All being well, Moonlight 3 will be released by Novell by Autumn with full Silverlight 3 compatibility so I can retire InScribe Web Preview 2 leaving Preview 3 a fully cross platform solution for Windows/Mac/Linux.
Tuesday, 20 April 2010
Silverlight 4 Release, Moonlight 3 Preview, and InScribeX Web
Last Thursday Microsoft released Silverlight 4, a significant milestone for the InScribeX Web project since SL4 brings a useful set of new functionality for delivering the InScribeX Web approach to Ancient Egyptian. Silverlight 4 runs on Windows (XP, Vista, 7) and Mac OSX (Intel systems). No platform change since version 3 beyond Google Chrome web browser now officially supported.
Meanwhile Novell continues development of Moonlight, the open source Linux equivalent to Silverlight. Moonlight is running several months behind Silverlight with the current release Moonlight 2 corresponding to Silverlight 2. Moonlight 3 release is expected sometime this Summer although previews have been available since February and the latest version (preview 6) made available last week is in pretty good shape.
With that background, I thought it would be useful to state now how these changes are affecting InScribeX Web.
InScribe Web Preview 2
I made this version available in July last year to run on Silverlight 2 not long before Silverlight 3 appeared. This version is still available and runs with Moonlight 2 on Linux as well as Silverlight 2, 3 or 4. I expect this version to remain live, unchanged, until a few weeks after Moonlight 3 release. At that point I expect to retire this version as there will be no need to continue Silverlight 2 compatibility.
InScribe Web Preview 3
This version is written for Silverlight 3 and tested to work with Moonlight 3 previews and Silverlight 4 release. I've held back on making this available until Silverlight 4 was released and out in the field for a couple of weeks so probably an early May release date. As it stands this version is not greatly enhanced over preview 2 although I've changed the interface to make more effective use of screen space and added a bunch of behind the scenes changes. This redesign especially benefits netbooks and other low resolution devices. I'll probably migrate some features of Preview 4 back into Preview 3 for the sake of Linux users once Moonlight 3 is available and before the Moonlight 4 release. In particular some UMdC support. Once Moonlight 4 is available (Winter?) this version can be retired and Preview 4 used cross platform.
InScribe Web Preview 4
Preview 4 requires Silverlight 4 or later and makes use of some of the new functionality, notably to enable rich text editing and printing. Probably late May/early June for the first cut then some incremental changes to follow during the rest of the year. Preview 4 introduces and supports a new file format for texts incorporating Ancient Egyptian, namely UMdC (Unicode Manuel de Codage).
About UMdC
The question of file formats for hieroglyphs and Ancient Egyptian has been a thorn in my side for some time. The problem is not how to devise ways of representing Egyptian in interesting and more powerful representations but rather how to evolve current ways of working with hieroglyphs without adding unnecessary complications in forseeable future directions. I've finally settled on this UMdC approach as the simplest solution to remove this blockage. A topic I hope to cover in more depth tomorrow.
Meanwhile Novell continues development of Moonlight, the open source Linux equivalent to Silverlight. Moonlight is running several months behind Silverlight with the current release Moonlight 2 corresponding to Silverlight 2. Moonlight 3 release is expected sometime this Summer although previews have been available since February and the latest version (preview 6) made available last week is in pretty good shape.
With that background, I thought it would be useful to state now how these changes are affecting InScribeX Web.
InScribe Web Preview 2
I made this version available in July last year to run on Silverlight 2 not long before Silverlight 3 appeared. This version is still available and runs with Moonlight 2 on Linux as well as Silverlight 2, 3 or 4. I expect this version to remain live, unchanged, until a few weeks after Moonlight 3 release. At that point I expect to retire this version as there will be no need to continue Silverlight 2 compatibility.
InScribe Web Preview 3
This version is written for Silverlight 3 and tested to work with Moonlight 3 previews and Silverlight 4 release. I've held back on making this available until Silverlight 4 was released and out in the field for a couple of weeks so probably an early May release date. As it stands this version is not greatly enhanced over preview 2 although I've changed the interface to make more effective use of screen space and added a bunch of behind the scenes changes. This redesign especially benefits netbooks and other low resolution devices. I'll probably migrate some features of Preview 4 back into Preview 3 for the sake of Linux users once Moonlight 3 is available and before the Moonlight 4 release. In particular some UMdC support. Once Moonlight 4 is available (Winter?) this version can be retired and Preview 4 used cross platform.
InScribe Web Preview 4
Preview 4 requires Silverlight 4 or later and makes use of some of the new functionality, notably to enable rich text editing and printing. Probably late May/early June for the first cut then some incremental changes to follow during the rest of the year. Preview 4 introduces and supports a new file format for texts incorporating Ancient Egyptian, namely UMdC (Unicode Manuel de Codage).
About UMdC
The question of file formats for hieroglyphs and Ancient Egyptian has been a thorn in my side for some time. The problem is not how to devise ways of representing Egyptian in interesting and more powerful representations but rather how to evolve current ways of working with hieroglyphs without adding unnecessary complications in forseeable future directions. I've finally settled on this UMdC approach as the simplest solution to remove this blockage. A topic I hope to cover in more depth tomorrow.
Thursday, 17 December 2009
Novell releases Moonlight 2. InScribeX Web 2 now available on Linux.
Novell has today released Moonlight 2, their open source Linux implementation of Silverlight 2.
After trying the first Moonlight 2 preview back in May and testing out several preview versions and nine ‘beta’ versions, needless to say the first thing I checked was the current (July) release of InScribeX Web to see if it is working at last. Sigh of relief!
InScribeX Web is software for working with Ancient Egyptian, including the Basic Egyptian Hieroglyphs added recently to Unicode (5.2). InScribeX has therefore now hit an early goal of running cross-platform on Windows, Mac and Linux.
The Novell press release (www.novell.com/nl-nl/news/press/new-release-of-moonlight-now-available/) also announced an update to their agreement with Microsoft to include Microsoft support for development/testing of Moonlight versions 3 and 4. Novell is working towards a Q1 2010 preview of Moonlight 3 for release in Q3 with Moonlight 4 to follow ‘shortly thereafter’. Miguel de Icaza describes some technical features, including parts of Moonlight 3 functionality already present in 2 at his blog, tirania.org/blog/archive/2009/Dec-17.html.
As noted here last week, my InScribeX Web development is now targetting Silverlight 4 as 'InScribeX Web 4' for the spring 2010 timeframe. Reasons include better desktop deployment, printing, rich text support and improved InScribe 2004SE interoperability. Whether some IW4 features might find their way into a Moonlight/Silverlight 2 or 3 compatible version for Linux is an open question. As always, time is the enemy.
Caveats. Some devices such as the Amazon Kindle use Linux but are not user configurable. Hand held devices in general would not be ideal for InScribeX Web because of input and/or small screen size, even if Silverlight or Moonlight were available. Likewise games consoles where I'd need to add controller support and a redesigned interface for widespread accessability. All the same, apparently Silverlight or Moonlight implementations for Windows Mobile, XBox 360, PlayStation 3 and Wii are in various stages of development so it will be interesting to see what possibilities arise during 2010.
After trying the first Moonlight 2 preview back in May and testing out several preview versions and nine ‘beta’ versions, needless to say the first thing I checked was the current (July) release of InScribeX Web to see if it is working at last. Sigh of relief!
InScribeX Web is software for working with Ancient Egyptian, including the Basic Egyptian Hieroglyphs added recently to Unicode (5.2). InScribeX has therefore now hit an early goal of running cross-platform on Windows, Mac and Linux.
The Novell press release (www.novell.com/nl-nl/news/press/new-release-of-moonlight-now-available/) also announced an update to their agreement with Microsoft to include Microsoft support for development/testing of Moonlight versions 3 and 4. Novell is working towards a Q1 2010 preview of Moonlight 3 for release in Q3 with Moonlight 4 to follow ‘shortly thereafter’. Miguel de Icaza describes some technical features, including parts of Moonlight 3 functionality already present in 2 at his blog, tirania.org/blog/archive/2009/Dec-17.html.
As noted here last week, my InScribeX Web development is now targetting Silverlight 4 as 'InScribeX Web 4' for the spring 2010 timeframe. Reasons include better desktop deployment, printing, rich text support and improved InScribe 2004SE interoperability. Whether some IW4 features might find their way into a Moonlight/Silverlight 2 or 3 compatible version for Linux is an open question. As always, time is the enemy.
Caveats. Some devices such as the Amazon Kindle use Linux but are not user configurable. Hand held devices in general would not be ideal for InScribeX Web because of input and/or small screen size, even if Silverlight or Moonlight were available. Likewise games consoles where I'd need to add controller support and a redesigned interface for widespread accessability. All the same, apparently Silverlight or Moonlight implementations for Windows Mobile, XBox 360, PlayStation 3 and Wii are in various stages of development so it will be interesting to see what possibilities arise during 2010.
Wednesday, 9 December 2009
Designing InScribeX Web version 4: Introduction
During the next few months I’m hoping to find enough time to complete the next version of InScribeX Web (see http://www.inscribex.com/ for the current technical preview). The idea of InScribeX Web is to provide some useful tools for working with Ancient Egyptian in Unicode without the need to buy or install specialist software.
For technical reasons, the next preview of InScribeX Web is unlikely to be online before March/April 2010 so I’ve decided that the best way forward is to blog on the subject so interested parties can follow the development work as it happens. There is no substitute for using software, rather than reading about it, but at least this way gives some opportunity for feedback.
Incidentally I’ve only just discovered my email spam filter had grown too aggressive (it’s a balancing act when one has a public email address) so please try again if you have attempted but failed to contact me in recent weeks.
In parallel with this development I’m continuing to work on a new version of InScribe 2004, namely InScribe 2004SE (Second Edition), which despite the 2004 handle is in fact a major functional upgrade which enables use of Unicode and refreshes the software to take advantage of new features in Windows Vista and 7 while retaining the mode of use and features of the original edition. This is relevant to InScribeX Web as the two are being designed to complement each other when the commercial InScribe 2004SE software is installed.
InScribeX Web uses the Microsoft Silverlight plugin for Web Browsers. The main reason for this choice is simple, I needed the most cost effective way of creating advanced internet software: the project is unfunded so there was no scope for the luxury of developing under more time consuming alternatives such as Adobe AIR or Google Gears (fortunately as Gears is no longer being developed in favour of Google changing tack to an as yet to be clarified HTML 5 approach in ChromeOS etc.). Practicalities aside, Silverlight also allows for fun graphics and other effects and must admit I rather enjoy having these facilities to hand as a refreshing change to the more formal approach necessary in the InScribe 2004SE development.
The choice of Silverlight is not without controversy, nor without complication from a developer perspective. The current InScribeX Web preview was written for Silverlight 2. Silverlight 3 was released in July adding new features and Silverlight 4 announced in November for release in the spring. If that is not enough, the Linux equivalent (Moonlight, developed as an open source project by Novell) is running some distance behind Silverlight itself with Moonlight 2 not expected to be released until early next year (when I built the InScribe Web technical preview, the Moonlight release was expected late Summer).
To sidestep the version complications, InScribeX Web development is now targeting Silverlight 4 (expected in March/April 2010, I am currently using the developer-only Beta preview) and for those interested in such matters I‘m using Visual Studio 2010 Beta 2 (with some Expression Blend) as the development environment. There are some major benefits using 4 which I’ll run over another day. It’s a pure guess but I’m half expecting version 4 to be the point at which Silverlight hits prime time; we shall see.
Unfortunately, this is not good news for Linux users! My thinking at the moment is to wait until Moonlight 2 is released and the current technical preview working and then make a call on what to do. I’m actually very keen on making InScribeX available on Linux (the X means cross-platform) but it would be perverse to penalize the 95% (or whatever it is) of the internet population who can use Silverlight.
For technical reasons, the next preview of InScribeX Web is unlikely to be online before March/April 2010 so I’ve decided that the best way forward is to blog on the subject so interested parties can follow the development work as it happens. There is no substitute for using software, rather than reading about it, but at least this way gives some opportunity for feedback.
Incidentally I’ve only just discovered my email spam filter had grown too aggressive (it’s a balancing act when one has a public email address) so please try again if you have attempted but failed to contact me in recent weeks.
In parallel with this development I’m continuing to work on a new version of InScribe 2004, namely InScribe 2004SE (Second Edition), which despite the 2004 handle is in fact a major functional upgrade which enables use of Unicode and refreshes the software to take advantage of new features in Windows Vista and 7 while retaining the mode of use and features of the original edition. This is relevant to InScribeX Web as the two are being designed to complement each other when the commercial InScribe 2004SE software is installed.
InScribeX Web uses the Microsoft Silverlight plugin for Web Browsers. The main reason for this choice is simple, I needed the most cost effective way of creating advanced internet software: the project is unfunded so there was no scope for the luxury of developing under more time consuming alternatives such as Adobe AIR or Google Gears (fortunately as Gears is no longer being developed in favour of Google changing tack to an as yet to be clarified HTML 5 approach in ChromeOS etc.). Practicalities aside, Silverlight also allows for fun graphics and other effects and must admit I rather enjoy having these facilities to hand as a refreshing change to the more formal approach necessary in the InScribe 2004SE development.
The choice of Silverlight is not without controversy, nor without complication from a developer perspective. The current InScribeX Web preview was written for Silverlight 2. Silverlight 3 was released in July adding new features and Silverlight 4 announced in November for release in the spring. If that is not enough, the Linux equivalent (Moonlight, developed as an open source project by Novell) is running some distance behind Silverlight itself with Moonlight 2 not expected to be released until early next year (when I built the InScribe Web technical preview, the Moonlight release was expected late Summer).
To sidestep the version complications, InScribeX Web development is now targeting Silverlight 4 (expected in March/April 2010, I am currently using the developer-only Beta preview) and for those interested in such matters I‘m using Visual Studio 2010 Beta 2 (with some Expression Blend) as the development environment. There are some major benefits using 4 which I’ll run over another day. It’s a pure guess but I’m half expecting version 4 to be the point at which Silverlight hits prime time; we shall see.
Unfortunately, this is not good news for Linux users! My thinking at the moment is to wait until Moonlight 2 is released and the current technical preview working and then make a call on what to do. I’m actually very keen on making InScribeX available on Linux (the X means cross-platform) but it would be perverse to penalize the 95% (or whatever it is) of the internet population who can use Silverlight.
Subscribe to:
Comments (Atom)
