No Disconnect

Things I’d rather not forget

Archive for the ‘Unicode’ Category

Further Arabic Unicode Lessons

leave a comment »

Machine-readable information about Arabic shaping (i.e., information about which characters join and which do not) is available in Unicode’s ArabicShaping.txt file. This file includes entries for characters outside of the Unicode Arabic ranges, including glyphs from Hebrew and Syriac.

The ArabicShaping.txt feature “Joining_Group” seems, approximately, to mean “shape.” These are glyphs with similar shapes, which differ by diacritics.

There are 29 joining groups, of which 8 are non-right-joining. So there are 232 combinations of sequences of groups where the first (right) glyph is non-right-joining. Further information about the joining groups.

  • There are three heh-related singleton groups: HEH (0647), HEH GOAL (06C1,06C2–the latter being a ligature), and HAMZA ON HEH GOAL (06C3).
  • No_Joining_Group symbols simply don’t connect.
  • The singleton group SWASH KAF (06AA) comes with no explanation.
  • Others groups are intuitive.

Scheherezade is missing these Unicode 5.1 glyphs: 0608, 063B, 063C, 063D, 063E, 063F, 076E, 076F, 0770, 0771, 0772, 0773, 0774, 0775, 0776, 0777, 0778, 0779, 077A, 077B, 077C, 077D, 077E, 077F

Written by nodisconnect

September 14, 2009 at 3:13 am

Posted in Unicode

Arabic Unicode Summary

leave a comment »

There are only two blocks that the Unicode standard (5.1) recommends be used for Arabic script: U+0600 though U+06FF and U+0750 through U+077F. The former block is the basic character set; the latter is an extended set.

Two other blocks, U+FB50 through U+FC3F and U+FE70 through U+FEFF, are “presentation forms” that are included for compatibility with previous standards. The Unicode standard does not recommend using those codepoints.

In the Unicode Standard book (available for 5.0), Chapter 2 is a useful overview. Chapter 8 has details about Arabic script.

SIL Scheherazade supports almost everything in the Unicode 4.1 standard. From Unicode 4.1 to Unicode 5.0, there were no changes to the Arabic blocks. From Unicode 5.0 to 5.1, there were apparently about a few dozen characters added. No idea whether any currently-available font includes all of these shapes.

Written by nodisconnect

July 22, 2009 at 9:08 pm

Posted in Unicode

Follow

Get every new post delivered to your Inbox.