uax 9
1.0.0Implementation of the Unicode Standards Annex #9's bidirectional text algorithm
About UAX-9
This is an implementation of the Unicode Standards Annex #9's bidirectional text algorithm. It provides a convenient way to handle text bidirectionality.
Bidirectional text occurs when text of different directionality is mixed. For instance, if arabic text, which is typically right-to-left, intersperses roman numerals, which is typically left-to-right, then the roman numerals need to be rendered in reversed order to produce the correct display order.
The Unicode Bidirectional algorithm implemented by this library handles the reordering of such text into a canonical order that can then be used by text rendering engines to produce correctly laid out text.
Note that this algorithm does not analyse line breaking. You must provide the appropriate line breaking opportunities yourself, see UAX-14. The algorithm will also not handle paragraph breaks, but instead expects you to deliver properly segmented strings for analysis.
How To
The system will compile binary database files on first load. Should anything go wrong during this process, a note is produced on load. If you would like to prevent this automated loading, push uax-9-no-load
to *features*
before loading. You can then manually load the database files when convenient through load-databases
.
Once loaded, you can compute the line breaking levels of a string with the levels
function. To use this information and produce a reordering index vector, pass its result to the reorder
function. Note that when iterating through these characters, the level of the character needs to be taken into consideration, as some characters need to be mirrored when right-to-left oriented. You can detect right-to-left levels by testing whether they are odd. You can then retrieve the potentially mirrored variant of the character through mirror-at
.
Alternatively you can also iterate through the string directly in the correct character order (including mirroring) using do-in-order
. Also note that some characters will require manual mirroring in the rendering engine as no equivalent mirrored characters exist in Unicode.
External Files
The following files were retrieved from external resources, last accessed on 4.9.2019.
BidiBrackets.txt
https://www.unicode.org/Public/UCD/latest/ucd/BidiBrackets.txtBidiCharacterTest.txt
https://www.unicode.org/Public/UCD/latest/ucd/BidiCharacterTest.txtBidiMirroring.txt
https://www.unicode.org/Public/UCD/latest/ucd/BidiMirroring.txtBidiTest.txt
https://www.unicode.org/Public/UCD/latest/ucd/BidiTest.txtDerivedBidiClass.txt
https://www.unicode.org/Public/UCD/latest/ucd/DerivedBidiClass.txt
At the time, Unicode 12.1 was considered the latest version.
System Information
Definition Index
-
ORG.SHIRAKUMO.ALLOY.UAX-9
No documentation provided.-
EXTERNAL SPECIAL-VARIABLE *BIDI-BRACKETS-TABLE-FILE*
Variable containing the absolute path of the brackets table file. See LOAD-DATABASES See COMPILE-DATABASES
-
EXTERNAL SPECIAL-VARIABLE *BIDI-CLASS-DATABASE-FILE*
Variable containing the absolute path of the bidi class database file. See LOAD-DATABASES See COMPILE-DATABASES
-
EXTERNAL SPECIAL-VARIABLE *BIDI-MIRRORING-TABLE-FILE*
Variable containing the absolute path of the mirroring table file. See LOAD-DATABASES See COMPILE-DATABASES
-
EXTERNAL CONDITION NO-DATABASE-FILES
Warning signalled when LOAD-DATABASES is called and the files are not present. Two restarts must be active when this condition is signalled: COMPILE --- Call COMPILE-DATABASES ABORT --- Abort loading the databases, leaving them at their previous state. See LOAD-DATABASES
-
EXTERNAL FUNCTION CALL-IN-ORDER
- FUNCTION
- STRING
- &OPTIONAL
- LEVELS
- INDEXES
Calls the function per character in proper order over the string. The function must accept two arguments: CHARACTER --- The character to display. MANUAL-MIRROR --- Whether the rendering engine should draw the character mirrored. This function will iterate over the string in the proper order to respect bidirectionality. If indexes is not passed, it is automatically computed through REORDER on the levels. If levels is not passed, it is automatically computed through LEVELS on the string. Note that the CHARACTER passed to the function is already mirrored if a mirrored character exists in unicode. This means you do not need to call MIRROR-AT yourself. See LEVELS See REORDER See MIRROR-AT See DO-IN-ORDER
-
EXTERNAL FUNCTION COMPILE-DATABASES
Compiles the database files from their sources. This will load an optional part of the system and compile the database files to an efficient byte representation. If the compilation is successful, LOAD-DATABASES is called automatically. See *BIDI-CLASS-DATABASE-FILE* See *BIDI-BRACKETS-TABLE-FILE* See *BIDI-CLASS-DATABASE-FILE* See LOAD-DATABASES
-
EXTERNAL FUNCTION LEVELS
- STRING
- &KEY
- BASE-DIRECTION
- LINE-BREAKS
- START
- END
Computes the directional level for every code point in the string. Returns two values: LEVELS --- A vector of levels for each code point in the input string. Has the length of the input string. BASE-DIRECTION --- Returns the base direction of the string. If BASE-DIRECTION was not :AUTO, this is the determined direction. BASE-DIRECTION must be one of three values: :LEFT-TO-RIGHT :RIGHT-TO-LEFT :AUTO (default) This designates how the text is interpreted at its base level. When this level is :AUTO, the base level is determined automatically by scanning for the first directional code point in the string. LINE-BREAKS should be a list of indexes into the string. Each index designates a code point after which a line break is inserted. This is used to normalise the levels across breaks. If you pass this argument, you must pass the same argument to REORDER. If you do not pass this, the line end is assumed to be at the end of the string. The values in the levels vector designate which direction the code point at this index should have. If the level is even, the direction is LEFT-TO-RIGHT; if it is odd, RIGHT-TO-LEFT. You will need this information yourself to determine whether to display code points mirrored or not when rendering their glyphs. See REORDER See MIRROR-AT
-
EXTERNAL FUNCTION LOAD-DATABASES
Loads the databases from their files into memory. If one of the files is missing, a warning of type NO-DATABASE-FILES is signalled. If the loading succeeds, T is returned. See *BIDI-CLASS-DATABASE-FILE* See *BIDI-BRACKETS-TABLE-FILE* See *BIDI-CLASS-DATABASE-FILE* See NO-DATABASE-FILES
-
EXTERNAL FUNCTION MIRROR-AT
- STRING
- I
Returns the mirrored character at the given position in the string. Returns two values: CHARACTER --- The character to display. This may either be same character as was passed in, or its mirror sibling. MANUAL-MIRROR --- Whether the character needs to be displayed in a mirrored way in the renderer. If MANUAL-MIRROR is T, the returned character will be the same as the character at that point in the string. The rendering engine displaying the character must ensure that it is drawn mirrored instead. If MANUAL-MIRROR is NIL, the returned character can be drawn in all cases to achieve the correct mirroring behaviour. Note that you should only invoke this function to retrieve the mirror pair if the level of the character at the point is uneven and thus right-to-left.
-
EXTERNAL FUNCTION REORDER
- LEVELS
- &KEY
- LINE-BREAKS
- INDEXES
Computes a reordering of indexes into the string to process the code points in the correct order. Returns one value, the reordered index vector, the same length as the input LEVELS vector. The vector should be filled with indices into the original string. Iterating through this index vector in order should provide the correct ordering for the resulting code points when rendering along the base direction. LINE-BREAKS should be a list of indexes into the string. Each index designates a code point after which a line break is inserted. This argument must be the same as what you passed to LEVELS to get the levels vector. INDEXES is the index vector that's permuted and returned. You can pass this to save on allocation. If not passed, a vector the length of the levels vector is created. If passed, you should make sure that the indices in the vector make sense -- meaning they should typically be in ascending order starting with 0. See LEVELS
-
EXTERNAL MACRO DO-IN-ORDER
- CHARACTER
- MANUAL-MIRROR
- STRING
- &OPTIONAL
- LEVELS
- INDEXES
- &BODY
- BODY
Iterates over the string in bidirectional order, binding CHARACTER and MANUAL-MIRROR for each character. This is a convenience macro around CALL-IN-ORDER. See CALL-IN-ORDER
-