Microsoft Typography | Developer information | Specifications | OpenType Layout tag registry


OpenType Layout tag registry

The Tag Registry defines the OpenType Layout tags that Microsoft supports. OpenType Layout tags are 4-byte character strings that identify the scripts, language systems, baselines, and features in a OpenType Layout font. The registry establishes conventions for naming and using these tags. Registered tags have a specific meaning and convey precise information to developers and text-processing clients of OpenType Layout. Microsoft encourages font developers to use registered tags to assure compatibility and ease of use across fonts, applications, and operating systems.

This chapter contains sample sets of commonly used tags for scripts, language systems, and baselines. Microsoft will supply a list of additional tags upon request.

In addition, the Feature Tag section defines all the features and feature tags Microsoft has developed and registered to date. This section includes a description of the function of each feature.

Microsoft expects the list of registered tags and features to expand over time. The most recent version of the registry will be available on Microsoft's ftp and World Wide Web sites.

Microsoft welcomes nominations for new and useful features to register. For more information about feature and feature tag registration, see the Feature Tag section.


Script tags

Script tags identify the scripts represented in a OpenType Layout font. Script tags are defined by Microsoft Typography and correspond to the contiguous character code ranges in Unicode.

All tags are 4-byte character strings composed of a limited set of ASCII characters in the 0x20-0x7E range. A script tag can consist of 4 or less lowercase letters. If a script tag consists of three or less lowercase letters, the letters are followed by the requisite number of spaces (0x20), each consisting of a single byte.

Some of most commonly used script tags are shown below. A full list of script tags is available from Microsoft.


Script Script Tag
Arabicarab
Armenianarmn
Bengalibeng
Bopomofobopo
Braillebrai
Canadian Syllabicscans
Cherokeecher
CJK Ideographichani
Cyrilliccyrl
Devanagarideva
Ethiopicethi
Georgiangeor
Greekgrek
Gujaratigujr
Gurmukhiguru
Hangul Jamojamo
Hangulhang
Hebrewhebr
Hiraganahira
Kannadaknda
Katakanakana
Khmerkhmr
Laolao
Latinlatn
Malayalammlym
Mongolianmong
Myanmarmymr
Oghamogam
Oriyaorya
Runicrunr
Sinhalasinh
Syriacsyrc
Tamiltaml
Telugutelu
Thaanathaa
Thaithai
Tibetantibt
Yiyi


Language system tags

Language system tags identify the language systems supported in a OpenType Layout font. Microsoft uses the standard language system tag names defined in the Windows Natural Language Support API document (called NLSAPI.doc), in Appendix A: Locales and Language ID's. This document is available on the Microsoft Developers Network CD released by Microsoft quarterly, or it can be acquired directly from Microsoft Typography.

All tags are 4-byte character strings composed of a limited set of ASCII characters in the 0x20-0x7E range. If a language system tag consists of three or less lowercase letters, the letters are followed by the requisite number of spaces (0x20), each consisting of a single byte.


Language System Tag Language System
AbazaABA
AbkhazianABK
AdygheADY
AfrikaansAFK
AfarAFR
AgawAGW
AltaiALT
AmharicAMH
ArabicARA
AariARI
ArakaneseARK
AssameseASM
AthapaskanATH
AvarAVR
AwadhiAWA
AymaraAYM
AzeriAZE
BadagaBAD
BaghelkhandiBAG
BalkarBAL
BauleBAU
BerberBBR
BenchBCH
Bible CreeBCR
BelarussianBEL
BembaBEM
BengaliBEN
BulgarianBGR
BhiliBHI
BhojpuriBHO
BikolBIK
BilenBIL
BlackfootBKF
BalochiBLI
BalanteBLN
BaltiBLT
BambaraBMB
BamilekeBML
BretonBRE
BrahuiBRH
Braj BhashaBRI
BurmeseBRM
BashkirBSH
BetiBTI
CatalanCAT
CebuanoCEB
ChechenCHE
Chaha GurageCHG
ChattisgarhiCHH
ChichewaCHI
ChukchiCHK
ChipewyanCHP
CherokeeCHR
ChuvashCHU
ComorianCMR
CopticCOP
CreeCRE
CarrierCRR
Crimean TatarCRT
Church SlavonicCSL
CzechCSY
DanishDAN
DargwaDAR
Woods CreeDCR
German (Standard)DEU
DogriDGR
DhivehiDHV
DjermaDJR
DangmeDNG
DinkaDNK
DunganDUN
DzongkhaDZN
EbiraEBI
Eastern CreeECR
EdoEDO
EfikEFI
GreekELL
EnglishENG
ErzyaERZ
SpanishESP
EstonianETI
BasqueEUQ
EvenkiEVK
EvenEVN
EweEWE
French AntilleanFAN
FarsiFAR
FinnishFIN
FijianFJI
FlemishFLE
Forest NenetsFNE
FonFON
FaroeseFOS
French (Standard)FRA
FrisianFRI
FriulianFRL
FutaFTA
FulaniFUL
GaGAD
GaelicGAE
GagauzGAG
GalicianGAL
GarshuniGAR
GarhwaliGAW
Ge'ezGEZ
GilyakGIL
GumuzGMZ
GondiGON
GreenlandicGRN
GaroGRO
GuaraniGUA
GujaratiGUJ
HaitianHAI
HalamHAL
HarautiHAR
HausaHAU
HawaiinHAW
Hammer-BannaHBN
HiligaynonHIL
HindiHIN
High MariHMA
HindkoHND
HoHO
HarariHRI
CroatianHRV
HungarianHUN
ArmenianHYE
IgboIBO
IjoIJO
IlokanoILO
IndonesianIND
IngushING
InuktitutINU
IrishIRI
Irish TraditionalIRT
IcelandicISL
Inari SamiISM
ItalianITA
HebrewIWR
JavaneseJAV
YiddishJII
JapaneseJAN
JudezmoJUD
JulaJUL
KabardianKAB
KachchiKAC
KalenjinKAL
KannadaKAN
KarachayKAR
GeorgianKAT
KazakhKAZ
KebenaKEB
Khutsuri GeorgianKGE
KhakassKHA
Khanty-KazimKHK
KhmerKHM
Khanty-ShurishkarKHS
Khanty-VakhiKHV
KhowarKHW
KikuyuKIK
KirghizKIR
KisiiKIS
KokniKKN
KalmykKLM
KambaKMB
KumaoniKMN
KomoKMO
KomsoKMS
KanuriKNR
KodaguKOD
KonkaniKOK
KikongoKON
Komi-PermyakKOP
KoreanKOR
Komi-ZyrianKOZ
KpelleKPL
KrioKRI
KarakalpakKRK
KarelianKRL
KaraimKRM
KarenKRN
KooreteKRT
KashmiriKSH
KhasiKSI
Kildin SamiKSM
KuiKUI
KulviKUL
KumykKUM
KurdishKUR
KurukhKUU
KuyKUY
KoryakKYK
LadinLAD
LahuliLAH
LakLAK
LambaniLAM
LaoLAO
LatinLAT
LazLAZ
L-CreeLCR
LadakhiLDK
LezgiLEZ
LingalaLIN
Low MariLMA
LimbuLMB
LomweLMW
Lower SorbianLSB
Lule SamiLSM
LithuanianLTH
LubaLUB
LugandaLUG
LuhyaLUH
LuoLUO
LatvianLVI
MajangMAJ
MakuaMAK
Malayalam TraditionalMAL
MansiMAN
MarathiMAR
MarwariMAW
MbunduMBN
ManchuMCH
Moose CreeMCR
MendeMDE
Me'enMEN
MizoMIZ
MacedonianMKD
MaleMLE
MalagasyMLG
MalinkeMLN
Malayalam ReformedMLR
MalayMLY
MandinkaMND
MongolianMNG
ManipuriMNI
ManinkaMNK
Manx GaelicMNX
MokshaMOK
MoldavianMOL
MonMON
MaoriMRI
MaithiliMTH
MalteseMTS
MundariMUN
Naga-AssameseNAG
NanaiNAN
NaskapiNAS
N-CreeNCR
NdebeleNDB
NdongaNDG
NepaliNEP
NewariNEW
Norway House CreeNHC
NisiNIS
NiueanNIU
NkoleNKL
DutchNLD
NogaiNOG
NorwegianNOR
Northern SamiNSM
Northern TaiNTA
EsperantoNTO
NynorskNYN
Oji-CreeOCR
OjibwayOJB
OriyaORI
OromoORO
OssetianOSS
Palestinian AramaicPAA
PaliPAL
PunjabiPAN
PalpaPAP
PashtoPAS
Polytonic GreekPGR
PilipinoPIL
PalaungPLG
PolishPLK
ProvencalPRO
PortuguesePTG
ChinQIN
RajasthaniRAJ
R-CreeRCR
Russian BuriatRBU
RiangRIA
Rhaeto-RomanicRMS
RomanianROM
RomanyROY
RusynRSY
RuandaRUA
RussianRUS
SadriSAD
SanskritSAN
SantaliSAT
SayisiSAY
SekotaSEK
SelkupSEL
SangoSGO
ShanSHN
SibeSIB
SidamoSID
Silte GurageSIG
Skolt SamiSKS
SlovakSKY
SlaveySLA
SlovenianSLV
SomaliSML
SamoanSMO
SenaSNA
SindhiSND
SinhaleseSNH
SoninkeSNK
Sodo GurageSOG
SothoSOT
AlbanianSQI
SerbianSRB
SaraikiSRK
SererSRR
South SlaveySSL
Southern SamiSSM
SuriSUR
SvanSVA
SwedishSVE
Swadaya AramaicSWA
SwahiliSWK
SwaziSWZ
SutuSXT
SyriacSYR
TabasaranTAB
TajikiTAJ
TamilTAM
TatarTAT
TH-CreeTCR
TeluguTEL
TonganTGN
TigreTGR
TigrinyaTGY
ThaiTHA
TahitianTHT
TibetanTIB
TurkmenTKM
TemneTMN
TswanaTNA
Tundra NenetsTNE
TongaTNG
TodoTOD
TurkishTRK
TsongaTSG
Turoyo AramaicTUA
TuluTUL
TuvinTUV
TwiTWI
UdmurtUDM
UkrainianUKR
UrduURD
Upper SorbianUSB
UyghurUYG
UzbekUZB
VendaVEN
VietnameseVIT
WaWA
WagdiWAG
West-CreeWCR
WelshWEL
WolofWLF
XhosaXHS
YakutYAK
YorubaYBA
Y-CreeYCR
Yi ClassicYIC
Yi ModernYIM
Chinese PhoneticZHP
Chinese SimplifiedZHS
Chinese TraditionalZHT
ZandeZND
ZuluZUL


Baseline tags

This section defines the standard OpenType Layout baseline tags that Microsoft supports. A registered baseline tag has a specific meaning when used in the horizontal writing direction (used in the 'BASE' table's HorizAxis table), vertical writing direction (used in the 'BASE' table's VertAxis table), or both, and conveys information to font users about a baseline's use. For example, the "romn" baseline tag is commonly used to identify the baseline to layout Latin text in the horizontal, vertical, or both directions. for Latin text layout. For compatibility and ease of use, Microsoft encourages font developers to use registered baseline tags.

This version of the Tag Registry identifies the baselines that Microsoft has implemented to date. All baseline tags are 4-byte character strings composed of a limited set of ASCII characters in the 0x20-0x7E range. Baseline tags consist of four lowercase letters.


Baseline Tag Baseline for HorizAxis Baseline for VertAxis
"hang" The hanging baseline. This is the horizontal line from which syllables seem to hang in Indic scripts similar to Devanagari. The hanging baseline, (which now appears vertical) for Indic characters rotated 90 degrees clockwise, for vertical writing mode.
"icfb" deographic character face bottom edge baseline.
(See section Ideographic Character Face below for usage.)
Ideographic character face left edge baseline.
(See section Ideographic Character Face below for usage.)
"icft" Ideographic character face top edge baseline.
(See section Ideographic Character Face below for usage.)
Ideographic character face right edge baseline.
(See section Ideographic Character Face below for usage.)
"ideo" Ideographic em-box bottom edge baseline.
(See section Ideographic Em-Box below for usage.)
Ideographic em-box left edge baseline. If this tag is present in the VertAxis, the value must be set to 0.
(See section Ideographic Em-Box below for usage.)
"idtp" Ideographic em-box top edge baseline. (See section Ideographic Em-Box below for usage.) Ideographic em-box right edge baseline. If this tag is present in the VertAxis, the value is strongly recommended to be set to head.unitsPerEm. (See section Ideographic Em-Box below for usage.)
"math" The baseline about which mathematical characters are centered. The baseline about which mathematical characters, when rotated 90 degrees clockwise for vertical writing mode, are centered.
"romn" The baseline used by simple alphabetic scripts such as Latin, Cyrillic and Greek. The alphabetic baseline for characters rotated 90 degrees clockwise for vertical writing mode. (This would not apply to alphabetic characters that remain upright in vertical writing mode, since these characters are not rotated.)


Ideographic Em-Box

[ The notation <Axis>.<Baseline Tag> is used in the following description to mean the baseline tag as defined in the specified axis. For example, HorizAxis.ideo means the ideo baseline tag as defined in the HorizAxis of the BASE table. See above for a list of registered baseline tags. ]

A font's ideographic em-box is the rectangle that defines a standard escapement around the full-width ideographic glyphs of the font, for both the horizontal and vertical writing directions. It is usually a square, but may be non-square as in the case of fonts used in Japanese newspaper layout that have a vertically condensed design.

The left, right, top and bottom edges of the ideographic em-box are to be determined as follows:

ideoEmboxLeft = 0

If HorizAxis.ideo defined:

ideoEmboxBottom = HorizAxis.ideo

If HorizAxis.idtp defined:

ideoEmboxTop = HorizAxis.idtp
Else:
ideoEmboxTop = HorizAxis.ideo + head.unitsPerEm

If VertAxis.idtp defined:

ideoEmboxRight = VertAxis.idtp
Else:
ideoEmboxRight = head.unitsPerEm

If VertAxis.ideo defined and non-zero:

Warning: Bad VertAxis.ideo value

Else If this is a CJK font:

ideoEmboxBottom = OS/2.sTypoDescender
ideoEmboxTop = OS/2.sTypoAscender
ideoEmboxRight = head.unitsPerEm
Else:
ideoEmbox cannot be determined for this font

Determining whether a font is CJK (Chinese, Japanese, or Korean) or not, as in the second-last "Else" clause above, can be done by checking the CJK-related bits of the OS/2.ulUnicodeRange fields.

Note that font designers can specify a HorizAxis.ideo baseline in their non-CJK fonts; this can be used by applications when aligning the font with an ideographic font used on the same line of text, when the user has specified ideographic em-box alignment.

The ideographic em-box center baseline is defined as halfway between the ideographic em-box top and bottom baselines in the horizontal axis, and halfway between the ideographic em-box left and right baselines in the vertical axis. These center baselines are defined in whole character units. The division used in the calculation must round to the character unit nearest 0 if needed. Thus, for maximal precision of center baseline placement, vendors should ensure that opposite edges of the ideographic em-box box are an even number of character units apart.

Example:

The values of the ideographic baseline tags for the Kozuka Mincho font family (designed on a 1000-unit em) are:

HorizAxis.ideo = -120; HorizAxis.idtp = 880.
Since this describes a square ideographic em-box, it is sufficient to record only the following:
HorizAxis.ideo = -120.
If HorizAxis.ideo is not present, then the following will be used for the ideographic em-box bottom and top, since this is a CJK font:
OS/2.sTypoDescender = -120; OS/2.sTypoAscender = 880.

Compatibility notes:

  1. Most applications expect the width of full-width ideographs in a CJK font to be exactly one em, thus it is strongly recommended that VertAxis.idtp, if present, be set to head.unitsPerEm. (The idtp baseline tag was introduced in OpenType 1.3.)

  2. While the OpenType specification allows for CJK fonts' OS/2.sTypoDescender and OS/2.sTypoAscender fields to specify metrics different from the HorizAxis.ideo and HorizAxis.idtp in the 'BASE' table, CJK font developers should be aware that existing applications may not read the 'BASE' table at all but simply use the OS/2.sTypoDescender and OS/2.sTypoAscender fields to describe the bottom and top edges of the ideographic em-box. If developers want their fonts to work correctly with such applications, they should ensure that any ideographic em-box values in the 'BASE' table of their CJK fonts describe the same bottom and top edges as the OS/2.sTypoDescender and OS/2.sTypoAscender fields.

  3. Applications on platforms other than Windows that don't parse the 'OS/2' table won't have access to the OS/2.sTypoDescender and OS/2.sTypoAscender fields, since these metrics are exposed only through Windows APIs currently. Thus, CJK fonts will typically have the same descender value recorded in hhea.Descender, OS/2.sTypoDescender, and HorizAxis.ideo (if present), and the same Ascender value recorded in hhea.Ascender, OS/2.sTypoAscender, and HorizAxis.idtp (if present).

See the section "OpenType CJK Font Guidelines" for more information about constructing CJK fonts.


Ideographic Character Face

[ The notation <Axis>.<Baseline Tag> is used in the following description to mean the baseline tag as defined in the specified axis. For example, HorizAxis.icfb means the icfb baseline tag as defined in the HorizAxis of the BASE table. See above for a list of registered baseline tags. ]

The ideographic character face (ICF), also known as the average character face (ACF), specifies the approximate bounding box of the full-width ideographic and kana glyphs in a CJK font. (This is different from the FontBBox, as described in the PostScript programming language, which is the bounding box of all glyphs in the font.) In Japanese, the term for ICF is heikin jizura.

It is typically expressed as a percentage that represents the ratio of the length of an ICF box edge to the length of an ideographic em-box edge, and is conceptualized as a square centered within the ideographic em-box. However, in OpenType, the ICF box's left, bottom, right, and top edges are specified as the VertAxis.icfb, HorizAxis.icfb, VertAxis.icft, and HorizAxis.icft baselines, respectively, thus giving font designers the flexibility to specify a non-square and/or non-centered ICF box.

Font designers should set the value of the ICF box edges based on how tight or loose they want the font to appear when text is set with no tracking or kerning (beta gumi in Japanese). Therefore, the left-over boundary of the ideographic em-box around the ICF box is the default escapement of the font.

Applications can use the ICF box as an alignment tool, to ensure that glyphs touch the edges of the text frame and page objects are visually aligned to text edges. It is also useful for aligning glyphs of different sizes on the same line. In Japanese traditional paper-based workflow, the ICF box was often used for these purposes. It provides optically aligned results that are superior to using the ideographic em-box.

HorizAxis.icfb is the mininum piece of information required to define the ICF, in a CJK font. First, the ideographic em-box dimensions must be calculated as in the section "Ideographic Em-Box" above. The ICF edges are then calculated in the following order:

If HorizAxis.icfb defined:
icfBottom = HorizAxis.icfb

margin = HorizAxis.icfb - ideoEmboxBottom

If HorizAxis.icft defined:

icfTop = HorizAxis.icft
Else:
icfTop = ideoEmboxTop - margin

If VertAxis.icfb defined:

icfLeft = VertAxis.icfb
Else:
icfLeft = margin

If VertAxis.icft defined:

icfRight = VertAxis.icft
Else:
icfRight = ideoEmBoxRight - icfLeft
Else:
ICF cannot be determined for this font

For the last case above, i.e. fonts that don't have ICF information in their 'BASE' table, an application may choose to apply a heuristic such as calculating the bounding box of some or all of the ideographic and kana glyphs, and then averaging its margin with the ideographic em-box.

The ICF center baseline is defined as halfway between the ICF top and bottom baselines in the horizontal axis, and halfway between the ICF left and right baselines in the vertical axis. These center baselines are defined in whole character units. The division used in the calculation must round to the character unit nearest 0 if needed. Thus, for maximal precision of center baseline placement, vendors should ensure that opposite edges of the ICF box are an even number of character units apart.

Example:

The values of the ICF baselines for the Extra Light and Heavy weights of the Kozuka Mincho font family (designed on a 1000-unit em, with ideographic em-box as given in the example in the previous section) are:

Kozuka Mincho Extra Light:
VertAxis.icfb = 41; HorizAxis.icfb = -79;
VertAxis.icft = 959; HorizAxis.icft = 839.
Since this describes a square ICF centered in a square ideographic em-box, it is sufficient to record only the following:
HorizAxis.icfb = -79.

Kozuka Mincho Heavy:
VertAxis.icfb = 26; HorizAxis.icfb = -94;
VertAxis.icft = 974; HorizAxis.icft = 854.
It is sufficient to record only:
HorizAxis.icfb = -94.

It is strongly recommended that each of the edges of the ICF box be equidistant from the corresponding edge of the ideographic em-box. Following this will result in more predictable results in applications that use these values. That is, for fonts based on a square ideographic em-box, the ICF box should be a centered square.

See the section "OpenType CJK Font Guidelines" for more information about constructing CJK fonts.


Feature tags

Features provide information about how to use the glyphs in a font to render a script or language. For example, an Arabic font might have a feature for substituting initial glyph forms, and a Kanji font might have a feature for positioning glyphs vertically. All OpenType Layout features define data for glyph substitution, glyph positioning, or both.

Each OpenType Layout feature has a feature tag that identifies its typographic function and effects. By examining a feature's tag, a text-processing client can determine what a feature does and decide whether to implement it. All tags are 4-byte character strings composed of a limited set of ASCII characters in the 0x20-0x7E range. Microsoft-registered feature tags use four lowercase letters. For instance, the "mark" feature manages the placement of diacritical marks, and the "swsh" feature renders swash glyphs.

This version of the Tag Registry describes all the OpenType Layout features Microsoft has developed to date. It also includes details that identify the lookups that Microsoft uses to implement each feature. Lookup information is provided for reference purposes only; the set of lookups used to implement a feature will vary across system platforms, applications, fonts, and font developers.

A feature definition may not provide all the information required to properly implement glyph substitution or positioning actions. In many cases, a text-processing client may need to supply additional data. For example, the function of the "init" feature is to provide initial glyph forms. Nothing in the feature's lookup tables indicates when or where to apply this feature during text processing. To correctly use the "init" feature in Arabic text where initial glyph forms appear at the beginning of words, text-processing clients must be able to identify the first glyph position in each word before making the glyph ubstitution. In all cases, the text-processing client is responsible for applying, combining, and arbitrating among features and rendering the result.

The tag space defined by tags consisting of four uppercase letters (A-Z) with no punctuation, spaces, or numbers, is reserved as a vendor space. Font vendors may use such tags to identify private features. For example, the feature tag "PKRN" might designate a private feature that may be used to kern punctuation marks. Microsoft does not guarantee the compatibility or usability of private features, and it cannot ensure that two font vendors will not choose the same tag for a private feature.


To register features

Microsoft encourages font developers to use registered feature tags when implementing registered features. However, font developers also may define and register their own features.

Microsoft welcomes nominations for new features and feature tags to register. To qualify for registration, a feature must have a single function that is clearly identified by its tag. The function of the feature should be defined at the lowest useful level and must be distinctly different from the functions of currently registered features. When font developers register feature tags and functions with Microsoft, they do not have to supply implementation details.

Microsoft reserves the right to officially assign feature tags in the Microsoft Tag Registry. Although Microsoft has reserved the feature and feature tag definitions listed here, Microsoft fonts do not contain all of the features.

Registered features



this page was last updated 22 March 2001
© 2001 Microsoft Corporation. All rights reserved. Terms of use.
comments to the MST group: ttwsite@microsoft.com

 

Microsoft Typography | Developer information | Specifications | OpenType Layout tag registry