Exploring IICore—Part 1


Adobe

Today’s article is the very first one that references IICore (International Ideographs Core), which is best described as a region-agnostic subset that includes the most commonly used CJK Unified Ideographs in Unicode, and is intended for use in memory-challenged devices and environments. Included are 9,810 ideographs, the bulk of which are in the URO (9,706), with the remaining ones in Extensions A (42) and B (62).
IICore is instantiated as the kIICore property of the Unihan Database, and documented in UAX #38. The kIICore property values consist of an initial letter—A, B, or C—that indicates priority, followed by one or more letters that specify a source that more or less corresponds to a region: G, H, J, K, M, P (short for KP), and T.

In Part 1 of what may eventually become a multiple-part series about IICore, I will briefly explore the ideographs that are tagged “K” for Korean use, along with pointing out some that should have been tagged “K” after examining the mappings to the KS X 1001 standard.
A total of 4,744 ideographs are tagged “K” in their kIICore property values. Of these, 138 are outside of KS X 1001. We’ll come back to them at the end of this article.
It is very curious that only 14 of the 4,620 ideographs that are included in the KS X 1001 standard are not tagged “K” in their kIICore property values, yet are included in kIICore. The table below lists them and their kIICore property values, along with a related ideograph, if any:

Ideograph
kIICore
Related Ideograph
kIICore

塞 U+585E
AGTJHMP
塞 U+F96C
n/a

奬 U+596C
AP
獎 U+734E
ATHKM

復 U+5FA9
ATJHMP
復 U+F966
n/a

慄 U+6144
ATJHMP
慄 U+F9D9
n/a

戀 U+6200
ATHMP
戀 U+F990
n/a

撚 U+649A
ATJHMP
撚 U+F991
n/a

栗 U+6817
AGTJHMP
栗 U+F9DA
n/a

渗 U+6E17
AG
滲 U+6EF2
ATJHKMP

耉 U+8009
AP
耈 U+8008
CK

胄 U+80C4
AGTJP
冑 U+5191
ATJHK

詰 U+8A70
ATJHMP
NONE
n/a

諾 U+8AFE
ATJHMP
諾 U+F95D
n/a

輦 U+8F26
ATJHP
輦 U+F998
n/a

默 U+9ED8
AGTHMP
黙 U+9ED9
AJK

Eight of the ideographs can be explained by guessing that an initial version of IICore may have included the corresponding CJK Compatibility Ideographs that were subsequently stripped out. Another five—U+734E 獎, U+6EF2 滲, U+8008 耈, U+5191 冑 & U+9ED9 黙—can be explained because they were apparently the preferred code points for the very popular HWP (Hangul Word Processor) app, which was likely used to enter the ideographs by those who compiled the list for Korea (ROK). The only possible explanation for U+8A70 詰 seems to be because it happens to be the very last hanja (aka ideograph) in the KS X 1001 standard, and may have felt victim to an inadvertent off-by-one error.
The obvious fix here is to simply tag the 14 characters on the left column of the table with “K” in their kIICore property values, which will make KS X 1001 support complete, and the best part is that it will not change the number of ideographs in IICore.
Going back to the 138 ideographs outside of KS X 1001 that are tagged “K” in their kIICore property values, it turns out that the following seven do not have a kIRG_KSource property value, which raises the proverbial red flag:

Ideograph
kIICore
Source References

媴 U+5AB4
CK
G5-4047, HB2-DD43, T2-4249

琟 U+741F
CK
G3-3F59, H-98CA, KP1-5945, T3-3D35

璤 U+74A4
CK
GE-3354, H-FC71, T3-6567

璸 U+74B8
BTK
G3-3F71, HB2-F040, KP1-59CB, T2-622D

砇 U+7807
CK
G5-577A, KP1-5FAC, T3-2E3B

穦 U+7A66
CK
GE-3642, KP1-62B1, T3-5A65

黙 U+9ED9
AJK
GE-4874, J0-4C5B, T4-5560

Unfortunately, the person who compiled the “K” portion of IICore passed away, so we may never know exactly why these ideographs were tagged “K” in their kIICore property values. Only U+9ED9 黙, which makes an appearance in both tables, can be explained by being the preferred code point for the HWP app.
🐡