Today’s article is the very first one that references IICore (International Ideographs Core), which is best described as a region-agnostic subset that includes the most commonly used CJK Unified Ideographs in Unicode, and is intended for use in memory-challenged devices and environments. Included are 9,810 ideographs, the bulk of which are in the URO (9,706), with the remaining ones in Extensions A (42) and B (62).
IICore is instantiated as the kIICore property of the Unihan Database, and documented in UAX #38. The kIICore property values consist of an initial letter—A, B, or C—that indicates priority, followed by one or more letters that specify a source that more or less corresponds to a region: G, H, J, K, M, P (short for KP), and T.
In Part 1 of what may eventually become a multiple-part series about IICore, I will briefly explore the ideographs that are tagged “K” for Korean use, along with pointing out some that should have been tagged “K” after examining the mappings to the KS X 1001 standard.
A total of 4,744 ideographs are tagged “K” in their kIICore property values. Of these, 138 are outside of KS X 1001. We’ll come back to them at the end of this article.
It is very curious that only 14 of the 4,620 ideographs that are included in the KS X 1001 standard are not tagged “K” in their kIICore property values, yet are included in kIICore. The table below lists them and their kIICore property values, along with a related ideograph, if any:
Eight of the ideographs can be explained by guessing that an initial version of IICore may have included the corresponding CJK Compatibility Ideographs that were subsequently stripped out. Another five—U+734E 獎, U+6EF2 滲, U+8008 耈, U+5191 冑 & U+9ED9 黙—can be explained because they were apparently the preferred code points for the very popular HWP (Hangul Word Processor) app, which was likely used to enter the ideographs by those who compiled the list for Korea (ROK). The only possible explanation for U+8A70 詰 seems to be because it happens to be the very last hanja (aka ideograph) in the KS X 1001 standard, and may have felt victim to an inadvertent off-by-one error.
The obvious fix here is to simply tag the 14 characters on the left column of the table with “K” in their kIICore property values, which will make KS X 1001 support complete, and the best part is that it will not change the number of ideographs in IICore.
Going back to the 138 ideographs outside of KS X 1001 that are tagged “K” in their kIICore property values, it turns out that the following seven do not have a kIRG_KSource property value, which raises the proverbial red flag:
G5-4047, HB2-DD43, T2-4249
G3-3F59, H-98CA, KP1-5945, T3-3D35
GE-3354, H-FC71, T3-6567
G3-3F71, HB2-F040, KP1-59CB, T2-622D
G5-577A, KP1-5FAC, T3-2E3B
GE-3642, KP1-62B1, T3-5A65
GE-4874, J0-4C5B, T4-5560
Unfortunately, the person who compiled the “K” portion of IICore passed away, so we may never know exactly why these ideographs were tagged “K” in their kIICore property values. Only U+9ED9 黙, which makes an appearance in both tables, can be explained by being the preferred code point for the HWP app.