Unicode

(Tùi UNICODE choán--lâi)

Unicode, he̍k-chiá kiò thong-iōng-bé (通用碼), bān-kok-bé (萬國碼; Hôa-gí ho͘-im: Wanguoma) sī 1-chióng pian-bé piau-chún. Unicode ji̍t jī sī Eng-gí uni kap code 2-jī cho͘-ha̍p--khí-lâi-ê. Uni ū "thong-iōng" ê ì-sù; code sī "hû-bé" ê ì-sù. Unicode ê 1-ê tiōng-iàu ê lí-liām sī beh siat-kè 1-thò ē-sài chhú-lí sè-kài kok-chióng bûn-jī ê pian-bé.

Kán-tan kóng, thong-iōng-bé sī 1-ê kok-chè piau-chún. I ê bo̍k-piau sī kā chhú-lí sè-kài kok-chióng gí-giân ê bûn-jī ê jī-tô͘ chòe-pian-bé. Kā múi 1-ê jī-tô͘ tùi-èng kàu 1-ê chéng-sò͘. Chit-ê chéng-sò͘ kiò-chòe chit-ê jī-tô͘ ê bé-ūi. Án-ne ē-sài kā bûn-jī choán-hoà choè sò͘-jī, chiah ū-hoat-tō iōng tiān-náu chhú-lí kah pó-chûn.

Thong-iōng-bé ū chi̍t-koá ki-su̍t siōng ê hān-chè kap būn-tê. Mā-ū chi̍t-kóa phoe-phêng. M̄-koh, thong-iōng-bé chiām-chiām piàn-chòe nńg-thé kok-chè-hòa kap nńg-thé to-gí-giân khoân-kéng chit 2-hāng sū-kang siōng chú-liû ê pian-bé. Microsoft Windows NT kap āu-lâi ê Microsoft Windows 2000, Microsoft Windows XP iōng UTF-16 lâi pó-chûn hē-thóng lāi-pō͘ iōng ê bûn-jī. UNIX-lūi ê hē-thóng, chhiūⁿ Linux, BSD (OpenBSD, FreeBSD) kap Mac OS X iōng UTF-8 lâi piáu-hiān to-gí-giân ê bûn-jī.

Khí-goân

siu-kái

Chá-kî tiān-náu iōng ê pian-bé chú-iàu chiam-tùi Eng-gí lâi siat-kè. Ka-na sek-ha̍p chhú-lí Eng-bûn. Āu-lâi chiām-chiām cheng-ka Au-chiu kî-tha chú-iàu gí-giân iōng ê jī-bó. M̄-koh, bô-kâng kok-ka só͘ su-iàu kap chin-ka ê jī-bó lóng bô-kâng. Kiat-kó sī chhut-hiān chin-chē bô-hoat-tō͘ sio kau-thong ê pian-bé. Iōng Hoat-gí pian-bé hē-thóng pó-chûn ê chu-liāu, nā iōng Tek-gí pian-bé hē-thóng lâi tha̍k kap chhú-lí ē têng-tâⁿ--khì. Chiam-tùi 1-chióng gí-giân ê pian-bé hē-thóng siat-kè ê nńg-thé ka-na ē-sài chhú-lí hit chióng gí-giân. Beh kā chit-ê nńg-thé kái kah ē-sài chhú-lí pa̍t chióng gí-giân sī chin hùi-khì ê tāi-chì. Beh iōng tiān-náu chhú-lí 1-chóng í-siōng ê gí-giân ē-sài kóng chin khùn-lân. Nā sī khó-lū sè-kài kî-tha ê gí-giân kap bûn-jī, chit-ê būn-tê ka-na ē lú-lâi lú siong-tiōng.

Nā-sī ū 1-thò pian-bé ē-sài chhú-lí sè-kài kok-chióng bûn-jī. Bô kâng gí-giân ê chu-liāu kau-thong tio̍h piàn kán-tan. Tông-sî chhú-lí to-gí-giân mā piàn kán-tan. Nā-sī 1-thò nńg-thé lī-iōng chit chióng pian-bé lâi siat-kè, chit-ê nńg-thé, tiō sǹg-kóng khai-sí sī chiam-tùi bó͘ 1-chóng gí-giân lâi siat-kè, mā ē-sài khah kán-tan tio̍h kái lâi chi-oān pa̍t-chóng gí-giân kap bûn-jī. Chia-ê lī-ek ē-sài kóng sī chá-kî khai-sí thui-sak thong-iōng-bé ê tōng-ki.

Beh liáu-kái thui-sak thong-iōng-bé chit-chióng pian-bé piau-chún ê tōng-ki, su-iàu seng liáu-káu siáⁿ-mi̍h sī pian-bé. Iōng Eng-gí chòe lē. Eng-gí su-iàu 26 ê tōa-siá ê jī (ABC...XYZ), 26 ê sió-siá ê jī (abc...xyz), Arabic sò͘-jī (0123456789), kap 1-kóa piau-tiám (jī). Beh iōng tiān-náu chhú-lí Eng-gí, su-iàu 1-ê tùi-chiàu-pió, chit-ê pió ka múi 1-ê jī tùi-èng 1-ê to̍k-it ê 2-chìn-ūi sò͘-jī. M̄-koh, tiòng-iàu ê sī, ta̍k-ê lâng lóng ài iōng kāng-khóan ê tùi-chiàu-pió. Án-ne ta̍k-ke chia ū hoat-tō ko͘-thong, beh ka chia-ê 2-chìn-ūi sò͘-jī hoan-e̍k tńg lâi chòe Eng-gí chiah bôe têng-tâⁿ.

Siông-sè chhiáⁿ khoàⁿ: ASCII

1-ê pian-bé hē-thóng ē-sài tùi-èng kàu gōa-chē ê jī-tô͘ ài khòaⁿ chit-ê pian-bé iōng kui-ê bit lâi pó-chûn pian-bé-pió. 1-ê 7-bit ê 2-chìn-ūi sò͘-jī tùi-èng ê hoàn-ûi sī àn 0 kàu 2^7-1=127(thak chòe 2 ê 7 chhù-hong). So-í, 1-ê 7-bit ê pian-bé ē-sài siōng-chē tùi-èng kàu 128 ê jī-tô͘. Kāng-khoán ê tō-lí, 1-ê 8-bit ê pian-bé ē-sài tùi-èng kàu 256 ê jī-tô͘. 1-ê 16 bit ê pian-bé ē-sài tùi-èng kàu ??? ê jī-tô͘. Iōng lú-chē bit ê pian-bé ē-sài tùi-èng kàu lú-chē ê jī-tô͘, m̄-koh, beh pó-chûn 1-ê jī su-iàu ê RAM mā lú-chē.

Chá-kî ê tiān-náu, RAM sī chin tin-kùi ê chu-goân. In-ùi án-ne, ta̍k-ke ē iōng sè ê pian-bé. Chhú-lí Eng-gí ê sī, sǹg-sǹg 7-bit ê pian-bé tio̍h ū-kàu. Che chò-sêng 7-bit ê ASCII pian-bé piau-chú. M̄-koh, kî-thaⁿ iōng lô-má-jī bûn-jī hē-thóng ê Europe gí-gian, chia-chia sū-iàu 1-koa ū ka phiat-im hū-ho ê jī, chhiūⁿ 'å', he̍k-chiá-sī 1-koá liân-jī, chhiūⁿ 'œ'. Chia-ê jī(jī-tô͘) bô pau-koah tī ASCII pian-bé. Europe kok-ka, khai-sí chè-têng 8-bit ê pian-bé. Chia-ê 8-bit pian-bé, tùi 0 kàu 128 ê bé-ūi kap ASCII oân-choân sio-siâng.

Cho͘-hó-ê jī-bó kap Cho͘-ha̍p-ê jī-bó

siu-kái

Ūi-tio̍h beh tī iú-hān ê pian-bé khong-kan lāi-tè chi-goân lú-chē lú-hó ê bûn-jī, thong-iōng-bé sú-iōng cho͘-hap-ê jī-bó ê chò-hoat. Iōng á chit-ê jī chò lē. Thong-iōng-bé ū hō chit-ê jī ka-kī 1-ê bé-ūi (U+00E1). M̄-koh, lán m̄a ē-sài siūⁿ-kóng chit-ê jī sī a (bé-ūi U+0061) kap ˊ lâi cho͘-hap--ê. Tī thong-iōng-bé ū tēng-gī 1-ê cho͘-hap-iōng (combining) ê ˊ (bé-ūi U+0301). N̄a-sī chhut-hiàn U+0061 U+0301 chit 2-ê sò͘-jī sio-liân, lán tio̍h ài liáu-kái che sī ài kà thâu-chêng U+0061 tāi-piáu ê a kap aū-piah U+0301 tāi-piáu ê ˊ, cho͘-ha̍p choè á. Iōng á (U+00E1) 1-ê sò͘-jī lâi piáu-sī Lô-má-jī jī-bó ê á, chit-chióng kiò cho͘-hó-ê jī-bó. (precomposed character). Iōng U+0061 U+0301 lâi piáu-sī, chit-chióng kiò cho͘-ha̍p-ê jī-bó (composed character). Chhiū U+0301 chit-chióng ê, chiò cho͘-ha̍p-iōng jī-bó (combining character).

1-ê ki-chhò jī-bó (base character) aū-piah ē-sài chiap 1-ê í-siōng ê cho͘-ha̍p-iōng jī-bó, hêng-sêng 1-ê cho͘-ha̍p-ê jī-bó. Nā sī chia-ê cho͘-ha̍p-ê jī-bó lóng beh kái chòe cho͘-hó-ê jī-bó, ū ka-kī ê bé-ūi. Ān-ne ē su-iàu iōng tiāu chin-chē bé-ūi, in-ùi cho͘-ha̍p ê khó-lêng-sèng ū chin-chē. M̄-koh, chá-chêng ê kî-thaⁿ pian-bé it-poaⁿ bô iōng cho͘-ha̍p. Ūi-tio̍h piāⁿ-lī chú-lí ka iōng kū pian-bé ê chu-liāu kau-thong, Europa ê chú-iàu gí-giân iōng ê bûn-jī ê jī-bó, i-poaⁿ tī thong-iōng-bé lāi lóng-ū tùi-èng ê cho͘-ha̍p-hó ê jī-bó. Ūi-tio̍h chiàu-kò 1-ê jī-bó khó-lêng ū 1-ê í-siōng ê piáu-sī-hoat (cho͘-hó-ê kap cho͘-ha̍p-ê). Thong-iōng-bé ū khu-tēng 2-ê piáu-sī án-chòaⁿ sèng kâng-ì (sio-siâng), chit-ê hoat-chek kiò canonical equivalence.

Iōng Hàn-jī ê '明' chit-ê jī chòe ké-sióng ê lē (si̍t-chè siōng Unicode tùi 明 ê chhù-lí sī chò tan-to̍k chi̍t jī, m̄-sī nn̄g jī ê cho͘-ha̍p). Chit-ê jī it-poaⁿ kan-na iōng 1-ê jī-tô͘ lâi ìn-soat, ū tok-lip ê bé-ūi. M̄-koh, chit-ê jī mā ē-sái thiah-chòe 2-ê jī-tô͘, hun-piat sī '日' kap '月'. Iōng chit 2-ê jī-tô͘ lâi ìn-soat, khó-lêng ē ìn chhut chhiūⁿ '日月' án-ne ê tô͘ , Chin pháiⁿ-khòaⁿ.

M̄-koh, kā '明' thiah-chòe 2-ê jī-tô͘ lâi ìn-soat ū 1-ê hó--chhù: kiám-chió su-iàu ê jī-tô͘. Tī 1-ê jī iōng 1-ê jī-tô͘ ê chêng-hêng, beh ìn '日','月','明', su-iàu 3-ê jī-tô͘. Nā-sī '明' thiah chòe 2-ê jī-tô͘ ìn, kan-na su-iàu 2-ê jī-tô͘. In-ùi jī-tô͘ ê sò͘-bo̍k it-tèng sī iú-hān. Beh iōng chia ê iú-hān ê jī-tô͘ lâi ìn-soat pí jī-tô͘ sò͘-bo̍k koh-khah chē ê jī ê sî, su-iàu kā jī thiah-choè 1-ê í-siōng ê jī-tô͘ lâi ìn. Iā-tio̍h-sī iōng jī-tô͘ khì cho͘-ha̍p (tàu) chhut sin-ê jī.

Beh iōng 2-ê í-siōng ê jī-tô͘ lâi cho͘-ha̍p chhut 1-ê jī ê sî, Su iàu iōng 1-kóa ìn-soat ki-su̍t, nā-bô ìn ê jī ê chhiūⁿ '日月' án-ne chin pháiⁿ-khòaⁿ.

Hián-sī ê būn-tê

siu-kái

Beh chèng-khak hián-sī cho͘-ha̍p ê jī-bó sū-iàu khah ho̍k-cha̍p ê jī-hêng hián-sī ki-su̍t. Chia ê ki-su̍t m̄-sī thong-iōng-bé piau-chún ê 1-pō͘-hūn. Chóng-kóng in-ùi tōa-pō͘-hūn ê bûn-jī kan-na su-iàu iōng cho͘-hó-ê jī-bó, tiān-náu nńg-té tùi cho͘-ha̍p jī-bó ê chi-oān kaù-taⁿ iû-oân bô-kaù-hó. Ū-hoat-tō͘ chèng-khak hián-sī cho͘-ha̍p jī-bó ê ki-chân jī-hêng ki-su̍t ū OpenType (Adobe System kap Microsoft chè-tēng), AAT (Apple Computer chè-tēng), kap Graphite (SIL International chè-tēng). M̄-koh, tōa-hūn ê nńg-thé bô khì lī-iōng chia ê jī-hêng ki-su̍t, tōa-hūn ê jī-hêng mā bô chi-oān, só͘-í bô hoat-tō͘ chèng-khak hián-sī cho͘-ha̍p ê jī-bó. Pí-lūn chiū Pe̍h-oē-jī lâi kóng, chin chē Pe̍h-oē-jī jī-bó m̄-sī tāi-seng to̍h í-keng cho͘-ha̍p hó-sè ê thong-iōng-bé, iā-to̍h-sī kóng it-tēng ài-iōng cho͘-ha̍p ê hong-sek. Tōa-pō͘-hūn ê nńg-thé leh hián-sī chia ê jī-bó ê sî ē têng-tâⁿ--khì (pìⁿ chòe lōan-má).

Siang-hiòng bûn-jī

siu-kái

Ū ê bûn-jī hē-thóng sī àn tó-pêng hiòng chiàⁿ-pêng siá, chhiūⁿ Latin bûn-jī. Ū ê sī àn chiàⁿ-pêng hiòng tó-pêng siá, chhiūⁿ Hi-pek-lâi-gí kap A-la-pek-gí.

Bé-ūi ê hoàn-ûi
16 chìn-ūi
UTF-16 UTF-8
binary
Notes
000000 - 00007F 00000000 0xxxxxxx 0xxxxxxx ASCII equivalence range; byte begins with zero
000080 - 0007FF 00000xxx xxxxxxxx 110xxxxx 10xxxxxx first byte begins with 110 or 1110, the following byte(s) begin with 10
000800 - 00FFFF xxxxxxxx xxxxxxxx 1110xxxx 10xxxxxx 10xxxxxx
010000 - 10FFFF 110110xx xxxxxxxx
110111xx xxxxxxxx
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx UTF-16 requires surrogates; an offset of 0x10000 is subtracted, so the bit pattern is not identical with UTF-8

  Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

  Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

  Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Consortium

siu-kái

  Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

  Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Hàn-jī thóng-it (Han unification)

siu-kái

  Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Jī-hêng

siu-kái

  Chit pha iáu-bōe ū lâng siá. Chhiáⁿ tàu pó͘-chhiong lōe-iông.

Pán-pún le̍k-sú

siu-kái
  • 1991 nî Unicode 1.0
  • 1993 nî Unicode 1.1
  • 1996 nî Unicode 2.0
  • 1998 nî Unicode 2.1
  • 1999 nî Unicode 3.0
  • 2001 nî Unicode 3.1
  • 2002 nî Unicode 3.2
  • 2003 nî Unicode 4.0
  • 2005 nî Unicode 4.1
  • 2006 nî Unicode 5.0

Pe̍h-ōe-jī kap Thong-iōng-bé

siu-kái

Chhiáⁿ chham-khó Taigi Unicode chit-phiⁿ bûn-chiuⁿ.

Goā-pō͘ liân-kiat

siu-kái