Main

Zeichen 1: Latein, Kyrillisch, Arabisch, Thai, Tibet, Phonetisch (kompakt). Tabellen 1-16

UTF-8 Listen mit den Unicode- und HTML-Zeichen in Tabellenform, vgl. utf8-zeichentabelle.de, unicode-table.com.

**Basic Latin + Latin-1 Supplement + Latin Extended-A + Latin Extended-B** => 512 Zeichen utf-8 ab U+00000 hex = 0 dezimal Details
Nr 1	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00000
00020		!	"	#	$	%	&	'	(	)	*	+	,	-	.	/	0	1	2	3	4	5	6	7	8	9	:	;	>	=	>	?
00040	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
00060	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~
00080
000A0		¡	¢	£	¤	¥	¦	§	¨	©	ª	«	¬		®	¯	°	±	²	³	´	µ	¶	·	¸	¹	º	»	¼	½	¾	¿
000C0	À	Á	Â	Ã	Ä	Å	Æ	Ç	È	É	Ê	Ë	Ì	Í	Î	Ï	Ð	Ñ	Ò	Ó	Ô	Õ	Ö	×	Ø	Ù	Ú	Û	Ü	Ý	Þ	ß
000E0	à	á	â	ã	ä	å	æ	ç	è	é	ê	ë	ì	í	î	ï	ð	ñ	ò	ó	ô	õ	ö	÷	ø	ù	ú	û	ü	ý	þ	ÿ
1.2
00100	Ā	ā	Ă	ă	Ą	ą	Ć	ć	Ĉ	ĉ	Ċ	ċ	Č	č	Ď	ď	Đ	đ	Ē	ē	Ĕ	ĕ	Ė	ė	Ę	ę	Ě	ě	Ĝ	ĝ	Ğ	ğ
00120	Ġ	ġ	Ģ	ģ	Ĥ	ĥ	Ħ	ħ	Ĩ	ĩ	Ī	ī	Ĭ	ĭ	Į	į	İ	ı	Ĳ	ĳ	Ĵ	ĵ	Ķ	ķ	ĸ	Ĺ	ĺ	Ļ	ļ	Ľ	ľ	Ŀ
00140	ŀ	Ł	ł	Ń	ń	Ņ	ņ	Ň	ň	ŉ	Ŋ	ŋ	Ō	ō	Ŏ	ŏ	Ő	ő	Œ	œ	Ŕ	ŕ	Ŗ	ŗ	Ř	ř	Ś	ś	Ŝ	ŝ	Ş	ş
00160	Š	š	Ţ	ţ	Ť	ť	Ŧ	ŧ	Ũ	ũ	Ū	ū	Ŭ	ŭ	Ů	ů	Ű	ű	Ų	ų	Ŵ	ŵ	Ŷ	ŷ	Ÿ	Ź	ź	Ż	ż	Ž	ž	ſ
00180	ƀ	Ɓ	Ƃ	ƃ	Ƅ	ƅ	Ɔ	Ƈ	ƈ	Ɖ	Ɗ	Ƌ	ƌ	ƍ	Ǝ	Ə	Ɛ	Ƒ	ƒ	Ɠ	Ɣ	ƕ	Ɩ	Ɨ	Ƙ	ƙ	ƚ	ƛ	Ɯ	Ɲ	ƞ	Ɵ
001A0	Ơ	ơ	Ƣ	ƣ	Ƥ	ƥ	Ʀ	Ƨ	ƨ	Ʃ	ƪ	ƫ	Ƭ	ƭ	Ʈ	Ư	ư	Ʊ	Ʋ	Ƴ	ƴ	Ƶ	ƶ	Ʒ	Ƹ	ƹ	ƺ	ƻ	Ƽ	ƽ	ƾ	ƿ
001C0	ǀ	ǁ	ǂ	ǃ	Ǆ	ǅ	ǆ	Ǉ	ǈ	ǉ	Ǌ	ǋ	ǌ	Ǎ	ǎ	Ǐ	ǐ	Ǒ	ǒ	Ǔ	ǔ	Ǖ	ǖ	Ǘ	ǘ	Ǚ	ǚ	Ǜ	ǜ	ǝ	Ǟ	ǟ
001E0	Ǡ	ǡ	Ǣ	ǣ	Ǥ	ǥ	Ǧ	ǧ	Ǩ	ǩ	Ǫ	ǫ	Ǭ	ǭ	Ǯ	ǯ	ǰ	Ǳ	ǲ	ǳ	Ǵ	ǵ	Ƕ	Ƿ	Ǹ	ǹ	Ǻ	ǻ	Ǽ	ǽ	Ǿ	ǿ

Englische Dokumentation

U+00000 Basic Latin (Alphabet, 128 codes from 0000–007F. Language english, german, french, italian, polish in ): The Basic Latin (or C0 Controls and Basic Latin) Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. The Basic Latin block was included in its present from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire. The classical Latin alphabet, also known as the Roman alphabet, is a writing system that evolved from the visually similar Cumaean Greek version of the Greek alphabet. The Greek alphabet, including the Cumaean version, descended from the Phoenician abjad. The Etruscans who ruled early Rome adopted and modified the Cumaean Greek alphabet. The Etruscan alphabet was in turn adopted and further modified by the ancient Romans to write the Latin language. During the Middle Ages scribes adapted the Latin alphabet for writing Romance languages, direct descendants of Latin, as well as Celtic, Germanic, Baltic, and some Slavic languages. With the age of colonialism and Christian evangelism, the Latin script spread beyond Europe, coming into use for writing indigenous American, Australian, Austronesian, Austroasiatic, and African languages. More recently, linguists have also tended to prefer the Latin script or the International Phonetic Alphabet (itself largely based on Latin script) when transcribing or creating written standards for non-European languages, such as the African reference alphabet. The term Latin alphabet may refer to either the alphabet used to write Latin (as described in this article), or other alphabets based on the Latin script, which is the basic set of letters common to the various alphabets descended from the classical Latin one, such as the English alphabet. These Latin alphabets may discard letters, like the Rotokas alphabet, or add new letters, like the Danish and Norwegian alphabets. Letter shapes have evolved over the centuries, including the creation for Medieval Latin of lower-case forms which did not exist in the Classical period.

U+00080 Latin-1 Supplement (, 128 codes from 0080–00FF. Language in ): The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) — FF (U+00FF). Controls C1 (0080–009F) are not graphic. The C1 Controls and Latin-1 Supplement block has been included in its present form, with the same character repertoire since version 1.0 of the Unicode Standard, where it was known as Latin 1

U+00100 Latin Extended-A (Alphabet, 128 codes from 0100–017F. Language celtic, sami, maltese, turkish in ): Latin Extended-A is a block of the Unicode Standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characters from the ISO 6937 standard. The Latin Extended-A block has been in the Unicode Standard since version 1.0, with its entire character repertoire, except for the Latin Small Letter Long S, which was added during unification with ISO 10646 in version 1.1

U+00180 Latin Extended-B (Alphabet, 208 codes from 0180–024F. Language slovenian, croatian in ): Latin Extended-B is a block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points U+0180..U+01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block was expanded, and another 65 characters were added. In version 3.0, the last thirty available code points in the block were assigned. The Latin Extended-B block contains ten subheadings for groups of characters: Non-European and historic Latin, African letters for clicks, Croatian digraphs matching Serbian Cyrillic letters, Pinyin diacritic-vowel combinations, Phonetic and historic letters, Additions for Slovenian and Croatian, Additions for Romanian, Miscellaneous additions, Additions for Livonian, and Additions for Sinology. The Non-European and historic, African clicks, Croatian digraphs, Pinyin, and the first part of the Phonetic and historic letters were present in Unicode 1.0; additional Phonetic and historic letters were added for version 3.0; and other Phonetic and historic, as well as the rest of the sub-blocks were the characters added for version 1.1.

**Latin Extended-B + IPA Extensions + Spacing Modifier Letters + Combining Diacritical Marks + Greek and Coptic** => 512 Zeichen utf-8 ab U+00200 hex = 512 dezimal Details
Nr 2	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00200	Ȁ	ȁ	Ȃ	ȃ	Ȅ	ȅ	Ȇ	ȇ	Ȉ	ȉ	Ȋ	ȋ	Ȍ	ȍ	Ȏ	ȏ	Ȑ	ȑ	Ȓ	ȓ	Ȕ	ȕ	Ȗ	ȗ	Ș	ș	Ț	ț	Ȝ	ȝ	Ȟ	ȟ
00220	Ƞ	ȡ	Ȣ	ȣ	Ȥ	ȥ	Ȧ	ȧ	Ȩ	ȩ	Ȫ	ȫ	Ȭ	ȭ	Ȯ	ȯ	Ȱ	ȱ	Ȳ	ȳ	ȴ	ȵ	ȶ	ȷ	ȸ	ȹ	Ⱥ	Ȼ	ȼ	Ƚ	Ⱦ	ȿ
00240	ɀ	Ɂ	ɂ	Ƀ	Ʉ	Ʌ	Ɇ	ɇ	Ɉ	ɉ	Ɋ	ɋ	Ɍ	ɍ	Ɏ	ɏ	ɐ	ɑ	ɒ	ɓ	ɔ	ɕ	ɖ	ɗ	ɘ	ə	ɚ	ɛ	ɜ	ɝ	ɞ	ɟ
00260	ɠ	ɡ	ɢ	ɣ	ɤ	ɥ	ɦ	ɧ	ɨ	ɩ	ɪ	ɫ	ɬ	ɭ	ɮ	ɯ	ɰ	ɱ	ɲ	ɳ	ɴ	ɵ	ɶ	ɷ	ɸ	ɹ	ɺ	ɻ	ɼ	ɽ	ɾ	ɿ
00280	ʀ	ʁ	ʂ	ʃ	ʄ	ʅ	ʆ	ʇ	ʈ	ʉ	ʊ	ʋ	ʌ	ʍ	ʎ	ʏ	ʐ	ʑ	ʒ	ʓ	ʔ	ʕ	ʖ	ʗ	ʘ	ʙ	ʚ	ʛ	ʜ	ʝ	ʞ	ʟ
002A0	ʠ	ʡ	ʢ	ʣ	ʤ	ʥ	ʦ	ʧ	ʨ	ʩ	ʪ	ʫ	ʬ	ʭ	ʮ	ʯ	ʰ	ʱ	ʲ	ʳ	ʴ	ʵ	ʶ	ʷ	ʸ	ʹ	ʺ	ʻ	ʼ	ʽ	ʾ	ʿ
002C0	ˀ	ˁ	˂	˃	˄	˅	ˆ	ˇ	ˈ	ˉ	ˊ	ˋ	ˌ	ˍ	ˎ	ˏ	ː	ˑ	˒	˓	˔	˕	˖	˗	˘	˙	˚	˛	˜	˝	˞	˟
002E0	ˠ	ˡ	ˢ	ˣ	ˤ	˥	˦	˧	˨	˩	˪	˫	ˬ	˭	ˮ	˯	˰	˱	˲	˳	˴	˵	˶	˷	˸	˹	˺	˻	˼	˽	˾	˿
2.2
00300	̀	́	̂	̃	̄	̅	̆	̇	̈	̉	̊	̋	̌	̍	̎	̏	̐	̑	̒	̓	̔	̕	̖	̗	̘	̙	̚	̛	̜	̝	̞	̟
00320	̠	̡	̢	̣	̤	̥	̦	̧	̨	̩	̪	̫	̬	̭	̮	̯	̰	̱	̲	̳	̴	̵	̶	̷	̸	̹	̺	̻	̼	̽	̾	̿
00340	̀	́	͂	̓	̈́	ͅ	͆	͇	͈	͉	͊	͋	͌	͍	͎	͏	͐	͑	͒	͓	͔	͕	͖	͗	͘	͙	͚	͛	͜	͝	͞	͟
00360	͠	͡	͢	ͣ	ͤ	ͥ	ͦ	ͧ	ͨ	ͩ	ͪ	ͫ	ͬ	ͭ	ͮ	ͯ	Ͱ	ͱ	Ͳ	ͳ	ʹ	͵	Ͷ	ͷ	͸	͹	ͺ	ͻ	ͼ	ͽ	;	Ϳ
00380	΀	΁	΂	΃	΄	΅	Ά	·	Έ	Ή	Ί	΋	Ό	΍	Ύ	Ώ	ΐ	Α	Β	Γ	Δ	Ε	Ζ	Η	Θ	Ι	Κ	Λ	Μ	Ν	Ξ	Ο
003A0	Π	Ρ	΢	Σ	Τ	Υ	Φ	Χ	Ψ	Ω	Ϊ	Ϋ	ά	έ	ή	ί	ΰ	α	β	γ	δ	ε	ζ	η	θ	ι	κ	λ	μ	ν	ξ	ο
003C0	π	ρ	ς	σ	τ	υ	φ	χ	ψ	ω	ϊ	ϋ	ό	ύ	ώ	Ϗ	ϐ	ϑ	ϒ	ϓ	ϔ	ϕ	ϖ	ϗ	Ϙ	ϙ	Ϛ	ϛ	Ϝ	ϝ	Ϟ	ϟ
003E0	Ϡ	ϡ	Ϣ	ϣ	Ϥ	ϥ	Ϧ	ϧ	Ϩ	ϩ	Ϫ	ϫ	Ϭ	ϭ	Ϯ	ϯ	ϰ	ϱ	ϲ	ϳ	ϴ	ϵ	϶	Ϸ	ϸ	Ϲ	Ϻ	ϻ	ϼ	Ͻ	Ͼ	Ͽ

Englische Dokumentation

U+00200 Latin Extended-B (Alphabet, 208 codes from 0180–024F. Language slovenian, croatian in ): Latin Extended-B is a block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points U+0180..U+01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block was expanded, and another 65 characters were added. In version 3.0, the last thirty available code points in the block were assigned. The Latin Extended-B block contains ten subheadings for groups of characters: Non-European and historic Latin, African letters for clicks, Croatian digraphs matching Serbian Cyrillic letters, Pinyin diacritic-vowel combinations, Phonetic and historic letters, Additions for Slovenian and Croatian, Additions for Romanian, Miscellaneous additions, Additions for Livonian, and Additions for Sinology. The Non-European and historic, African clicks, Croatian digraphs, Pinyin, and the first part of the Phonetic and historic letters were present in Unicode 1.0; additional Phonetic and historic letters were added for version 3.0; and other Phonetic and historic, as well as the rest of the sub-blocks were the characters added for version 1.1.

U+00250 IPA Extensions (Alphabet, 96 codes from 0250–02AF. Language in ): IPA Extensions is a block (0250–02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. With IPA´s ability to use Unicode for the presentation of phonetic symbols, ASCII-based systems such as X-SAMPA or Kirshenbaum are being supplanted. Within the Unicode blocks there are also a few former IPA characters no longer in international use by linguists. The IPA Extensions block has been present in Unicode since version 1.0, and was unchanged through the unification with ISO 10646. The block was filled out with extensions for representing disordered speech in version 3.0, and Sinology phonetic symbols in version 4.0 The International Phonetic Alphabet (unofficially—though commonly—abbreviated IPA) is an alphabetic system of phonetic notation based primarily on the Latin alphabet. It was devised by the International Phonetic Association as a standardized representation of the sounds in oral language. Who needs IPA? This is a relevant question! Actually, a lot of people. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators. The IPA is designed to represent only those qualities of speech that are part of oral language: phones, phonemes, intonation, and the separation of words and syllables. To represent additional qualities of speech, such as tooth gnashing, lisping, and sounds made with a cleft palate, an extended set of symbols called the Extensions to the IPA may be used. IPA symbols consist of one or more elements of two basic types, letters and diacritics. For example, the sound of the English letter ´t´ may be transcribed in IPA with a single letter, , or with a letter plus diacritics, , depending on how precise you want to describe its features in the context. Slashes are often used to signal broad or phonemic transcription; thus, /t/ is less specific and could refer to either or , depending on the context and language. Letters or diacritics might be added, removed, or modified by the International Phonetic Association. According to the recent change in 2005, there are 107 letters, 52 diacritics, and four prosodic marks in the IPA. These are shown in the current IPA chart, posted below in this article and at the website of the IPA. Let´s have some fun! You can take letters from this block and flip your text to entertain yourself and your friends.

U+002B0 Spacing Modifier Letters (Alphabet, 80 codes from 02B0–02FF. Language in ): Spacing Modifier letters is a Unicode block containing characters for the IPA (International Phonetic Alphabet), UPA (Uralic Phonetic Alphabet or Finno-Ugric transcription system), and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization.

U+00300 Combining Diacritical Marks (Alphabet, 112 codes from 0300–036F. Language in ): Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the Combining Grapheme Joiner, which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. A diacritic /daɪ.əˈkrɪtɨk/ – also diacritical mark, diacritical point, or diacritical sign – is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός (diakritikós, “distinguishing”, from ancient Greek διά (diá, through) and κρίνω (krínein, to separate)). Diacritic is primarily an adjective, though sometimes used as a noun, whereas diacritical is only ever an adjective. Some diacritical marks, such as the acute (´) and grave (`), are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters. The main use of diacritical marks in the Latin script is to change the sound-values of the letters to which they are added. Examples from English are the diaereses in naïve and Noël, which show that the vowel with the diaeresis mark is pronounced separately from the preceding vowel; the acute and grave accents, which can indicate that a final vowel is to be pronounced, as in saké and poetic breathèd; and the cedilla under the “c” in the borrowed French word façade, which shows it is pronounced /s/ rather than /k/. In other Latin alphabets, they may distinguish between homonyms, such as the French là (“there”) versus la (“the”), which are both pronounced . In Gaelic type, a dot over a consonant indicates lenition of the consonant in question. In other alphabetic systems, diacritical marks may perform other functions. Vowel pointing systems, namely the Arabic harakat ( ـَ, ـُ, ـُ, etc.) and the Hebrew niqqud ( ַ, ֶ, ִ, ֹ , ֻ, etc.) systems, indicate sounds (vowels and tones) that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ـْـ ) mark the absence of a vowel. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo ( ◌҃ ) and the Hebrew gershayim ( ״ ), which, respectively, mark abbreviations or acronyms, and Greek diacritical marks, which showed that letters of the alphabet were being used as numerals. In the Hanyu Pinyin official romanization system for Chinese, diacritics are used to mark the tones of the syllables in which the marked vowels occur. In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language, and may vary from case to case within a language. In some cases, letters are used as “in-line diacritics” in place of ancillary glyphs, because they modify the sound of the letter preceding them, as in the case of the “h” in English “sh” and “th”.

U+00370 Greek and Coptic (Alphabet, 144 codes from 0370–03FF. Language greek, coptic in ): Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally used for writing Coptic, using the similar Greek letters, in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block. The Greek alphabet is the script that has been used to write the Greek language since the 8th century BC. It was derived from the earlier Phoenician alphabet, and was the first alphabetic script to have distinct letters for vowels as well as consonants. As such, it became the ancestor of numerous other European and Middle Eastern alphabets, including Latin and Cyrillic. Apart from its use in writing the Greek language, both in its ancient and its modern forms, the Greek alphabet today also serves as a source of technical symbols and labels in many domains of mathematics, science and other fields. In its classical and modern forms, the alphabet has 24 letters, ordered from alpha to omega. Like and Cyrillic0400–04FF, Greek originally had only a single form of each letter; it developed the letter case distinction between upper-case and lower-case forms in parallel with Latin during the modern era. Sound values and conventional transcriptions for some of the letters differ between Ancient Greek and Modern Greek usage, owing to phonological changes in the language. In traditional (“polytonic”) Greek orthography, vowel letters can be combined with several diacritics, including accent marks, so-called “breathing” marks, and the iota subscript. In common present-day usage for Modern Greek since the 1980s, this system has been simplified to a so-called “monotonic” convention The Coptic alphabet is the script used for writing the Coptic language. The repertoire of glyphs is based on the Greek alphabet augmented by letters borrowed from the Egyptian Demotic and is the first alphabetic script used for the Egyptian language. There are several alphabets, as the Coptic writing system may vary greatly among the various dialects and subdialects of the Coptic language.

**Cyrillic + Cyrillic Supplement + Armenian + Hebrew** => 512 Zeichen utf-8 ab U+00400 hex = 1024 dezimal Details
Nr 3	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00400	Ѐ	Ё	Ђ	Ѓ	Є	Ѕ	І	Ї	Ј	Љ	Њ	Ћ	Ќ	Ѝ	Ў	Џ	А	Б	В	Г	Д	Е	Ж	З	И	Й	К	Л	М	Н	О	П
00420	Р	С	Т	У	Ф	Х	Ц	Ч	Ш	Щ	Ъ	Ы	Ь	Э	Ю	Я	а	б	в	г	д	е	ж	з	и	й	к	л	м	н	о	п
00440	р	с	т	у	ф	х	ц	ч	ш	щ	ъ	ы	ь	э	ю	я	ѐ	ё	ђ	ѓ	є	ѕ	і	ї	ј	љ	њ	ћ	ќ	ѝ	ў	џ
00460	Ѡ	ѡ	Ѣ	ѣ	Ѥ	ѥ	Ѧ	ѧ	Ѩ	ѩ	Ѫ	ѫ	Ѭ	ѭ	Ѯ	ѯ	Ѱ	ѱ	Ѳ	ѳ	Ѵ	ѵ	Ѷ	ѷ	Ѹ	ѹ	Ѻ	ѻ	Ѽ	ѽ	Ѿ	ѿ
00480	Ҁ	ҁ	҂	҃	҄	҅	҆	҇	҈	҉	Ҋ	ҋ	Ҍ	ҍ	Ҏ	ҏ	Ґ	ґ	Ғ	ғ	Ҕ	ҕ	Җ	җ	Ҙ	ҙ	Қ	қ	Ҝ	ҝ	Ҟ	ҟ
004A0	Ҡ	ҡ	Ң	ң	Ҥ	ҥ	Ҧ	ҧ	Ҩ	ҩ	Ҫ	ҫ	Ҭ	ҭ	Ү	ү	Ұ	ұ	Ҳ	ҳ	Ҵ	ҵ	Ҷ	ҷ	Ҹ	ҹ	Һ	һ	Ҽ	ҽ	Ҿ	ҿ
004C0	Ӏ	Ӂ	ӂ	Ӄ	ӄ	Ӆ	ӆ	Ӈ	ӈ	Ӊ	ӊ	Ӌ	ӌ	Ӎ	ӎ	ӏ	Ӑ	ӑ	Ӓ	ӓ	Ӕ	ӕ	Ӗ	ӗ	Ә	ә	Ӛ	ӛ	Ӝ	ӝ	Ӟ	ӟ
004E0	Ӡ	ӡ	Ӣ	ӣ	Ӥ	ӥ	Ӧ	ӧ	Ө	ө	Ӫ	ӫ	Ӭ	ӭ	Ӯ	ӯ	Ӱ	ӱ	Ӳ	ӳ	Ӵ	ӵ	Ӷ	ӷ	Ӹ	ӹ	Ӻ	ӻ	Ӽ	ӽ	Ӿ	ӿ
3.2
00500	Ԁ	ԁ	Ԃ	ԃ	Ԅ	ԅ	Ԇ	ԇ	Ԉ	ԉ	Ԋ	ԋ	Ԍ	ԍ	Ԏ	ԏ	Ԑ	ԑ	Ԓ	ԓ	Ԕ	ԕ	Ԗ	ԗ	Ԙ	ԙ	Ԛ	ԛ	Ԝ	ԝ	Ԟ	ԟ
00520	Ԡ	ԡ	Ԣ	ԣ	Ԥ	ԥ	Ԧ	ԧ	Ԩ	ԩ	Ԫ	ԫ	Ԭ	ԭ	Ԯ	ԯ	԰	Ա	Բ	Գ	Դ	Ե	Զ	Է	Ը	Թ	Ժ	Ի	Լ	Խ	Ծ	Կ
00540	Հ	Ձ	Ղ	Ճ	Մ	Յ	Ն	Շ	Ո	Չ	Պ	Ջ	Ռ	Ս	Վ	Տ	Ր	Ց	Ւ	Փ	Ք	Օ	Ֆ	՗	՘	ՙ	՚	՛	՜	՝	՞	՟
00560	ՠ	ա	բ	գ	դ	ե	զ	է	ը	թ	ժ	ի	լ	խ	ծ	կ	հ	ձ	ղ	ճ	մ	յ	ն	շ	ո	չ	պ	ջ	ռ	ս	վ	տ
00580	ր	ց	ւ	փ	ք	օ	ֆ	և	ֈ	։	֊	֋	֌	֍	֎	֏	֐	֑	֒	֓	֔	֕	֖	֗	֘	֙	֚	֛	֜	֝	֞	֟
005A0	֠	֡	֢	֣	֤	֥	֦	֧	֨	֩	֪	֫	֬	֭	֮	֯	ְ	ֱ	ֲ	ֳ	ִ	ֵ	ֶ	ַ	ָ	ֹ	ֺ	ֻ	ּ	ֽ	־	ֿ
005C0	׀	ׁ	ׂ	׃	ׄ	ׅ	׆	ׇ	׈	׉	׊	׋	׌	׍	׎	׏	א	ב	ג	ד	ה	ו	ז	ח	ט	י	ך	כ	ל	ם	מ	ן
005E0	נ	ס	ע	ף	פ	ץ	צ	ק	ר	ש	ת	׫	׬	׭	׮	ׯ	װ	ױ	ײ	׳	״	׵	׶	׷	׸	׹	׺	׻	׼	׽	׾	׿

Englische Dokumentation

U+00400 Cyrillic (Alphabet, 256 codes from 0400–04FF. Language russian, ukrainian, bulgarian in ): Cyrillic is a Unicode block containing the characters used to write the widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies. The Cyrillic script /sɨˈrɪlɪk/ is an alphabetic writing system employed across Eastern Europe, North and Central Asian countries. It is based on the Early Cyrillic, which was developed in the First Bulgarian Empire during the 9th century AD at the Preslav Literary School. It is the basis of alphabets used in various languages, past and present, in parts of Southeastern Europe and Northern Eurasia, especially those of Slavic origin, and non-Slavic languages influenced by Russian. As of 2011, around 252 million people in Eurasia use it as the official alphabet for their national languages. About half of them are in Russia. Thus, Cyrillic is one of the most used writing systems in the world. Cyrillic is derived from the Greek uncial script, augmented by letters from the older Glagolitic alphabet, including some ligatures. These additional letters were used for sounds not found in Greek. The script is named in honor of the two Byzantine brothers, Saints Cyril and Methodius, who created the Glagolitic alphabet earlier on. Modern scholars believe that Cyrillic was developed and formalized by early disciples of Cyril and Methodius. With the accession of Bulgaria to the European Union on 1 January 2007, Cyrillic became the third official script of the European Union, following the Latin and Greek scripts.

U+00500 Cyrillic Supplement (Alphabet, 48 codes from 0500–052F. Language komi in ): Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut, Azerbaijani, and Jakovlev´s Chuvash orthography.

U+00530 Armenian (Alphabet, 96 codes from 0530–058F. Language armenian in ): Armenian is a Unicode block containing characters for writing the Armenian language, both the traditional Western Armenian and reformed Eastern Armenian orthographies. Five Armenian ligatures are encoded in the Alphabetic Presentation Forms block. The Armenian language (classical: հայերէն; reformed: հայերեն hayeren) is an Indo-European language spoken by the Armenians. It is the official language of the Republic of Armenia and the self-proclaimed Nagorno-Karabakh Republic. It has historically been spoken throughout the Armenian Highlands and today is widely spoken in the Armenian diaspora. Armenian has its own unique script, the Armenian alphabet, invented in 405 AD by Mesrop Mashtots. Scholars classify Armenian as an independent branch of the Indo-European language family. The area that linguists are especially interested in is the distinctive phonological developments within the Indo-European languages. Armenian shares a number of major innovations with Greek, and some linguists group these two languages with Phrygian and the Indo-Iranian family into a higher-level subgroup of Indo-European, which is defined by such shared changes as the augment. Recently other scholars have proposed a Balkan grouping including Greek, Phrygian, Armenian, and Albanian. Armenia was a monolingual country till the second century BC. Its language has long literary history, with a fifth-century Bible translation as its oldest surviving text. There are two standardized modern literary forms, Eastern Armenian and Western Armenian, with which most contemporary dialects are mutually intelligible.

U+00590 Hebrew (Alphabet, 112 codes from 0590–05FF. Language hebrew, yiddish in ): Hebrew is a Unicode block containing characters for writing the Hebrew, Yiddish, Ladino, and other Jewish diaspora languages. Hebrew is a West Semitic language of the Afroasiatic language family. Historically, it is regarded as the language of the Hebrew Israelites and their ancestors, although the language was not referred to by the name Hebrew in the Tanakh. The earliest examples of written Paleo-Hebrew date from the 10th century BC, those were primitive drawings. Since the language used in that inscription remained unknown, it was impossible to prove whether it was in fact Hebrew or another local language. Hebrew had ceased to be an everyday spoken language somewhere between 200 and 400 CE, declining since the Bar Kochba War. Aramaic and to a lesser extent Greek were already in use as international languages, especially among elites and immigrants. Thus, Hebrew survived into the medieval period as the language of Jewish liturgy, rabbinic literature, intra-Jewish commerce, and poetry. Then, in the 19th century, it was revived as a spoken and literary language. According to Ethnologue, nowadays it´s spoken by 9 million people worldwide, including 7 million who are from Israel. If you didn´t know, The United States has the second largest Hebrew speaking population, with about 221,593 fluent speakers, mostly from Israel. Modern Hebrew is one of the two official languages of Israel (the other is Arabic). As for pre-modern Hebrew, it is used for prayers or studies in Jewish communities all around the world today. Ancient Hebrew is also the liturgical language of the Samaritans, while modern Hebrew or Arabic are their vernacular. As a foreign language, it is studied mostly by Jews and students of Judaism and Israel, and by archaeologists and linguists specializing in the Middle East and its civilizations, as well as by theologians in Christian seminaries. The Torah (the first five books), and most of the rest of the Hebrew Bible, are written in Biblical Hebrew. Much of its present form is written in the dialect that scholars believe flourished around the 6th century BC, around the time of the Babylonian exile. For this reason, Hebrew has been referred to by Jews as Leshon HaKodesh (לשון הקדש), “The Holy Language”, since ancient times.

**Arabic + Syriac + Arabic Supplement + Thaana + NKo** => 512 Zeichen utf-8 ab U+00600 hex = 1536 dezimal Details
Nr 4	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00600	؀	؁	؂	؃	؄	؅	؆	؇	؈	؉	؊	؋	،	؍	؎	؏	ؐ	ؑ	ؒ	ؓ	ؔ	ؕ	ؖ	ؗ	ؘ	ؙ	ؚ	؛	؜	؝	؞	؟
00620	ؠ	ء	آ	أ	ؤ	إ	ئ	ا	ب	ة	ت	ث	ج	ح	خ	د	ذ	ر	ز	س	ش	ص	ض	ط	ظ	ع	غ	ػ	ؼ	ؽ	ؾ	ؿ
00640	ـ	ف	ق	ك	ل	م	ن	ه	و	ى	ي	ً	ٌ	ٍ	َ	ُ	ِ	ّ	ْ	ٓ	ٔ	ٕ	ٖ	ٗ	٘	ٙ	ٚ	ٛ	ٜ	ٝ	ٞ	ٟ
00660	٠	١	٢	٣	٤	٥	٦	٧	٨	٩	٪	٫	٬	٭	ٮ	ٯ	ٰ	ٱ	ٲ	ٳ	ٴ	ٵ	ٶ	ٷ	ٸ	ٹ	ٺ	ٻ	ټ	ٽ	پ	ٿ
00680	ڀ	ځ	ڂ	ڃ	ڄ	څ	چ	ڇ	ڈ	ډ	ڊ	ڋ	ڌ	ڍ	ڎ	ڏ	ڐ	ڑ	ڒ	ړ	ڔ	ڕ	ږ	ڗ	ژ	ڙ	ښ	ڛ	ڜ	ڝ	ڞ	ڟ
006A0	ڠ	ڡ	ڢ	ڣ	ڤ	ڥ	ڦ	ڧ	ڨ	ک	ڪ	ګ	ڬ	ڭ	ڮ	گ	ڰ	ڱ	ڲ	ڳ	ڴ	ڵ	ڶ	ڷ	ڸ	ڹ	ں	ڻ	ڼ	ڽ	ھ	ڿ
006C0	ۀ	ہ	ۂ	ۃ	ۄ	ۅ	ۆ	ۇ	ۈ	ۉ	ۊ	ۋ	ی	ۍ	ێ	ۏ	ې	ۑ	ے	ۓ	۔	ە	ۖ	ۗ	ۘ	ۙ	ۚ	ۛ	ۜ	۝	۞	۟
006E0	۠	ۡ	ۢ	ۣ	ۤ	ۥ	ۦ	ۧ	ۨ	۩	۪	۫	۬	ۭ	ۮ	ۯ	۰	۱	۲	۳	۴	۵	۶	۷	۸	۹	ۺ	ۻ	ۼ	۽	۾	ۿ
4.2
00700	܀	܁	܂	܃	܄	܅	܆	܇	܈	܉	܊	܋	܌	܍	܎	܏	ܐ	ܑ	ܒ	ܓ	ܔ	ܕ	ܖ	ܗ	ܘ	ܙ	ܚ	ܛ	ܜ	ܝ	ܞ	ܟ
00720	ܠ	ܡ	ܢ	ܣ	ܤ	ܥ	ܦ	ܧ	ܨ	ܩ	ܪ	ܫ	ܬ	ܭ	ܮ	ܯ	ܰ	ܱ	ܲ	ܳ	ܴ	ܵ	ܶ	ܷ	ܸ	ܹ	ܺ	ܻ	ܼ	ܽ	ܾ	ܿ
00740	݀	݁	݂	݃	݄	݅	݆	݇	݈	݉	݊	݋	݌	ݍ	ݎ	ݏ	ݐ	ݑ	ݒ	ݓ	ݔ	ݕ	ݖ	ݗ	ݘ	ݙ	ݚ	ݛ	ݜ	ݝ	ݞ	ݟ
00760	ݠ	ݡ	ݢ	ݣ	ݤ	ݥ	ݦ	ݧ	ݨ	ݩ	ݪ	ݫ	ݬ	ݭ	ݮ	ݯ	ݰ	ݱ	ݲ	ݳ	ݴ	ݵ	ݶ	ݷ	ݸ	ݹ	ݺ	ݻ	ݼ	ݽ	ݾ	ݿ
00780	ހ	ށ	ނ	ރ	ބ	ޅ	ކ	އ	ވ	މ	ފ	ދ	ތ	ލ	ގ	ޏ	ސ	ޑ	ޒ	ޓ	ޔ	ޕ	ޖ	ޗ	ޘ	ޙ	ޚ	ޛ	ޜ	ޝ	ޞ	ޟ
007A0	ޠ	ޡ	ޢ	ޣ	ޤ	ޥ	ަ	ާ	ި	ީ	ު	ޫ	ެ	ޭ	ޮ	ޯ	ް	ޱ	޲	޳	޴	޵	޶	޷	޸	޹	޺	޻	޼	޽	޾	޿
007C0	߀	߁	߂	߃	߄	߅	߆	߇	߈	߉	ߊ	ߋ	ߌ	ߍ	ߎ	ߏ	ߐ	ߑ	ߒ	ߓ	ߔ	ߕ	ߖ	ߗ	ߘ	ߙ	ߚ	ߛ	ߜ	ߝ	ߞ	ߟ
007E0	ߠ	ߡ	ߢ	ߣ	ߤ	ߥ	ߦ	ߧ	ߨ	ߩ	ߪ	߫	߬	߭	߮	߯	߰	߱	߲	߳	ߴ	ߵ	߶	߷	߸	߹	ߺ	߻	߼	߽	߾	߿

Englische Dokumentation

U+00600 Arabic (Alphabet, 256 codes from 0600–06FF. Language arabic, persian, kurd in ): Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits. The Arabic script is a writing system used for writing several languages of Asia and Africa, such as Arabic, the Sorani and Luri dialects of Kurdish, Persian, Pashto, and Urdu. Even until the 16th century, it was used to write some texts in Spanish. After the Latin script, Chinese characters, and Devanagari, it is the fourth-most widely used writing system in the world.The Arabic script is written from right to left in a cursive style. In most cases the letters transcribe consonants, or consonants and a few vowels, so most Arabic alphabets are abjads.The script was first used to write texts in Arabic, most notably the Qurʼān, the holy book of Islam. With the spread of Islam, it came to be used to write languages of many language families, leading to the addition of new letters and other symbols, with some versions, such as Kurdish, Uyghur, and old Bosnian being abugidas or true alphabets. It is also the basis for a rich tradition of Arabic calligraphy.

U+00700 Syriac (Alphabet, 80 codes from 0700–074F. Language syrian, arabic in ): The Syrian script consists of 22 letters derived from the corresponding letters of the older Aramaic alphabet. The direction of the script is from right to left; the character of the writing is italic, and most of the letters are interconnected inside the word.The type of script used in older manuscripts (up to the end of the fifth century) is known as the estrangel (ʔesṭrangelå from the Greek στρογγύλη ‛round’). The inscriptions on the estrangelo are known from the monumental epigraphy of the I century A.D. from Osroena. After the division of the Syrian Church into Nestorians and Jacobites, each of these two groups developed its own type of font. The East Syriac (´Nestorian´, ´Chaldean´ or ´Assyrian´) font appeared at the beginning of the VII century. In Syriac, it is called madnḥāyā (lit. ´eastern´). The outlines of the East Syriac font are closer to estrangela than the West Syriac. The West Syriac (´Jacobite´ or ´Maronite´) font has been known in the manuscript book tradition since the end of the 8th century. In (Western) Syriac, it is called serto (lit. ´dash, letter´), which derives from serṭo pšiṭo (´simple/regular font´). Paleographic data show that serto dates back to the italics found in documents on parchment at the beginning of the III century from Edessa. The letters represent only consonants. At the end of the VII or the beginning of the VIII century, two systems of icons for vowels were created. The east introduced a system of dots, which were to be written above and below the letters to denote 8 vowels — 4 long and 4 short. In the west represented by the Jacobites in particular, this goal was reached by using small and slightly modified Greek letters, which were placed either above the letters or under them; 5 vowels were in action.

U+00750 Arabic Supplement (Alphabet, 48 codes from 0750–077F. Language arabic, persian, kurd in ): Arabic Supplement is a Unicode block that contains Arabic letters and variants mostly used for writing African (non-Arabic) languages.

U+00780 Thaana (Alphabet, 64 codes from 0780–07BF. Language maldivian in ): Thaana is a Unicode block containing characters for writing the Dhivehi and Arabic languages in the Maldives. Thaana, Taana or Tāna ( ތާނަ‎ in Tāna script) is the modern writing system of the Maldivian language spoken in the Maldives. Thaana has characteristics of both an abugida (diacritic, vowel-killer strokes) and a true alphabet (all vowels are written), with consonants derived from indigenous and Arabic numerals, and vowels derived from the vowel diacritics of the Arabic abjad. Its orthography is largely phonemic. The Thaana script first appeared in a Maldivian document towards the beginning of the 18th century in a crude initial form known as Gabulhi Thaana which was written scripta continua. This early script slowly developed, its characters slanting 45 degrees, becoming more graceful and spaces were added between words. As time went by it gradually replaced the older Dhives Akuru alphabet. The oldest written sample of the Thaana script is found in the island of Kanditheemu in Northern Miladhunmadulu Atoll. It is inscribed on the door posts of the main Hukuru Miskiy (Friday mosque) of the island and dates back to 1008 AH (AD 1599) and 1020 AH (AD 1611) when the roof of the building were built and the renewed during the reigns of Ibrahim Kalaafaan (Sultan Ibrahim III) and Hussain Faamuladeyri Kilege (Sultan Hussain II) respectively.Thaana, like Arabic, is written right to left. It indicates vowels with diacritic marks derived from Arabic. Each letter must carry either a vowel or a sukun (which indicates “no vowel”). The only exception to this rule is nūnu which, when written without a diacritic, indicates prenasalization of a following stop.

U+007C0 NKo (Alphabet, 64 codes from 07C0–07FF. Language nko in ): NKo is a Unicode block containing characters for the Manding languages of West Africa, including Bamanan, Jula, Maninka, Mandinka, and a common literary language, Kangbe, also called N´Ko. N´Ko is both a script devised by Solomana Kante in 1949, as a writing system for the Manding languages of West Africa, and the name of the literary language itself written in the script. The term N´Ko means I say in all Manding languages. The script has a few similarities to the Arabic script, notably its direction (right-to-left) and the connected letters. It obligatorily marks both tone and vowels.

**Samaritan + Mandaic + Syriac Supplement + Arabic Extended-B + Arabic Extended-A + Devanagari + Bengali** => 512 Zeichen utf-8 ab U+00800 hex = 2048 dezimal Details
Nr 5	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00800	ࠀ	ࠁ	ࠂ	ࠃ	ࠄ	ࠅ	ࠆ	ࠇ	ࠈ	ࠉ	ࠊ	ࠋ	ࠌ	ࠍ	ࠎ	ࠏ	ࠐ	ࠑ	ࠒ	ࠓ	ࠔ	ࠕ	ࠖ	ࠗ	࠘	࠙	ࠚ	ࠛ	ࠜ	ࠝ	ࠞ	ࠟ
00820	ࠠ	ࠡ	ࠢ	ࠣ	ࠤ	ࠥ	ࠦ	ࠧ	ࠨ	ࠩ	ࠪ	ࠫ	ࠬ	࠭	࠮	࠯	࠰	࠱	࠲	࠳	࠴	࠵	࠶	࠷	࠸	࠹	࠺	࠻	࠼	࠽	࠾	࠿
00840	ࡀ	ࡁ	ࡂ	ࡃ	ࡄ	ࡅ	ࡆ	ࡇ	ࡈ	ࡉ	ࡊ	ࡋ	ࡌ	ࡍ	ࡎ	ࡏ	ࡐ	ࡑ	ࡒ	ࡓ	ࡔ	ࡕ	ࡖ	ࡗ	ࡘ	࡙	࡚	࡛	࡜	࡝	࡞	࡟
00860	ࡠ	ࡡ	ࡢ	ࡣ	ࡤ	ࡥ	ࡦ	ࡧ	ࡨ	ࡩ	ࡪ	࡫	࡬	࡭	࡮	࡯	ࡰ	ࡱ	ࡲ	ࡳ	ࡴ	ࡵ	ࡶ	ࡷ	ࡸ	ࡹ	ࡺ	ࡻ	ࡼ	ࡽ	ࡾ	ࡿ
00880	ࢀ	ࢁ	ࢂ	ࢃ	ࢄ	ࢅ	ࢆ	ࢇ	࢈	ࢉ	ࢊ	ࢋ	ࢌ	ࢍ	ࢎ	࢏	࢐	࢑	࢒	࢓	࢔	࢕	࢖	ࢗ	࢘	࢙	࢚	࢛	࢜	࢝	࢞	࢟
008A0	ࢠ	ࢡ	ࢢ	ࢣ	ࢤ	ࢥ	ࢦ	ࢧ	ࢨ	ࢩ	ࢪ	ࢫ	ࢬ	ࢭ	ࢮ	ࢯ	ࢰ	ࢱ	ࢲ	ࢳ	ࢴ	ࢵ	ࢶ	ࢷ	ࢸ	ࢹ	ࢺ	ࢻ	ࢼ	ࢽ	ࢾ	ࢿ
008C0	ࣀ	ࣁ	ࣂ	ࣃ	ࣄ	ࣅ	ࣆ	ࣇ	ࣈ	ࣉ	࣊	࣋	࣌	࣍	࣎	࣏	࣐	࣑	࣒	࣓	ࣔ	ࣕ	ࣖ	ࣗ	ࣘ	ࣙ	ࣚ	ࣛ	ࣜ	ࣝ	ࣞ	ࣟ
008E0	࣠	࣡	࣢	ࣣ	ࣤ	ࣥ	ࣦ	ࣧ	ࣨ	ࣩ	࣪	࣫	࣬	࣭	࣮	࣯	ࣰ	ࣱ	ࣲ	ࣳ	ࣴ	ࣵ	ࣶ	ࣷ	ࣸ	ࣹ	ࣺ	ࣻ	ࣼ	ࣽ	ࣾ	ࣿ
5.2
00900	ऀ	ँ	ं	ः	ऄ	अ	आ	इ	ई	उ	ऊ	ऋ	ऌ	ऍ	ऎ	ए	ऐ	ऑ	ऒ	ओ	औ	क	ख	ग	घ	ङ	च	छ	ज	झ	ञ	ट
00920	ठ	ड	ढ	ण	त	थ	द	ध	न	ऩ	प	फ	ब	भ	म	य	र	ऱ	ल	ळ	ऴ	व	श	ष	स	ह	ऺ	ऻ	़	ऽ	ा	ि
00940	ी	ु	ू	ृ	ॄ	ॅ	ॆ	े	ै	ॉ	ॊ	ो	ौ	्	ॎ	ॏ	ॐ	॑	॒	॓	॔	ॕ	ॖ	ॗ	क़	ख़	ग़	ज़	ड़	ढ़	फ़	य़
00960	ॠ	ॡ	ॢ	ॣ	।	॥	०	१	२	३	४	५	६	७	८	९	॰	ॱ	ॲ	ॳ	ॴ	ॵ	ॶ	ॷ	ॸ	ॹ	ॺ	ॻ	ॼ	ॽ	ॾ	ॿ
00980	ঀ	ঁ	ং	ঃ	঄	অ	আ	ই	ঈ	উ	ঊ	ঋ	ঌ	঍	঎	এ	ঐ	঑	঒	ও	ঔ	ক	খ	গ	ঘ	ঙ	চ	ছ	জ	ঝ	ঞ	ট
009A0	ঠ	ড	ঢ	ণ	ত	থ	দ	ধ	ন	঩	প	ফ	ব	ভ	ম	য	র	঱	ল	঳	঴	঵	শ	ষ	স	হ	঺	঻	়	ঽ	া	ি
009C0	ী	ু	ূ	ৃ	ৄ	৅	৆	ে	ৈ	৉	৊	ো	ৌ	্	ৎ	৏	৐	৑	৒	৓	৔	৕	৖	ৗ	৘	৙	৚	৛	ড়	ঢ়	৞	য়
009E0	ৠ	ৡ	ৢ	ৣ	৤	৥	০	১	২	৩	৪	৫	৬	৭	৮	৯	ৰ	ৱ	৲	৳	৴	৵	৶	৷	৸	৹	৺	৻	ৼ	৽	৾	৿

Englische Dokumentation

U+00800 Samaritan (Alphabet, 64 codes from 0800–083F. Language in ): Samaritan is a Unicode block containing characters used for writing Samaritan Hebrew0590–05FF and Aramaic. The Samaritan alphabet is used by the Samaritans for religious writings, including the Samaritan Pentateuch, writings in Samaritan Hebrew, and for commentaries and translations in Samaritan Aramaic and occasionally Arabic.Samaritan is a direct descendant of the Paleo-Hebrew alphabet, which was a variety of the Phoenician alphabet in which large parts of the Hebrew Bible were originally penned. All these scripts are believed to be descendants of the Proto-Sinaitic script. That script was used by the ancient Israelites, both Jews and Samaritans. The better-known “square script” Hebrew alphabet traditionally used by Jews is a stylized version of the Aramaic alphabet which they adopted from the Persian Empire (which in turn adopted it from the Arameans). After the fall of the Persian Empire, Judaism used both scripts before settling on the Aramaic form. For a limited time thereafter, the use of paleo-Hebrew (proto-Samaritan) among Jews was retained only to write the Tetragrammaton, but soon that custom was also abandoned.

U+00840 Mandaic (Alphabet, 32 codes from 0840–085F. Language mandaic in ): Mandaic is a Unicode block containing characters of the Mandaic script used for writing the historic Eastern Aramaic, also called Classical Mandaic, and the modern Neo-Mandaic language. The Mandaic alphabet is based on the Aramaic alphabet, and is used for writing the Mandaic language. The Mandaic name for the script is Abagada or Abaga, after the first letters of the alphabet. Rather than the ancient Semitic names for the letters (alaph, beth, gimal), the letters are known as â, bâ, gâ and so on.

U+00860 Syriac Supplement (, 16 codes from 0860–086F. Language in ): The block includes additional letters to the Syrian script. They were used in the Suriani Malayalam language, also known as Syriac Malayalam or Karshoni. Until the 19th century, it was spoken by the Christians of the apostle Thomas who lived in the south-west of India, Kerala.

U+00870 Arabic Extended-B (, 48 codes from 0870–089F. Language in ): The Arabic script extension under the letter “B” primarily consists of characters used for the Quran. They help to recite religious texts correctly. This extension may also indicate different pronunciation variants and mark short or long pauses. Additionally, the block includes letter variants for non-Arabic languages, currency symbols, and an abbreviation mark.

U+008A0 Arabic Extended-A (Alphabet, 96 codes from 08A0–08FF. Language in ): In this block you will find additional Arabic letters 0600–06FF, vowel signs, tone marks for Rohingya, Berber, Belarusian, Tatar, Bashkir, and African languages. Apart from that, don´t forget to check Quranic annotation signs.

U+00900 Devanagari (Abugida, 128 codes from 0900–097F. Language sanskrit, hindi in ): Devanagari is a Unicode block containing characters for writing Hindi, Marathi, Sindhi, Nepali and Sanskrit. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali0980–09FF, Gurmukhi0A00–0A7F, Gujarati0A80–0AFF, Oriya0B00–0B7F, Tamil0B80–0BFF, Telugu0C00–0C7F, Kannada0C80–0CFF, and Malayalam0D00–0D7F blocks were similarly all based on their ISCII encodings. Devanagari, also called Nagari, is an abugida alphabet of India and Nepal. It is written from left to right, does not have distinct letter cases, and is recognisable (along with most other North Indic scripts, with a few exceptions like Gujarati0A80–0AFF and Oriya0B00–0B7F) by a horizontal line that runs along the top of full letters. Since the 19th century, it has been the most commonly used script for writing Sanskrit. Devanagari is used to write Hindi, Nepali, Marathi, Konkani, Bodo and Maithili among other languages and dialects. It was formerly used to write Gujarati. Because it is the standardised script for the Hindi, Nepali, Marathi, Konkani and Bodo languages, Devanagari is one of the most used and adopted writing systems in the world.

U+00980 Bengali (Abugida, 128 codes from 0980–09FF. Language bengali in ): Bengali is a Unicode block containing characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Rian, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil0B80–0BFF, Telugu0C00–0C7F, Kannada0C80–0CFF, and Malayalam0D00–0D7F blocks were similarly all based on ISCII encodings. The Bengali alphabet or Bangla alphabet is the writing system for the Bengali language and is the 6th most widely used writing system in the world. The script is shared by Assamese with minor variations, and is the basis for the other writing systems like Meithei and Bishnupriya Manipuri. Historically, the script has also been used to write the Sanskrit language in the region of Bengal.From a classificatory point of view, the Bengali script is an abugida, i.e. its vowel graphemes are mainly realized not as independent letters, but as diacritics attached to its consonant letters. It is written from left to right and lacks distinct letter cases. It is recognizable, as other Brahmic scripts, by a distinctive horizontal line running along the tops of the letters that links them together which is known as matra. The Bengali script is however less blocky and presents a more sinuous shape.

**Gurmukhi + Gujarati + Oriya + Tamil** => 512 Zeichen utf-8 ab U+00A00 hex = 2560 dezimal Details
Nr 6	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00A00	਀	ਁ	ਂ	ਃ	਄	ਅ	ਆ	ਇ	ਈ	ਉ	ਊ	਋	਌	਍	਎	ਏ	ਐ	਑	਒	ਓ	ਔ	ਕ	ਖ	ਗ	ਘ	ਙ	ਚ	ਛ	ਜ	ਝ	ਞ	ਟ
00A20	ਠ	ਡ	ਢ	ਣ	ਤ	ਥ	ਦ	ਧ	ਨ	਩	ਪ	ਫ	ਬ	ਭ	ਮ	ਯ	ਰ	਱	ਲ	ਲ਼	਴	ਵ	ਸ਼	਷	ਸ	ਹ	਺	਻	਼	਽	ਾ	ਿ
00A40	ੀ	ੁ	ੂ	੃	੄	੅	੆	ੇ	ੈ	੉	੊	ੋ	ੌ	੍	੎	੏	੐	ੑ	੒	੓	੔	੕	੖	੗	੘	ਖ਼	ਗ਼	ਜ਼	ੜ	੝	ਫ਼	੟
00A60	੠	੡	੢	੣	੤	੥	੦	੧	੨	੩	੪	੫	੬	੭	੮	੯	ੰ	ੱ	ੲ	ੳ	ੴ	ੵ	੶	੷	੸	੹	੺	੻	੼	੽	੾	੿
00A80	઀	ઁ	ં	ઃ	઄	અ	આ	ઇ	ઈ	ઉ	ઊ	ઋ	ઌ	ઍ	઎	એ	ઐ	ઑ	઒	ઓ	ઔ	ક	ખ	ગ	ઘ	ઙ	ચ	છ	જ	ઝ	ઞ	ટ
00AA0	ઠ	ડ	ઢ	ણ	ત	થ	દ	ધ	ન	઩	પ	ફ	બ	ભ	મ	ય	ર	઱	લ	ળ	઴	વ	શ	ષ	સ	હ	઺	઻	઼	ઽ	ા	િ
00AC0	ી	ુ	ૂ	ૃ	ૄ	ૅ	૆	ે	ૈ	ૉ	૊	ો	ૌ	્	૎	૏	ૐ	૑	૒	૓	૔	૕	૖	૗	૘	૙	૚	૛	૜	૝	૞	૟
00AE0	ૠ	ૡ	ૢ	ૣ	૤	૥	૦	૧	૨	૩	૪	૫	૬	૭	૮	૯	૰	૱	૲	૳	૴	૵	૶	૷	૸	ૹ	ૺ	ૻ	ૼ	૽	૾	૿
6.2
00B00	଀	ଁ	ଂ	ଃ	଄	ଅ	ଆ	ଇ	ଈ	ଉ	ଊ	ଋ	ଌ	଍	଎	ଏ	ଐ	଑	଒	ଓ	ଔ	କ	ଖ	ଗ	ଘ	ଙ	ଚ	ଛ	ଜ	ଝ	ଞ	ଟ
00B20	ଠ	ଡ	ଢ	ଣ	ତ	ଥ	ଦ	ଧ	ନ	଩	ପ	ଫ	ବ	ଭ	ମ	ଯ	ର	଱	ଲ	ଳ	଴	ଵ	ଶ	ଷ	ସ	ହ	଺	଻	଼	ଽ	ା	ି
00B40	ୀ	ୁ	ୂ	ୃ	ୄ	୅	୆	େ	ୈ	୉	୊	ୋ	ୌ	୍	୎	୏	୐	୑	୒	୓	୔	୕	ୖ	ୗ	୘	୙	୚	୛	ଡ଼	ଢ଼	୞	ୟ
00B60	ୠ	ୡ	ୢ	ୣ	୤	୥	୦	୧	୨	୩	୪	୫	୬	୭	୮	୯	୰	ୱ	୲	୳	୴	୵	୶	୷	୸	୹	୺	୻	୼	୽	୾	୿
00B80	஀	஁	ஂ	ஃ	஄	அ	ஆ	இ	ஈ	உ	ஊ	஋	஌	஍	எ	ஏ	ஐ	஑	ஒ	ஓ	ஔ	க	஖	஗	஘	ங	ச	஛	ஜ	஝	ஞ	ட
00BA0	஠	஡	஢	ண	த	஥	஦	஧	ந	ன	ப	஫	஬	஭	ம	ய	ர	ற	ல	ள	ழ	வ	ஶ	ஷ	ஸ	ஹ	஺	஻	஼	஽	ா	ி
00BC0	ீ	ு	ூ	௃	௄	௅	ெ	ே	ை	௉	ொ	ோ	ௌ	்	௎	௏	ௐ	௑	௒	௓	௔	௕	௖	ௗ	௘	௙	௚	௛	௜	௝	௞	௟
00BE0	௠	௡	௢	௣	௤	௥	௦	௧	௨	௩	௪	௫	௬	௭	௮	௯	௰	௱	௲	௳	௴	௵	௶	௷	௸	௹	௺	௻	௼	௽	௾	௿

Englische Dokumentation

U+00A00 Gurmukhi (Abugida, 128 codes from 0A00–0A7F. Language punjabi in ): Gurmukhi is a Unicode block containing characters for the Punjabi language, as it is written in India. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari0900–097F, Bengali0980–09FF, Gujarati0A80–0AFF, Oriya0B00–0B7F, Tamil0B80–0BFF, Telugu0C00–0C7F, Kannada0C80–0CFF, and Malayalam0D00–0D7F blocks were similarly all based on their ISCII encodings. Gurmukhi is the most common script used for writing the Punjabi language in India. An abugida derived from the Laṇḍā script and ultimately descended from Brahmi, Gurmukhi was standardised by the second Sikh guru, Guru Angad, in the 16th century. The whole of the Guru Granth Sahib´s 1430 pages are written in this script. The name Gurmukhi is derived from the Old Punjabi term “gurumukhī”, meaning “from the mouth of the Guru”. Modern Gurmukhi has thirty-eight consonants (vianjan), nine vowel symbols (lāga mātrā), two symbols for nasal sounds (bindī and ṭippī), and one symbol which duplicates the sound of any consonant (addak). In addition, four conjuncts are used: three subjoined forms of the consonants Rara, Haha and Vava, and one half-form of Yayya. Use of the conjunct forms of Vava and Yayya is increasingly scarce in modern contexts. Gurmukhi is primarily used in the Punjab state of India where it is the sole official script for all official and judicial purposes. The script is also widely used in the Indian states of Haryana, Himachal Pradesh, Jammu and the national capital of Delhi, with Punjabi being one of the official languages in these states. Gurmukhi has been adapted to write other languages, such as Braj Bhasha, Khariboli (and other Hindustani dialects), Sanskrit and Sindhi.

U+00A80 Gujarati (Abugida, 128 codes from 0A80–0AFF. Language in ): Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari0900–097F, Bengali0980–09FF, Gurmukhi0A00–0A7F, Oriya0B00–0B7F, Tamil0B80–0BFF, Telugu0C00–0C7F, Kannada0C80–0CFF, and Malayalam0D00–0D7F blocks were similarly all based on their ISCII encodings. The Gujarati script, which like all Nāgarī writing systems is strictly speaking an abugida rather than an alphabet, is used to write the Gujarati and Kutchi languages. It is a variant of Devanāgarī script differentiated by the loss of the characteristic horizontal line running above the letters and by a small number of modifications in the remaining characters.With a few additional characters, added for this purpose, the Gujarati script is also often used to write Sanskrit and Hindi.Gujarati numerical digits are also different from their Devanagari counterparts.

U+00B00 Oriya (Abugida, 128 codes from 0B00–0B7F. Language oriya in ): Oriya is a Unicode block containing characters for the Oriya, Khondi, and Santali languages of Orissa state, India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Oriya characters A1-ED from the 1988 ISCII standard. The Devanagari0900–097F, Bengali0980–09FF, Gujarati0A80–0AFF, Gurmukhi0A00–0A7F, Tamil0B80–0BFF, Telugu0C00–0C7F, Kannada0C80–0CFF and Malayalam0D00–0D7F blocks were similarly all based on their ISCII encodings. Oriya (oṛiā), officially spelled Odia, is an Indian language belonging to the Indo-Aryan branch of the Indo-European language family. It is the predominant language of the Indian state of Odisha, where native speakers comprise 80% of the population, and it is spoken in parts of West Bengal, Jharkhand, Chhattisgarh and Andhra Pradesh. Oriya is one of the many official languages in India; it is the official language of Odisha and the second official language of Jharkhand. Oriya is the sixth Indian language to be designated a Classical Language in India, on the basis of having a long literary history and not having borrowed extensively from other languages.

U+00B80 Tamil (Abugida, 128 codes from 0B80–0BFF. Language tamil, sanskrit in ): Tamil is a Unicode block containing characters for the Tamil, Badaga, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B02..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari0900–097F, Bengali0980–09FF, Gujarati0A80–0AFF, Gurmukhi0A00–0A7F, Oriya0B00–0B7F, Telugu0C00–0C7F, Kannada0C80–0CFF and Malayalam0D00–0D7F blocks were similarly all based on their ISCII encodings. The Tamil script (tamiḻ ariccuvaṭi) is an abugida script that is used by the Tamil people in India, Sri Lanka, Malaysia and elsewhere, to write the Tamil language, as well as to write the liturgical language Sanskrit, using consonants and diacritics not represented in the Tamil alphabet. Certain minority languages such as Saurashtra, Badaga, Irula, and Paniya are also written in the Tamil script.

**Telugu + Kannada + Malayalam + Sinhala** => 512 Zeichen utf-8 ab U+00C00 hex = 3072 dezimal Details
Nr 7	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00C00	ఀ	ఁ	ం	ః	ఄ	అ	ఆ	ఇ	ఈ	ఉ	ఊ	ఋ	ఌ	఍	ఎ	ఏ	ఐ	఑	ఒ	ఓ	ఔ	క	ఖ	గ	ఘ	ఙ	చ	ఛ	జ	ఝ	ఞ	ట
00C20	ఠ	డ	ఢ	ణ	త	థ	ద	ధ	న	఩	ప	ఫ	బ	భ	మ	య	ర	ఱ	ల	ళ	ఴ	వ	శ	ష	స	హ	఺	఻	఼	ఽ	ా	ి
00C40	ీ	ు	ూ	ృ	ౄ	౅	ె	ే	ై	౉	ొ	ో	ౌ	్	౎	౏	౐	౑	౒	౓	౔	ౕ	ౖ	౗	ౘ	ౙ	ౚ	౛	౜	ౝ	౞	౟
00C60	ౠ	ౡ	ౢ	ౣ	౤	౥	౦	౧	౨	౩	౪	౫	౬	౭	౮	౯	౰	౱	౲	౳	౴	౵	౶	౷	౸	౹	౺	౻	౼	౽	౾	౿
00C80	ಀ	ಁ	ಂ	ಃ	಄	ಅ	ಆ	ಇ	ಈ	ಉ	ಊ	ಋ	ಌ	಍	ಎ	ಏ	ಐ	಑	ಒ	ಓ	ಔ	ಕ	ಖ	ಗ	ಘ	ಙ	ಚ	ಛ	ಜ	ಝ	ಞ	ಟ
00CA0	ಠ	ಡ	ಢ	ಣ	ತ	ಥ	ದ	ಧ	ನ	಩	ಪ	ಫ	ಬ	ಭ	ಮ	ಯ	ರ	ಱ	ಲ	ಳ	಴	ವ	ಶ	ಷ	ಸ	ಹ	಺	಻	಼	ಽ	ಾ	ಿ
00CC0	ೀ	ು	ೂ	ೃ	ೄ	೅	ೆ	ೇ	ೈ	೉	ೊ	ೋ	ೌ	್	೎	೏	೐	೑	೒	೓	೔	ೕ	ೖ	೗	೘	೙	೚	೛	೜	ೝ	ೞ	೟
00CE0	ೠ	ೡ	ೢ	ೣ	೤	೥	೦	೧	೨	೩	೪	೫	೬	೭	೮	೯	೰	ೱ	ೲ	ೳ	೴	೵	೶	೷	೸	೹	೺	೻	೼	೽	೾	೿
7.2
00D00	ഀ	ഁ	ം	ഃ	ഄ	അ	ആ	ഇ	ഈ	ഉ	ഊ	ഋ	ഌ	഍	എ	ഏ	ഐ	഑	ഒ	ഓ	ഔ	ക	ഖ	ഗ	ഘ	ങ	ച	ഛ	ജ	ഝ	ഞ	ട
00D20	ഠ	ഡ	ഢ	ണ	ത	ഥ	ദ	ധ	ന	ഩ	പ	ഫ	ബ	ഭ	മ	യ	ര	റ	ല	ള	ഴ	വ	ശ	ഷ	സ	ഹ	ഺ	഻	഼	ഽ	ാ	ി
00D40	ീ	ു	ൂ	ൃ	ൄ	൅	െ	േ	ൈ	൉	ൊ	ോ	ൌ	്	ൎ	൏	൐	൑	൒	൓	ൔ	ൕ	ൖ	ൗ	൘	൙	൚	൛	൜	൝	൞	ൟ
00D60	ൠ	ൡ	ൢ	ൣ	൤	൥	൦	൧	൨	൩	൪	൫	൬	൭	൮	൯	൰	൱	൲	൳	൴	൵	൶	൷	൸	൹	ൺ	ൻ	ർ	ൽ	ൾ	ൿ
00D80	඀	ඁ	ං	ඃ	඄	අ	ආ	ඇ	ඈ	ඉ	ඊ	උ	ඌ	ඍ	ඎ	ඏ	ඐ	එ	ඒ	ඓ	ඔ	ඕ	ඖ	඗	඘	඙	ක	ඛ	ග	ඝ	ඞ	ඟ
00DA0	ච	ඡ	ජ	ඣ	ඤ	ඥ	ඦ	ට	ඨ	ඩ	ඪ	ණ	ඬ	ත	ථ	ද	ධ	න	඲	ඳ	ප	ඵ	බ	භ	ම	ඹ	ය	ර	඼	ල	඾	඿
00DC0	ව	ශ	ෂ	ස	හ	ළ	ෆ	෇	෈	෉	්	෋	෌	෍	෎	ා	ැ	ෑ	ි	ී	ු	෕	ූ	෗	ෘ	ෙ	ේ	ෛ	ො	ෝ	ෞ	ෟ
00DE0	෠	෡	෢	෣	෤	෥	෦	෧	෨	෩	෪	෫	෬	෭	෮	෯	෰	෱	ෲ	ෳ	෴	෵	෶	෷	෸	෹	෺	෻	෼	෽	෾	෿

Englische Dokumentation

U+00C00 Telugu (Abugida, 128 codes from 0C00–0C7F. Language telugu in ): Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Andhra Pradesh, India. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari0900–097F, Bengali0980–09FF, Gujarati0A80–0AFF, Gurmukhi0A00–0A7F, Oriya0B00–0B7F, Tamil0B80–0BFF, Kannada0C80–0CFF and Malayalam0D00–0D7F blocks were similarly all based on their ISCII encodings. Telugu script, an abugida from the Brahmic family of scripts, is used to write the Telugu language, a language found in the South Indian states of Andhra Pradesh and Telangana as well as several other neighbouring states. It gained prominence during the Vengi Chalukyan era. It shares high similarity with its sibling Kannada script.

U+00C80 Kannada (Abugida, 128 codes from 0C80–0CFF. Language dravidian in ): Kannada is a Unicode block containing characters for the Kannada and Tulu languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari0900–097F, Bengali0980–09FF, Gujarati0A80–0AFF, Gurmukhi0A00–0A7F, Oriya0B00–0B7F, Tamil0B80–0BFF, Telugu0C00–0C7F, and Malayalam0D00–0D7F blocks were similarly all based on their ISCII encodings. The Kannada alphabet is an abugida of the Brahmic family, used primarily to write the Kannada language, one of the Dravidian languages of southern India. Several minor languages, such as Tulu, Konkani, Kodava, and Beary, also use alphabets based on the Kannada script. The Kannada and Telugu0C00–0C7F scripts share high mutual intellegibility with each other, and are often considered to be regional variants of single script. Similarly, Goykanadi, a variant of Old Kannada, has been historically used to write Konkani in the state of Goa.

U+00D00 Malayalam (Abugida, 128 codes from 0D00–0D7F. Language dravidian in ): Malayalam is a Unicode block containing characters for the Malayalam language. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari0900–097F, Bengali0980–09FF, Gujarati0A80–0AFF, Gurmukhi0A00–0A7F, Oriya0B00–0B7F, Tamil0B80–0BFF, Telugu0C00–0C7F and Kannada0C80–0CFF blocks were similarly all based on their ISCII encodings. The Malayalam script (Malayāḷalipi; IPA: ), also known as Kairali script, is a Brahmic script used commonly to write the Malayalam language—which is the principal language of the Indian state of Kerala, spoken by 35 million people in the world. Like many other Indic scripts, it is an alphasyllabary (abugida), a writing system that is partially “alphabetic” and partially syllable-based. The modern Malayalam alphabet has 15 vowel letters, 41 consonant letters, and a few other symbols. The Malayalam script is a Vattezhuttu script, which had been extended with Grantha script symbols to represent Indo-Aryan loanwords. The script is also used to write several minority languages such as Paniya, Betta Kurumba, and Ravula. The Malayalam language itself was historically written in several different scripts.

U+00D80 Sinhala (Abugida, 128 codes from 0D80–0DFF. Language sinhalese, sanskrit in ): Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations. The Sinhalese alphabet is an abugida used by the Sinhala people in Sri Lanka and elsewhere to write the Sinhala language and also the liturgical languages Pali and Sanskrit. Being a member of the Brahmic family of scripts, the Sinhalese script can trace its ancestry back more than 2,000 years.Sinhalese is often considered two alphabets, or an alphabet within an alphabet, due to the presence of two sets of letters. The core set, known as the śuddha siṃhala or eḷu hōḍiya, can represent all native phonemes. In order to render Sanskrit and Pali words, an extended set, the miśra siṃhala, is available.

**Thai + Lao + Tibetan** => 512 Zeichen utf-8 ab U+00E00 hex = 3584 dezimal Details
Nr 8	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
00E00	฀	ก	ข	ฃ	ค	ฅ	ฆ	ง	จ	ฉ	ช	ซ	ฌ	ญ	ฎ	ฏ	ฐ	ฑ	ฒ	ณ	ด	ต	ถ	ท	ธ	น	บ	ป	ผ	ฝ	พ	ฟ
00E20	ภ	ม	ย	ร	ฤ	ล	ฦ	ว	ศ	ษ	ส	ห	ฬ	อ	ฮ	ฯ	ะ	ั	า	ำ	ิ	ี	ึ	ื	ุ	ู	ฺ	฻	฼	฽	฾	฿
00E40	เ	แ	โ	ใ	ไ	ๅ	ๆ	็	่	้	๊	๋	์	ํ	๎	๏	๐	๑	๒	๓	๔	๕	๖	๗	๘	๙	๚	๛	๜	๝	๞	๟
00E60	๠	๡	๢	๣	๤	๥	๦	๧	๨	๩	๪	๫	๬	๭	๮	๯	๰	๱	๲	๳	๴	๵	๶	๷	๸	๹	๺	๻	๼	๽	๾	๿
00E80	຀	ກ	ຂ	຃	ຄ	຅	ຆ	ງ	ຈ	ຉ	ຊ	຋	ຌ	ຍ	ຎ	ຏ	ຐ	ຑ	ຒ	ຓ	ດ	ຕ	ຖ	ທ	ຘ	ນ	ບ	ປ	ຜ	ຝ	ພ	ຟ
00EA0	ຠ	ມ	ຢ	ຣ	຤	ລ	຦	ວ	ຨ	ຩ	ສ	ຫ	ຬ	ອ	ຮ	ຯ	ະ	ັ	າ	ຳ	ິ	ີ	ຶ	ື	ຸ	ູ	຺	ົ	ຼ	ຽ	຾	຿
00EC0	ເ	ແ	ໂ	ໃ	ໄ	໅	ໆ	໇	່	້	໊	໋	໌	ໍ	໎	໏	໐	໑	໒	໓	໔	໕	໖	໗	໘	໙	໚	໛	ໜ	ໝ	ໞ	ໟ
00EE0	໠	໡	໢	໣	໤	໥	໦	໧	໨	໩	໪	໫	໬	໭	໮	໯	໰	໱	໲	໳	໴	໵	໶	໷	໸	໹	໺	໻	໼	໽	໾	໿
8.2
00F00	ༀ	༁	༂	༃	༄	༅	༆	༇	༈	༉	༊	་	༌	།	༎	༏	༐	༑	༒	༓	༔	༕	༖	༗	༘	༙	༚	༛	༜	༝	༞	༟
00F20	༠	༡	༢	༣	༤	༥	༦	༧	༨	༩	༪	༫	༬	༭	༮	༯	༰	༱	༲	༳	༴	༵	༶	༷	༸	༹	༺	༻	༼	༽	༾	༿
00F40	ཀ	ཁ	ག	གྷ	ང	ཅ	ཆ	ཇ	཈	ཉ	ཊ	ཋ	ཌ	ཌྷ	ཎ	ཏ	ཐ	ད	དྷ	ན	པ	ཕ	བ	བྷ	མ	ཙ	ཚ	ཛ	ཛྷ	ཝ	ཞ	ཟ
00F60	འ	ཡ	ར	ལ	ཤ	ཥ	ས	ཧ	ཨ	ཀྵ	ཪ	ཫ	ཬ	཭	཮	཯	཰	ཱ	ི	ཱི	ུ	ཱུ	ྲྀ	ཷ	ླྀ	ཹ	ེ	ཻ	ོ	ཽ	ཾ	ཿ
00F80	ྀ	ཱྀ	ྂ	ྃ	྄	྅	྆	྇	ྈ	ྉ	ྊ	ྋ	ྌ	ྍ	ྎ	ྏ	ྐ	ྑ	ྒ	ྒྷ	ྔ	ྕ	ྖ	ྗ	྘	ྙ	ྚ	ྛ	ྜ	ྜྷ	ྞ	ྟ
00FA0	ྠ	ྡ	ྡྷ	ྣ	ྤ	ྥ	ྦ	ྦྷ	ྨ	ྩ	ྪ	ྫ	ྫྷ	ྭ	ྮ	ྯ	ྰ	ྱ	ྲ	ླ	ྴ	ྵ	ྶ	ྷ	ྸ	ྐྵ	ྺ	ྻ	ྼ	྽	྾	྿
00FC0	࿀	࿁	࿂	࿃	࿄	࿅	࿆	࿇	࿈	࿉	࿊	࿋	࿌	࿍	࿎	࿏	࿐	࿑	࿒	࿓	࿔	࿕	࿖	࿗	࿘	࿙	࿚	࿛	࿜	࿝	࿞	࿟
00FE0	࿠	࿡	࿢	࿣	࿤	࿥	࿦	࿧	࿨	࿩	࿪	࿫	࿬	࿭	࿮	࿯	࿰	࿱	࿲	࿳	࿴	࿵	࿶	࿷	࿸	࿹	࿺	࿻	࿼	࿽	࿾	࿿

Englische Dokumentation

U+00E00 Thai (Abugida, 128 codes from 0E00–0E7F. Language thai in ): Thai is a Unicode block containing characters for the Thai, Lanna Tai, and Pali languages. It is based on the Thai Industrial Standards 620-2529 and 620-2533. Thai script (Thai: อักษรไทย; rtgs: akson thai; ʔàksɔ̌ːn tʰāj) is used to write the Thai language and other languages in Thailand. It has 44 consonant letters (Thai: พยัญชนะ, phayanchana), 15 vowel symbols (Thai: สระ, sara) that combine into at least 28 vowel forms, and four tone diacritics (Thai: วรรณยุกต์ or วรรณยุต, wannayuk or wannayut).Although commonly referred to as the “Thai alphabet”, the character set is in fact not a true alphabet but an abugida, a writing system in which each consonant may invoke an inherent vowel sound. In the case of the Thai script this is an implied ´a´ or ´o´. Consonants are written horizontally from left to right, with vowels arranged above, below, to the left, or to the right of the corresponding consonant, or in a combination of positions.Thai has its own set of Thai numerals which are based on the Hindu Arabic numeral system (Thai: เลขไทย, lek thai), but the standard western Hindu-Arabic numerals (Thai: เลขฮินดูอารบิก, lek hindu arabik) are also commonly used.

U+00E80 Lao (Abugida, 128 codes from 0E80–0EFF. Language laotian in ): Lao is a Unicode block containing characters for the languages of Laos. The characters of the Lao block allocated to be equivalent to the similarly positioned characters of the Thai block immediately preceding it. The Lao alphabet, Akson Lao (Lao: ອັກສອນລາວ ʔáksɔ̌ːn láːw), is the main script used to write the Lao language and other minority languages in Laos. It is ultimately of Indic origin, the alphabet includes 27 consonants (ພະຍັນຊະນະ ), 7 consonantal ligatures (ພະຍັນຊະນະປະສົມ ), 33 vowels (ສະຫລະ ) (some based on combinations of symbols), and 4 tone marks (ວັນນະຍຸດ ). According to Article 89 of Amended Constitution of 2003 of the Lao People´s Democratic Republic, the Lao alphabet is the official script to the official language, but is also used to transcribe minority languages in the country, but some minority language speakers continue to use their traditional writing systems while the Hmong have adopted the Roman Alphabet. An older version of the script was also used by the ethnic Lao of Thailand´s Isan region, who make up a third of Thailand´s population, before Isan was incorporated into Siam, until its use was banned and supplemented with the very similar Thai alphabet in 1871, although the region remained distant culturally and politically until further government campaigns and integration into the Thai state (Thaification) were imposed in the 20th century. The letters of the Lao Alphabet are very similar to the Thai alphabet, which has the same roots. They differ in the fact, that in Thai there are still more letters to write one sound and the more circular style of writing in Lao.Lao, like most indic scripts, is traditionally written from left to right. Traditionally considered an abugida script, where certain ´implied´ vowels are unwritten, recent spelling reforms make this definition somewhat problematic, as all vowel sounds today are marked with diacritics when written according the Lao PDR´s propagated and promoted spelling standard. However most Lao outside of Laos, and many inside Laos, continue to write according to former spelling standards, which continues the use of the implied vowel maintaining the Lao script´s status as an abugida. Vowels can be written above, below, in front of, or behind consonants, with some vowel combinations written before, over and after. Spaces for separating words and punctuations were traditionally not used, but a space is used and functions in place of a comma or period. The letters have no majuscule or minuscule (upper and lower case) differentiations.

U+00F00 Tibetan (Abugida, 256 codes from 0F00–0FFF. Language tibetan in ): Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of Tibet, Bhutan, Nepal, and northern India. The Tibetan Unicode block is unique for having been allocated as a standard virama-based encoding for version 1.0, removed from the Unicode Standard when unifying with ISO 10646 for version 1.1, then reintroduced as an explicit root/subjoined encoding, with a larger block size in version 2.0. The Tibetan alphabet is an abugida of Indic origin used to write the Tibetan language as well as Dzongkha, the Sikkimese language, Ladakhi, and sometimes Balti. The printed form of the alphabet is called uchen script (Wylie: dbu-can; “with a head”) while the hand-written cursive form used in everyday writing is called umê script (Wylie: dbu-med; “headless”).The alphabet is very closely linked to a broad ethnic Tibetan identity. Besides Tibet, it has also been used for Tibetan languages in Bhutan, India, Nepal, and Pakistan. The Tibetan alphabet is ancestral to the Limbu alphabet, the Lepcha alphabet, and the multilingual ´Phags-pa script.The Tibetan alphabet is romanized in a variety of ways. This article employs the Wylie transliteration system.

**Myanmar + Georgian + Hangul Jamo** => 512 Zeichen utf-8 ab U+01000 hex = 4096 dezimal Details
Nr 9	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01000	က	ခ	ဂ	ဃ	င	စ	ဆ	ဇ	ဈ	ဉ	ည	ဋ	ဌ	ဍ	ဎ	ဏ	တ	ထ	ဒ	ဓ	န	ပ	ဖ	ဗ	ဘ	မ	ယ	ရ	လ	ဝ	သ	ဟ
01020	ဠ	အ	ဢ	ဣ	ဤ	ဥ	ဦ	ဧ	ဨ	ဩ	ဪ	ါ	ာ	ိ	ီ	ု	ူ	ေ	ဲ	ဳ	ဴ	ဵ	ံ	့	း	္	်	ျ	ြ	ွ	ှ	ဿ
01040	၀	၁	၂	၃	၄	၅	၆	၇	၈	၉	၊	။	၌	၍	၎	၏	ၐ	ၑ	ၒ	ၓ	ၔ	ၕ	ၖ	ၗ	ၘ	ၙ	ၚ	ၛ	ၜ	ၝ	ၞ	ၟ
01060	ၠ	ၡ	ၢ	ၣ	ၤ	ၥ	ၦ	ၧ	ၨ	ၩ	ၪ	ၫ	ၬ	ၭ	ၮ	ၯ	ၰ	ၱ	ၲ	ၳ	ၴ	ၵ	ၶ	ၷ	ၸ	ၹ	ၺ	ၻ	ၼ	ၽ	ၾ	ၿ
01080	ႀ	ႁ	ႂ	ႃ	ႄ	ႅ	ႆ	ႇ	ႈ	ႉ	ႊ	ႋ	ႌ	ႍ	ႎ	ႏ	႐	႑	႒	႓	႔	႕	႖	႗	႘	႙	ႚ	ႛ	ႜ	ႝ	႞	႟
010A0	Ⴀ	Ⴁ	Ⴂ	Ⴃ	Ⴄ	Ⴅ	Ⴆ	Ⴇ	Ⴈ	Ⴉ	Ⴊ	Ⴋ	Ⴌ	Ⴍ	Ⴎ	Ⴏ	Ⴐ	Ⴑ	Ⴒ	Ⴓ	Ⴔ	Ⴕ	Ⴖ	Ⴗ	Ⴘ	Ⴙ	Ⴚ	Ⴛ	Ⴜ	Ⴝ	Ⴞ	Ⴟ
010C0	Ⴠ	Ⴡ	Ⴢ	Ⴣ	Ⴤ	Ⴥ	჆	Ⴧ	჈	჉	჊	჋	჌	Ⴭ	჎	჏	ა	ბ	გ	დ	ე	ვ	ზ	თ	ი	კ	ლ	მ	ნ	ო	პ	ჟ
010E0	რ	ს	ტ	უ	ფ	ქ	ღ	ყ	შ	ჩ	ც	ძ	წ	ჭ	ხ	ჯ	ჰ	ჱ	ჲ	ჳ	ჴ	ჵ	ჶ	ჷ	ჸ	ჹ	ჺ	჻	ჼ	ჽ	ჾ	ჿ
9.2
01100	ᄀ	ᄁ	ᄂ	ᄃ	ᄄ	ᄅ	ᄆ	ᄇ	ᄈ	ᄉ	ᄊ	ᄋ	ᄌ	ᄍ	ᄎ	ᄏ	ᄐ	ᄑ	ᄒ	ᄓ	ᄔ	ᄕ	ᄖ	ᄗ	ᄘ	ᄙ	ᄚ	ᄛ	ᄜ	ᄝ	ᄞ	ᄟ
01120	ᄠ	ᄡ	ᄢ	ᄣ	ᄤ	ᄥ	ᄦ	ᄧ	ᄨ	ᄩ	ᄪ	ᄫ	ᄬ	ᄭ	ᄮ	ᄯ	ᄰ	ᄱ	ᄲ	ᄳ	ᄴ	ᄵ	ᄶ	ᄷ	ᄸ	ᄹ	ᄺ	ᄻ	ᄼ	ᄽ	ᄾ	ᄿ
01140	ᅀ	ᅁ	ᅂ	ᅃ	ᅄ	ᅅ	ᅆ	ᅇ	ᅈ	ᅉ	ᅊ	ᅋ	ᅌ	ᅍ	ᅎ	ᅏ	ᅐ	ᅑ	ᅒ	ᅓ	ᅔ	ᅕ	ᅖ	ᅗ	ᅘ	ᅙ	ᅚ	ᅛ	ᅜ	ᅝ	ᅞ	ᅟ
01160	ᅠ	ᅡ	ᅢ	ᅣ	ᅤ	ᅥ	ᅦ	ᅧ	ᅨ	ᅩ	ᅪ	ᅫ	ᅬ	ᅭ	ᅮ	ᅯ	ᅰ	ᅱ	ᅲ	ᅳ	ᅴ	ᅵ	ᅶ	ᅷ	ᅸ	ᅹ	ᅺ	ᅻ	ᅼ	ᅽ	ᅾ	ᅿ
01180	ᆀ	ᆁ	ᆂ	ᆃ	ᆄ	ᆅ	ᆆ	ᆇ	ᆈ	ᆉ	ᆊ	ᆋ	ᆌ	ᆍ	ᆎ	ᆏ	ᆐ	ᆑ	ᆒ	ᆓ	ᆔ	ᆕ	ᆖ	ᆗ	ᆘ	ᆙ	ᆚ	ᆛ	ᆜ	ᆝ	ᆞ	ᆟ
011A0	ᆠ	ᆡ	ᆢ	ᆣ	ᆤ	ᆥ	ᆦ	ᆧ	ᆨ	ᆩ	ᆪ	ᆫ	ᆬ	ᆭ	ᆮ	ᆯ	ᆰ	ᆱ	ᆲ	ᆳ	ᆴ	ᆵ	ᆶ	ᆷ	ᆸ	ᆹ	ᆺ	ᆻ	ᆼ	ᆽ	ᆾ	ᆿ
011C0	ᇀ	ᇁ	ᇂ	ᇃ	ᇄ	ᇅ	ᇆ	ᇇ	ᇈ	ᇉ	ᇊ	ᇋ	ᇌ	ᇍ	ᇎ	ᇏ	ᇐ	ᇑ	ᇒ	ᇓ	ᇔ	ᇕ	ᇖ	ᇗ	ᇘ	ᇙ	ᇚ	ᇛ	ᇜ	ᇝ	ᇞ	ᇟ
011E0	ᇠ	ᇡ	ᇢ	ᇣ	ᇤ	ᇥ	ᇦ	ᇧ	ᇨ	ᇩ	ᇪ	ᇫ	ᇬ	ᇭ	ᇮ	ᇯ	ᇰ	ᇱ	ᇲ	ᇳ	ᇴ	ᇵ	ᇶ	ᇷ	ᇸ	ᇹ	ᇺ	ᇻ	ᇼ	ᇽ	ᇾ	ᇿ

Englische Dokumentation

U+01000 Myanmar (Abugida, 160 codes from 1000–109F. Language burmese in ): The Burmese script (MLCTS: mranma akkha.ra; pronounced: ) is an abugida in the Brahmic family, used for writing Burmese. It is an adaptation of the Old Mon script or the Pyu script. In recent decades, other alphabets using the Mon script, including Shan and Mon itself, have been restructured according to the standard of the now-dominant Burmese alphabet. Besides the Burmese language, the Burmese alphabet is also used for the liturgical languages of Pali and Sanskrit.The characters are rounded in appearance because the traditional palm leaves used for writing on with a stylus would have been ripped by straight lines. It is written from left to right and requires no spaces between words, although modern writing usually contains spaces after each clause to enhance readability.The earliest evidence of the Burmese alphabet is dated to 1035, while a casting made in the 18th century of an old stone inscription points to 984. Burmese calligraphy originally followed a square format but the cursive format took hold from the 17th century when popular writing led to the wider use of palm leaves and folded paper known as parabaiks. The alphabet has undergone considerable modification to suit the evolving phonology of the Burmese language.There are several systems of transliteration into the Latin alphabet; for this article, the MLC Transcription System is used.

U+010A0 Georgian (Alphabet, 96 codes from 10A0–10FF. Language georgian, abkhazian in ): Georgian is a Unicode block containing the Mkhedruli and Asomtavruli Georgian characters used to write Modern Georgian, Svan, and Mingrelian languages. Another lower case, Nuskhuri, is encoded in a separate Georgian Supplement block, which is used with the Asomtavruli to write the ecclesiastical Khutsuri Georgian script. The Georgian scripts are the three writing systems used to write the Georgian language: Asomtavruli, Nuskhuri and Mkhedruli. Their letters are equivalent, sharing the same names and alphabetical order and all three are unicameral (make no distinction between upper and lower case). Although each continues to be used, Mkhedruli (see below) is taken as the standard for Georgian and its related Kartvelian languages.The scripts originally had 38 letters. Georgian is currently written in a 33-letter alphabet, as five of the letters are obsolete in that language. The Mingrelian alphabet uses 36: the 33 of Georgian, one letter obsolete for that language, and two additional letters specific to Mingrelian and Svan. That same obsolete letter, plus a letter borrowed from Greek, are used in the 35-letter Laz alphabet. The fourth Kartvelian language, Svan, is not commonly written, but when it is it uses the letters of the Mingrelian alphabet, with an additional obsolete Georgian letter and sometimes supplemented by diacritics for its many vowels.

U+01100 Hangul Jamo (Abugida, 256 codes from 1100–11FF. Language korean in ): Hangul Jamo is a Unicode block containing positional (Choseong, Jungseong, and Jongseong) forms of the Hangul consonant and vowel clusters. They can be used to dynamically compose syllables that are not available as precomposed Hangul syllables in Unicode, specifically archaic syllables containing sounds that have since merged phonetically with other sounds in modern pronunciation.

**Ethiopic + Ethiopic Supplement + Cherokee** => 512 Zeichen utf-8 ab U+01200 hex = 4608 dezimal Details
Nr 10	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01200	ሀ	ሁ	ሂ	ሃ	ሄ	ህ	ሆ	ሇ	ለ	ሉ	ሊ	ላ	ሌ	ል	ሎ	ሏ	ሐ	ሑ	ሒ	ሓ	ሔ	ሕ	ሖ	ሗ	መ	ሙ	ሚ	ማ	ሜ	ም	ሞ	ሟ
01220	ሠ	ሡ	ሢ	ሣ	ሤ	ሥ	ሦ	ሧ	ረ	ሩ	ሪ	ራ	ሬ	ር	ሮ	ሯ	ሰ	ሱ	ሲ	ሳ	ሴ	ስ	ሶ	ሷ	ሸ	ሹ	ሺ	ሻ	ሼ	ሽ	ሾ	ሿ
01240	ቀ	ቁ	ቂ	ቃ	ቄ	ቅ	ቆ	ቇ	ቈ	቉	ቊ	ቋ	ቌ	ቍ	቎	቏	ቐ	ቑ	ቒ	ቓ	ቔ	ቕ	ቖ	቗	ቘ	቙	ቚ	ቛ	ቜ	ቝ	቞	቟
01260	በ	ቡ	ቢ	ባ	ቤ	ብ	ቦ	ቧ	ቨ	ቩ	ቪ	ቫ	ቬ	ቭ	ቮ	ቯ	ተ	ቱ	ቲ	ታ	ቴ	ት	ቶ	ቷ	ቸ	ቹ	ቺ	ቻ	ቼ	ች	ቾ	ቿ
01280	ኀ	ኁ	ኂ	ኃ	ኄ	ኅ	ኆ	ኇ	ኈ	኉	ኊ	ኋ	ኌ	ኍ	኎	኏	ነ	ኑ	ኒ	ና	ኔ	ን	ኖ	ኗ	ኘ	ኙ	ኚ	ኛ	ኜ	ኝ	ኞ	ኟ
012A0	አ	ኡ	ኢ	ኣ	ኤ	እ	ኦ	ኧ	ከ	ኩ	ኪ	ካ	ኬ	ክ	ኮ	ኯ	ኰ	኱	ኲ	ኳ	ኴ	ኵ	኶	኷	ኸ	ኹ	ኺ	ኻ	ኼ	ኽ	ኾ	኿
012C0	ዀ	዁	ዂ	ዃ	ዄ	ዅ	዆	዇	ወ	ዉ	ዊ	ዋ	ዌ	ው	ዎ	ዏ	ዐ	ዑ	ዒ	ዓ	ዔ	ዕ	ዖ	዗	ዘ	ዙ	ዚ	ዛ	ዜ	ዝ	ዞ	ዟ
012E0	ዠ	ዡ	ዢ	ዣ	ዤ	ዥ	ዦ	ዧ	የ	ዩ	ዪ	ያ	ዬ	ይ	ዮ	ዯ	ደ	ዱ	ዲ	ዳ	ዴ	ድ	ዶ	ዷ	ዸ	ዹ	ዺ	ዻ	ዼ	ዽ	ዾ	ዿ
10.2
01300	ጀ	ጁ	ጂ	ጃ	ጄ	ጅ	ጆ	ጇ	ገ	ጉ	ጊ	ጋ	ጌ	ግ	ጎ	ጏ	ጐ	጑	ጒ	ጓ	ጔ	ጕ	጖	጗	ጘ	ጙ	ጚ	ጛ	ጜ	ጝ	ጞ	ጟ
01320	ጠ	ጡ	ጢ	ጣ	ጤ	ጥ	ጦ	ጧ	ጨ	ጩ	ጪ	ጫ	ጬ	ጭ	ጮ	ጯ	ጰ	ጱ	ጲ	ጳ	ጴ	ጵ	ጶ	ጷ	ጸ	ጹ	ጺ	ጻ	ጼ	ጽ	ጾ	ጿ
01340	ፀ	ፁ	ፂ	ፃ	ፄ	ፅ	ፆ	ፇ	ፈ	ፉ	ፊ	ፋ	ፌ	ፍ	ፎ	ፏ	ፐ	ፑ	ፒ	ፓ	ፔ	ፕ	ፖ	ፗ	ፘ	ፙ	ፚ	፛	፜	፝	፞	፟
01360	፠	፡	።	፣	፤	፥	፦	፧	፨	፩	፪	፫	፬	፭	፮	፯	፰	፱	፲	፳	፴	፵	፶	፷	፸	፹	፺	፻	፼	፽	፾	፿
01380	ᎀ	ᎁ	ᎂ	ᎃ	ᎄ	ᎅ	ᎆ	ᎇ	ᎈ	ᎉ	ᎊ	ᎋ	ᎌ	ᎍ	ᎎ	ᎏ	᎐	᎑	᎒	᎓	᎔	᎕	᎖	᎗	᎘	᎙	᎚	᎛	᎜	᎝	᎞	᎟
013A0	Ꭰ	Ꭱ	Ꭲ	Ꭳ	Ꭴ	Ꭵ	Ꭶ	Ꭷ	Ꭸ	Ꭹ	Ꭺ	Ꭻ	Ꭼ	Ꭽ	Ꭾ	Ꭿ	Ꮀ	Ꮁ	Ꮂ	Ꮃ	Ꮄ	Ꮅ	Ꮆ	Ꮇ	Ꮈ	Ꮉ	Ꮊ	Ꮋ	Ꮌ	Ꮍ	Ꮎ	Ꮏ
013C0	Ꮐ	Ꮑ	Ꮒ	Ꮓ	Ꮔ	Ꮕ	Ꮖ	Ꮗ	Ꮘ	Ꮙ	Ꮚ	Ꮛ	Ꮜ	Ꮝ	Ꮞ	Ꮟ	Ꮠ	Ꮡ	Ꮢ	Ꮣ	Ꮤ	Ꮥ	Ꮦ	Ꮧ	Ꮨ	Ꮩ	Ꮪ	Ꮫ	Ꮬ	Ꮭ	Ꮮ	Ꮯ
013E0	Ꮰ	Ꮱ	Ꮲ	Ꮳ	Ꮴ	Ꮵ	Ꮶ	Ꮷ	Ꮸ	Ꮹ	Ꮺ	Ꮻ	Ꮼ	Ꮽ	Ꮾ	Ꮿ	Ᏸ	Ᏹ	Ᏺ	Ᏻ	Ᏼ	Ᏽ	᏶	᏷	ᏸ	ᏹ	ᏺ	ᏻ	ᏼ	ᏽ	᏾	᏿

Englische Dokumentation

U+01200 Ethiopic (Abugida, 384 codes from 1200–137F. Language in ): The Ethiopian script (Ge´ez alphabet — ግዕዝ) is an abugida (consonant-syllabic script) originally developed to register the ancient Ethiopian language called Geez in the state of Aksum. The languages that use the Ethiopian script have it go by the name of Fidäl (ፊደል), which means ´writing´ or ´alphabet´. The Ethiopian script continues to be very convenient for writing other languages too. The most common is Amharic and Tigrinya from Eritrea and Ethiopia. It is also used for some of the ´Gurage´ languages, as well as Meken and many other Ethiopian languages. Eritrea employs it for Tigre and traditionally for the Kush language called Bilin. But they were not the only ones to use the Ethiopian script. For example, it can also be found in some other Horn of Africa languages, like Oromo. However, in Oromo they switched to alphabets based on Latin. In 1956 there lived a man who contributed a lot to the development of the Ethiopian alphabet. His name was Sheikh Bakri Sapalo, he was a scholar, poet, and religious teacher. He invented a sillabarium (writing system), which resembled the Ethiopian in structure. Its basic characters served as the basis for the Oromo language.

U+01380 Ethiopic Supplement (Abugida, 32 codes from 1380–139F. Language in ): Ethiopic Supplement is a Unicode block containing extra Ge´ez characters for writing the Sebatbeit language, and Ethiopic tone marks.

U+013A0 Cherokee (Syllabary, 96 codes from 13A0–13FF. Language cherokee in ): The Cherokee script is a syllabic script invented by the Indian George Hess (also known as George Gist or tribe chief Sequoia) for the Cherokee language in 1819. His creation of the syllabary is particularly noteworthy, because he couldn´t read any script. He first experimented with logograms, but his system later developed into a syllabary. The descendants of Sequoia claim that the script was invented much earlier than when Sequoiawas born, so his role was reduced to being the last member of a special clan who guarded this script, but there is no confirmation or evidence of this. A year later, in 1820, thousands of Cherokee learned to write and read in this script. In 1830 90% of the Indians of this tribe mastered literacy and writing skills. The Cherokee script was used for more than a hundred years. It was published in books, religious texts, almanacs and newspapers (in particular, the Cherokee Phoenix newspaper). Today this script still exists and plays a very important role in the life of the Cherokee. For example, you need to speak and write Cherokee to get the status of a full member of the tribe. In addition, the authorities are trying to revive and popularize both the writing and the Cherokee language. The writing system consists of 85 syllabic signs. Some of them resemble Latin letters, but have a completely different meaning (for example, the sign for /a/ reminds of D). Not all phonemic oppositions are marked in writing. For example, /g/ and /k/ differ only in syllables with /a/. In the alphabet there are also no marks for the length and brevity of vowels and tonal differences. Besides, there is no accepted way to express consonant combinations. In this system, each symbol represents a syllable rather than a single phoneme. Some symbols do resemble the Latin, Greek and even Cyrillic scripts´ letters, but the sounds are completely different (for example, the sound /a/ is written with a letter that resembles Latin /d/).

**Unified Canadian Aboriginal Syllabics** => 512 Zeichen utf-8 ab U+01400 hex = 5120 dezimal Details
Nr 11	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01400	᐀	ᐁ	ᐂ	ᐃ	ᐄ	ᐅ	ᐆ	ᐇ	ᐈ	ᐉ	ᐊ	ᐋ	ᐌ	ᐍ	ᐎ	ᐏ	ᐐ	ᐑ	ᐒ	ᐓ	ᐔ	ᐕ	ᐖ	ᐗ	ᐘ	ᐙ	ᐚ	ᐛ	ᐜ	ᐝ	ᐞ	ᐟ
01420	ᐠ	ᐡ	ᐢ	ᐣ	ᐤ	ᐥ	ᐦ	ᐧ	ᐨ	ᐩ	ᐪ	ᐫ	ᐬ	ᐭ	ᐮ	ᐯ	ᐰ	ᐱ	ᐲ	ᐳ	ᐴ	ᐵ	ᐶ	ᐷ	ᐸ	ᐹ	ᐺ	ᐻ	ᐼ	ᐽ	ᐾ	ᐿ
01440	ᑀ	ᑁ	ᑂ	ᑃ	ᑄ	ᑅ	ᑆ	ᑇ	ᑈ	ᑉ	ᑊ	ᑋ	ᑌ	ᑍ	ᑎ	ᑏ	ᑐ	ᑑ	ᑒ	ᑓ	ᑔ	ᑕ	ᑖ	ᑗ	ᑘ	ᑙ	ᑚ	ᑛ	ᑜ	ᑝ	ᑞ	ᑟ
01460	ᑠ	ᑡ	ᑢ	ᑣ	ᑤ	ᑥ	ᑦ	ᑧ	ᑨ	ᑩ	ᑪ	ᑫ	ᑬ	ᑭ	ᑮ	ᑯ	ᑰ	ᑱ	ᑲ	ᑳ	ᑴ	ᑵ	ᑶ	ᑷ	ᑸ	ᑹ	ᑺ	ᑻ	ᑼ	ᑽ	ᑾ	ᑿ
01480	ᒀ	ᒁ	ᒂ	ᒃ	ᒄ	ᒅ	ᒆ	ᒇ	ᒈ	ᒉ	ᒊ	ᒋ	ᒌ	ᒍ	ᒎ	ᒏ	ᒐ	ᒑ	ᒒ	ᒓ	ᒔ	ᒕ	ᒖ	ᒗ	ᒘ	ᒙ	ᒚ	ᒛ	ᒜ	ᒝ	ᒞ	ᒟ
014A0	ᒠ	ᒡ	ᒢ	ᒣ	ᒤ	ᒥ	ᒦ	ᒧ	ᒨ	ᒩ	ᒪ	ᒫ	ᒬ	ᒭ	ᒮ	ᒯ	ᒰ	ᒱ	ᒲ	ᒳ	ᒴ	ᒵ	ᒶ	ᒷ	ᒸ	ᒹ	ᒺ	ᒻ	ᒼ	ᒽ	ᒾ	ᒿ
014C0	ᓀ	ᓁ	ᓂ	ᓃ	ᓄ	ᓅ	ᓆ	ᓇ	ᓈ	ᓉ	ᓊ	ᓋ	ᓌ	ᓍ	ᓎ	ᓏ	ᓐ	ᓑ	ᓒ	ᓓ	ᓔ	ᓕ	ᓖ	ᓗ	ᓘ	ᓙ	ᓚ	ᓛ	ᓜ	ᓝ	ᓞ	ᓟ
014E0	ᓠ	ᓡ	ᓢ	ᓣ	ᓤ	ᓥ	ᓦ	ᓧ	ᓨ	ᓩ	ᓪ	ᓫ	ᓬ	ᓭ	ᓮ	ᓯ	ᓰ	ᓱ	ᓲ	ᓳ	ᓴ	ᓵ	ᓶ	ᓷ	ᓸ	ᓹ	ᓺ	ᓻ	ᓼ	ᓽ	ᓾ	ᓿ
11.2
01500	ᔀ	ᔁ	ᔂ	ᔃ	ᔄ	ᔅ	ᔆ	ᔇ	ᔈ	ᔉ	ᔊ	ᔋ	ᔌ	ᔍ	ᔎ	ᔏ	ᔐ	ᔑ	ᔒ	ᔓ	ᔔ	ᔕ	ᔖ	ᔗ	ᔘ	ᔙ	ᔚ	ᔛ	ᔜ	ᔝ	ᔞ	ᔟ
01520	ᔠ	ᔡ	ᔢ	ᔣ	ᔤ	ᔥ	ᔦ	ᔧ	ᔨ	ᔩ	ᔪ	ᔫ	ᔬ	ᔭ	ᔮ	ᔯ	ᔰ	ᔱ	ᔲ	ᔳ	ᔴ	ᔵ	ᔶ	ᔷ	ᔸ	ᔹ	ᔺ	ᔻ	ᔼ	ᔽ	ᔾ	ᔿ
01540	ᕀ	ᕁ	ᕂ	ᕃ	ᕄ	ᕅ	ᕆ	ᕇ	ᕈ	ᕉ	ᕊ	ᕋ	ᕌ	ᕍ	ᕎ	ᕏ	ᕐ	ᕑ	ᕒ	ᕓ	ᕔ	ᕕ	ᕖ	ᕗ	ᕘ	ᕙ	ᕚ	ᕛ	ᕜ	ᕝ	ᕞ	ᕟ
01560	ᕠ	ᕡ	ᕢ	ᕣ	ᕤ	ᕥ	ᕦ	ᕧ	ᕨ	ᕩ	ᕪ	ᕫ	ᕬ	ᕭ	ᕮ	ᕯ	ᕰ	ᕱ	ᕲ	ᕳ	ᕴ	ᕵ	ᕶ	ᕷ	ᕸ	ᕹ	ᕺ	ᕻ	ᕼ	ᕽ	ᕾ	ᕿ
01580	ᖀ	ᖁ	ᖂ	ᖃ	ᖄ	ᖅ	ᖆ	ᖇ	ᖈ	ᖉ	ᖊ	ᖋ	ᖌ	ᖍ	ᖎ	ᖏ	ᖐ	ᖑ	ᖒ	ᖓ	ᖔ	ᖕ	ᖖ	ᖗ	ᖘ	ᖙ	ᖚ	ᖛ	ᖜ	ᖝ	ᖞ	ᖟ
015A0	ᖠ	ᖡ	ᖢ	ᖣ	ᖤ	ᖥ	ᖦ	ᖧ	ᖨ	ᖩ	ᖪ	ᖫ	ᖬ	ᖭ	ᖮ	ᖯ	ᖰ	ᖱ	ᖲ	ᖳ	ᖴ	ᖵ	ᖶ	ᖷ	ᖸ	ᖹ	ᖺ	ᖻ	ᖼ	ᖽ	ᖾ	ᖿ
015C0	ᗀ	ᗁ	ᗂ	ᗃ	ᗄ	ᗅ	ᗆ	ᗇ	ᗈ	ᗉ	ᗊ	ᗋ	ᗌ	ᗍ	ᗎ	ᗏ	ᗐ	ᗑ	ᗒ	ᗓ	ᗔ	ᗕ	ᗖ	ᗗ	ᗘ	ᗙ	ᗚ	ᗛ	ᗜ	ᗝ	ᗞ	ᗟ
015E0	ᗠ	ᗡ	ᗢ	ᗣ	ᗤ	ᗥ	ᗦ	ᗧ	ᗨ	ᗩ	ᗪ	ᗫ	ᗬ	ᗭ	ᗮ	ᗯ	ᗰ	ᗱ	ᗲ	ᗳ	ᗴ	ᗵ	ᗶ	ᗷ	ᗸ	ᗹ	ᗺ	ᗻ	ᗼ	ᗽ	ᗾ	ᗿ

Englische Dokumentation

U+01400 Unified Canadian Aboriginal Syllabics (Abugida, 640 codes from 1400–167F. Language cree in ): Unified Canadian Aboriginal Syllabics is a Unicode block containing characters for writing Inuktitut, Carrier, several dialects of Cree, and Canadian Athabascan languages. You can find the additions for some Cree dialects, Ojibwe, in the Unified Canadian Aboriginal Syllabics Extended block. Canadian Aboriginal syllabic writing, or simply syllabics, is a family of abugidas (consonant-based alphabets) used to write a number of Aboriginal Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. They are valued for their distinctiveness from the Latin script of the dominant languages and for the ease with which literacy can be achieved. In fact, by the late 19th century the Cree had achieved one of the highest rates of literacy in the world. Canadian syllabics are currently used to write all of the Cree languages from Naskapi (spoken in Quebec) to the Rocky Mountains, including Eastern Cree, Woods Cree, Swampy Cree and Plains Cree. You can also see them as Inuktitut texts in the eastern Canadian Arctic. Actually these Canadian syllabics perform as co-official with the Latin script in the territory of Nunavut. Apart from that, this script is met regionally for the other large Canadian Algonquian language, Ojibwe in Western Canada, as well as for Blackfoot, where the alphabet is actually considered obsolete. Among the Athabaskan languages further to the west, the syllabics have been used to write Dakelh (Carrier), Chipewyan, Slavey, Tłı̨chǫ (Dogrib) and Dane-zaa (Beaver). As for the United States, you may come across this kind of writing in communities that straddle the border, but it´s mostly a Canadian phenomenon.

**Unified Canadian Aboriginal Syllabics + Ogham + Runic + Tagalog + Hanunoo + Buhid + Tagbanwa + Khmer** => 512 Zeichen utf-8 ab U+01600 hex = 5632 dezimal Details
Nr 12	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01600	ᘀ	ᘁ	ᘂ	ᘃ	ᘄ	ᘅ	ᘆ	ᘇ	ᘈ	ᘉ	ᘊ	ᘋ	ᘌ	ᘍ	ᘎ	ᘏ	ᘐ	ᘑ	ᘒ	ᘓ	ᘔ	ᘕ	ᘖ	ᘗ	ᘘ	ᘙ	ᘚ	ᘛ	ᘜ	ᘝ	ᘞ	ᘟ
01620	ᘠ	ᘡ	ᘢ	ᘣ	ᘤ	ᘥ	ᘦ	ᘧ	ᘨ	ᘩ	ᘪ	ᘫ	ᘬ	ᘭ	ᘮ	ᘯ	ᘰ	ᘱ	ᘲ	ᘳ	ᘴ	ᘵ	ᘶ	ᘷ	ᘸ	ᘹ	ᘺ	ᘻ	ᘼ	ᘽ	ᘾ	ᘿ
01640	ᙀ	ᙁ	ᙂ	ᙃ	ᙄ	ᙅ	ᙆ	ᙇ	ᙈ	ᙉ	ᙊ	ᙋ	ᙌ	ᙍ	ᙎ	ᙏ	ᙐ	ᙑ	ᙒ	ᙓ	ᙔ	ᙕ	ᙖ	ᙗ	ᙘ	ᙙ	ᙚ	ᙛ	ᙜ	ᙝ	ᙞ	ᙟ
01660	ᙠ	ᙡ	ᙢ	ᙣ	ᙤ	ᙥ	ᙦ	ᙧ	ᙨ	ᙩ	ᙪ	ᙫ	ᙬ	᙭	᙮	ᙯ	ᙰ	ᙱ	ᙲ	ᙳ	ᙴ	ᙵ	ᙶ	ᙷ	ᙸ	ᙹ	ᙺ	ᙻ	ᙼ	ᙽ	ᙾ	ᙿ
01680		ᚁ	ᚂ	ᚃ	ᚄ	ᚅ	ᚆ	ᚇ	ᚈ	ᚉ	ᚊ	ᚋ	ᚌ	ᚍ	ᚎ	ᚏ	ᚐ	ᚑ	ᚒ	ᚓ	ᚔ	ᚕ	ᚖ	ᚗ	ᚘ	ᚙ	ᚚ	᚛	᚜	᚝	᚞	᚟
016A0	ᚠ	ᚡ	ᚢ	ᚣ	ᚤ	ᚥ	ᚦ	ᚧ	ᚨ	ᚩ	ᚪ	ᚫ	ᚬ	ᚭ	ᚮ	ᚯ	ᚰ	ᚱ	ᚲ	ᚳ	ᚴ	ᚵ	ᚶ	ᚷ	ᚸ	ᚹ	ᚺ	ᚻ	ᚼ	ᚽ	ᚾ	ᚿ
016C0	ᛀ	ᛁ	ᛂ	ᛃ	ᛄ	ᛅ	ᛆ	ᛇ	ᛈ	ᛉ	ᛊ	ᛋ	ᛌ	ᛍ	ᛎ	ᛏ	ᛐ	ᛑ	ᛒ	ᛓ	ᛔ	ᛕ	ᛖ	ᛗ	ᛘ	ᛙ	ᛚ	ᛛ	ᛜ	ᛝ	ᛞ	ᛟ
016E0	ᛠ	ᛡ	ᛢ	ᛣ	ᛤ	ᛥ	ᛦ	ᛧ	ᛨ	ᛩ	ᛪ	᛫	᛬	᛭	ᛮ	ᛯ	ᛰ	ᛱ	ᛲ	ᛳ	ᛴ	ᛵ	ᛶ	ᛷ	ᛸ	᛹	᛺	᛻	᛼	᛽	᛾	᛿
12.2
01700	ᜀ	ᜁ	ᜂ	ᜃ	ᜄ	ᜅ	ᜆ	ᜇ	ᜈ	ᜉ	ᜊ	ᜋ	ᜌ	ᜍ	ᜎ	ᜏ	ᜐ	ᜑ	ᜒ	ᜓ	᜔	᜕	᜖	᜗	᜘	᜙	᜚	᜛	᜜	᜝	᜞	ᜟ
01720	ᜠ	ᜡ	ᜢ	ᜣ	ᜤ	ᜥ	ᜦ	ᜧ	ᜨ	ᜩ	ᜪ	ᜫ	ᜬ	ᜭ	ᜮ	ᜯ	ᜰ	ᜱ	ᜲ	ᜳ	᜴	᜵	᜶	᜷	᜸	᜹	᜺	᜻	᜼	᜽	᜾	᜿
01740	ᝀ	ᝁ	ᝂ	ᝃ	ᝄ	ᝅ	ᝆ	ᝇ	ᝈ	ᝉ	ᝊ	ᝋ	ᝌ	ᝍ	ᝎ	ᝏ	ᝐ	ᝑ	ᝒ	ᝓ	᝔	᝕	᝖	᝗	᝘	᝙	᝚	᝛	᝜	᝝	᝞	᝟
01760	ᝠ	ᝡ	ᝢ	ᝣ	ᝤ	ᝥ	ᝦ	ᝧ	ᝨ	ᝩ	ᝪ	ᝫ	ᝬ	᝭	ᝮ	ᝯ	ᝰ	᝱	ᝲ	ᝳ	᝴	᝵	᝶	᝷	᝸	᝹	᝺	᝻	᝼	᝽	᝾	᝿
01780	ក	ខ	គ	ឃ	ង	ច	ឆ	ជ	ឈ	ញ	ដ	ឋ	ឌ	ឍ	ណ	ត	ថ	ទ	ធ	ន	ប	ផ	ព	ភ	ម	យ	រ	ល	វ	ឝ	ឞ	ស
017A0	ហ	ឡ	អ	ឣ	ឤ	ឥ	ឦ	ឧ	ឨ	ឩ	ឪ	ឫ	ឬ	ឭ	ឮ	ឯ	ឰ	ឱ	ឲ	ឳ	឴	឵	ា	ិ	ី	ឹ	ឺ	ុ	ូ	ួ	ើ	ឿ
017C0	ៀ	េ	ែ	ៃ	ោ	ៅ	ំ	ះ	ៈ	៉	៊	់	៌	៍	៎	៏	័	៑	្	៓	។	៕	៖	ៗ	៘	៙	៚	៛	ៜ	៝	៞	៟
017E0	០	១	២	៣	៤	៥	៦	៧	៨	៩	៪	៫	៬	៭	៮	៯	៰	៱	៲	៳	៴	៵	៶	៷	៸	៹	៺	៻	៼	៽	៾	៿

Englische Dokumentation

U+01600 Unified Canadian Aboriginal Syllabics (Abugida, 640 codes from 1400–167F. Language cree in ): Unified Canadian Aboriginal Syllabics is a Unicode block containing characters for writing Inuktitut, Carrier, several dialects of Cree, and Canadian Athabascan languages. You can find the additions for some Cree dialects, Ojibwe, in the Unified Canadian Aboriginal Syllabics Extended block. Canadian Aboriginal syllabic writing, or simply syllabics, is a family of abugidas (consonant-based alphabets) used to write a number of Aboriginal Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. They are valued for their distinctiveness from the Latin script of the dominant languages and for the ease with which literacy can be achieved. In fact, by the late 19th century the Cree had achieved one of the highest rates of literacy in the world. Canadian syllabics are currently used to write all of the Cree languages from Naskapi (spoken in Quebec) to the Rocky Mountains, including Eastern Cree, Woods Cree, Swampy Cree and Plains Cree. You can also see them as Inuktitut texts in the eastern Canadian Arctic. Actually these Canadian syllabics perform as co-official with the Latin script in the territory of Nunavut. Apart from that, this script is met regionally for the other large Canadian Algonquian language, Ojibwe in Western Canada, as well as for Blackfoot, where the alphabet is actually considered obsolete. Among the Athabaskan languages further to the west, the syllabics have been used to write Dakelh (Carrier), Chipewyan, Slavey, Tłı̨chǫ (Dogrib) and Dane-zaa (Beaver). As for the United States, you may come across this kind of writing in communities that straddle the border, but it´s mostly a Canadian phenomenon.

U+01680 Ogham (Alphabet, 32 codes from 1680–169F. Language primitive irish, pictish in ): Ogham is a Unicode block containing characters for representing Old Irish inscriptions. Ogham /ˈɒɡəm/ (Modern Irish ˈoːmˠ or ˈoːəmˠ; Old Irish: ogam ˈɔɣamˠ) is an Early Medieval alphabet used primarily to write the early Irish language (in the so-called “orthodox” inscriptions, 4th to 6th centuries), and later the Old Irish language (so-called scholastic ogham, 6th to 9th centuries). There are roughly 400 surviving orthodox inscriptions on stone monuments throughout Ireland and western Britain; the bulk of them are in the south of Ireland, in Counties Kerry, Cork and Waterford. The largest number outside of Ireland is in Pembrokeshire in Wales.The vast majority of the inscriptions consist of personal names.According to the High Medieval Bríatharogam, names of various trees can be ascribed to individual letters.The etymology of the word ogam or ogham remains unclear. One possible origin is from the Irish og-úaim ´point-seam´, referring to the seam made by the point of a sharp weapon.

U+016A0 Runic (Alphabet, 96 codes from 16A0–16FF. Language old italic, runic in ): Runic is a Unicode block containing characters for writing Futhark runic inscriptions. Although many of the characters appear similar, they should not be confused with the J.R.R. Tolkien-designed Cirth, which has a separate ConScript Unicode Registry encoding. However, in Unicode 7.0 some additional Runic characters were added, including three Runic characters that were used only by Tolkien, for example in the maps of Hobbit: these are different from Cirth. Runes (Proto-Norse: ᚱᚢᚾᛟ (runo), Old Norse: rún) are the letters in a set of related alphabets known as runic alphabets, which were used to write various Germanic languages before the adoption of the Latin alphabet and for specialised purposes thereafter. The Scandinavian variants are also known as futhark or fuþark (derived from their first six letters of the alphabet: F, U, Þ, A, R, and K); the Anglo-Saxon variant is futhorc or fuþorc (due to sound changes undergone in Old English by the names of those six letters). Runology is the study of the runic alphabets, runic inscriptions, runestones, and their history. Runology forms a specialised branch of Germanic linguistics. The earliest runic inscriptions date from around 150 AD. The characters were generally replaced by the Latin alphabet as the cultures that had used runes underwent Christianisation, by approximately 700 AD in central Europe and 1100 AD in northern Europe. However, the use of runes persisted for specialized purposes in northern Europe. Until the early 20th century, runes were used in rural Sweden for decorative purposes in Dalarna and on Runic calendars. The three best-known runic alphabets are the Elder Futhark (around 150–800 AD), the Anglo-Saxon Futhorc (400–1100 AD), and the Younger Futhark (800–1100 AD). The Younger Futhark is divided further into the long-branch runes (also called Danish, although they were also used in Norway and Sweden); short-branch or Rök runes (also called Swedish-Norwegian, although they were also used in Denmark); and the stavlösa or Hälsinge runes (staveless runes). The Younger Futhark developed further into the Marcomannic runes, the Medieval runes (1100–1500 AD), and the Dalecarlian runes (around 1500–1800 AD). Historically, the runic alphabet is a derivation of the Old Italic alphabets of antiquity, with the addition of some innovations. Which variant of the Old Italic family in particular gave rise to the runes is uncertain. Suggestions include Raetic, Etruscan, or Old Latin as candidates. At the time, all of these scripts had the same angular letter shapes suited for epigraphy, which would become characteristic of the runes. The process of transmission of the script is unknown. The oldest inscriptions are found in Denmark and northern Germany, not near Italy. A “West Germanic hypothesis” suggests transmission via Elbe Germanic groups, while a “Gothic hypothesis” presumes transmission via East Germanic expansion.

U+01700 Tagalog (Abugida, 32 codes from 1700–171F. Language tagalog, runic in ): Tagalog is a Unicode block containing characters of the pre-Spanish Philippine Baybayin script used for writing the Tagalog language. Tagalog /təˈɡɑːlɒɡ/ (Tagalog: ) is an Austronesian language spoken as a first language by a quarter of the population of the Philippines and as a second language by the majority. It is the first language of the Philippine region IV (CALABARZON and MIMAROPA), of Bulacan and of Metro Manila. Its standardized form, officially named Filipino, is the national language and one of two official languages of the Philippines, the other being English.It is related to other Philippine languages such as the Bikol languages, Ilokano, the Visayan languages, and Kapampangan, and more distantly to other Austronesian languages such as Indonesian, Hawaiian and Malagasy.

U+01720 Hanunoo (Abugida, 32 codes from 1720–173F. Language hanunoo in ): Hanunoo is a Unicode block containing characters used for writing the Hanunó´o language. Hanunó’o is one of the indigenous scripts of the Philippines and is used by the Mangyan peoples of southern Mindoro to write the Hanunó´o language. It is an abugida descended from the Brahmic scripts, closely related to Baybayin, and is famous for being written vertical but written upward, rather than downward as nearly all other scripts (however, it´s read horizontally left to right). It is usually written on bamboo by incising characters with a knife. Most known Hanunó´o inscriptions are relatively recent because of the perishable nature of bamboo. It is therefore difficult to trace the history of the script.

U+01740 Buhid (Abugida, 32 codes from 1740–175F. Language buhid in ): Buhid is a Unicode block containing characters for writing the Buhid language of the Philippines. Buhid is a Brahmic script of the Philippines, closely related to Baybayin, and is used today by the Mangyans to write their language, Buhid.

U+01760 Tagbanwa (Abugida, 32 codes from 1760–177F. Language in ): Tagbanwa is a Unicode block containing characters for writing the Tagbanwa languages. Tagbanwa, also known as Apurahuano, is one of the writing systems of the Philippines. The Tagbanwa languages (Aborlan, Calamian, and Central), which are Austronesian languages with about 8,000 speakers in the central and northern regions of Palawan, are dying out as the younger generations of Tagbanwa are learning Cuyonon and Tagalog.

U+01780 Khmer (Abugida, 128 codes from 1780–17FF. Language khmer in ): Khmer is a Unicode block containing characters for writing the Khmer, or Cambodian, language. The Khmer alphabet or Khmer script (IPA: ʔaʔsɑː kʰmaːe) is an abugida, which means that it´s a consonant-driven script. It´s used to write the Khmer language (the official language of Cambodia). Apart from that, the script is applied for Pali in the Buddhist liturgy of Cambodia and Thailand. The origins of Khmer go back to the Pallava script, which it was adopted from. Pallava is a variant of the Grantha alphabet descended from the Brahmi script, which was used in southern India and South East Asia during the 5th and 6th centuries AD. I know, this chain seems complicated, but doesn´t all linguistics? Anyway, the oldest Khmer inscription was found at Angkor Borei District in Takéo Province south of Phnom Penh and it dates back to 611. As for the modern Khmer script, it differs a lot from its precedent forms on the inscriptions of the Angkor ruins. The Thai0E00–0E7F and Lao0E80–0EFF scripts have descended from an older form of the Khmer script. Khmer is written from left to right. Words within one sentence or phrase usually come together with no spaces between them. Consonant clusters within a word are “stacked”, with the second (and occasionally third) consonant being written in reduced form under the main consonant. Originally there were 35 consonant characters, but modern Khmer uses only 33. Each character in fact represents a consonant sound together with an inherent vowel – either â or ô. You might remember that Khmer is an abugida. That´s why vowel sounds are more commonly represented as dependent vowels – additional marks accompanying a consonant character, and indicating what vowel sound is to be pronounced after that consonant (or consonant cluster). Most dependent vowels have two different pronunciations, depending in most cases on the inherent vowel of the consonant to which they are added. In some positions, a consonant written with no dependent vowel is taken to be followed by the sound of its inherent vowel. Needless to say, there are also a number of diacritics used to indicate further modifications in pronunciation. The script also includes its own numerals and punctuation marks.

**Mongolian + Unified Canadian Aboriginal Syllabics Extended + Limbu + Tai Le + New Tai Lue + Khmer Symbols** => 512 Zeichen utf-8 ab U+01800 hex = 6144 dezimal Details
Nr 13	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01800	᠀	᠁	᠂	᠃	᠄	᠅	᠆	᠇	᠈	᠉	᠊	᠋	᠌	᠍	᠎	᠏	᠐	᠑	᠒	᠓	᠔	᠕	᠖	᠗	᠘	᠙	᠚	᠛	᠜	᠝	᠞	᠟
01820	ᠠ	ᠡ	ᠢ	ᠣ	ᠤ	ᠥ	ᠦ	ᠧ	ᠨ	ᠩ	ᠪ	ᠫ	ᠬ	ᠭ	ᠮ	ᠯ	ᠰ	ᠱ	ᠲ	ᠳ	ᠴ	ᠵ	ᠶ	ᠷ	ᠸ	ᠹ	ᠺ	ᠻ	ᠼ	ᠽ	ᠾ	ᠿ
01840	ᡀ	ᡁ	ᡂ	ᡃ	ᡄ	ᡅ	ᡆ	ᡇ	ᡈ	ᡉ	ᡊ	ᡋ	ᡌ	ᡍ	ᡎ	ᡏ	ᡐ	ᡑ	ᡒ	ᡓ	ᡔ	ᡕ	ᡖ	ᡗ	ᡘ	ᡙ	ᡚ	ᡛ	ᡜ	ᡝ	ᡞ	ᡟ
01860	ᡠ	ᡡ	ᡢ	ᡣ	ᡤ	ᡥ	ᡦ	ᡧ	ᡨ	ᡩ	ᡪ	ᡫ	ᡬ	ᡭ	ᡮ	ᡯ	ᡰ	ᡱ	ᡲ	ᡳ	ᡴ	ᡵ	ᡶ	ᡷ	ᡸ	᡹	᡺	᡻	᡼	᡽	᡾	᡿
01880	ᢀ	ᢁ	ᢂ	ᢃ	ᢄ	ᢅ	ᢆ	ᢇ	ᢈ	ᢉ	ᢊ	ᢋ	ᢌ	ᢍ	ᢎ	ᢏ	ᢐ	ᢑ	ᢒ	ᢓ	ᢔ	ᢕ	ᢖ	ᢗ	ᢘ	ᢙ	ᢚ	ᢛ	ᢜ	ᢝ	ᢞ	ᢟ
018A0	ᢠ	ᢡ	ᢢ	ᢣ	ᢤ	ᢥ	ᢦ	ᢧ	ᢨ	ᢩ	ᢪ	᢫	᢬	᢭	᢮	᢯	ᢰ	ᢱ	ᢲ	ᢳ	ᢴ	ᢵ	ᢶ	ᢷ	ᢸ	ᢹ	ᢺ	ᢻ	ᢼ	ᢽ	ᢾ	ᢿ
018C0	ᣀ	ᣁ	ᣂ	ᣃ	ᣄ	ᣅ	ᣆ	ᣇ	ᣈ	ᣉ	ᣊ	ᣋ	ᣌ	ᣍ	ᣎ	ᣏ	ᣐ	ᣑ	ᣒ	ᣓ	ᣔ	ᣕ	ᣖ	ᣗ	ᣘ	ᣙ	ᣚ	ᣛ	ᣜ	ᣝ	ᣞ	ᣟ
018E0	ᣠ	ᣡ	ᣢ	ᣣ	ᣤ	ᣥ	ᣦ	ᣧ	ᣨ	ᣩ	ᣪ	ᣫ	ᣬ	ᣭ	ᣮ	ᣯ	ᣰ	ᣱ	ᣲ	ᣳ	ᣴ	ᣵ	᣶	᣷	᣸	᣹	᣺	᣻	᣼	᣽	᣾	᣿
13.2
01900	ᤀ	ᤁ	ᤂ	ᤃ	ᤄ	ᤅ	ᤆ	ᤇ	ᤈ	ᤉ	ᤊ	ᤋ	ᤌ	ᤍ	ᤎ	ᤏ	ᤐ	ᤑ	ᤒ	ᤓ	ᤔ	ᤕ	ᤖ	ᤗ	ᤘ	ᤙ	ᤚ	ᤛ	ᤜ	ᤝ	ᤞ	᤟
01920	ᤠ	ᤡ	ᤢ	ᤣ	ᤤ	ᤥ	ᤦ	ᤧ	ᤨ	ᤩ	ᤪ	ᤫ	᤬	᤭	᤮	᤯	ᤰ	ᤱ	ᤲ	ᤳ	ᤴ	ᤵ	ᤶ	ᤷ	ᤸ	᤹	᤺	᤻	᤼	᤽	᤾	᤿
01940	᥀	᥁	᥂	᥃	᥄	᥅	᥆	᥇	᥈	᥉	᥊	᥋	᥌	᥍	᥎	᥏	ᥐ	ᥑ	ᥒ	ᥓ	ᥔ	ᥕ	ᥖ	ᥗ	ᥘ	ᥙ	ᥚ	ᥛ	ᥜ	ᥝ	ᥞ	ᥟ
01960	ᥠ	ᥡ	ᥢ	ᥣ	ᥤ	ᥥ	ᥦ	ᥧ	ᥨ	ᥩ	ᥪ	ᥫ	ᥬ	ᥭ	᥮	᥯	ᥰ	ᥱ	ᥲ	ᥳ	ᥴ	᥵	᥶	᥷	᥸	᥹	᥺	᥻	᥼	᥽	᥾	᥿
01980	ᦀ	ᦁ	ᦂ	ᦃ	ᦄ	ᦅ	ᦆ	ᦇ	ᦈ	ᦉ	ᦊ	ᦋ	ᦌ	ᦍ	ᦎ	ᦏ	ᦐ	ᦑ	ᦒ	ᦓ	ᦔ	ᦕ	ᦖ	ᦗ	ᦘ	ᦙ	ᦚ	ᦛ	ᦜ	ᦝ	ᦞ	ᦟ
019A0	ᦠ	ᦡ	ᦢ	ᦣ	ᦤ	ᦥ	ᦦ	ᦧ	ᦨ	ᦩ	ᦪ	ᦫ	᦬	᦭	᦮	᦯	ᦰ	ᦱ	ᦲ	ᦳ	ᦴ	ᦵ	ᦶ	ᦷ	ᦸ	ᦹ	ᦺ	ᦻ	ᦼ	ᦽ	ᦾ	ᦿ
019C0	ᧀ	ᧁ	ᧂ	ᧃ	ᧄ	ᧅ	ᧆ	ᧇ	ᧈ	ᧉ	᧊	᧋	᧌	᧍	᧎	᧏	᧐	᧑	᧒	᧓	᧔	᧕	᧖	᧗	᧘	᧙	᧚	᧛	᧜	᧝	᧞	᧟
019E0	᧠	᧡	᧢	᧣	᧤	᧥	᧦	᧧	᧨	᧩	᧪	᧫	᧬	᧭	᧮	᧯	᧰	᧱	᧲	᧳	᧴	᧵	᧶	᧷	᧸	᧹	᧺	᧻	᧼	᧽	᧾	᧿

Englische Dokumentation

U+01800 Mongolian (Alphabet, 176 codes from 1800–18AF. Language mongolian in ): Mongolian is a Unicode block containing characters for dialects of Mongolian, Manchu, and Sibe languages. It is traditionally written in vertical lines Top-Down, right across the page, although the Unicode code charts cite the characters rotated to horizontal orientation. Many alphabets have been devised for the Mongolian language over the centuries, and from a variety of scripts. The oldest, called simply the Mongolian script, has been the predominant script during most of Mongolian history, and is still in active use today in the Inner Mongolia region of China. It has spawned several alphabets, either as attempts to fix its perceived shortcomings, or to allow the notation of other languages, such as Sanskrit and Tibetan0F00–0FFF. In the 20th century, Mongolia first switched to the Latin script, and then almost immediately replaced it with the Cyrillic script for compatibility with the Soviet Union, its political ally of the time. Mongols in Inner Mongolia and other parts of China, on the other hand, continue to use alphabets based on the traditional Mongolian script.

U+018B0 Unified Canadian Aboriginal Syllabics Extended (Abugida, 80 codes from 18B0–18FF. Language cree in ): Unified Canadian Aboriginal Syllabics Extended is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block for some dialects of Cree, Ojibwe, Dene, and Carrier.

U+01900 Limbu (Abugida, 80 codes from 1900–194F. Language limbu in ): The Limbu script is used to write the Limbu language. The Limbu script is an abugida derived from the Tibetan script.

U+01950 Tai Le (Abugida, 48 codes from 1950–197F. Language in ): Tai Le is the name of Tai Nüa script, the script used for the Tai Nüa language.Tai Nüa (Tai Nüa: ᥖᥭᥰᥖᥬᥳᥑᥨᥒᥰ) (also called Tai Nɯa, Dehong Dai, or Chinese Shan; own name: Tai2 Lə6, which means “upper Tai” or “northern Tai”, or ᥖᥭᥰᥖᥬᥳᥑᥨᥒᥰ ; Chinese: Dǎinǎyǔ 傣哪语 or Déhóng Dǎiyǔ 德宏傣语; Thai: ภาษาไทเหนือ, pronounced or ภาษาไทใต้คง, pronounced ) is one of the languages spoken by the Dai people in China, especially in the Dehong Dai and Jingpo Autonomous Prefecture in the southwest of Yunnan province. It is closely related to the other Tai languages. Speakers of this language across the border in Myanmar are known as Shan. It should not be confused with Tai Lü (Xishuangbanna Dai). There are also Tai Nüa speakers in Thailand.

U+01980 New Tai Lue (Alphabet, 96 codes from 1980–19DF. Language in ): New Tai Lue is a Unicode block containing characters for writing the Tai Lü language. New Tai Lue script, also known as Simplified Tai Lue, is an alphabet used to write the Tai Lü language. Developed in China in the 1950s, New Tai Lue is based on the traditional Tai Le alphabet developed ca. 1200 AD. The government of China promoted the alphabet for use as a replacement for the older script; teaching the script was not mandatory, however, and as a result many are illiterate in New Thai Lue. In addition, communities in Burma, Laos, Thailand and Vietnam still use the Tai Le alphabet.

U+019E0 Khmer Symbols (, 32 codes from 19E0–19FF. Language in ): Khmer Anz is a Unicode block containing lunar date symbols, used in the writing system of the Khmer (Cambodian) language.

**Buginese + Tai Tham + Combining Diacritical Marks Extended + Balinese + Sundanese + Batak** => 512 Zeichen utf-8 ab U+01A00 hex = 6656 dezimal Details
Nr 14	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01A00	ᨀ	ᨁ	ᨂ	ᨃ	ᨄ	ᨅ	ᨆ	ᨇ	ᨈ	ᨉ	ᨊ	ᨋ	ᨌ	ᨍ	ᨎ	ᨏ	ᨐ	ᨑ	ᨒ	ᨓ	ᨔ	ᨕ	ᨖ	ᨗ	ᨘ	ᨙ	ᨚ	ᨛ	᨜	᨝	᨞	᨟
01A20	ᨠ	ᨡ	ᨢ	ᨣ	ᨤ	ᨥ	ᨦ	ᨧ	ᨨ	ᨩ	ᨪ	ᨫ	ᨬ	ᨭ	ᨮ	ᨯ	ᨰ	ᨱ	ᨲ	ᨳ	ᨴ	ᨵ	ᨶ	ᨷ	ᨸ	ᨹ	ᨺ	ᨻ	ᨼ	ᨽ	ᨾ	ᨿ
01A40	ᩀ	ᩁ	ᩂ	ᩃ	ᩄ	ᩅ	ᩆ	ᩇ	ᩈ	ᩉ	ᩊ	ᩋ	ᩌ	ᩍ	ᩎ	ᩏ	ᩐ	ᩑ	ᩒ	ᩓ	ᩔ	ᩕ	ᩖ	ᩗ	ᩘ	ᩙ	ᩚ	ᩛ	ᩜ	ᩝ	ᩞ	᩟
01A60	᩠	ᩡ	ᩢ	ᩣ	ᩤ	ᩥ	ᩦ	ᩧ	ᩨ	ᩩ	ᩪ	ᩫ	ᩬ	ᩭ	ᩮ	ᩯ	ᩰ	ᩱ	ᩲ	ᩳ	ᩴ	᩵	᩶	᩷	᩸	᩹	᩺	᩻	᩼	᩽	᩾	᩿
01A80	᪀	᪁	᪂	᪃	᪄	᪅	᪆	᪇	᪈	᪉	᪊	᪋	᪌	᪍	᪎	᪏	᪐	᪑	᪒	᪓	᪔	᪕	᪖	᪗	᪘	᪙	᪚	᪛	᪜	᪝	᪞	᪟
01AA0	᪠	᪡	᪢	᪣	᪤	᪥	᪦	ᪧ	᪨	᪩	᪪	᪫	᪬	᪭	᪮	᪯	᪰	᪱	᪲	᪳	᪴	᪵	᪶	᪷	᪸	᪹	᪺	᪻	᪼	᪽	᪾	ᪿ
01AC0	ᫀ	᫁	᫂	᫃	᫄	᫅	᫆	᫇	᫈	᫉	᫊	᫋	ᫌ	ᫍ	ᫎ	᫏	᫐	᫑	᫒	᫓	᫔	᫕	᫖	᫗	᫘	᫙	᫚	᫛	᫜	᫝	᫞	᫟
01AE0	᫠	᫡	᫢	᫣	᫤	᫥	᫦	᫧	᫨	᫩	᫪	᫫	᫬	᫭	᫮	᫯	᫰	᫱	᫲	᫳	᫴	᫵	᫶	᫷	᫸	᫹	᫺	᫻	᫼	᫽	᫾	᫿
14.2
01B00	ᬀ	ᬁ	ᬂ	ᬃ	ᬄ	ᬅ	ᬆ	ᬇ	ᬈ	ᬉ	ᬊ	ᬋ	ᬌ	ᬍ	ᬎ	ᬏ	ᬐ	ᬑ	ᬒ	ᬓ	ᬔ	ᬕ	ᬖ	ᬗ	ᬘ	ᬙ	ᬚ	ᬛ	ᬜ	ᬝ	ᬞ	ᬟ
01B20	ᬠ	ᬡ	ᬢ	ᬣ	ᬤ	ᬥ	ᬦ	ᬧ	ᬨ	ᬩ	ᬪ	ᬫ	ᬬ	ᬭ	ᬮ	ᬯ	ᬰ	ᬱ	ᬲ	ᬳ	᬴	ᬵ	ᬶ	ᬷ	ᬸ	ᬹ	ᬺ	ᬻ	ᬼ	ᬽ	ᬾ	ᬿ
01B40	ᭀ	ᭁ	ᭂ	ᭃ	᭄	ᭅ	ᭆ	ᭇ	ᭈ	ᭉ	ᭊ	ᭋ	ᭌ	᭍	᭎	᭏	᭐	᭑	᭒	᭓	᭔	᭕	᭖	᭗	᭘	᭙	᭚	᭛	᭜	᭝	᭞	᭟
01B60	᭠	᭡	᭢	᭣	᭤	᭥	᭦	᭧	᭨	᭩	᭪	᭫	᭬	᭭	᭮	᭯	᭰	᭱	᭲	᭳	᭴	᭵	᭶	᭷	᭸	᭹	᭺	᭻	᭼	᭽	᭾	᭿
01B80	ᮀ	ᮁ	ᮂ	ᮃ	ᮄ	ᮅ	ᮆ	ᮇ	ᮈ	ᮉ	ᮊ	ᮋ	ᮌ	ᮍ	ᮎ	ᮏ	ᮐ	ᮑ	ᮒ	ᮓ	ᮔ	ᮕ	ᮖ	ᮗ	ᮘ	ᮙ	ᮚ	ᮛ	ᮜ	ᮝ	ᮞ	ᮟ
01BA0	ᮠ	ᮡ	ᮢ	ᮣ	ᮤ	ᮥ	ᮦ	ᮧ	ᮨ	ᮩ	᮪	᮫	ᮬ	ᮭ	ᮮ	ᮯ	᮰	᮱	᮲	᮳	᮴	᮵	᮶	᮷	᮸	᮹	ᮺ	ᮻ	ᮼ	ᮽ	ᮾ	ᮿ
01BC0	ᯀ	ᯁ	ᯂ	ᯃ	ᯄ	ᯅ	ᯆ	ᯇ	ᯈ	ᯉ	ᯊ	ᯋ	ᯌ	ᯍ	ᯎ	ᯏ	ᯐ	ᯑ	ᯒ	ᯓ	ᯔ	ᯕ	ᯖ	ᯗ	ᯘ	ᯙ	ᯚ	ᯛ	ᯜ	ᯝ	ᯞ	ᯟ
01BE0	ᯠ	ᯡ	ᯢ	ᯣ	ᯤ	ᯥ	᯦	ᯧ	ᯨ	ᯩ	ᯪ	ᯫ	ᯬ	ᯭ	ᯮ	ᯯ	ᯰ	ᯱ	᯲	᯳	᯴	᯵	᯶	᯷	᯸	᯹	᯺	᯻	᯼	᯽	᯾	᯿

Englische Dokumentation

U+01A00 Buginese (Abugida, 32 codes from 1A00–1A1F. Language buginese, makassarese, mandar in ): Buginese is a Unicode block containing characters for writing the Buginese language of Sulawesi. The Lontara script is a Brahmic script traditionally used for the Bugis, Makassarese, and Mandar languages of Sulawesi in Indonesia. It is also known as the Buginese script, as Lontara documents written in this language are the most numerous. It was largely replaced by the Latin alphabet during the period of Dutch colonization, though it is still used today to a limited extent. The term Lontara is derived from the Malay name for palmyra palm, lontar, whose leaves are traditionally used for manuscripts. In Buginese, this script is called urupu sulapa eppa which means “four-cornered letters”, referencing the Bugis-Makasar belief of the four elements that shaped the universe: fire, water, air, and earth.

U+01A20 Tai Tham (Abugida, 144 codes from 1A20–1AAF. Language in ): Tai Tham is a Unicode block containing characters of the Lanna script used for writing the Northern Thai (Kam Mu´ang), Tai Lü, and Khün languages. The Tai Tham script (Northern Thai pronunciation: , tua mueanɡ; Tai Lü: ᦒᧄ , Tham, “scripture”), also known as the Lanna script or Tua Mueang, is used for three living languages: Northern Thai (that is, Kham Mueang), Tai Lü and Khün. In addition, the Lanna script is used for Lao Tham (or old Lao) and other dialect variants in Buddhist palm leaves and notebooks. The script is also known as Tham or Yuan script.The Northern Thai language is a close relative of Thai and member of the Chiang Saeng language family. It is spoken by nearly 6,000,000 people in Northern Thailand and several thousand in Laos of whom few are literate in Lanna script. The script is still read by older monks. Northern Thai has six linguistic tones and Thai only five, making transcription into the Thai alphabet problematic. There is some resurgent interest in the script among younger people, but an added complication is that the modern spoken form, called Kammuang, differs in pronunciation from the older form.There are 670,000 speakers of Tai Lü of whom those born before 1950 are literate in Lanna script. The script has also continued to be taught in the monasteries. There are 120,000 speakers of Khün for which Lanna is the only script.

U+01AB0 Combining Diacritical Marks Extended (, 80 codes from 1AB0–1AFF. Language in ): Combining Diacritical Marks Extended is a Unicode block containing diactritical marks used in German dialectology.

U+01B00 Balinese (Abugida, 128 codes from 1B00–1B7F. Language balinese, sasak in ): Balinese is a Unicode block containing characters for the basa Bali language. The Balinese script, natively known as Aksara Bali and Hanacaraka, is an abugida used in the island of Bali, Indonesia, commonly for writing the Austronesian Balinese language, Old Javanese, and the liturgical language Sanskrit. With some modifications, the script is also used to write the Sasak language, used in the neighboring island of Lombok. The script is a descendant of the Brahmi script, and so has many similarities with the modern scripts of South and Southeast Asia. The Balinese script, along with the Javanese script, is considered the most elaborate and ornate among Brahmic scripts of Southeast Asia.Though everyday use of the script has largely been supplanted by the Latin alphabet, the Balinese script has significant prevalence in many of the island´s traditional ceremonies and is strongly associated with the Hindu religion. The script is mainly used today for copying lontar or palm leaf manuscripts containing religious texts.

U+01B80 Sundanese (Abugida, 64 codes from 1B80–1BBF. Language sundanese in ): Sundanese is a Unicode block containing modern characters for writing the Sundanese language of the island of Java. Sundanese script (Aksara Sunda) is a writing system which is used by the Sundanese people. It is built based on Old Sundanese script (Aksara Sunda Kuno) which was used by the ancient Sundanese between the 14th and 18th centuries.

U+01BC0 Batak (Abugida, 64 codes from 1BC0–1BFF. Language batak in ): Batak is a Unicode block containing characters for writing the Batak dialects of Karo, Mandailing, Pakpak, Simalungun, and Toba. The Batak script, natively known as surat Batak, surat na sapulu sia (the nineteen letters), or si-sia-sia, is an abugida used to write the Austronesian Batak languages spoken by several million people on the Indonesian island of Sumatra. The script may derived from the Kawi and Pallava script, ultimately derived from the Brahmi script of India, or from the hypothetical Proto-Sumatran script influenced by Pallava.

**Lepcha + Ol Chiki + Cyrillic Extended-C + Georgian Extended + Sundanese Supplement + Vedic Extensions + Phonetic Extensions + Phonetic Extensions Supplement + Combining Diacritical Marks Supplement** => 512 Zeichen utf-8 ab U+01C00 hex = 7168 dezimal Details
Nr 15	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01C00	ᰀ	ᰁ	ᰂ	ᰃ	ᰄ	ᰅ	ᰆ	ᰇ	ᰈ	ᰉ	ᰊ	ᰋ	ᰌ	ᰍ	ᰎ	ᰏ	ᰐ	ᰑ	ᰒ	ᰓ	ᰔ	ᰕ	ᰖ	ᰗ	ᰘ	ᰙ	ᰚ	ᰛ	ᰜ	ᰝ	ᰞ	ᰟ
01C20	ᰠ	ᰡ	ᰢ	ᰣ	ᰤ	ᰥ	ᰦ	ᰧ	ᰨ	ᰩ	ᰪ	ᰫ	ᰬ	ᰭ	ᰮ	ᰯ	ᰰ	ᰱ	ᰲ	ᰳ	ᰴ	ᰵ	ᰶ	᰷	᰸	᰹	᰺	᰻	᰼	᰽	᰾	᰿
01C40	᱀	᱁	᱂	᱃	᱄	᱅	᱆	᱇	᱈	᱉	᱊	᱋	᱌	ᱍ	ᱎ	ᱏ	᱐	᱑	᱒	᱓	᱔	᱕	᱖	᱗	᱘	᱙	ᱚ	ᱛ	ᱜ	ᱝ	ᱞ	ᱟ
01C60	ᱠ	ᱡ	ᱢ	ᱣ	ᱤ	ᱥ	ᱦ	ᱧ	ᱨ	ᱩ	ᱪ	ᱫ	ᱬ	ᱭ	ᱮ	ᱯ	ᱰ	ᱱ	ᱲ	ᱳ	ᱴ	ᱵ	ᱶ	ᱷ	ᱸ	ᱹ	ᱺ	ᱻ	ᱼ	ᱽ	᱾	᱿
01C80	ᲀ	ᲁ	ᲂ	ᲃ	ᲄ	ᲅ	ᲆ	ᲇ	ᲈ	Ᲊ	ᲊ	᲋	᲌	᲍	᲎	᲏	Ა	Ბ	Გ	Დ	Ე	Ვ	Ზ	Თ	Ი	Კ	Ლ	Მ	Ნ	Ო	Პ	Ჟ
01CA0	Რ	Ს	Ტ	Უ	Ფ	Ქ	Ღ	Ყ	Შ	Ჩ	Ც	Ძ	Წ	Ჭ	Ხ	Ჯ	Ჰ	Ჱ	Ჲ	Ჳ	Ჴ	Ჵ	Ჶ	Ჷ	Ჸ	Ჹ	Ჺ	᲻	᲼	Ჽ	Ჾ	Ჿ
01CC0	᳀	᳁	᳂	᳃	᳄	᳅	᳆	᳇	᳈	᳉	᳊	᳋	᳌	᳍	᳎	᳏	᳐	᳑	᳒	᳓	᳔	᳕	᳖	᳗	᳘	᳙	᳚	᳛	᳜	᳝	᳞	᳟
01CE0	᳠	᳡	᳢	᳣	᳤	᳥	᳦	᳧	᳨	ᳩ	ᳪ	ᳫ	ᳬ	᳭	ᳮ	ᳯ	ᳰ	ᳱ	ᳲ	ᳳ	᳴	ᳵ	ᳶ	᳷	᳸	᳹	ᳺ	᳻	᳼	᳽	᳾	᳿
15.2
01D00	ᴀ	ᴁ	ᴂ	ᴃ	ᴄ	ᴅ	ᴆ	ᴇ	ᴈ	ᴉ	ᴊ	ᴋ	ᴌ	ᴍ	ᴎ	ᴏ	ᴐ	ᴑ	ᴒ	ᴓ	ᴔ	ᴕ	ᴖ	ᴗ	ᴘ	ᴙ	ᴚ	ᴛ	ᴜ	ᴝ	ᴞ	ᴟ
01D20	ᴠ	ᴡ	ᴢ	ᴣ	ᴤ	ᴥ	ᴦ	ᴧ	ᴨ	ᴩ	ᴪ	ᴫ	ᴬ	ᴭ	ᴮ	ᴯ	ᴰ	ᴱ	ᴲ	ᴳ	ᴴ	ᴵ	ᴶ	ᴷ	ᴸ	ᴹ	ᴺ	ᴻ	ᴼ	ᴽ	ᴾ	ᴿ
01D40	ᵀ	ᵁ	ᵂ	ᵃ	ᵄ	ᵅ	ᵆ	ᵇ	ᵈ	ᵉ	ᵊ	ᵋ	ᵌ	ᵍ	ᵎ	ᵏ	ᵐ	ᵑ	ᵒ	ᵓ	ᵔ	ᵕ	ᵖ	ᵗ	ᵘ	ᵙ	ᵚ	ᵛ	ᵜ	ᵝ	ᵞ	ᵟ
01D60	ᵠ	ᵡ	ᵢ	ᵣ	ᵤ	ᵥ	ᵦ	ᵧ	ᵨ	ᵩ	ᵪ	ᵫ	ᵬ	ᵭ	ᵮ	ᵯ	ᵰ	ᵱ	ᵲ	ᵳ	ᵴ	ᵵ	ᵶ	ᵷ	ᵸ	ᵹ	ᵺ	ᵻ	ᵼ	ᵽ	ᵾ	ᵿ
01D80	ᶀ	ᶁ	ᶂ	ᶃ	ᶄ	ᶅ	ᶆ	ᶇ	ᶈ	ᶉ	ᶊ	ᶋ	ᶌ	ᶍ	ᶎ	ᶏ	ᶐ	ᶑ	ᶒ	ᶓ	ᶔ	ᶕ	ᶖ	ᶗ	ᶘ	ᶙ	ᶚ	ᶛ	ᶜ	ᶝ	ᶞ	ᶟ
01DA0	ᶠ	ᶡ	ᶢ	ᶣ	ᶤ	ᶥ	ᶦ	ᶧ	ᶨ	ᶩ	ᶪ	ᶫ	ᶬ	ᶭ	ᶮ	ᶯ	ᶰ	ᶱ	ᶲ	ᶳ	ᶴ	ᶵ	ᶶ	ᶷ	ᶸ	ᶹ	ᶺ	ᶻ	ᶼ	ᶽ	ᶾ	ᶿ
01DC0	᷀	᷁	᷂	᷃	᷄	᷅	᷆	᷇	᷈	᷉	᷊	᷋	᷌	᷍	᷎	᷏	᷐	᷑	᷒	ᷓ	ᷔ	ᷕ	ᷖ	ᷗ	ᷘ	ᷙ	ᷚ	ᷛ	ᷜ	ᷝ	ᷞ	ᷟ
01DE0	ᷠ	ᷡ	ᷢ	ᷣ	ᷤ	ᷥ	ᷦ	ᷧ	ᷨ	ᷩ	ᷪ	ᷫ	ᷬ	ᷭ	ᷮ	ᷯ	ᷰ	ᷱ	ᷲ	ᷳ	ᷴ	᷵	᷶	᷷	᷸	᷹	᷺	᷻	᷼	᷽	᷾	᷿

Englische Dokumentation

U+01C00 Lepcha (Abugida, 80 codes from 1C00–1C4F. Language lepcha in ): Lepcha (also known as Rong or Rong-Ring) is a Unicode block containing characters for writing the Lepcha language of Sikkim and West Bengal, India. The specific feature of this alphabet is that it´s an abugida (consonant alphabet, where the vowels depend a lot on the consonants). However, it´s a bit unusual for abugidas that the syllabic endings of the words in Lepcha are written with the diacritical marks. The Lepcha script comes from the Tibetan script, possibly with some influence from the . The tradition suggests that the script was devised at the beginning of the 18th century by prince Chakdor Namgyal of the Namgyal dynasty of Sikkim, or by scholar Thikúng Men Salóng in the 17th century. The early Lecha manuscripts were written vertically, apparently, under Chinese influence. However, later they switched to horizontal writing, but the letters kept their orientation, rotated 90 degrees from their Tibetan prototypes. This was reflected in the unusual way of writing final consonants. As for the rotation, it must have happened in the XVIII century. The Lepcha script used to be in action long before now, and a lot of books were published in this writing. At the end of XIX and the beginning of XX centuries it was really blooming, but for a short period of time. It is believed that nowadays it´s no longer in use.

U+01C50 Ol Chiki (Alphabet, 48 codes from 1C50–1C7F. Language santali in ): Ol Chiki is a Unicode block containing the Ol Chiki symbols. It was also called the Ol Cemet´ (Santali: ol ´writing´, cemet ´ ´learning´) script, and it was used for writing the Santali language during the early 20th century. It was created by Raghunath Murmu in 1925. Santali used to be written with the Latin alphabet. However, since Santali is not an Indo-Aryan language (like most other languages in the south of India), Indic scripts did not have letters for all of Santali´s phonemes, especially its stop consonants and vowels, which made it difficult to write the language accurately in an unmodified Indic script. The detailed analysis was given by Dr. Byomkes Chakrabarti in his ´Comparative Study of Santali and Bengali´. Missionaries (first of all Paul Olaf Bodding, a Norwegian) brought the Latin script, which is better at representing Santali stops, phonemes and nasal sounds with the use of diacritical marks and accents. Unlike most Indic scripts, which have been derived from Brahmi, Ol Chiki is not an abugida. When other Indic scripts were driven by consonants, the Ol Chiki vowels are given equal representation with consonants. In addition, it was designed specifically for the language, but it was impossible to aassign one letter to each phoneme because the sixth vowel in Ol Chiki was still problematic. Ol Chiki has 30 letters, the forms of which are intended to repeat the natural shapes of the letters. This was confirmed by the linguist Norman Zide, who said “The shapes of the letters are not arbitrary, but reflect the names for the letters, which are words, usually the names of objects or actions representing conventionalized form in the pictorial shape of the characters.” It is written from left to right.

U+01C80 Cyrillic Extended-C (Alphabet, 16 codes from 1C80–1C8F. Language in ): This set represents additional letters included in the earliest cyrillic alphabet — Old Slavic. It was created in the first Bulgarian kingdom in the 9th century and it contained 48 letters. Later it was used for the Church Slavonic language.

U+01C90 Georgian Extended (, 48 codes from 1C90–1CBF. Language in ): Georgian script Mkhedruli doesn´t have capital letters. The sentences in Mkhedruli usually start with the lowercase. The capital variants of the Georgian letters make up a separate font Mtavruli, which you can see in this Unicode block. It is used to write text in uppercase or highlight an important word.

U+01CC0 Sundanese Supplement (Abugida, 16 codes from 1CC0–1CCF. Language sundanese in ): Sundanese Supplement is a Unicode block containing punctuation characters for Sundanese.

U+01CD0 Vedic Extensions (, 48 codes from 1CD0–1CFF. Language in ): Vedic Extensions is a Unicode block containing characters for representing tones and other vedic symbols in Devanagari and other Indic scripts.

U+01D00 Phonetic Extensions (, 128 codes from 1D00–1D7F. Language in ): Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.

U+01D80 Phonetic Extensions Supplement (, 64 codes from 1D80–1DBF. Language in ): Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet.

U+01DC0 Combining Diacritical Marks Supplement (, 64 codes from 1DC0–1DFF. Language in ): Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet and Medievalist notations. It is an extension of the diacritic characters found in the Combining Diacritical Marks0300–036F block. They are mostly applied in consonant and syllabic systems not as independent characters, but rather as additional or supplemental signs which change or make the meaning more clear. Sometimes diacritical signs are required to be smaller than the letters. As for the synonymous names, they include the following: glyphs, accents (which is more narrow in terms of meaning and context), the already mentioned diacritics (which is a professional term that linguists use a lot). Needless to say, a system of diacritics that refers to some script or text is also called a diacritic. You might be wondering, how many diacritics can be used with one letter? Sometimes one letter may have more than two diacritics at the same time. Just like in the following examples: ặ, ṩ, ᶑ. The vocal symbols in alphabets like Hebrew, Arabic, and Syriac can be often confused with diacritis due to their similar appearance. However, they mostly act as a special type of letters, so they carry different functions. When do we use diacritics? Diacritics come in handy if the letters in an alphabet are not enough to express some sounds or meanings. The main alternatives for diacritics are various combinations of two letters (digraphs), three letters or more that convey one sound. For instance, the sound /sh/ is a digraph in English as it is in French /ch/, whereas in German it will be a trigraph /sch/. Are there languages that convey this sound with one letter? Yes, sure, it´s clearly reflected in Czech /š/. Plus, in this case we´re dealing with a diacritic, which plays the role of this pronunciation facilitator. Diacritics are used both with consonant and vowel letters. The key drawback of diacritics is that they fill the writing with tiny little details, which are extremely important, and if you forget or skip one, it can lead to serious mistakes and consequences. However, we know a lot of languages which don´t use diacritics at all (English) or just a little (Russian). In some cases there´s a tendency of replacing diacritical letters with digraphs. The German sound /ö/ becomes /ое/ in the textual versions, but since the introduction of umlaut, this phenomenon is almost out of use.

**Latin Extended Additional + Greek Extended** => 512 Zeichen utf-8 ab U+01E00 hex = 7680 dezimal Details
Nr 16	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F
01E00	Ḁ	ḁ	Ḃ	ḃ	Ḅ	ḅ	Ḇ	ḇ	Ḉ	ḉ	Ḋ	ḋ	Ḍ	ḍ	Ḏ	ḏ	Ḑ	ḑ	Ḓ	ḓ	Ḕ	ḕ	Ḗ	ḗ	Ḙ	ḙ	Ḛ	ḛ	Ḝ	ḝ	Ḟ	ḟ
01E20	Ḡ	ḡ	Ḣ	ḣ	Ḥ	ḥ	Ḧ	ḧ	Ḩ	ḩ	Ḫ	ḫ	Ḭ	ḭ	Ḯ	ḯ	Ḱ	ḱ	Ḳ	ḳ	Ḵ	ḵ	Ḷ	ḷ	Ḹ	ḹ	Ḻ	ḻ	Ḽ	ḽ	Ḿ	ḿ
01E40	Ṁ	ṁ	Ṃ	ṃ	Ṅ	ṅ	Ṇ	ṇ	Ṉ	ṉ	Ṋ	ṋ	Ṍ	ṍ	Ṏ	ṏ	Ṑ	ṑ	Ṓ	ṓ	Ṕ	ṕ	Ṗ	ṗ	Ṙ	ṙ	Ṛ	ṛ	Ṝ	ṝ	Ṟ	ṟ
01E60	Ṡ	ṡ	Ṣ	ṣ	Ṥ	ṥ	Ṧ	ṧ	Ṩ	ṩ	Ṫ	ṫ	Ṭ	ṭ	Ṯ	ṯ	Ṱ	ṱ	Ṳ	ṳ	Ṵ	ṵ	Ṷ	ṷ	Ṹ	ṹ	Ṻ	ṻ	Ṽ	ṽ	Ṿ	ṿ
01E80	Ẁ	ẁ	Ẃ	ẃ	Ẅ	ẅ	Ẇ	ẇ	Ẉ	ẉ	Ẋ	ẋ	Ẍ	ẍ	Ẏ	ẏ	Ẑ	ẑ	Ẓ	ẓ	Ẕ	ẕ	ẖ	ẗ	ẘ	ẙ	ẚ	ẛ	ẜ	ẝ	ẞ	ẟ
01EA0	Ạ	ạ	Ả	ả	Ấ	ấ	Ầ	ầ	Ẩ	ẩ	Ẫ	ẫ	Ậ	ậ	Ắ	ắ	Ằ	ằ	Ẳ	ẳ	Ẵ	ẵ	Ặ	ặ	Ẹ	ẹ	Ẻ	ẻ	Ẽ	ẽ	Ế	ế
01EC0	Ề	ề	Ể	ể	Ễ	ễ	Ệ	ệ	Ỉ	ỉ	Ị	ị	Ọ	ọ	Ỏ	ỏ	Ố	ố	Ồ	ồ	Ổ	ổ	Ỗ	ỗ	Ộ	ộ	Ớ	ớ	Ờ	ờ	Ở	ở
01EE0	Ỡ	ỡ	Ợ	ợ	Ụ	ụ	Ủ	ủ	Ứ	ứ	Ừ	ừ	Ử	ử	Ữ	ữ	Ự	ự	Ỳ	ỳ	Ỵ	ỵ	Ỷ	ỷ	Ỹ	ỹ	Ỻ	ỻ	Ỽ	ỽ	Ỿ	ỿ
16.2
01F00	ἀ	ἁ	ἂ	ἃ	ἄ	ἅ	ἆ	ἇ	Ἀ	Ἁ	Ἂ	Ἃ	Ἄ	Ἅ	Ἆ	Ἇ	ἐ	ἑ	ἒ	ἓ	ἔ	ἕ	἖	἗	Ἐ	Ἑ	Ἒ	Ἓ	Ἔ	Ἕ	἞	἟
01F20	ἠ	ἡ	ἢ	ἣ	ἤ	ἥ	ἦ	ἧ	Ἠ	Ἡ	Ἢ	Ἣ	Ἤ	Ἥ	Ἦ	Ἧ	ἰ	ἱ	ἲ	ἳ	ἴ	ἵ	ἶ	ἷ	Ἰ	Ἱ	Ἲ	Ἳ	Ἴ	Ἵ	Ἶ	Ἷ
01F40	ὀ	ὁ	ὂ	ὃ	ὄ	ὅ	὆	὇	Ὀ	Ὁ	Ὂ	Ὃ	Ὄ	Ὅ	὎	὏	ὐ	ὑ	ὒ	ὓ	ὔ	ὕ	ὖ	ὗ	὘	Ὑ	὚	Ὓ	὜	Ὕ	὞	Ὗ
01F60	ὠ	ὡ	ὢ	ὣ	ὤ	ὥ	ὦ	ὧ	Ὠ	Ὡ	Ὢ	Ὣ	Ὤ	Ὥ	Ὦ	Ὧ	ὰ	ά	ὲ	έ	ὴ	ή	ὶ	ί	ὸ	ό	ὺ	ύ	ὼ	ώ	὾	὿
01F80	ᾀ	ᾁ	ᾂ	ᾃ	ᾄ	ᾅ	ᾆ	ᾇ	ᾈ	ᾉ	ᾊ	ᾋ	ᾌ	ᾍ	ᾎ	ᾏ	ᾐ	ᾑ	ᾒ	ᾓ	ᾔ	ᾕ	ᾖ	ᾗ	ᾘ	ᾙ	ᾚ	ᾛ	ᾜ	ᾝ	ᾞ	ᾟ
01FA0	ᾠ	ᾡ	ᾢ	ᾣ	ᾤ	ᾥ	ᾦ	ᾧ	ᾨ	ᾩ	ᾪ	ᾫ	ᾬ	ᾭ	ᾮ	ᾯ	ᾰ	ᾱ	ᾲ	ᾳ	ᾴ	᾵	ᾶ	ᾷ	Ᾰ	Ᾱ	Ὰ	Ά	ᾼ	᾽	ι	᾿
01FC0	῀	῁	ῂ	ῃ	ῄ	῅	ῆ	ῇ	Ὲ	Έ	Ὴ	Ή	ῌ	῍	῎	῏	ῐ	ῑ	ῒ	ΐ	῔	῕	ῖ	ῗ	Ῐ	Ῑ	Ὶ	Ί	῜	῝	῞	῟
01FE0	ῠ	ῡ	ῢ	ΰ	ῤ	ῥ	ῦ	ῧ	Ῠ	Ῡ	Ὺ	Ύ	Ῥ	῭	΅	`	῰	῱	ῲ	ῳ	ῴ	῵	ῶ	ῷ	Ὸ	Ό	Ὼ	Ώ	ῼ	´	῾	῿

Englische Dokumentation

U+01E00 Latin Extended Additional (, 256 codes from 1E00–1EFF. Language in ): Latin Extended Additional is a block of the Unicode standard. The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. There are also a few Medievalist characters.

U+01F00 Greek Extended (, 256 codes from 1F00–1FFF. Language in ): Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters can be found in the Greek and Coptic (Unicode block). Greek Extended was encoded in version 1.1 of the Unicode Standard as is, having had no additions up to 6.2. As an alternative to Greek Extended, combining characters can be used to represent the tones and breath marks of polytonic Greek.