1: Text | 2: Symbol | 3: Asien | 4: Asien | 5: Asien | 6: Yi, Vai | 7: Hangul | 8: Privat | 9: Ägäisch | 10: Keil | 11: Anatol | 12: Bamum | 13: Tangut | 14: Kana | 15: Symbol | 16: Picto | 17: CJK | 18: CJK | 19: CJK | 20: CJK | 21: CJK | 22: CJK | 23: CJK | 24: CJK | 113:Tags | Details.
U+00000 Control character (32 codes): Part of The Basic Latin (or C0 Controls and Basic Latin) Unicode block.
U+00020 Basic Latin (alphabet over 96 codes. Language english, german, french, italian, polish in England, USA, Germany, France, Italy, Poland): The Basic Latin (or C0 Controls and Basic Latin) Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. The Basic Latin block was included in its present from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire. The classical Latin alphabet, also known as the Roman alphabet, is a writing system that evolved from the visually similar Cumaean Greek version of the Greek alphabet. The Greek alphabet, including the Cumaean version, descended from the Phoenician abjad. The Etruscans who ruled early Rome adopted and modified the Cumaean Greek alphabet. The Etruscan alphabet was in turn adopted and further modified by the ancient Romans to write the Latin language. During the Middle Ages scribes adapted the Latin alphabet for writing Romance languages, direct descendants of Latin, as well as Celtic, Germanic, Baltic, and some Slavic languages. With the age of colonialism and Christian evangelism, the Latin script spread beyond Europe, coming into use for writing indigenous American, Australian, Austronesian, Austroasiatic, and African languages. More recently, linguists have also tended to prefer the Latin script or the International Phonetic Alphabet (itself largely based on Latin script) when transcribing or creating written standards for non-European languages, such as the African reference alphabet. The term Latin alphabet may refer to either the alphabet used to write Latin (as described in this article), or other alphabets based on the Latin script, which is the basic set of letters common to the various alphabets descended from the classical Latin one, such as the English alphabet. These Latin alphabets may discard letters, like the Rotokas alphabet, or add new letters, like the Danish and Norwegian alphabets. Letter shapes have evolved over the centuries, including the creation for Medieval Latin of lower-case forms which did not exist in the Classical period.
U+00080 Latin-1 Supplement (128 codes): The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) — FF (U+00FF). Controls C1 (0080–009F) are not graphic. The C1 Controls and Latin-1 Supplement block has been included in its present form, with the same character repertoire since version 1.0 of the Unicode Standard, where it was known as Latin 1
U+00100 Latin Extended-A (alphabet over 128 codes. Language celtic, sami, maltese, turkish in Scotland, Wales, Ireland, Norway, Finland, Sweden, Malta, Turkey): Latin Extended-A is a block of the Unicode Standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characters from the ISO 6937 standard. The Latin Extended-A block has been in the Unicode Standard since version 1.0, with its entire character repertoire, except for the Latin Small Letter Long S, which was added during unification with ISO 10646 in version 1.1
U+00180 Latin Extended-B (alphabet over 208 codes. Language slovenian, croatian in Slovenia, Croatia, Rumania, Libya): Latin Extended-B is a block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points U+0180..U+01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block was expanded, and another 65 characters were added. In version 3.0, the last thirty available code points in the block were assigned. The Latin Extended-B block contains ten subheadings for groups of characters: Non-European and historic Latin, African letters for clicks, Croatian digraphs matching Serbian Cyrillic letters, Pinyin diacritic-vowel combinations, Phonetic and historic letters, Additions for Slovenian and Croatian, Additions for Romanian, Miscellaneous additions, Additions for Livonian, and Additions for Sinology. The Non-European and historic, African clicks, Croatian digraphs, Pinyin, and the first part of the Phonetic and historic letters were present in Unicode 1.0; additional Phonetic and historic letters were added for version 3.0; and other Phonetic and historic, as well as the rest of the sub-blocks were the characters added for version 1.1.
U+00200 Latin Extended-B (alphabet over 208 codes. Language slovenian, croatian in Slovenia, Croatia, Rumania, Libya): Latin Extended-B is a block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points U+0180..U+01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block was expanded, and another 65 characters were added. In version 3.0, the last thirty available code points in the block were assigned. The Latin Extended-B block contains ten subheadings for groups of characters: Non-European and historic Latin, African letters for clicks, Croatian digraphs matching Serbian Cyrillic letters, Pinyin diacritic-vowel combinations, Phonetic and historic letters, Additions for Slovenian and Croatian, Additions for Romanian, Miscellaneous additions, Additions for Livonian, and Additions for Sinology. The Non-European and historic, African clicks, Croatian digraphs, Pinyin, and the first part of the Phonetic and historic letters were present in Unicode 1.0; additional Phonetic and historic letters were added for version 3.0; and other Phonetic and historic, as well as the rest of the sub-blocks were the characters added for version 1.1.
U+00250 IPA Extensions (alphabet over 96 codes): IPA Extensions is a block (0250–02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. With IPA´s ability to use Unicode for the presentation of phonetic symbols, ASCII-based systems such as X-SAMPA or Kirshenbaum are being supplanted. Within the Unicode blocks there are also a few former IPA characters no longer in international use by linguists. The IPA Extensions block has been present in Unicode since version 1.0, and was unchanged through the unification with ISO 10646. The block was filled out with extensions for representing disordered speech in version 3.0, and Sinology phonetic symbols in version 4.0 The International Phonetic Alphabet (unofficially—though commonly—abbreviated IPA) is an alphabetic system of phonetic notation based primarily on the Latin alphabet. It was devised by the International Phonetic Association as a standardized representation of the sounds of oral language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators.The IPA is designed to represent only those qualities of speech that are part of oral language: phones, phonemes, intonation, and the separation of words and syllables. To represent additional qualities of speech, such as tooth gnashing, lisping, and sounds made with a cleft palate, an extended set of symbols called the Extensions to the IPA may be used.IPA symbols are composed of one or more elements of two basic types, letters and diacritics. For example, the sound of the English letter t may be transcribed in IPA with a single letter, , or with a letter plus diacritics, , depending on how precise one wishes to be. Often, slashes are used to signal broad or phonemic transcription; thus, /t/ is less specific than, and could refer to, either or , depending on the context and language.Occasionally letters or diacritics are added, removed, or modified by the International Phonetic Association. As of the most recent change in 2005, there are 107 letters, 52 diacritics, and four prosodic marks in the IPA. These are shown in the current IPA chart, posted below in this article and at the website of the IPA.
U+002B0 Spacing Modifier Letters (alphabet over 80 codes): Spacing Modifier letters is a Unicode block containing characters for the IPA (International Phonetic Alphabet), UPA (Uralic Phonetic Alphabet or Finno-Ugric transcription system), and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization.
U+00300 Combining Diacritical Marks (alphabet over 112 codes): Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the Combining Grapheme Joiner, which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. A diacritic /daɪ.əˈkrɪtɨk/ – also diacritical mark, diacritical point, or diacritical sign – is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός (diakritikós, «distinguishing», from ancient Greek διά (diá, through) and κρίνω (krínein, to separate)). Diacritic is primarily an adjective, though sometimes used as a noun, whereas diacritical is only ever an adjective. Some diacritical marks, such as the acute (´) and grave (`), are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters. The main use of diacritical marks in the Latin script is to change the sound-values of the letters to which they are added. Examples from English are the diaereses in naïve and Noël, which show that the vowel with the diaeresis mark is pronounced separately from the preceding vowel; the acute and grave accents, which can indicate that a final vowel is to be pronounced, as in saké and poetic breathèd; and the cedilla under the «c» in the borrowed French word façade, which shows it is pronounced /s/ rather than /k/. In other Latin alphabets, they may distinguish between homonyms, such as the French là (»there») versus la (»the»), which are both pronounced . In Gaelic type, a dot over a consonant indicates lenition of the consonant in question. In other alphabetic systems, diacritical marks may perform other functions. Vowel pointing systems, namely the Arabic harakat ( ـَ, ـُ, ـُ, etc.) and the Hebrew niqqud ( ַ, ֶ, ִ, ֹ , ֻ, etc.) systems, indicate sounds (vowels and tones) that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ـْـ ) mark the absence of a vowel. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo ( ◌҃ ) and the Hebrew gershayim ( ״ ), which, respectively, mark abbreviations or acronyms, and Greek diacritical marks, which showed that letters of the alphabet were being used as numerals. In the Hanyu Pinyin official romanization system for Chinese, diacritics are used to mark the tones of the syllables in which the marked vowels occur. In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language, and may vary from case to case within a language. In some cases, letters are used as «in-line diacritics» in place of ancillary glyphs, because they modify the sound of the letter preceding them, as in the case of the «h» in English «sh» and «th».
U+00370 Greek Coptic (alphabet over 144 codes. Language greek, coptic in Greece): Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally used for writing Coptic, using the similar Greek letters, in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block. The Greek alphabet is the script that has been used to write the Greek language since the 8th century BC. It was derived from the earlier Phoenician alphabet, and was the first alphabetic script to have distinct letters for vowels as well as consonants. As such, it became the ancestor of numerous other European and Middle Eastern alphabets, including Latin and Cyrillic. Apart from its use in writing the Greek language, both in its ancient and its modern forms, the Greek alphabet today also serves as a source of technical symbols and labels in many domains of mathematics, science and other fields. In its classical and modern forms, the alphabet has 24 letters, ordered from alpha to omega. Like and Cyrillic, Greek originally had only a single form of each letter; it developed the letter case distinction between upper-case and lower-case forms in parallel with Latin during the modern era. Sound values and conventional transcriptions for some of the letters differ between Ancient Greek and Modern Greek usage, owing to phonological changes in the language. In traditional (»polytonic») Greek orthography, vowel letters can be combined with several diacritics, including accent marks, so-called «breathing» marks, and the iota subscript. In common present-day usage for Modern Greek since the 1980s, this system has been simplified to a so-called «monotonic» convention The Coptic alphabet is the script used for writing the Coptic language. The repertoire of glyphs is based on the Greek alphabet augmented by letters borrowed from the Egyptian Demotic and is the first alphabetic script used for the Egyptian language. There are several alphabets, as the Coptic writing system may vary greatly among the various dialects and subdialects of the Coptic language.
U+00400 Cyrillic (alphabet over 256 codes. Language russian, ukrainian, bulgarian in Russia, Ukraine, Bulgaria, Serbia, Macedonia, Moldova): Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies. The Cyrillic script /sɨˈrɪlɪk/ is an alphabetic writing system employed across Eastern Europe, North and Central Asian countries. It is based on the Early Cyrillic, which was developed in the First Bulgarian Empire during the 9th century AD at the Preslav Literary School. It is the basis of alphabets used in various languages, past and present, in parts of Southeastern Europe and Northern Eurasia, especially those of Slavic origin, and non-Slavic languages influenced by Russian. As of 2011, around 252 million people in Eurasia use it as the official alphabet for their national languages. About half of them are in Russia. Cyrillic is one of the most used writing systems in the world.Cyrillic is derived from the Greek uncial script, augmented by letters from the older Glagolitic alphabet, including some ligatures. These additional letters were used for sounds not found in Greek. The script is named in honor of the two Byzantine brothers, Saints Cyril and Methodius, who created the Glagolitic alphabet earlier on. Modern scholars believe that Cyrillic was developed and formalized by early disciples of Cyril and Methodius.With the accession of Bulgaria to the European Union on 1 January 2007, Cyrillic became the third official script of the European Union, following the Latin and Greek scripts.
U+00500 Cyrillic Supplement (alphabet over 48 codes. Language komi in Russia): Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut, Azerbaijani, and Jakovlev´s Chuvash orthography.
U+00530 Armenian (alphabet over 96 codes. Language armenian in Armenia): Armenian is a Unicode block containing characters for writing the Armenian language, both the traditional Western Armenian and reformed Eastern Armenian orthographies. Five Armenian ligatures are encoded in the Alphabetic Presentation Forms block. The Armenian language (classical: հայերէն; reformed: հայերեն hayeren) is an Indo-European language spoken by the Armenians. It is the official language of the Republic of Armenia and the self-proclaimed Nagorno-Karabakh Republic. It has historically been spoken throughout the Armenian Highlands and today is widely spoken in the Armenian diaspora. Armenian has its own unique script, the Armenian alphabet, invented in 405 AD by Mesrop Mashtots.Linguists classify Armenian as an independent branch of the Indo-European language family. It is of interest to linguists for its distinctive phonological developments within the Indo-European languages. Armenian shares a number of major innovations with Greek, and some linguists group these two languages with Phrygian and the Indo-Iranian family into a higher-level subgroup of Indo-European, which is defined by such shared innovations as the augment. More recently, others have proposed a Balkan grouping including Greek, Phrygian, Armenian, and Albanian.Armenia was a monolingual country no later than by the second century BC. Its language has long literary history, with a fifth-century Bible translation as its oldest surviving text. There are two standardized modern literary forms, Eastern Armenian and Western Armenian, with which most contemporary dialects are mutually intelligible.
U+00590 Hebrew (alphabet over 112 codes. Language hebrew, yiddish in Israel): Hebrew is a Unicode block containing characters for writing the Hebrew, Yiddish, Ladino, and other Jewish diaspora languages. Hebrew is a West Semitic language of the Afroasiatic language family. Historically, it is regarded as the language of the Hebrew Israelites and their ancestors, although the language was not referred to by the name Hebrew in the Tanakh. The earliest examples of written Paleo-Hebrew date from the 10th century BCE, in the form of primitive drawings, although «the question of the language used in this inscription remained unanswered, making it impossible to prove whether it was in fact Hebrew or another local language».Hebrew had ceased to be an everyday spoken language somewhere between 200 and 400 CE, declining since the aftermath of the Bar Kochba War. Aramaic and to a lesser extent Greek were already in use as international languages, especially among elites and immigrants. It survived into the medieval period as the language of Jewish liturgy, rabbinic literature, intra-Jewish commerce, and poetry. Then, in the 19th century, it was revived as a spoken and literary language, and, according to Ethnologue, is now the language of 9 million people worldwide, of whom 7 million are from Israel. The United States has the second largest Hebrew speaking population, with about 221,593 fluent speakers, mostly from Israel.Modern Hebrew is one of the two official languages of Israel (the other being Arabic), while pre-modern Hebrew is used for prayer or study in Jewish communities around the world today. Ancient Hebrew is also the liturgical tongue of the Samaritans, while modern Hebrew or Arabic is their vernacular. As a foreign language, it is studied mostly by Jews and students of Judaism and Israel, and by archaeologists and linguists specializing in the Middle East and its civilizations, as well as by theologians in Christian seminaries.The Torah (the first five books), and most of the rest of the Hebrew Bible, is written in Biblical Hebrew, with much of its present form specifically in the dialect that scholars believe flourished around the 6th century BCE, around the time of the Babylonian exile. For this reason, Hebrew has been referred to by Jews as Leshon HaKodesh (לשון הקדש), «The Holy Language», since ancient times.
U+00600 Arabic (alphabet over 256 codes. Language arabic, persian, kurd in Algeria, Bahrain, Egypt, Iraq, Iran, Kuwait, Afghanistan, Pakistan, India, Fiji Islands): Arabic is a Unicode block, containing the standard letters and the most common diacritics of the Arabic script, and the Arabic-Indic digits. The Arabic script is a writing system used for writing several languages of Asia and Africa, such as Arabic, the Sorani and Luri dialects of Kurdish, Persian, Pashto, and Urdu. Even until the 16th century, it was used to write some texts in Spanish. After the Latin script, Chinese characters, and Devanagari, it is the fourth-most widely used writing system in the world.The Arabic script is written from right to left in a cursive style. In most cases the letters transcribe consonants, or consonants and a few vowels, so most Arabic alphabets are abjads.The script was first used to write texts in Arabic, most notably the Qurʼān, the holy book of Islam. With the spread of Islam, it came to be used to write languages of many language families, leading to the addition of new letters and other symbols, with some versions, such as Kurdish, Uyghur, and old Bosnian being abugidas or true alphabets. It is also the basis for a rich tradition of Arabic calligraphy.
U+00750 Arabic Supplement (alphabet over 48 codes. Language arabic, persian, kurd in Algeria, Bahrain, Egypt, Iraq, Iran, Kuwait, Afghanistan, Pakistan, India, Fiji Islands): Arabic Supplement is a Unicode block that encodes Arabic letter variants mostly used for writing African (non-Arabic) languages.
U+00780 Thaana (alphabet over 64 codes. Language maldivian in Maldives): Thaana is a Unicode block containing characters for writing the Dhivehi and Arabic languages in the Maldives. Thaana, Taana or Tāna ( ތާނަ in Tāna script) is the modern writing system of the Maldivian language spoken in the Maldives. Thaana has characteristics of both an abugida (diacritic, vowel-killer strokes) and a true alphabet (all vowels are written), with consonants derived from indigenous and Arabic numerals, and vowels derived from the vowel diacritics of the Arabic abjad. Its orthography is largely phonemic. The Thaana script first appeared in a Maldivian document towards the beginning of the 18th century in a crude initial form known as Gabulhi Thaana which was written scripta continua. This early script slowly developed, its characters slanting 45 degrees, becoming more graceful and spaces were added between words. As time went by it gradually replaced the older Dhives Akuru alphabet. The oldest written sample of the Thaana script is found in the island of Kanditheemu in Northern Miladhunmadulu Atoll. It is inscribed on the door posts of the main Hukuru Miskiy (Friday mosque) of the island and dates back to 1008 AH (AD 1599) and 1020 AH (AD 1611) when the roof of the building were built and the renewed during the reigns of Ibrahim Kalaafaan (Sultan Ibrahim III) and Hussain Faamuladeyri Kilege (Sultan Hussain II) respectively.Thaana, like Arabic, is written right to left. It indicates vowels with diacritic marks derived from Arabic. Each letter must carry either a vowel or a sukun (which indicates «no vowel»). The only exception to this rule is nūnu which, when written without a diacritic, indicates prenasalization of a following stop.
U+007C0 NKo (alphabet over 64 codes. Language nko in Guinea, Mali): NKo is a Unicode block containing characters for the Manding languages of West Africa, including Bamanan, Jula, Maninka, Mandinka, and a common literary language, Kangbe, also called N´Ko. N´Ko is both a script devised by Solomana Kante in 1949, as a writing system for the Manding languages of West Africa, and the name of the literary language itself written in the script. The term N´Ko means I say in all Manding languages. The script has a few similarities to the Arabic script, notably its direction (right-to-left) and the connected letters. It obligatorily marks both tone and vowels.
U+00800 Samaritan (alphabet over 64 codes): Samaritan is a Unicode block containing characters used for writing Samaritan Hebrew and Aramaic. The Samaritan alphabet is used by the Samaritans for religious writings, including the Samaritan Pentateuch, writings in Samaritan Hebrew, and for commentaries and translations in Samaritan Aramaic and occasionally Arabic.Samaritan is a direct descendant of the Paleo-Hebrew alphabet, which was a variety of the Phoenician alphabet in which large parts of the Hebrew Bible were originally penned. All these scripts are believed to be descendants of the Proto-Sinaitic script. That script was used by the ancient Israelites, both Jews and Samaritans. The better-known «square script» Hebrew alphabet traditionally used by Jews is a stylized version of the Aramaic alphabet which they adopted from the Persian Empire (which in turn adopted it from the Arameans). After the fall of the Persian Empire, Judaism used both scripts before settling on the Aramaic form. For a limited time thereafter, the use of paleo-Hebrew (proto-Samaritan) among Jews was retained only to write the Tetragrammaton, but soon that custom was also abandoned.
U+00840 Mandaic (alphabet over 32 codes. Language mandaic): Mandaic is a Unicode block containing characters of the Mandaic script used for writing the historic Eastern Aramaic, also called Classical Mandaic, and the modern Neo-Mandaic language. The Mandaic alphabet is based on the Aramaic alphabet, and is used for writing the Mandaic language. The Mandaic name for the script is Abagada or Abaga, after the first letters of the alphabet. Rather than the ancient Semitic names for the letters (alaph, beth, gimal), the letters are known as â, bâ, gâ and so on.
U+00860 Syriac Supplement
U+008A0 Arabic Extended-A (alphabet over 96 codes): Arabic Extended-A is a Unicode block encoding Qur´anic annotations and letter variants used for various non-Arabic languages.
U+00900 Devanagari (abugida over 128 codes. Language sanskrit, hindi in India, Pakistan, Fiji Islands, Mauritius): Devanagari is a Unicode block containing characters for writing Hindi, Marathi, Sindhi, Nepali and Sanskrit. In its original incarnation, the code points U+0900..U+0954 were a direct copy of the characters A0-F4 from the 1988 ISCII standard. The Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings. Devanagari, also called Nagari, is an abugida alphabet of India and Nepal. It is written from left to right, does not have distinct letter cases, and is recognisable (along with most other North Indic scripts, with a few exceptions like Gujarati and Oriya) by a horizontal line that runs along the top of full letters. Since the 19th century, it has been the most commonly used script for writing Sanskrit. Devanagari is used to write Hindi, Nepali, Marathi, Konkani, Bodo and Maithili among other languages and dialects. It was formerly used to write Gujarati. Because it is the standardised script for the Hindi, Nepali, Marathi, Konkani and Bodo languages, Devanagari is one of the most used and adopted writing systems in the world.
U+00980 Bengali (abugida over 128 codes. Language bengali in India, Bangladesh, Butane): Bengali is a Unicode block containing characters for the Bengali, Assamese, Bishnupriya Manipuri, Daphla, Garo, Hallam, Khasi, Mizo, Munda, Naga, Rian, and Santali languages. In its original incarnation, the code points U+0981..U+09CD were a direct copy of the Bengali characters A1-ED from the 1988 ISCII standard, as well as several Assamese ISCII characters in the U+09F0 column. The Devanagari, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on ISCII encodings. The Bengali alphabet or Bangla alphabet is the writing system for the Bengali language and is the 6th most widely used writing system in the world. The script is shared by Assamese with minor variations, and is the basis for the other writing systems like Meithei and Bishnupriya Manipuri. Historically, the script has also been used to write the Sanskrit language in the region of Bengal.From a classificatory point of view, the Bengali script is an abugida, i.e. its vowel graphemes are mainly realized not as independent letters, but as diacritics attached to its consonant letters. It is written from left to right and lacks distinct letter cases. It is recognizable, as other Brahmic scripts, by a distinctive horizontal line running along the tops of the letters that links them together which is known as matra. The Bengali script is however less blocky and presents a more sinuous shape.
U+00A00 Gurmukhi (abugida over 128 codes. Language punjabi in India, Pakistan): Gurmukhi is a Unicode block containing characters for the Punjabi language, as it is written in India. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings. Gurmukhi is the most common script used for writing the Punjabi language in India. An abugida derived from the Laṇḍā script and ultimately descended from Brahmi, Gurmukhi was standardised by the second Sikh guru, Guru Angad, in the 16th century. The whole of the Guru Granth Sahib´s 1430 pages are written in this script. The name Gurmukhi is derived from the Old Punjabi term «gurumukhī», meaning «from the mouth of the Guru».Modern Gurmukhi has thirty-eight consonants (vianjan), nine vowel symbols (lāga mātrā), two symbols for nasal sounds (bindī and ṭippī), and one symbol which duplicates the sound of any consonant (addak). In addition, four conjuncts are used: three subjoined forms of the consonants Rara, Haha and Vava, and one half-form of Yayya. Use of the conjunct forms of Vava and Yayya is increasingly scarce in modern contexts.Gurmukhi is primarily used in the Punjab state of India where it is the sole official script for all official and judicial purposes. The script is also widely used in the Indian states of Haryana, Himachal Pradesh, Jammu and the national capital of Delhi, with Punjabi being one of the official languages in these states. Gurmukhi has been adapted to write other languages, such as Braj Bhasha, Khariboli (and other Hindustani dialects), Sanskrit and Sindhi.
U+00A80 Gujarati (abugida over 128 codes in India, Pakistan, Uganda, Tanzania, Kenya): Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings. The Gujarati script, which like all Nāgarī writing systems is strictly speaking an abugida rather than an alphabet, is used to write the Gujarati and Kutchi languages. It is a variant of Devanāgarī script differentiated by the loss of the characteristic horizontal line running above the letters and by a small number of modifications in the remaining characters.With a few additional characters, added for this purpose, the Gujarati script is also often used to write Sanskrit and Hindi.Gujarati numerical digits are also different from their Devanagari counterparts.
U+00B00 Oriya (abugida over 128 codes. Language oriya in India): Oriya is a Unicode block containing characters for the Oriya, Khondi, and Santali languages of Orissa state, India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Oriya characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Gurmukhi, Tamil, Telugu, Kannada and Malayalam blocks were similarly all based on their ISCII encodings. Oriya (oṛiā), officially spelled Odia, is an Indian language belonging to the Indo-Aryan branch of the Indo-European language family. It is the predominant language of the Indian state of Odisha, where native speakers comprise 80% of the population, and it is spoken in parts of West Bengal, Jharkhand, Chhattisgarh and Andhra Pradesh. Oriya is one of the many official languages in India; it is the official language of Odisha and the second official language of Jharkhand. Oriya is the sixth Indian language to be designated a Classical Language in India, on the basis of having a long literary history and not having borrowed extensively from other languages.
U+00B80 Tamil (abugida over 128 codes. Language tamil, sanskrit in India, Sri-Lanka, Singapore, Malaysia, Kenya): Tamil is a Unicode block containing characters for the Tamil, Badaga, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B02..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Gurmukhi, Oriya, Telugu, Kannada and Malayalam blocks were similarly all based on their ISCII encodings. The Tamil script (tamiḻ ariccuvaṭi) is an abugida script that is used by the Tamil people in India, Sri Lanka, Malaysia and elsewhere, to write the Tamil language, as well as to write the liturgical language Sanskrit, using consonants and diacritics not represented in the Tamil alphabet. Certain minority languages such as Saurashtra, Badaga, Irula, and Paniya are also written in the Tamil script.
U+00C00 Telugu (abugida over 128 codes. Language telugu in India): Telugu is a Unicode block containing characters for the Telugu, Gondi, and Lambadi languages of Andhra Pradesh, India. In its original incarnation, the code points U+0C01..U+0C4D were a direct copy of the Telugu characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Gurmukhi, Oriya, Tamil, Kannada and Malayalam blocks were similarly all based on their ISCII encodings. Telugu script, an abugida from the Brahmic family of scripts, is used to write the Telugu language, a language found in the South Indian states of Andhra Pradesh and Telangana as well as several other neighbouring states. It gained prominence during the Vengi Chalukyan era. It shares high similarity with its sibling Kannada script.
U+00C80 Kannada (abugida over 128 codes. Language dravidian in India, Goa): Kannada is a Unicode block containing characters for the Kannada and Tulu languages. In its original incarnation, the code points U+0C82..U+0CCD were a direct copy of the Kannada characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Gurmukhi, Oriya, Tamil, Telugu, and Malayalam blocks were similarly all based on their ISCII encodings. The Kannada alphabet is an abugida of the Brahmic family, used primarily to write the Kannada language, one of the Dravidian languages of southern India. Several minor languages, such as Tulu, Konkani, Kodava, and Beary, also use alphabets based on the Kannada script. The Kannada and Telugu scripts share high mutual intellegibility with each other, and are often considered to be regional variants of single script. Similarly, Goykanadi, a variant of Old Kannada, has been historically used to write Konkani in the state of Goa.
U+00D00 Malayalam (abugida over 128 codes. Language dravidian in India, Goa): Malayalam is a Unicode block containing characters for the Malayalam language. In its original incarnation, the code points U+0D02..U+0D4D were a direct copy of the Malayalam characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Gurmukhi, Oriya, Tamil, Telugu and Kannada blocks were similarly all based on their ISCII encodings. The Malayalam script (Malayāḷalipi; IPA: ), also known as Kairali script, is a Brahmic script used commonly to write the Malayalam language—which is the principal language of the Indian state of Kerala, spoken by 35 million people in the world. Like many other Indic scripts, it is an alphasyllabary (abugida), a writing system that is partially “alphabetic” and partially syllable-based. The modern Malayalam alphabet has 15 vowel letters, 41 consonant letters, and a few other symbols. The Malayalam script is a Vattezhuttu script, which had been extended with Grantha script symbols to represent Indo-Aryan loanwords. The script is also used to write several minority languages such as Paniya, Betta Kurumba, and Ravula. The Malayalam language itself was historically written in several different scripts.
U+00D80 Sinhala (abugida over 128 codes. Language sinhalese, sanskrit in Sri-Lanka, India): Sinhala is a Unicode block containing characters for the Sinhala and Pali languages of Sri Lanka, and is also used for writing Sanskrit in Sri Lanka. The Sinhala allocation is loosely based on the ISCII standard, except that Sinhala contains extra prenasalized consonant letters, leading to inconsistencies with other ISCII-Unicode script allocations. The Sinhalese alphabet is an abugida used by the Sinhala people in Sri Lanka and elsewhere to write the Sinhala language and also the liturgical languages Pali and Sanskrit. Being a member of the Brahmic family of scripts, the Sinhalese script can trace its ancestry back more than 2,000 years.Sinhalese is often considered two alphabets, or an alphabet within an alphabet, due to the presence of two sets of letters. The core set, known as the śuddha siṃhala or eḷu hōḍiya, can represent all native phonemes. In order to render Sanskrit and Pali words, an extended set, the miśra siṃhala, is available.
U+00E00 Thai (abugida over 128 codes. Language thai in Thailand, India): Thai is a Unicode block containing characters for the Thai, Lanna Tai, and Pali languages. It is based on the Thai Industrial Standards 620-2529 and 620-2533. Thai script (Thai: อักษรไทย; rtgs: akson thai; ʔàksɔ̌ːn tʰāj) is used to write the Thai language and other languages in Thailand. It has 44 consonant letters (Thai: พยัญชนะ, phayanchana), 15 vowel symbols (Thai: สระ, sara) that combine into at least 28 vowel forms, and four tone diacritics (Thai: วรรณยุกต์ or วรรณยุต, wannayuk or wannayut).Although commonly referred to as the «Thai alphabet», the character set is in fact not a true alphabet but an abugida, a writing system in which each consonant may invoke an inherent vowel sound. In the case of the Thai script this is an implied ´a´ or ´o´. Consonants are written horizontally from left to right, with vowels arranged above, below, to the left, or to the right of the corresponding consonant, or in a combination of positions.Thai has its own set of Thai numerals which are based on the Hindu Arabic numeral system (Thai: เลขไทย, lek thai), but the standard western Hindu-Arabic numerals (Thai: เลขฮินดูอารบิก, lek hindu arabik) are also commonly used.
U+00E80 Lao (abugida over 128 codes. Language laotian in Thailand, Laos): Lao is a Unicode block containing characters for the languages of Laos. The characters of the Lao block allocated to be equivalent to the similarly positioned characters of the Thai block immediately preceding it. The Lao alphabet, Akson Lao (Lao: ອັກສອນລາວ ʔáksɔ̌ːn láːw), is the main script used to write the Lao language and other minority languages in Laos. It is ultimately of Indic origin, the alphabet includes 27 consonants (ພະຍັນຊະນະ ), 7 consonantal ligatures (ພະຍັນຊະນະປະສົມ ), 33 vowels (ສະຫລະ ) (some based on combinations of symbols), and 4 tone marks (ວັນນະຍຸດ ). According to Article 89 of Amended Constitution of 2003 of the Lao People´s Democratic Republic, the Lao alphabet is the official script to the official language, but is also used to transcribe minority languages in the country, but some minority language speakers continue to use their traditional writing systems while the Hmong have adopted the Roman Alphabet. An older version of the script was also used by the ethnic Lao of Thailand´s Isan region, who make up a third of Thailand´s population, before Isan was incorporated into Siam, until its use was banned and supplemented with the very similar Thai alphabet in 1871, although the region remained distant culturally and politically until further government campaigns and integration into the Thai state (Thaification) were imposed in the 20th century. The letters of the Lao Alphabet are very similar to the Thai alphabet, which has the same roots. They differ in the fact, that in Thai there are still more letters to write one sound and the more circular style of writing in Lao.Lao, like most indic scripts, is traditionally written from left to right. Traditionally considered an abugida script, where certain ´implied´ vowels are unwritten, recent spelling reforms make this definition somewhat problematic, as all vowel sounds today are marked with diacritics when written according the Lao PDR´s propagated and promoted spelling standard. However most Lao outside of Laos, and many inside Laos, continue to write according to former spelling standards, which continues the use of the implied vowel maintaining the Lao script´s status as an abugida. Vowels can be written above, below, in front of, or behind consonants, with some vowel combinations written before, over and after. Spaces for separating words and punctuations were traditionally not used, but a space is used and functions in place of a comma or period. The letters have no majuscule or minuscule (upper and lower case) differentiations.
U+00F00 Tibetan (abugida over 256 codes. Language tibetan in China, India, Butane, Nepal, Pakistan): Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of Tibet, Bhutan, Nepal, and northern India. The Tibetan Unicode block is unique for having been allocated as a standard virama-based encoding for version 1.0, removed from the Unicode Standard when unifying with ISO 10646 for version 1.1, then reintroduced as an explicit root/subjoined encoding, with a larger block size in version 2.0. The Tibetan alphabet is an abugida of Indic origin used to write the Tibetan language as well as Dzongkha, the Sikkimese language, Ladakhi, and sometimes Balti. The printed form of the alphabet is called uchen script (Wylie: dbu-can; «with a head») while the hand-written cursive form used in everyday writing is called umê script (Wylie: dbu-med; «headless»).The alphabet is very closely linked to a broad ethnic Tibetan identity. Besides Tibet, it has also been used for Tibetan languages in Bhutan, India, Nepal, and Pakistan. The Tibetan alphabet is ancestral to the Limbu alphabet, the Lepcha alphabet, and the multilingual ´Phags-pa script.The Tibetan alphabet is romanized in a variety of ways. This article employs the Wylie transliteration system.
U+01000 Myanmar (abugida over 160 codes. Language burmese in Myanmar, Thailand, Bangladesh, Malaysia): The Burmese script (MLCTS: mranma akkha.ra; pronounced: ) is an abugida in the Brahmic family, used for writing Burmese. It is an adaptation of the Old Mon script or the Pyu script. In recent decades, other alphabets using the Mon script, including Shan and Mon itself, have been restructured according to the standard of the now-dominant Burmese alphabet. Besides the Burmese language, the Burmese alphabet is also used for the liturgical languages of Pali and Sanskrit.The characters are rounded in appearance because the traditional palm leaves used for writing on with a stylus would have been ripped by straight lines. It is written from left to right and requires no spaces between words, although modern writing usually contains spaces after each clause to enhance readability.The earliest evidence of the Burmese alphabet is dated to 1035, while a casting made in the 18th century of an old stone inscription points to 984. Burmese calligraphy originally followed a square format but the cursive format took hold from the 17th century when popular writing led to the wider use of palm leaves and folded paper known as parabaiks. The alphabet has undergone considerable modification to suit the evolving phonology of the Burmese language.There are several systems of transliteration into the Latin alphabet; for this article, the MLC Transcription System is used.
U+010A0 Georgian (alphabet over 96 codes. Language georgian, abkhazian in Georgia, Abkhazia): Georgian is a Unicode block containing the Mkhedruli and Asomtavruli Georgian characters used to write Modern Georgian, Svan, and Mingrelian languages. Another lower case, Nuskhuri, is encoded in a separate Georgian Supplement block, which is used with the Asomtavruli to write the ecclesiastical Khutsuri Georgian script. The Georgian scripts are the three writing systems used to write the Georgian language: Asomtavruli, Nuskhuri and Mkhedruli. Their letters are equivalent, sharing the same names and alphabetical order and all three are unicameral (make no distinction between upper and lower case). Although each continues to be used, Mkhedruli (see below) is taken as the standard for Georgian and its related Kartvelian languages.The scripts originally had 38 letters. Georgian is currently written in a 33-letter alphabet, as five of the letters are obsolete in that language. The Mingrelian alphabet uses 36: the 33 of Georgian, one letter obsolete for that language, and two additional letters specific to Mingrelian and Svan. That same obsolete letter, plus a letter borrowed from Greek, are used in the 35-letter Laz alphabet. The fourth Kartvelian language, Svan, is not commonly written, but when it is it uses the letters of the Mingrelian alphabet, with an additional obsolete Georgian letter and sometimes supplemented by diacritics for its many vowels.
U+01100 Hangul Jamo (abugida over 256 codes. Language korean in North Korea, South Korea, China, Japan, Indonesia): Hangul Jamo is a Unicode block containing positional (Choseong, Jungseong, and Jongseong) forms of the Hangul consonant and vowel clusters. They can be used to dynamically compose syllables that are not available as precomposed Hangul syllables in Unicode, specifically archaic syllables containing sounds that have since merged phonetically with other sounds in modern pronunciation.
U+01200 Ethiopic (abugida over 384 codes in Ethiopia, Eritrea, Somalia, Sudan, Israel): Ethiopic is a Unicode block containing characters for writing the Ge´ez, Tigrinya, Amharic, Tigre, and Oromo languages.Ge´ez (ግዕዝ Gəʿəz), (also known as Ethiopic) is a script used as an abugida (syllable alphabet) for several languages of Ethiopia and Eritrea. It originated as an abjad (consonant-only alphabet) and was first used to write Ge´ez, now the liturgical language of the Ethiopian Orthodox Tewahedo Church and the Eritrean Orthodox Tewahedo Church. In Amharic and Tigrinya, the script is often called fidäl (ፊደል), meaning «script» or «alphabet».The Ge´ez script has been adapted to write other, mostly Semitic, languages, particularly Amharic in Ethiopia, and Tigrinya in both Eritrea and Ethiopia. It is also used for Sebatbeit, Me´en, and most other languages of Ethiopia. In Eritrea it is used for Tigre, and it has traditionally been used for Blin, a Cushitic language. Tigre, spoken in western and northern Eritrea, is considered to resemble Ge´ez more than do the other derivative languages. Some other languages in the Horn of Africa, such as Oromo, used to be written using Ge´ez, but have migrated to Latin-based orthographies.For the representation of sounds, this article uses a system that is common (though not universal) among linguists who work on Ethiopian Semitic languages. This differs somewhat from the conventions of the International Phonetic Alphabet.
U+01380 Ethiopic Supplement (abugida over 32 codes): Ethiopic Supplement is a Unicode block containing extra Ge´ez characters for writing the Sebatbeit language, and Ethiopic tone marks.
U+013A0 Cherokee (syllabary over 96 codes. Language cherokee in USA): Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. The Cherokee syllabary is a syllabary invented by Sequoyah to write the Cherokee language in the late 1810s and early 1820s. His creation of the syllabary is particularly noteworthy in that he could not previously read any script. He first experimented with logograms, but his system later developed into a syllabary. In his system, each symbol represents a syllable rather than a single phoneme; the 85 (originally 86) characters in the Cherokee syllabary provide a suitable method to write Cherokee. Some symbols do resemble the Latin, Greek and even the Cyrillic scripts´ letters, but the sounds are completely different (for example, the sound /a/ is written with a letter that resembles Latin D).
U+01400 Unified Canadian Aboriginal Syllabics (abugida over 640 codes. Language cree in Canada, USA): Unified Canadian Aboriginal Syllabics is a Unicode block containing characters for writing Inuktitut, Carrier, several dialects of Cree, and Canadian Athabascan languages. Additions for some Cree dialects, Ojibwe, and Dene can be found at the Unified Canadian Aboriginal Syllabics Extended block. Canadian Aboriginal syllabic writing, or simply syllabics, is a family of abugidas (consonant-based alphabets) used to write a number of Aboriginal Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. They are valued for their distinctiveness from the Latin script of the dominant languages and for the ease with which literacy can be achieved; indeed, by the late 19th century the Cree had achieved what may have been one of the highest rates of literacy in the world.Canadian syllabics are currently used to write all of the Cree languages from Naskapi (spoken in Quebec) to the Rocky Mountains, including Eastern Cree, Woods Cree, Swampy Cree and Plains Cree. They are also used to write Inuktitut in the eastern Canadian Arctic; there they are co-official with the Latin script in the territory of Nunavut. They are used regionally for the other large Canadian Algonquian language, Ojibwe in Western Canada, as well as for Blackfoot, where they are obsolete. Among the Athabaskan languages further to the west, syllabics have been used at one point or another to write Dakelh (Carrier), Chipewyan, Slavey, Tłı̨chǫ (Dogrib) and Dane-zaa (Beaver). Syllabics have occasionally been used in the United States by communities that straddle the border, but are principally a Canadian phenomenon.
U+01600 Unified Canadian Aboriginal Syllabics (abugida over 640 codes. Language cree in Canada, USA): Unified Canadian Aboriginal Syllabics is a Unicode block containing characters for writing Inuktitut, Carrier, several dialects of Cree, and Canadian Athabascan languages. Additions for some Cree dialects, Ojibwe, and Dene can be found at the Unified Canadian Aboriginal Syllabics Extended block. Canadian Aboriginal syllabic writing, or simply syllabics, is a family of abugidas (consonant-based alphabets) used to write a number of Aboriginal Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. They are valued for their distinctiveness from the Latin script of the dominant languages and for the ease with which literacy can be achieved; indeed, by the late 19th century the Cree had achieved what may have been one of the highest rates of literacy in the world.Canadian syllabics are currently used to write all of the Cree languages from Naskapi (spoken in Quebec) to the Rocky Mountains, including Eastern Cree, Woods Cree, Swampy Cree and Plains Cree. They are also used to write Inuktitut in the eastern Canadian Arctic; there they are co-official with the Latin script in the territory of Nunavut. They are used regionally for the other large Canadian Algonquian language, Ojibwe in Western Canada, as well as for Blackfoot, where they are obsolete. Among the Athabaskan languages further to the west, syllabics have been used at one point or another to write Dakelh (Carrier), Chipewyan, Slavey, Tłı̨chǫ (Dogrib) and Dane-zaa (Beaver). Syllabics have occasionally been used in the United States by communities that straddle the border, but are principally a Canadian phenomenon.
U+01680 Ogham (alphabet over 32 codes. Language primitive irish, pictish in Scotland, Ireland, Wales): Ogham is a Unicode block containing characters for representing Old Irish inscriptions. Ogham /ˈɒɡəm/ (Modern Irish ˈoːmˠ or ˈoːəmˠ; Old Irish: ogam ˈɔɣamˠ) is an Early Medieval alphabet used primarily to write the early Irish language (in the so-called «orthodox» inscriptions, 4th to 6th centuries), and later the Old Irish language (so-called scholastic ogham, 6th to 9th centuries). There are roughly 400 surviving orthodox inscriptions on stone monuments throughout Ireland and western Britain; the bulk of them are in the south of Ireland, in Counties Kerry, Cork and Waterford. The largest number outside of Ireland is in Pembrokeshire in Wales.The vast majority of the inscriptions consist of personal names.According to the High Medieval Bríatharogam, names of various trees can be ascribed to individual letters.The etymology of the word ogam or ogham remains unclear. One possible origin is from the Irish og-úaim ´point-seam´, referring to the seam made by the point of a sharp weapon.
U+016A0 Runic (alphabet over 96 codes. Language old italic, runic): Runic is a Unicode block containing characters for writing Futhark runic inscriptions. Although many of the characters appear similar, they should not be confused with the J.R.R. Tolkien-designed Cirth, which has a separate ConScript Unicode Registry encoding. However, in Unicode 7.0 some additional Runic characters were added, including three Runic characters that were used only by Tolkien, for example in the maps of Hobbit: these are different from Cirth. Runes (Proto-Norse: ᚱᚢᚾᛟ (runo), Old Norse: rún) are the letters in a set of related alphabets known as runic alphabets, which were used to write various Germanic languages before the adoption of the Latin alphabet and for specialised purposes thereafter. The Scandinavian variants are also known as futhark or fuþark (derived from their first six letters of the alphabet: F, U, Þ, A, R, and K); the Anglo-Saxon variant is futhorc or fuþorc (due to sound changes undergone in Old English by the names of those six letters). Runology is the study of the runic alphabets, runic inscriptions, runestones, and their history. Runology forms a specialised branch of Germanic linguistics. The earliest runic inscriptions date from around 150 AD. The characters were generally replaced by the Latin alphabet as the cultures that had used runes underwent Christianisation, by approximately 700 AD in central Europe and 1100 AD in northern Europe. However, the use of runes persisted for specialized purposes in northern Europe. Until the early 20th century, runes were used in rural Sweden for decorative purposes in Dalarna and on Runic calendars. The three best-known runic alphabets are the Elder Futhark (around 150–800 AD), the Anglo-Saxon Futhorc (400–1100 AD), and the Younger Futhark (800–1100 AD). The Younger Futhark is divided further into the long-branch runes (also called Danish, although they were also used in Norway and Sweden); short-branch or Rök runes (also called Swedish-Norwegian, although they were also used in Denmark); and the stavlösa or Hälsinge runes (staveless runes). The Younger Futhark developed further into the Marcomannic runes, the Medieval runes (1100–1500 AD), and the Dalecarlian runes (around 1500–1800 AD). Historically, the runic alphabet is a derivation of the Old Italic alphabets of antiquity, with the addition of some innovations. Which variant of the Old Italic family in particular gave rise to the runes is uncertain. Suggestions include Raetic, Etruscan, or Old Latin as candidates. At the time, all of these scripts had the same angular letter shapes suited for epigraphy, which would become characteristic of the runes. The process of transmission of the script is unknown. The oldest inscriptions are found in Denmark and northern Germany, not near Italy. A «West Germanic hypothesis» suggests transmission via Elbe Germanic groups, while a «Gothic hypothesis» presumes transmission via East Germanic expansion.
U+01700 Tagalog (abugida over 32 codes. Language tagalog, runic in North Philippines): Tagalog is a Unicode block containing characters of the pre-Spanish Philippine Baybayin script used for writing the Tagalog language. Tagalog /təˈɡɑːlɒɡ/ (Tagalog: ) is an Austronesian language spoken as a first language by a quarter of the population of the Philippines and as a second language by the majority. It is the first language of the Philippine region IV (CALABARZON and MIMAROPA), of Bulacan and of Metro Manila. Its standardized form, officially named Filipino, is the national language and one of two official languages of the Philippines, the other being English.It is related to other Philippine languages such as the Bikol languages, Ilokano, the Visayan languages, and Kapampangan, and more distantly to other Austronesian languages such as Indonesian, Hawaiian and Malagasy.
U+01720 Hanunoo (abugida over 32 codes. Language hanunoo in Philippines): Hanunoo is a Unicode block containing characters used for writing the Hanunó´o language. Hanunó’o is one of the indigenous scripts of the Philippines and is used by the Mangyan peoples of southern Mindoro to write the Hanunó´o language. It is an abugida descended from the Brahmic scripts, closely related to Baybayin, and is famous for being written vertical but written upward, rather than downward as nearly all other scripts (however, it´s read horizontally left to right). It is usually written on bamboo by incising characters with a knife. Most known Hanunó´o inscriptions are relatively recent because of the perishable nature of bamboo. It is therefore difficult to trace the history of the script.
U+01740 Buhid (abugida over 32 codes. Language buhid in Philippines): Buhid is a Unicode block containing characters for writing the Buhid language of the Philippines. Buhid is a Brahmic script of the Philippines, closely related to Baybayin, and is used today by the Mangyans to write their language, Buhid.
U+01760 Tagbanwa (abugida over 32 codes in Philippines): Tagbanwa is a Unicode block containing characters for writing the Tagbanwa languages. Tagbanwa, also known as Apurahuano, is one of the writing systems of the Philippines. The Tagbanwa languages (Aborlan, Calamian, and Central), which are Austronesian languages with about 8,000 speakers in the central and northern regions of Palawan, are dying out as the younger generations of Tagbanwa are learning Cuyonon and Tagalog.
U+01780 Khmer (abugida over 128 codes. Language khmer in Cambodia, Vietnam, Thailand, Laos, China): Khmer is a Unicode block containing characters for writing the Khmer, or Cambodian, language. The Khmer alphabet or Khmer script (IPA: ʔaʔsɑː kʰmaːe) is an abugida (alphasyllabary) script used to write the Khmer language (the official language of Cambodia). It is also used to write Pali in the Buddhist liturgy of Cambodia and Thailand.It was adapted from the Pallava script, a variant of the Grantha alphabet descended from the Brahmi script, which was used in southern India and South East Asia during the 5th and 6th centuries AD. The oldest dated inscription in Khmer was found at Angkor Borei District in Takéo Province south of Phnom Penh and dates from 611. The modern Khmer script differs somewhat from precedent forms seen on the inscriptions of the ruins of Angkor. The Thai and Lao scripts are descended from an older form of the Khmer script.Ancient Khmer script engraved on stone.Khmer is written from left to right. Words within the same sentence or phrase are generally run together with no spaces between them. Consonant clusters within a word are «stacked», with the second (and occasionally third) consonant being written in reduced form under the main consonant. Originally there were 35 consonant characters, but modern Khmer uses only 33. Each such character in fact represents a consonant sound together with an inherent vowel – either â or ô.There are some independent vowel characters, but vowel sounds are more commonly represented as dependent vowels – additional marks accompanying a consonant character, and indicating what vowel sound is to be pronounced after that consonant (or consonant cluster). Most dependent vowels have two different pronunciations, depending in most cases on the inherent vowel of the consonant to which they are added. In some positions, a consonant written with no dependent vowel is taken to be followed by the sound of its inherent vowel. There are also a number of diacritics used to indicate further modifications in pronunciation. The script also includes its own numerals and punctuation marks.
U+01800 Mongolian (alphabet over 176 codes. Language mongolian in China, Mongolia, Afghanistan): Mongolian is a Unicode block containing characters for dialects of Mongolian, Manchu, and Sibe languages. It is traditionally written in vertical lines Top-Down, right across the page, although the Unicode code charts cite the characters rotated to horizontal orientation. Many alphabets have been devised for the Mongolian language over the centuries, and from a variety of scripts. The oldest, called simply the Mongolian script, has been the predominant script during most of Mongolian history, and is still in active use today in the Inner Mongolia region of China. It has spawned several alphabets, either as attempts to fix its perceived shortcomings, or to allow the notation of other languages, such as Sanskrit and Tibetan. In the 20th century, Mongolia first switched to the Latin script, and then almost immediately replaced it with the Cyrillic script for compatibility with the Soviet Union, its political ally of the time. Mongols in Inner Mongolia and other parts of China, on the other hand, continue to use alphabets based on the traditional Mongolian script.
U+018B0 Unified Canadian Aboriginal Syllabics Extended (abugida over 80 codes. Language cree in Canada, USA): Unified Canadian Aboriginal Syllabics Extended is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block for some dialects of Cree, Ojibwe, Dene, and Carrier.
U+01900 Limbu (abugida over 80 codes. Language limbu in Nepal, India, Kashmir, Pakistan): The Limbu script is used to write the Limbu language. The Limbu script is an abugida derived from the Tibetan script.
U+01950 Tai Le (abugida over 48 codes in Vietnam, Laos, Myanmar, Thailand): Tai Le is the name of Tai Nüa script, the script used for the Tai Nüa language.Tai Nüa (Tai Nüa: ᥖᥭᥰᥖᥬᥳᥑᥨᥒᥰ) (also called Tai Nɯa, Dehong Dai, or Chinese Shan; own name: Tai2 Lə6, which means «upper Tai» or «northern Tai», or ᥖᥭᥰᥖᥬᥳᥑᥨᥒᥰ ; Chinese: Dǎinǎyǔ 傣哪语 or Déhóng Dǎiyǔ 德宏傣语; Thai: ภาษาไทเหนือ, pronounced or ภาษาไทใต้คง, pronounced ) is one of the languages spoken by the Dai people in China, especially in the Dehong Dai and Jingpo Autonomous Prefecture in the southwest of Yunnan province. It is closely related to the other Tai languages. Speakers of this language across the border in Myanmar are known as Shan. It should not be confused with Tai Lü (Xishuangbanna Dai). There are also Tai Nüa speakers in Thailand.
U+01980 New Tai Lue (alphabet over 96 codes in Vietnam, Laos, Myanmar, Thailand): New Tai Lue is a Unicode block containing characters for writing the Tai Lü language. New Tai Lue script, also known as Simplified Tai Lue, is an alphabet used to write the Tai Lü language. Developed in China in the 1950s, New Tai Lue is based on the traditional Tai Le alphabet developed ca. 1200 AD. The government of China promoted the alphabet for use as a replacement for the older script; teaching the script was not mandatory, however, and as a result many are illiterate in New Thai Lue. In addition, communities in Burma, Laos, Thailand and Vietnam still use the Tai Le alphabet.
U+019E0 Khmer Symbols (32 codes): Khmer Symbols is a Unicode block containing lunar date symbols, used in the writing system of the Khmer (Cambodian) language.
U+01A00 Buginese (abugida over 32 codes. Language buginese, makassarese, mandar in Indonesia): Buginese is a Unicode block containing characters for writing the Buginese language of Sulawesi. The Lontara script is a Brahmic script traditionally used for the Bugis, Makassarese, and Mandar languages of Sulawesi in Indonesia. It is also known as the Buginese script, as Lontara documents written in this language are the most numerous. It was largely replaced by the Latin alphabet during the period of Dutch colonization, though it is still used today to a limited extent. The term Lontara is derived from the Malay name for palmyra palm, lontar, whose leaves are traditionally used for manuscripts. In Buginese, this script is called urupu sulapa eppa which means «four-cornered letters», referencing the Bugis-Makasar belief of the four elements that shaped the universe: fire, water, air, and earth.
U+01A20 Tai Tham (abugida over 144 codes in Thailand, Burma, Laos, Cambodia, China, Vietnam): Tai Tham is a Unicode block containing characters of the Lanna script used for writing the Northern Thai (Kam Mu´ang), Tai Lü, and Khün languages. The Tai Tham script (Northern Thai pronunciation: , tua mueanɡ; Tai Lü: ᦒᧄ , Tham, «scripture»), also known as the Lanna script or Tua Mueang, is used for three living languages: Northern Thai (that is, Kham Mueang), Tai Lü and Khün. In addition, the Lanna script is used for Lao Tham (or old Lao) and other dialect variants in Buddhist palm leaves and notebooks. The script is also known as Tham or Yuan script.The Northern Thai language is a close relative of Thai and member of the Chiang Saeng language family. It is spoken by nearly 6,000,000 people in Northern Thailand and several thousand in Laos of whom few are literate in Lanna script. The script is still read by older monks. Northern Thai has six linguistic tones and Thai only five, making transcription into the Thai alphabet problematic. There is some resurgent interest in the script among younger people, but an added complication is that the modern spoken form, called Kammuang, differs in pronunciation from the older form.There are 670,000 speakers of Tai Lü of whom those born before 1950 are literate in Lanna script. The script has also continued to be taught in the monasteries. There are 120,000 speakers of Khün for which Lanna is the only script.
U+01AB0 Combining Diacritical Marks Extended (80 codes): Combining Diacritical Marks Extended is a Unicode block containing diactritical marks used in German dialectology.
U+01B00 Balinese (abugida over 128 codes. Language balinese, sasak in Bali, Lombok, Indonesia): Balinese is a Unicode block containing characters for the basa Bali language. The Balinese script, natively known as Aksara Bali and Hanacaraka, is an abugida used in the island of Bali, Indonesia, commonly for writing the Austronesian Balinese language, Old Javanese, and the liturgical language Sanskrit. With some modifications, the script is also used to write the Sasak language, used in the neighboring island of Lombok. The script is a descendant of the Brahmi script, and so has many similarities with the modern scripts of South and Southeast Asia. The Balinese script, along with the Javanese script, is considered the most elaborate and ornate among Brahmic scripts of Southeast Asia.Though everyday use of the script has largely been supplanted by the Latin alphabet, the Balinese script has significant prevalence in many of the island´s traditional ceremonies and is strongly associated with the Hindu religion. The script is mainly used today for copying lontar or palm leaf manuscripts containing religious texts.
U+01B80 Sundanese (abugida over 64 codes. Language sundanese in Indonesia, Jakarta): Sundanese is a Unicode block containing modern characters for writing the Sundanese language of the island of Java. Sundanese script (Aksara Sunda) is a writing system which is used by the Sundanese people. It is built based on Old Sundanese script (Aksara Sunda Kuno) which was used by the ancient Sundanese between the 14th and 18th centuries.
U+01BC0 Batak (abugida over 64 codes. Language batak in Indonesia): Batak is a Unicode block containing characters for writing the Batak dialects of Karo, Mandailing, Pakpak, Simalungun, and Toba. The Batak script, natively known as surat Batak, surat na sapulu sia (the nineteen letters), or si-sia-sia, is an abugida used to write the Austronesian Batak languages spoken by several million people on the Indonesian island of Sumatra. The script may derived from the Kawi and Pallava script, ultimately derived from the Brahmi script of India, or from the hypothetical Proto-Sumatran script influenced by Pallava.
U+01C00 Lepcha (abugida over 80 codes. Language lepcha in India, Butane, Nepal): Lepcha is a Unicode block containing characters for writing the Lepcha language of Sikkim and West Bengal, India. The Lepcha script, or Róng script is an abugida used by the Lepcha people to write the Lepcha language. Unusually for an abugida, syllable-final consonants are written as diacritics.
U+01C50 Ol Chiki (alphabet over 48 codes. Language santali in India, Butane, Nepal, Bangladesh): Ol Chik is a Unicode block containing characters of the Ol Chiki, or Ol Cemet´ script used for writing the Santali language during the early 20th century. The Ol Chiki script, also known as Ol Cemetʼ (Santali: ol ´writing´, cemet ´ ´learning´), Ol Ciki, Ol, and sometimes as the Santali alphabet, was created in 1925 by Raghunath Murmu for the Santali language.Previously, Santali had been written with the Latin alphabet. But because Santali is not an Indo-Aryan language (like most other languages in the south of India), Indic scripts did not have letters for all of Santali´s phonemes, especially its stop consonants and vowels, which made writing the language accurately in an unmodified Indic script difficult. The detailed analysis was given by Dr. Byomkes Chakrabarti in his ´Comparative Study of Santali and Bengali´. Missionaries (first of all Paul Olaf Bodding, a Norwegian) brought the Latin script, which is better at representing Santali stops, phonemes and nasal sounds with the use of diacritical marks and accents. Unlike most Indic scripts, which are derived from Brahmi, Ol Chiki is not an abugida, with vowels given equal representation with consonants. Additionally, it was designed specifically for the language, but one letter could not be assigned to each phoneme because the sixth vowel in Ol Chiki is still problematic.Ol Chiki has 30 letters, the forms of which are intended to evoke natural shapes. Linguist Norman Zide said «The shapes of the letters are not arbitrary, but reflect the names for the letters, which are words, usually the names of objects or actions representing conventionalized form in the pictorial shape of the characters.» It is written from left to right.
U+01C80 Cyrillic Supplement (alphabet over 48 codes. Language komi in Russia): Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut, Azerbaijani, and Jakovlev´s Chuvash orthography.
U+01CC0 Sundanese Supplement (abugida over 16 codes. Language sundanese): Sundanese Supplement is a Unicode block containing punctuation characters for Sundanese.
U+01CD0 Vedic Extensions (48 codes): Vedic Extensions is a Unicode block containing characters for representing tones and other vedic symbols in Devanagari and other Indic scripts.
U+01D00 Phonetic Extensions (128 codes): Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.
U+01D80 Phonetic Extensions Supplement (64 codes): Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet.
U+01DC0 Combining Diacritical Marks Supplement (64 codes): Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet and Medievalist notations. It is an extension of the diacritic characters found in the Combining Diacritical Marks block.
U+01E00 Latin Extended Additional (256 codes in England, USA, Germany, France, Italy, Poland): Latin Extended Additional is a block of the Unicode standard. The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. There are also a few Medievalist characters.
U+01F00 Greek Extended (256 codes in Greece): Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters can be found in the Greek and Coptic (Unicode block). Greek Extended was encoded in version 1.1 of the Unicode Standard as is, having had no additions up to 6.2. As an alternative to Greek Extended, combining characters can be used to represent the tones and breath marks of polytonic Greek.
1: Text | 2: Symbol | 3: Asien | 4: Asien | 5: Asien | 6: Yi, Vai | 7: Hangul | 8: Privat | 9: Ägäisch | 10: Keil | 11: Anatol | 12: Bamum | 13: Tangut | 14: Kana | 15: Symbol | 16: Picto | 17: CJK | 18: CJK | 19: CJK | 20: CJK | 21: CJK | 22: CJK | 23: CJK | 24: CJK | 113:Tags | Details.