Noob Academic

Toward a knowledge-to-text controlled natural language for isiZulu by Maria Keet & Langa Khumalo

February 18, 2016 | 5 Minute Read

Ulwazi lwanamhlanje lusuka kwiphepha lika Keet kunye noKhumalo. Igama leliphepha leli lingentla. IsiZulu lulwimi lwesiNguni. IsiZulu lulwimi lwesiNguni. Iilwimi zesiNguni ziquka isiXhosa, isiNdebele kunye nesiSwati. Kuzo zonke ezilwimi, isiZulu lelona lwimi luxhaphakaleyo, sithethwa ngabantu abayi-22.7% kubo bonke abantu eMzantsi Afrika. Nangona kunjalo, isiZulu kunye nazo ezinye iilwimi esizibizileyo zinqabile kwiSoftware. Kweliphepha, ababhali bajonga ukuba ingenziwa njani iControlled natural language ukulungiselela ukubhala imithetho yeshishini okanye i-Ontology ngesiZulu. Ngamanye amazwi, kuzakuguqulelwa i-Ontology okanye ii-Business rules zisisiwa kwimibhalo yesiZulu. I-Grammaar yesiZulu (kunye nezinye iilwimi zesiNguni) iyasibonisa ukuba indlela yokusebenzisa ii-templates ayizokusebenza kulemeko. Ingxaki yinto yokuba ezilwimi zisebenzisa i-system yezibizo, iincutshe zeelwimi zithi ezilwimi zi-agglutinative kwaye zisebenzisa i-verb conjugation ene-concords kwi-class nganye yezibizo. Kweliphepha ababhali babonakalisa izinto esele bezenzile okubheka phambili.

UMzantsi Afrika lilizwe apho umgaqo siseko ukhusela zonke iilwimi zaseMzantsi Afrika, kodwa nangona kunjalo; imali ebhekiswa kwi-Human Language Technologies (HLT) incinci. Incinci kakhulu kwicandelo apho ku-proceswa khona ulwazi. Ukunqaba kwe-HLTs yinto eyaziwayo, kwaye isidingo sazo selesixhasiwe ngabantu. Umzekelo, isebe le Science and Technology line-Office eyaziwa ngokuba yi-National Indigenous Knowledge Systems Office (NIKSO). Le office yenze i-project ekuthiwa yi-National Recordal System. Le project ikhutshwe ngo-2013. Umsebenzi wale-project kukugcina, ukukhusela kunye nokuxhaphakisa olulwazi[1]. Into esingayithetha yinto yokuba le-project inika abantu ithuba lokukhangela ulwazi ngazo zonke iilwimi zomzantsi Afrika. Ngaphezulu kwale-project, idjunivesithi yakwaZulu-Natal iqalise ukunyanzilisa abafundi ukuba benze isiZulu, kwaye izama ukuphucula okanye ukuphuhlisa isiZulu ngokuzama ukwakha amagama e-Science kwisiZulu. Iinkomponi ezinkulu ezifana noo-Facebook, Google kunye no-Microsoft ziyazama ukusebenzisa iilwimi zaseMzantsi Afrika kwizixhobo ze-Software zabo.

u-Google Translate uyasixhasa isiZulu, isixhobo sabo siyazama ukuguqulela isiNgesi sisise kwisiZulu. Mva nje, ingathi uyasixhasa nesiXhosa. Kumaxesha amaninzi kodwa, ukukhupha izivakalisi ezingezizo. Zonke ezizinto kunye nezinto ezifana nazo, ukuze zisebenze, zidinga ii-Controlled natural languages, ii-Natural language generation systems, machine translation kunye ne-multilingualism kwi-knowledge representation ukuze zikwazi ukwenza ii-semantics-driven end-user interfaces. Ikhona imisebenzi sele yenziwe ngabantu kwicala le natural language understanding (NLU) ngeelwimi zesiNguni. Zikhona ne-project e-Europe ezizama ukwakha ii-Ontology zeelwimi zonke. Xa ababhali beqwalasela ezi project, bathi ukuba azisosebenza ngobunjalo bazo neelwimi zesiNguni.

Ukuqala phantsi ukwakha okanye ukucacisa i-Grammar njengababhalo u-Kuhn[2] kunye no-Ranta[3] yinto ethatha ixesha. Imibhalo ekhoyo ecacisa i-Grammar kunye nezinye izinto kwisiZulu kunye nezinye iilwimi zesiNguni zindala. Zisabalulekile kodwa ukuziphucula kungathatha ixesha. Siyayazi ukuba ukwakhiwa kwe-Systems kudingeka ngoku. Ithi lo nto eyona nto kumele siqale ngayo kukuqwalasela ii-controlled natural languages xa sifuna ukuphucula i-natural language generation sizokwazi ukunceda ii-linguistics, computer scientists kunye nezinye iincutshe zenze umsebenzi wazo. Yonke le nto inyanzelelisa ababhali bakhe kancinci-kancinci ngokuqala bathathe izinto ezisetyenziswa rhoqo elwimi njenge quantification, implication, etc.

Ababhali bazakusebenzisa ii-technologies ze-OWL kunye ne-Semantic Web ngenxa yokuba sele ziphucukile kwaye zixhaphakile. Umntu ofuna ukufunda banzi ngalemeko, angafunda umsebenzi ka Bouayad-Agha et al[4]. Ii-controlled natural langauge ezisebenzisa u-OWL zixhaphakile, eyona ixhaphake kakhulu yi-Attempto Controlled English (ACE). Enye yezinto ezinomtsalane zii logic-based models. Eyona ibalaseleyo kuzo yi Object-Role modelling. Ingxaki kodwa yinto yokuba isinzi sazo zisebenzisa ii-templates. Esizathu sibangela ukuba zilungele isiNgesi kakhulu. Ababhali abaquka u-Jarrar kunye nabanye[5] sele bezivezile iingxaki zokusebenzisa ii-templates kwezinye iilwimi. Yonke le nto isenza sibuze ezizinto zilandelayo:

  1. Zeziphi i-patterns ze-verbalisation kwesiZulu ze-logic constructs ezilungileyo?
  2. Ingaba zingafumaneka usebenzisa ii-templates zodwa, ii-templates kunye nemithetho ethile okanye ingaba kudingeka i-Grammar epheleleyo?
  3. Iimpendulo zalemibuzo ingentla ingaba ithetha ukuthini ngokwenziwa kwale-controlled natural language?

Ukuzama ukuphendula lemibuzo, ababhali bazamile ukwenza ii-algorithms zoku-verbalizer i-subsumption, negation, exestential ne-universal quantification kunye ne-conjugation. I-grammar yesiZulu ayikho lula, ngenxa yalonto kuyacaca ukuba ii-templates azizokulunga ukudilishana nazo zonke ezi-constructs besithetha ngazo ngentla. Ingxaki ibangelwa yinto yokuba kwisiZulu, isibizo sitshintsha imeko kwisivakalisi. Ngamanye amazwi, izinto ezifana nee-quantifiers, negation kunye nezenzi zixhomekeke kwisibizo. Umzekelo omfutshane ubekwe ngezantsi ukubonisa ukuba kutheni i-templates zingasoze zisebenze. Lo mzekelo usebenzisa i-quantification.

Consider a template for English for a simple axiom with quantification, which can be, e.g.

All [noun1 pl.] [verb 3rd pers. pl.] at least one [noun2]

The words for ‘all’ and the ‘at least one’ in isiZulu, however, depend on the noun class of [noun1] (or [noun1 pl.]) and [noun2], respectively. For instance:

bonke oSolwazi bafundisa isifundo esisodwa

‘all professors teach at least one course’

compared to

konke ukusebenza kuyawufeza umsebenzi onqunyiwe owodwa

‘all operations achieve at least one task’

Xa sihlalutya le nto, sizama ukwazi ukuba kwenzeka ntoni; siyabona ukuba u-“oSolwazi” ukwihlelo lesithathu kwaye u-“ukusebenza” ukwihlelo 15. Ngenxa yozizahu, siphela sisebenzisa u-“bonke” kunye no-“konke” xa kusetyenziswe igama ngalinye. Zikhona ezinye izinto esingazokuzixela apha, kodwa umtnu ofuna ulwazi oluphangeleleyo lungafumaneka kwiphepha lika-Keet kunye noKhumalo[6]. Yonke le nto ingentla iyabonisa ukuba izixhobo ezilungelanga iilwimi zesiNguni. Eliphepha liqhubekeka nomsebenzi oqaqalwe ngu-Keet kunye no Khumalo[7]. Kwelaphepha babenze i-algorithm entsha. Eli iphepha lona lithetha ngesiZulu, liveza ububanzi bolulwimi - liveza i-grammar kunye nezinye izinto ezibalulekileyo ngolulwimi.

References:

  • [1] Council for Scientific and Industrial Research (CSIR), Meraka Institute, http://www.csir.co.za/meraka/National_Recordal_System.html
  • [2] Kuhn, T. (2013). A principled approach to grammars for controlled natural languages and predictive
  • editors. Journal of Logic, Language and Information, 12, 13–48
  • [3] Ranta, A. (2011). Grammatical framework: Programming with multilingual grammars. Stanford: CSLI Publications.
  • [4] Bouayad-Agha, N., Casamayor, G., & Wanner, L. (2014). Natural language generation in the context of the semantic web. Semantic Web Journal, 5(6), 493–513
  • [5] Jarrar, M., Keet, C. M., & Dongilli, P. (2006). Multilingual verbalization of ORM conceptual models and axiomatized ontologies. Starlab technical report, Vrije Universiteit Brussel, Belgium.
  • [6] Keet, C.M. and Khumalo, L., Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, pp.4.
  • [7] Keet, C. M., & Khumalo, L. (2014a). Basics for a grammar engine to verbalize logical theories in isiZulu. In A. Bikakis et al., (Eds.), Proceedings of the 8th International Web Rule Symposium (RuleML’14)