Planning Effective Learning for the Visually Impaired in Federal University in Osun State, Nigeria: Technology Perspective

The School system at the university level in developing nations provides the most impactful environment for effective learning and building human capital required for a future-knowledge economy. Unfortunately, less attention is given to the provision of appropriate platforms for the disabled to effectively learn in the university system. Notably, disabled individuals rely on their ability to listen to only speech to gain information quickly to aid learning. In this paper, a learning framework was implemented and tested from educational and technological perspectives. The study focused on an ideal combination of a human-like behavior with a computer application to build a one-way interactive medium between the computer and the user. Using qualitative and quantitative research, we focused on acoustic signal processing without the incorporation of visual input. The respondents used for the study include students, lecturers, specialists in specialized education, and technological experts. The result is a prototype system that translates texts into speech, and a synthesized speech representation of the text entered into the system to aid effective learning. The test results with different users revealed that a majority of 84.70%, 85.30%, 84%, and 86.70% of the users rated the system to have high-quality content, clarity, attractiveness, and responsiveness, respectively.


INTRODUCTION
Education in all parts of the world has become the primary facilitator to the ever-increasing knowledge, engendering quality development of both humans and systems (Gambo et al., 2018). The school system provides the most impactful environment for effective learning (Schleicher, 2011;Paine et al., 2016), innovation, and building human capital required for a future-knowledge economy. Unfortunately, less attention is given to providing appropriate platforms for the disabled (for example, the visually impaired) to effectively learn in the university system.
Visual impairment among students makes learning challenging and affects academic performance (Layton and Lock, 2001;Otyola et al., 2017). Students learn from their lecturers mostly by listening to lectures and reading lecture notes or relevant materials-students with good vision benefit from listening to lectures and reading. In contrast, students with visual impairment mostly find reading very difficult because of their poor sight (Fox, 2012). Students who are visually impaired and students who have good vision do not read at the same pace. In the Nigerian case, disabled individuals rely on their ability to listen to only speech to gain 81 www.mojet.net information quickly and effectively to aid learning.
A typical example is a current situation at the Obafemi Awolowo University (OAU), Ile-Ife, being the only federal university in Osun State. Several students across the faculties in the university are visually impaired. Unfortunately, there is no adequate provision for these students to enjoy and learn effectively alongside others who are not visually impaired.
Meanwhile, the school system expects all categories of students to perform well. Individuals with this defect rely on their ability to hear or listen to only speech to gain information quickly and effectively to aid learning. This is because the visually impaired cannot see, while some others have low vision to comprehend any written text. To this end, there is a need for a system to be put in place to give a synthesized speech representation of text for a compelling study, thereby engendering active learning (Garrison and Kanuka, 2004;Hew and Brush, 2007). Furthermore, a system can be developed from a technological perspective by leveraging on available Information and Communication Technology (ICT) tools (Eligi and Mwantimwa, 2017). By this, we envisage a system that can turn text to speech so that the visually impaired can access all relevant lecture materials through speech. As observed in Wongkai et al. (2012), the text-to-speech engine is widely used by blind and VI students to support reading the electronic text through computers using a channel for hearing. The interaction when using a text-to-speech system is facilitated by listening to audio recordings of speech and narration (Beal and Shaw, 2009;Balan et al., 2014). Thus, it becomes crucial to have a system that will give the correct prosody and pronunciation analysis from the written text and process the sound signal dynamically during the speech conversion.
Therefore, this paper aims to plan a learning framework for implementing and testing a text-to-speech system from educational and technological perspectives. The goal is to develop a software model that offers a computer-generated spoken voice from the text entered into the prototype system to aid effective learning. Similarly, this is to harness technology's potential in providing the solution to effective learning for the visually impaired.
In the context of this research, the text-to-speech system is an application that allows an individual to type out a message and have it spoken by the machine (Parr, 2012). The ultimate goal is to help visually impaired people and people with reading disabilities access information quickly. The system reads an entire word to the user; hence the pronunciation of articles and necessary words is usually fed to the system. The rest of the words and complex ones are calculated and read accordingly. The text-to-speech system application has been effectively used to help read the text document to visually impaired students. They do not always need to look at the book or screen and read the entire document (Isewon et al., 2012).

Effective Learning from Technological Perspective
The Education sector is undergoing a dramatic change triggered by technologies, competitive forces, and changes in their operational model. Higher institutions saddled with the responsibility of delivering quality learning have been the most beneficial to this area. For instance, the need to plan, determine and ensure student academic performance, the adequate keeping of admission records, effective processing of results, and to mention a few. This intends to provide valuable insight into the applicability of technology in education and support effective learning. Therefore, the introduction and use of ICT have brought about better services by making systems, businesses, and operations more pervasive and productive in today's society (Gambo et al., 2018). In the educational sector today, ICT is a significant catalyst in driving and achieving the successes of significant operations targeted towards quality and practical learning (Enkhtsogt and Kim, 2018). One of such operation is the planning of sound education to support all the effective learning strategies (Mitchell and Sutherland, 2020). The use of ICT in engendering quality learning in educational planning lends itself to more studentcentered learning settings (Noor-Ul-Amin, 2013), which is why it is essential to extend planning quality education to those with disabilities. For example, ICT usage in the educational system engenders the introduction of new pedagogical practices involving assistive technology to aid effective learning (Kreijns et al., 2013). The work of Rao (2013) appraised ICT as a change agent for education and justified the need for its application among every category of students (Wastiau et al., 2013;Albugami and Ahmed, 2015).
Without a speck of doubt, the educational system, as a whole, is seen to have a keen interest in adopting the many-sided benefits of ICT applications. Correctly, ICT can be used to proffer solutions to people with disabilities to enjoy learning alongside those without disabilities. Again, this justifies the global reforms in education and challenging ICT demands. In education, ICT is a broad, deep, and rapidly-growing field of study (Moursund, 2005). ICT utilizes a broad range of technologies to collect, store, edit, retrieve, and transfer information in various forms. One of the major factors or agencies of national development and global competitiveness is education across all boards. Thus, this paper attests that when planning for education, it is essential to shoulder practical learning for people with disabilities, most notably the visually impaired, using technology.

LITERATURE REVIEW
Speech is one of the oldest and most natural means of information exchange between humans (Tamilnadu, 2010). Over the years, attempts have been made to develop vocally-interactive systems to realize speech synthesis called Text-To-Speech (TTS). TTS synthesis is a technology that provides a means of converting written text-basically in English Language-from a descriptive form to a spoken language that is easily understandable by the end-user (Kaladharan, 2015). Many a time, it runs on Java and so many other platforms like Visual Basic, C++, and a few to mention.
According to Raghavendray (2008), TTS converter is a recent software project that allows the visually challenged to read and understand various documents through the generation of an automatic speech being developed by the machine to convey the information to the user in a preferred language and preferred voice based on the text input. The TTS's objective is to convert an arbitrary-given text into a corresponding-spoken waveform (Shiga et al., 2020). Text processing and speech generation are two primary components of a TTS. The text-processing component aims to process the given input text and produce an appropriate sequence of phonemic units. These phonemic units are realized by the speech generation component either by synthesizing from parameters or selecting a unit from a large speech corpus.
In Ismail (2004), the TTS process was reported to have four main steps to process the speech signal; these include: acoustic processing, features extraction, features classification, and character tagging. These four steps are employed in the TTS model (Sasirekha, 2014) to generate the synthetic.
Besides, several TTS exist in the market today. Some are open-source (i.e., available for free on the internet), while others are proprietary (i.e., not free at all). Examples of such TTS system include: i.
the Multi-domain TTS (MdTTS) Synthesis by Automatic Text Classification designed (Francesca et al., 2014). The MdTTS describes a synthesis strategy for generating speech among different domains and increasing the flexibility of high-quality TTS systems. Also, the MdTTS performance was subjectively validated through several perceptual tests, which gives room to lots of bias, and the focus was not for educational purposes; ii. the text processing for TTS systems in Indian languages by Anand and Prakash (2013) addressed issues of Font-to-Akshara mapping, pronunciation rules for Akshara, text normalization in the context of building TTS systems in Indian languages. However, the system is limited to Indian languages, and the planned framework is not adaptable to the English language; iii. TTS by Juergen (2014) gave a review on how the TTS engine works. His work was limited to the front-end and back-end issues like concatenative synthesis and formant synthesis. Criteria for TTS evaluation like accuracy, intelligibility, and naturalness were identified. Still, many aspectssuch as prosody generation, natural language processing, were skimmed only for space reasons; www.mojet.net iv. the work of Fan et al. (2014) on TTS synthesis and Automatic Speech Recognition focused on the use of Natural Language Processing (NLP) techniques in the production of voice from an input text, also known as Text-To-Speech (TTS) synthesis, and the inverse process, which is the production of a written text transcription from an input voice utterance, also known as Automatic Speech Recognition (ASR). The authors provided a framework upon which additional insights were gained in planning the framework used in this paper.
However, there is lots of improvisation in the research industry to make the speech more effective, natural, and stress-free. Examples include: Shirbahadurkar and Bormane (2009), Buza et al. (2009), and Saraswathi and Vishalaksh (2011). However, this improvisation had several challenges upon which this paper attempts to address. For example, in Saraswathi and Vishalaksh (2011), it is challenging to process unrestricted text. Secondly, finding the optimal concatenating point in the text is challenging. In Shirbahadurkar and Bormane (2009), the quality of text analysis, prosody, and creating a big speech database for concatenation synthesis is challenging. Simultaneously, syllable and accentuation rules are not completed in Buza et al. (2009). Furthermore, in his design, Amanpreet (2009) only focuses on Automatic Syllable Segmentation for Indian languages. The choice of syllable units reported in Amanpreet (2009) was due to the Indian languages' syllable-centric nature. Besides, Sub-Word syllable units, as reflected in the Indian languages, are used for speech segmentation. This means the realization of one word is affected by other adjacent words, which results in a lack of generation generalization.
In their design, Vieri and Veri (2013) stated that a blind person could not see the length of an input text when starting to listen to it with the speech synthesizer's help. Thus, an important feature is to give, in advance, some information about the text that is to be read successfully. The synthesizer has some difficulties when checking the document and calculating the estimated duration of reading and speaking to the listener. There are several problems in the text preprocessing, such as: numerals, abbreviations, and acronyms. The syllable is a word containing a single vowel sound, which is pronounced as a unit Text-to-Phonetic Conversion. Isewon et al. (2014) designed a TTS speech conversion system. The concatenative synthesis method was used in which segments are tied together to form a complete speech chain of sub-word units. The system's result provided an interface for the conversion of any input text in English to speech. Despite the ability to solve the problem, the system was not included with a database to store text files. Hence there is no continuous repeat of speech after reading through all text content. Suhas et al. (2015) designed a TTS synthesizer using Articulatory synthesis. The mechanical and acoustic model was used to develop the system. Construction of the vocal database-by extracting acoustic units from speech-was used to provide the speech analysis. Voice signal analysis, speech segmentation, vocal database construction, and text analysis used cannot be used for any other method. The output is far from the natural voice. Bhargava et al. (2015) designed a reading assistant for the visually impaired. The imaging processing technique (IPT) was used to design the system. The system designed and implemented can convert any text file into an audio file. However, there is more processing time when the document is large.

RESEARCH METHOD
This study is focused on an ideal combination of a human-like behavior with a computer application to build a one-way interactive medium between the computer and the user. Also, it focuses only on the acoustic signal processing without the incorporation of visual input, and the target users are visually impaired in Obafemi Awolowo University (OAU), Ile-Ife, Nigeria. OAU is the only Federal University domicile in Osun State established between 1961 and 1962.

Planning the Framework
Qualitative and quantitative research approaches were used in collecting and analyzing data. The Rapid Application Development Framework (RADF), as shown in Figure 1, was strictly followed in this research (Dennis et al., 2015). In the RADF, the functional modules are developed in parallel as prototypes and are integrated to make the complete system easier to incorporate the changes within the development process. The most crucial aspect of this model is ensuring that the prototype developed are reusable. The RADF distributes the analysis, design, build and test phases into a series of short, iterative development cycles.
The data (also known as the functional requirements) was collected for analysis purposes at the planning stage. The use case approach was used to analyze data within the planned framework to implement the prototype TTS system. The requirements gathering and analyses were carried out between planning and strategizing to determine the targeted users, expected functionalities of the TTS system from users, system functionalities, the necessary data for the system, and the system's likely feedback.
A structured interview guide was used to elicit these data from eighty (80) students of OAU, Ile-Ife. They are visually impaired or have poor vision. During the interview, the questions asked bothers on the challenges these students face during the learning process, most notably during lecture time. The purposive sampling technique was used to select these students from different faculties and departments in the university to identify the challenges they face when learning. In planning the design for effective learning, the Unified Modelling Language (UML) tool was used to describe the relationship between users and the system operations. Correctly, the use case, activity, sequence, and class diagrams were used to represent the planned framework designed. The implementation of the design was done using Java programming language and MySQL. Java NetBeans IDE was used for the development of the front-end and the back-end. The system was tested using alpha and beta testing based on the user's experience. Alpha testing helped test all issues and bugs before releasing the system to the users or the public. In contrast, real users performed beta testing.

Data Collection
In the course of data collection, oral interviews were carried out. The target users are students from OAU, Ile-Ife, who are visually impaired or have poor vision. Oral interviews were carried out in order to have sufficient information for the development of the prototype system. Additional twenty-five (25) students with poor vision in the university were also interviewed to know their challenges while reading in school. Some of the expectations of these visually impaired students on the system are shown in Table 1. User should be able to use the application with ease 2 Users should be able to add and load any text basically in the English Language into the system. This is inclusive of text files. 3 The system should read the text file to the user according to what is contained in the text field. 4 Users can save text that has been read. 5 Users can reload previously read text into the text field. 6 User can listen again to the previous text being saved in the system to aid assimilation Personal interaction with experts in special education was carried out to have further a broader view of using ICT tools to help address the challenges mentioned above. Specifically, five (5) experts in the Faculty of Education, OAU, Ile-Ife, and two (2) external experts who are currently into special education training were consulted for interaction on possible ways to make learning exciting and educative enough for the visually impaired. Based on the interaction with the experts in special education, the requirements in table 2 were elicited as the possible feedback the system is expected to provide. Read the menu to understand the text provided in the text field to the user. 3 Save menu to save the text that has been read to the user. 4 Clear menu to save the text that has been read to the user.

5
Database design to store all text being read to the user of the application.

6
Repeat checkbox to repeat reading of the entire text content without clicking on start playing menu after reading through all the text.
The mode of learning and reading of the visually impaired students after receiving lectures from the class was also observed to determine essential features that will be needed in the system to make it more useful for them during their study.

Design Specification
The design phase is one of the activities within the RADF. The essence of the design specification is to reduce the difficulty in understanding and to define the requirements. First, we have the planned architectural design (PAD) for the system, as shown in Figure 2. The PAD consists of the overall structure of the TTS system. As shown in Figure 2, the application is divided into three main modules. First, the main application module (Text-to-Phoneme module), which includes the essential GUI components that handles the basic operations of the application such as input of parameters for conversion either via file or direct keyboard input. The second module (Phoneme-to-Speech module) is the main conversion engine integrated into the main module, and it is for the acceptance of data. Hence the conversion into speech is made possible. www.mojet.net

Figure 2: Planned Architectural Design (PAD)
The additional design specifications were made possible with the Agro UML tool. The use case diagram in Figure 3 was used to represent the system's functional requirements regarding actors, roles, and interactions. As Figure 3 reflects, the TTS software requirements are captured with the description of scenarios called used cases to enable system components communicates to perform an expected operation (Jacobson, 1987)

Figure 3: Use case diagram of the system
In Figure 4, an activity diagram is depicted to show the activities flow that can be performed by the system. The user can type text into the text field area provided in the application from the start point. Also, the user can load or copy text from any location into the text field provided. If the text provided is in English, the system reads out the text based on what is contained in the text field. The user can either save the text before or after it has been read or clear the text to reload or type in another text. Suppose it requires the user to open saved content of the text that has been previously read. In that case, the system allows for such action, and the user can listen again to such text. Suppose no text is provided into the text field. In that case, the system does not process any input, and, unfortunately, no speech will be generated.

87
www.mojet.net Figure 4: Activity diagram of the system Figure 5 shows the class diagram used to create the classes and relationships for the system problem domain. As Figure 5 shows, the class names, methods, and attributes of the system's classes and their relationship are stated.

Implementation of the System
The system was implemented in a software and hardware environment. A web interface allows users to interact with the system by entering or loading text files. The structure of the prototype system is divided into two main modules: (i) the main application module, which allows user to interact with the system, and (ii) the second module that has the system functionalities and is integrated into the main module for the acceptance of data and conversion.
The home page provides the GUI that makes communication and interaction easy between the user and the system through the windows, menus, and icons that a mouse can manipulate. Here, the home menu is the first screen the system displayed when it is launched. The home GUI as shown in Figure 7, is the first screen that the user sees when the TTS application is loaded. The screen consists of the system's name, text field, start talking action, pause talking action, stop talking action, save option, clear text content option, a field showing all saved text before and after reading. This menu list contains different intonations of speech synthesis based on the user's selection and the repeat mode checkbox. www.mojet.net Figure 8 shows the interface's screenshot when the 'more option' button is clicked, showing more additional features embedded in the system. The additional features consist of the volume option to increase the system's sound volume. The user interacting with this system hears the speech sound clearly and audibly. Figure 9 represents an interface showing the screenshot of the system reading text in repeat mode when text is loaded to the system. This application's user can listen several times to the synthesized speech representation of the text loaded or text copied from any location on the computer into the text field without clicking on the 'start talking option' after the system has finished reading the text. The 'repeat mode' checkbox at the right-hand panel in the interface enables these features to be performed by the system when the user clicks on the checkbox. Figure 9: Screenshot of the system reading text in repeat mode when text is loaded Figure 10 represents an interface showing the system's screenshot, saving the text read for future reference. The 'save menu' option allows the user to save a large volume of text content to the system for review or reread whenever the need arises. The system permits users to save the text provided in the text field before reading it out to the user or after it has been read to the user. Figure 10: Screenshot of the system saving the text that has been read for future reading Figure 11 shows the action page whenever the user tries to delete the saved content from the system. As the user deletes the saved content, a pop-up message is displayed to notify the user whether to delete the selected saved content permanently or reverse the action.

SYSTEM TESTING
The alpha and beta testing techniques were used to test the system. The goal is to detect errors and bugs before it is delivered, installed, and made operational. It is also aimed at having feedback from the experience of users on the system for further improvement. Specifically, all the system components and the system were tested to ensure adequate functionality and to ensure that it meets the specification earlier stated.
On the one hand, alpha testing was carried out in every phase of the system development to ensure that all the modules function as expected. Bugs were encountered in some modules, and they were fixed before proceeding to the next module of the system. On the other hand, beta testing was carried out by allowing users to test and rate the system based on some criteria such as clarity, attractiveness, responsiveness, and content quality. The measurement scale of rating is 1-10, where 1 is worst, and 10 is www.mojet.net best. The result obtained from users is shown in Table 3. The criteria used for beta testing are explained below: a) Clarity: This measures how well the user can interact with the system by communicating meaning and function with ease of use and efficiency. b) Attractiveness: This measures how pleasing or appealing the user interface is to the user. c) Responsiveness: This measures how quickly the system responds to the text provided in the text field to synthesize speech representation. d) Content quality: This measures the level of excellence of speech produced by the system

RESULTS AND DISCUSSION
Specifically, the result is a prototype system that translates text into speech. It offers a synthesized speech representation of the text entered into the system to aid effective learning for the visually impaired. The test was carried out by fifteen (15) different users with a scale rating of 1-10. Table 3 shows the results of alpha-beta test carried out with real users.   Figure 13 describes the testing result of the software for attractiveness. With 84% attractiveness, the users found the system appealing to use.  Finally, Figure 15 shows the testing result of the software for content quality. The content quality borders on getting precisely what the user wants with the software. Consequently, we achieved over 90% agreement in terms of the content quality of our system.

CONCLUSION AND FUTURE WORK
In this paper, we have been able to justify the need to harness the potential of ICTs in educational planning towards achieving effective learning for the visually impaired. Specifically, the paper leveraged ICT use in supporting students with visual impairment, thereby improving their academic performance. A system that can provide synthesized speech representation of text and reads out to the user with different intonations was presented. The paper also concluded that the system developed has converted inputted text into synthesized speech in the English language. Fortunately, the developed system will help people with learning disabilities by assisting them in accessing the text's large content through synthesized speech representation. It also provided the visually challenged individuals with an interface to interact for an effective learning process; thus, reading through a large text volume is easier.
Furthermore, students who are not visually impaired but prefer to listen rather than read their notes can benefit from the system. They have the opportunity to listen to the notes given by their lecturers in different voices; that is, male and female voices. They may listen to the notes as many times as possible in a day, which will give a better comprehension of the subject matter. More so, they may decide to listen to it while cooking, driving, bathing, or even while dressing up for a class. The TTS can accommodate the high volume of texts. As a matter of fact, students who find it very difficult to read a large volume of texts may decide to listen to it. This app will make it possible for a student to listen to a whole textbook at once. Besides, it has storage capacity. Any notes that students would like to refer to later can be stored in this system easily.
Further research on this work is essential to create engines for localized Nigerian languages to make TTS technology more accessible to a broader range of Nigerians, especially in tertiary institutions. Suffice it to say, an application that can prevent eye strain by enabling users of TTS technology to sit and listen to text input comfortably is inevitable.