YouGlish: A web-sourced corpus for bolstering L2 pronunciation in language education

ABSTRACT


INTRODUCTION
The ubiquity and inexorable progress of technology entails its incorporation into the field of language education. With the advent of computers, an autonomy-fostering area called computer-assisted language learning (henceforth CALL), which is technology-driven and inextricably linked to many other disciplines (Beatty, 2013)-corpus linguistics being one of these-came into our lives in three phases: 1. behavioristic (1960s-until 1980s), 2. communicative (early 1990s-late 1990s), and 3. integrative (late 1990s-now) (Richards, 2015).
The evolution of CALL from the behavioristic perspective to the integrative one might also be construed as the transition from teacher-centeredness to learner-centeredness (Richards, 2015). Endorsed by corpus linguistics, this personalized learning exhibits itself in data-driven learning (DDL)-"an inductive approach to learning in which learners acquire an understanding of language patterns and rules by becoming more involved in researching corpora, usually through use of a computer-based concordancing program" (Beatty, 2013, p. 68). Enabling learners to access corpora of spoken and written texts to show usages of words or phrases in their immediate contexts as they appear, these programs supply learners with authenticity, transparency, and learner autonomy (Bardovi-Harlig & Mossman, 2022).
One of the ramifications of CALL has been computerassisted pronunciation teaching (hereafter CAPT), which, as the name suggests, is specifically concerned with the teaching of pronunciation through exposure to authentic audio-visual materials in individualized and stress-free learning environments (Luo, 2016;Rogerson-Revell, 2021). Among the latest products of CAPT is YouGlish (abbreviation of You English) (Miller, 2019)-an online corpus with extensive genuine videos that can serve as a phonetic concordancer to be utilized in teaching pronunciation along with vocabulary and aural comprehension. Like other products of CAPT, it might be asserted that YouGlish too provides individualized learning opportunities, increased availability of L2 input, conducive learning environment, aural discrimination and focused attention, self-paced learning, visual support, rapid, and personal feedback through automatic speech recognition (ASR) (Dai & Wu, 2021;Hsu, 2016;McCrocklin, 2016), attention to both segmentals and suprasegmentals, multiple speaker exemplars, longer stretches of discourse, adaptability and flexibility for various learning styles, increased intrinsic motivation because of fun and gamified aspect (Henrichsen, 2021). Even the simplest form of CAPT software provides extensive aural input since it is believed that comprehension precedes production (Flege, 1995). Authentic and natural speech models are offered by this kind of CAPT software such

OPEN ACCESS
as TED talks and YouGlish in the hope that comprehension practice through these pieces of software will eventually lead to accurate production.
As correct pronunciation is deemed salient for successful communication (Levis, 2018;Martin, 2020;Pennington & Rogerson-Revell, 2019;Uchida & Sugimoto, 2020) along with multiword prefabricated lexical chunks or formulaic expressions (Hinkel, 2022;Pleyer, 2023), YouGlish might serve as an effective and practicable means for the learning and teaching of phonetic and lexical structures. To this end, this research aims to outline the theoretical rationale underlying YouGlish, specify its characteristics, expound on its implementations in language classrooms, and delineate some concerns regarding its utilization. In this regard, this research intends to provide language teachers with both theoretical and practical insights about YouGlish.

GENERAL DESCRIPTION
YouGlish is a means that delivers rapid and impartial responses regarding the non-prescriptive usage of languages by actual people in real-life situations (Miller, 2019). Based on YouTube, this pronunciation corpus provides access to millions of genuine and contextualized videos in its database unlike instructed English to which language learners are exposed. The website also serves as a video pronunciation dictionary allowing language learners to access a wide range of videos on the selected pronunciation samples (Miller, 2019). The website supports the search of pronunciation samples in 16 (i.e., Arabic, Chinese, Dutch, English, French, German, Hebrew, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, and Turkish) languages including the sign language. On the website, there is a search bar that helps search English pronunciation samples in Australian, the UK, and the US accents.
Users who wish to search for the pronunciation of a particular lexical item just type in that word or phrase, and they then are presented with a multitude of videos including the pronunciation of the selected search item. The videos start playing automatically and will continue to play the same video unless users click on next. Captioned texts appear below the video, with the search item highlighted. The video pane allows users to play, pause and replay videos, adjust their speed (i.e., slow, normal, and fast) and rewind them five seconds. Further options are provided by caption bar such as sending feedback to the administrator, sharing video on Facebook, Twitter, and through Gmail, saving video to users' personal lists on YouGlish, and a YouTube button that opens video in a new tab.
Along with these, the website also presents some additional pronunciation resources below the video pane. Users are provided with 'nearby words'-a list of adjacent words about the search item. Those who wish to improve their pronunciation might click on one of these words in the pane. 'Phonetic' section allows users to see both traditional and modern phonetic transcriptions of the search item along with its syllables and supplies them with a bank of words that are syllabically and segmentally similar. Another option that users are supplied with is 'tips to improve your English pronunciation' under which is a list of ideas and suggestions given for polishing their pronunciation. The suggestions supplied on the website are (1) breaking the pronunciation sample down into its sounds (and say it out loud until consistent production is achieved), (2) recording oneself articulating the search item in full sentences so that mistakes can easily be noticed on the subsequent watches, (3) searching for tutorials on YouTube for the target pronunciation of the sample, and (4) directing attention on one accent by picking up one accent and sticking to it for a blend of accents might be confusing beginner learners.
A number of other strategies are offered for further practice such as (1) working on word/sentence reduction (e.g., what are you going to do this weekend→what you gonna do this weekend), (2) tackling intonation (i.e., stress, rhythm, and pause), despite the hardship of mastery, as it conveys moods and attitudes, and (3) subscription to English teaching channels on YouTube.
The suggestions or strategies on this website are fixed; that is, they do not change with subsequent searches. Only the new search item is inserted into the same body of text (i.e., the list of suggestions/strategies) with new query.
Creating an account on YouGlish gives users access to additional features, including their personal database that encompasses previously searched items and video samples reached through the site. On their account, users can navigate between 'my tracks' and 'my words' through which they might access their earlier searches and personally-added words-the definitions of which are instantly populated by the site. They might edit these words through 'notes' and 'category' tabs, which allow them to make notes about words and add their categories (i.e., part of speech, topical categorization such as food, sports, etc.). Users may access further practice on any words saved under 'my words' tab. Simply by clicking on the 'D' button that appears to the right of the 'definition', an information package about the selected word is revealed. This package includes further vocabulary resources under four categories: (1) lexical information (i.e., definitions, verb patterns, adjective forms, and singularity/plurality), (2) synonyms (i.e., words or phrases with similar or nearly the same meanings), (3) 'from the web' (i.e., web-sourced definitions of the search item, each with its source link), and Users may also watch videos including pronunciation samples of the selected word/phrase just by clicking on the play icon next to the 'D' button.
In addition to these, users can sign up for free 'five-minute lessons' or 'word of the day', which will be accessed through Twitter or email. In case of registration via email, users will have the chance to receive five-minute lessons of real English conversations to improve their pronunciation at regular intervals, with such options as every day, twice a week, once a week or once a month. What is different in these five-minute videos from those on YouGlish from its regular usage on the main page is the provision of different settings offered in each lesson. Learners are provided with options to adjust the frequency of watching the videos and given four seconds intervals to repeat what they have heard. Users might also register for 'word of the day' that will be sent at regular intervals as in five-minute lessons. Upon registration, learners are supplied with definitions, synonyms, word usages, translation to 52 languages, visual support if available, and pronunciation videos of the words.

PEDAGOGICAL FIT
YouGlish might be rooted in a handful of theoretical principles and its manifold uses could be explicated through certain approaches. The first of these might be noticing hypothesis (Schmidt, 1990), which suggests the facilitation of learning of some language aspects providing that learners are consciously aware of these aspects in the language they speak. By exposing themselves to genuine and natural pronunciation samples, YouGlish users might be expected to develop a conscious awareness of those samples and hopefully incorporate these new linguistic features into their linguistic competence. In other words, it can be maintained that YouGlish might supply learners with opportunities to transition from implicit language knowledge to procedural knowledge (Richards, 2015). Noticing hypothesis is also closely related to discovery learning (Goldhawk, 2023), according to which learning happens by discovery, prioritizing reflection, thinking, experimenting, and exploring as well as inductive learning often used interchangeably with discovery learning and depicts the type of learning process wherein learners discover the rules through observation (Goldhawk, 2023).
YouGlish might also be grounded on the natural approach (Krashen & Terrell, 1983) particularly the monitor, input and affective filter hypotheses. With sufficient time to choose and apply a certain rule, enough focus on form, and knowledge of the rules, learners can monitor the output of the acquired system through conscious learning. On YouGlish, learners can practice the unfamiliar phonetic and lexical items and hopefully become competent in producing those items via the awareness of their usages exhibited in authentic videos it its database. In regards to comprehensible input, it can be asserted that learners might understand the phonetic and lexical items based on the natural and genuine contexts in which they are used as illustrated in the videos accessed through YouGlish. It might also be assumed that learners' socalled affective filter is lowered during practice on YouGlish for it supplies them with a stress-free environment wherein no assessment is undertaken.
Based on a microcomputer-based approach to foreign language learning that regards leaners' discovery of language forms salient and considers teachers as linguistic informants who supply learners with strategies for discovery, DDL enables learners to access to the facts of linguistic performance in contrary to other approaches that encompass linguistic competence (Johns, 1991). Through the use of concordancers (hereby YouGlish is regarded as a phonetic concordancer), this type of learning is individualized, teacher-supported and supplies learners with numerous opportunities for discovery learning through authentic and natural materials. Substantial amount of data (millions of videos) are presented to learners via YouGlish, which hopefully contributes to linguistic competence of learners at the phonetic and lexical levels.
Another theoretical principle behind YouGlish might be claimed to be genre theory (Kress & Knapp, 1992) and textbased instruction/genre-based teaching (Burns, 2012), which encapsulate the explicit teaching of the structure of various written and spoken text types. Text here refers to the structured sequences of language used in specific contexts in specific ways, while genre pertains to the body of texts used by members of a discourse community (Richards & Rogers, 2014). It must, however, be noted here that YouGlish supplies learners with spoken texts-profuse videos from diversified contexts ranging from academic to nonacademic. From the perspective of genre approach or text-based instruction, learners are expected to discover target phonetic and lexical structures and develop proficiency through repeated exposure to real-life spoken texts. In contrary to its provision by teachers through whole texts as they develop language skills for meaningful communication, guided practice (Richards & Rogers, 2014) is ensured through the videos accessed via this platform.
Lexical approach (Lewis, 1993), which derives from the belief that the building blocks of language learning and communication are lexis-more specifically multi-word combinations-rather than grammar, functions, or notions (Richards & Rogers, 2014) might be considered yet another theoretical principle underlying YouGlish. Multiword prefabricated chunks can be utilized as search items on YouGlish and prolific videos about the selected items across a wide variety of genres can be accessed by learners. Through frequent and regular practice, multiword chunks might be encountered on several occasions across varying contexts thus learners might be provided with copious examples of the usages of these lexical items.

CLASSROOM IMPLEMENTATIONS
The effectiveness of technology for teaching pronunciation has been proven in many studies (Martinsen et al., 2017;McCrocklin, 2016;Pennington & Rogerson-Revell, 2019;Tsai, 2019). However, language teachers are daunted by the quantity and quality of such technologies. For this, Yoshida (2018) provides a number of criteria for the selection of technological tools, which are (1) appropriateness to learning objectives, (2) quality and accuracy, (3) practicality of use, and (4) cost.
If we are to make a quick evaluation of YouGlish based on this set of criteria, it can be expressed that YouGlish might be considered appropriate for the teaching of L2 segmental pronunciation (of English in particular) for it provides increased learning opportunities for learners to be exposed to a large number of speech samples with specific concentration on a lexical item. It can be expected of autonomous learners who are extensively exposed to the production of pronunciation samples to familiarize themselves with their accurate articulation thus improve their comprehension and production. Learners who have not yet achieved autonomy might need the help of 'more knowledgeable-others' (who, in this case, are language teachers) to assist and guide them through the learning process or until the autonomy is achieved. With regard to the second criterion for the selection of technological tools, it might be asserted that YouGlish may not completely fulfill the requirements in terms of quality and accuracy. This is because there is no filter on the website that discriminates between native-nonnative speakers (McCarthy, 2018) or identifies instances of mispronunciation. This is when language teachers get involved in the teaching and learning process and help learners differentiate between right and wrong. This might, however, bring up another issue -the qualifications of teachers (e.g., how qualified or competent are teachers in English pronunciation to be able to help learners?). However, this discussion will be concluded here for the purposes of this research. It might be argued that YouGlish satisfies the third criterion -practicality of use -in that it is easy to learn and use this corpus, and it provides original materials for its users. In regard to the last criterion, YouGlish is already available to everyone, which means it is cost-free. It, however, requires stable internet connection and a smart device to access the website, which should not be a concern in the 21st-century educational settings.
Upon deciding on the selection of YouGlish for the teaching and learning of L2 segmental pronunciation, the next step would be to specify its applications in language education. Prior to the elaboration on the utilization of this website for language teaching, it must be reminded here that pronunciation is intricately related to four main language skills: listening, speaking, reading and writing. Pronunciation can be considered a part of speaking since a lack of correct pronunciation might lead to misunderstandings or communication breakdowns (Pennington & Rogerson-Revell, 2019;Uchida & Sugimoto, 2020). It is safe to say that speaking is also intricately connected to listening as spoken activities, by nature, necessitate the message to be articulated intelligibly and perceived accurately for them to be completed favorably (Sicola & Darcy, 2015). Since mispronunciation of words, phrases or lexical items might cause misconceptions or communicative failures (Veivo & Mutta, 2022), it can be held that pronunciation is also linked to vocabulary. Despite the fact that vocabulary teaching is often associated with definitions, parts of speech, etc., it should be borne in mind that pronunciation (e.g., vowels, consonants, word stress, and word endings) is another aspect of vocabulary teaching (Sardegna & Jarosz, 2022). The complex relationship between orthography and pronunciation can both promote and inhibit target-like production while reading aloud (Sicola & Darcy, 2015, p. 481), the knowledge of phonetic alphabet will thus help learners achieve the articulation of words in reading texts to the degree that they are familiar with them. With regard to the relationship between pronunciation and writing, it can be stated that the mechanics of writing include the teaching of L2 symbols such as letters and considering the difference between L1 and L2, a number of activities in relation to spelling, word construction, and sentence level writing might be utilized by teachers by considering the symbol-sound correspondences. Given all these, pronunciation cannot be divorced from four main language skills; therefore, the applications of YouGlish in language education may have to cover these skills rather than pronunciation alone. However, the primary focus will be on pronunciation with references to four main language skills. YouGlish might primarily be benefited for the teaching of L2 pronunciation. By searching the target words or phrases, learners or teachers can access the pronunciation of the selected item by various real speakers in authentic and realworld contexts. Highlighting the query item, YouGlish provides its users with opportunities to focus on the individual words/phrases-more specifically concentrating on both the segmental features of pronunciation, such as the pronunciation of the word-initial, word-medial, and wordfinal positions of phonemes, syllables, and phonetic transcription, and suprasegmental features, such as the place of stress in the selected lexical items. It also provides users with some strategies to enhance their pronunciation through repeated exposure to the search item until they gain mastery over it. YouGlish further enables its users to keep track of their practice through the creation of an account whereby they can add study words in their list.
The provision of phonetic transcriptions of the query items by YouGlish might be associated with writing skill -especially the orthographic aspect of pronunciation. It must, however, be noted that only autonomous leaners or learners with certain knowledge of phonetic transcription can utilize this feature. Still this website cannot be said to promote phonetic transcription on word level since there are no further explanations in regard to the sound-letter correspondences or any other related matters, which users may inquire about. It might thus be argued that this feature of pronunciation (phonetic transcription) is underrepresented on the website. This can be regarded an area to be improved or it can be ignored considering the true essence of YouGlish, that is, concentration on the pronunciation of individual words/phrases. YouGlish might be claimed to serve as a phonetic concordancer for it presents copious number of videos related to the query item. With these features, users might get themselves familiarized with the authentic usages of search items across various settings.
Another benefit of YouGlish relates to the learning of vocabulary through meaningful chunks embedded in various real-life contexts from academic to non-academic. This feature of YouGlish concurs with the findings of many previous studies on the use of technology in enhancing vocabulary knowledge (Huang, 2015). YouGlish is a web-based corpus housing millions of videos, which makes it a practical tool for teaching vocabulary through corpus. In their meta-analysis, Boulton and Cobb (2017) revealed corpus-based or DDL results in better learning outcomes than traditional approaches. Users might also benefit from YouGlish in terms of collocations that play a significant role in both receptive and productive language processing (Farshi & Tavakoli, 2021) and assist fluency in both spoken and written language (McGuire, 2009). It was also reported in a study that corpora might be utilized to teach collocational competence (Masrai, 2022). All these findings support the utilization of YouGlish as a vast corpus for the teaching and learning of lexical items.
YouGlish might also contribute to the preparation of visual pronunciation dictionaries by lexicographers and the provision of manifold pronunciation variants of English to language learners. The pronunciation variants of words are generally identified through pronunciation preference polls (Oladipupo & Akinola, 2022). This method has its drawbacks and may be replaced by YouGlish by exposing language users to the most frequent pronunciation variants in three mainstream English accents with its extensive database (McCarthy, 2018). It might also be utilized as an immense source for pronunciation lexicographers with its large-scale archive incorporating authentic lexical usage across manifold contexts. Considering the positive effect of visuals on vocabulary retention (Teng, 2023), and the conceivable contribution to the facilitation of production of pronunciation samples (Chen, 2022) this feature of YouGlish might be regarded significant.
More recently, researchers have studied the use of corpora to teach genre (Cotos et al., 2017) and YouGlish could be employed in this regard. With its large-scale archive, real-life videos across varied genres including music, sports, gaming, news, science, politics, etc., genre-based corpus of spoken texts (videos) can be accumulated for future use. Teachers might profit from this corpus for they include massive amounts of topic-based input across diversified contexts. Additionally, English for specific purposes (ESP) vocabulary might be taught through specialized vocabulary lists of videos formed earlier. This feature of YouGlish -the use of corpora in teaching ESP vocabulary -overlaps with previous findings in the literature as well (Bancroft-Billings, 2020;Otto, 2021).
Last but not least, aural comprehension is another area that could be enhanced through YouGlish. Subject-specific or general listening comprehension exercises can be created by language teachers and implemented in classrooms. YouGlish might also be extra helpful for learners in terms of aural comprehension since it supplies them with captions placed under videos. The use of multimedia for the teaching of listening and vocabulary is advocated for contextualized visuals aid the understanding of input. Previous studies in the literature endorse the integration of captioned videos for its positive effect on aural comprehension and vocabulary learning Hsieh, 2020;Teng, 2023).

PEDAGOGICAL CONSIDERATIONS
As with many technological aids for language education, YouGlish is not without some drawbacks, which are categorized into two as 1. those stemming from the website itself and 2. those related to its implementation in foreign language pedagogy.
The first concern for YouGlish could be the malfunctioning of accents section; that is, videos may appear in jumbled order (McCarthy, 2018). This might solely affect those who are involved in activities through YouGlish for the purposes of specifically studying English accents. Also, the provision of only three accents of English on the website could be another shortcoming for English has numerous regional accents dialects (Schneider, 2020). In tandem with these weaknesses, it can be held that the absence of a mechanism for the discrimination of native-nonnative accents is another issue to be considered for YouGlish for users are not provided with an option to discriminate between native speakers and nonnative speakers.
With regard to the considerations for YouGlish in its implementation in language education, it might be asserted that this technological tool is appropriate for relatively advanced or autonomous learners in case of its application as a tool for self-paced learning. In other words, learners with lower-level proficiency might need the assistance of a 'more knowledgeable other' (Vygotsky & Cole, 1978). From a sociocultural perspective of learning, lower-level learners using YouGlish may need mediation as they take on more responsibility for their learning over time (Richards, 2015). Language teachers might accordingly get involved in this process of mediation. For relatively advanced or autonomous learners, it can be maintained that this scaffolding is supplied by the authentic and natural speech patterns accessed through YouGlish.
The fact that YouGlish concentrates primarily on segmental pronunciation of English is another shortcoming of the website. Two issues arise from this statement. First, it remains obscure as to how well segmental pronunciation is surmounted via YouGlish. Second, only word stress is handled as the suprasegmental pronunciation feature on the website. Considering the significance of both segmental and suprasegmental features of pronunciation in terms of effective communication (Pennington, 2021;Suzukida & Saito, 2022;Yenkimaleki & van Heuven, 2021), YouGlish might be claimed to require reconsideration and improvement in its content, such as the addition of new features and the teaching of both pronunciation features.
Despite the need for amelioration, YouGlish is a fairly recent website thus more research is needed to make a critical evaluation of it. Thus far, the studies conducted on YouGlish have focused on its effectiveness on EFL learner's speaking competence (Fu & Yang, 2019), its use for the learning and retention of commonly mispronounced English words (Kartal & Korucu-Kis, 2020), its review (Karatay, 2017), its utilization as a tool for pronunciation dictionary lexicography (McCarthy, 2018), and description and evaluation of the site's applications for English language learning and teaching (Miller, 2019). This leaves us with the vagueness of further discussions and evaluations of YouGlish. Future research might therefore focus on its applications in English language learning and teaching across a variety of educational settings so that thorough evaluations could be made about YouGlish.

CONCLUSIONS
To conclude, this review delved into the exploitation of YouGlish as a powerful tool for the teaching and learning of L2 pronunciation by (1) outlining theoretical rationales behind this website, (2) explicating this technological tool, (3) portraying its usages in language education, and (4) enumerating the concerns for its application.
Descriptive in nature, this paper might also add to our understanding of how YouGlish can be utilized for educational purposes. It must, however, be emphasized that further research is urged to gain valuable insights into its implementation. It can thus be suggested that future research might focus on its extensive use in teaching both segmental and suprasegmental features of English pronunciation, retention and learning of vocabulary in various contexts, and its applications as a phonetic concordancer.
Funding: No external funding is received for this article.
Ethics declaration: Author declared that the review does not require ethical approval since it does not include human participation and the information is freely available in the public domain (https://youglish.com/).

Declaration of interest:
The author declares no competing interests.
Availability of data and materials: All data generated or analyzed during this study are available for sharing when appropriate request is directed to author. Yenkimaleki, M., & van Heuven, V. J. (2021)