Voice Emotion Productions by Children and Adults with Cochlear Implants
Boys Town National Research Hospital
University of Nebraska-Lincoln
Jenni L. Sis, Auditory Prostheses & Perception Laboratory, Boys Town National Research Hospital & Department of Special Education and Communication Disorders, University of Nebraska-Lincoln.
This research was supported by NIH R01 DC014233 and the Human Subjects Recruitment Core of NIH P30 DC004662.
Correspondence concerning this article should be addressed to Jenni Sis, University of Nebraska-Lincoln, Lincoln, NE. Contact: [email protected]
Purpose: Despite the success of cochlear implant technology, one limitation is the insufficient transmission of pitch information. The recognition of voice emotion relies, at least in part, on the ability of the individual to extract voice pitch cues. Previous studies have shown deficits in voice emotion recognition by children with cochlear implants. However, little is known about the productions of emotions. This study investigated how identifiable children with cochlear implants’ productions of happy and sad emotions are to children with normal hearing, acoustically compared them to the happy and sad productions of children with normal hearing, and compared results to post-lingually deaf adults with cochlear implants.
Materials ; Methods: Children with cochlear implants (implanted before age 2) and children with normal hearing each recorded emotion-neutral sentences in both a happy and sad way. A group of children with normal hearing heard each production and indicated the intended emotion in a single-interval, two-alternative, forced-choice procedure (response choices: happy, sad) and a different group of listeners participated in a single-interval, five-alternative, forced-choice procedure (response choices: happy, sad, neutral, angry, scared). A group of normal hearing adults listened to the adults with cochlear implants’ and adults with normal hearing productions and indicated intended emotions in a two-alternative, forced-choice produce. Acoustical analyses of the child productions were used to analyze variables including means, the variance of fundamental frequency (F0), and intensity.
Results: Significant deficits were shown in how recognizable children with cochlear implants’ happy/sad productions were relative to the children with normal hearing productions in both the two-alternative and the five-alternative tasks. While the children with cochlear implants showed deficits in their production of emotional prosody, the sentences they produced were highly intelligible. Acoustical analyses of the productions showed smaller contrasts between children with cochlear implants’ happy/sad productions compared to the children with normal hearing happy/sad productions.
Conclusions: These results showed that while present-day cochlear implants and clinical habilitation protocols allow pediatric cochlear implant users to achieve excellent phonetic speech production, significant deficits remain in their production of speech prosody. Children with cochlear implants may benefit from early intervention services with a focus on prosodic aspects of speech.
Keywords: cochlear implants, emotion production, perception, prosody, emotion recognition
Voice Emotion Productions by Children and Adults with Cochlear Implants
In a child’s developmental years, communication and peer-to-peer interactions are essential for social development. To engage in social interactions, a child must understand the message and the speaker’s intent, mood, and emotion then respond appropriately. Speech integrates both linguistic and indexical information. The linguistic information contains the content or “what is being said” and indexical information carries the emotion, dialect, or gender (Levi & Pisoni, 2007; Munsen et al., 2006). Although CIs are successful in the recognition of speech or “what is being said”, studies have shown children born with hearing loss and children implanted with CIs have deficits in the ability to recognize emotion or “how it is said” (Murray & Arnott, 1993; Chatterjee et al., 2015). Understanding emotion is important in building relationships, helps with self-awareness, and can impact social cognition and proper social development (Cutting & Dunn, 1999; Eisenberg et al., 2010). As early as 7 months (Grossmann, 2010), a child can discern different facial cues of people around them and first begin expressing emotion by using imitation (Eisenberg et al., 2010).
As the brain develops, children learn to understand emotion through experiences and use this knowledge to engage in social interactions with others. Knowledge of emotion appears as young as the age of 5 years and continues to improve with age (Sauter et al., 2013). Hearing, along with vision, is an important cue in recognition of vocal emotion. Emotion understanding is a multisensory process that integrates auditory plus visual cues and begins in early development (Stein & Rowland, 2011; Fengler et al., 2017). However, when visual cues are not available such as in telephone conversations, dark lit environments, or obstacles obscuring the face, the listener must rely on auditory cues alone. Children born with hearing loss have decreased access to auditory cues that help convey emotions during the critical time for emotional development.
Children born deaf or with a severe hearing impairment are without acoustic hearing and have poor access to auditory cues that aid in the development of speech. When sensory amplification, such as a hearing aid, is ineffective, a cochlear implant (CI) can give children with severe to profound hearing loss sound awareness and the ability to communicate through spoken language (Kirk et al., 2002). Over 26,000 CIs are implanted in children, as young as 12 months (FDA), in the US (NIH, 2013). Unlike their normal hearing peers that learn through acoustic signals, CI users must learn language through a degraded signal. Bypassing the cochlear hair cells, electrodes surgically implanted in the cochlea deliver electrical impulses directly to the auditory nerve. The brain of a child with a cochlear implant integrates information received and develops language through this electrical signal. Children, once born without the ability to hear, are now able to recognize speech and communicate through spoken language. The success in language development through an electrical degraded input could be due to the plasticity of a young brain. Sharma et al. (2002) reported a greater sensitivity period for plasticity at a younger age and continued to decrease as the child ages. Studies have shown pre-lingually deaf children implanted at an early age have a reduced risk of experiencing auditory deprivation that decreases the risk of developing severe delays in language (Bracken & Cato, 1986) and verbal intelligence (Bracken & Cato, 1986).
Although CIs are successful in the recognition of speech, one important technology limitation is the insufficient transmission of voice pitch information which impacts emotion perception and production. Voice emotion is conveyed auditorally through suprasegmental aspects or features of speech which include pitch, rate, and intensity. These features overlay phonetic speech to provide additional information such as the mood and the intended emotion of the speaker (Prosodic Features) or the “how it is said”. Voice emotion recognition depends, at least in part, on one’s ability to extract voice pitch cues and distinguish dynamic pitch changes. Murray and Arnott (1993) reported that the pitch envelope is the most important cue used in discerning emotions (anger, happiness, fear, sadness, disgust). Pitch changes occur as the fundamental frequency (i.e., vocal fold vibration) varies during speech and is one of the most reliable cues for intonation (Peng et al., 2012). The normal hearing cochlea contains narrow auditory nerve filters and specific tonotopic organization in the cochlea up through the auditory cortex allowing perception of fine differences in pitch (Leaver ; Rauschecker, 2016; Von Bekesy, 1949). CI technology allows CI users to obtain some pitch information from the limited electrode channels in the corresponding area of the cochlea of that portion of the speech. However, the wide spread of electrical current away from electrodes (Shannon, 1983) on the remaining auditory neurons cause poor pitch sensitivity. The fine structure of the signal is absent leaving only the temporal envelop which does not contribute to fine pitch perception and sensitivity (Shannon 1983, Chatterjee ; Peng, 2008) making identification of emotion difficult.
Due to the poor transmission of pitch in CIs, the inability to identify harmonic pitch (Zeng, 2002) may cause deficits in the ability to recognize and produce speech prosody (Deroche et al., 2016; Chatterjee ; Peng, 2008;), recognize question/statements (Peng et al., 2008), lexical tones (Barry et al., 2002; Peng et al., 2004; Peng et al., 2017), and difficulty with musical perception/pitch (Kong et al., 2004).
CIs’ insufficient pitch transmission impacts the ability to recognize dynamic pitch changes. Deroche et al. (2016) demonstrated that children with CIs had deficits in the ability to identify dynamic changes in pitch compared to their NH peers. Some languages depend highly on lexical tone information to differentiate words semantically, such as Mandarin, i.e. changing the fundamental frequency changes the lexical meaning. However, Peng et al. (2004) reported CCIs had difficulty producing and identifying different lexical tones which could be due to the poor coding of voice pitch information. The loss of spectro-temporal detail resulted in poor transmission of dynamic pitch changes to identify the differences between questions and statements. Peng et al. (2008) reported question/statement activities being more difficult in CI users due to the poor identification of dynamic changes of the fundamental frequency at the end of the sentence. This difficulty may reflect in the school years and impact academic performance.
Previous studies have shown deficits in voice emotion recognition by CCI compared to normal hearing peers. A previous study in the Auditory Prosthesis & Perception Lab (Chatterjee et al., 2015), investigated the voice emotion recognition of CCI and adults with cochlear implants (ACI) and compared them to children and adults with normal hearing. CCIs and ACIs had poorer performance compared to normal hearing children and adults when listening to productions from a female and male talker with five different emotions (neutral, angry, happy, sad, scared). Luo et al. (2007) found the same deficiency in emotion recognition performance in CCIs compared to CNH when listening to emotions (angry, happy, anxious, sad, neutral). This study also suggested that CI users rely more on intensity and rate cues to discern voice emotions due to the limited access of salient pitch cues. Due to the poor pitch input from the device, there may be a heavy reliance on visual cues compared to auditory cues in emotion recognition (Fengler et al., 2017; Ludlow et al., 2010). However, deficits were reported even in facial emotion recognition of CCIs. Wiefferink et al. (2012) reported that even with auditory stimulation, CCIs have a reduced emotion understanding and recognition possibly due to the lack of auditory stimulation at an early age.
A few studies have reported on the productions of emotion by CCIs. Studies have shown deficits in CCIs’ imitative productions of emotion (Nakata et al., 2012; Wang et al., 2013; Mildner ; Koska, 2014). In Nakata et al. (2012), the CCIs heard the model’s utterance then asked to imitate as closely as possible to the model’s emotions. The students were asked to rate on a 10 point scale how closely the utterances matched to the emotions of the model’s utterance. CCIs’ imitation of voice emotion scored significantly less than their NH peers when rated by NH adult students. In Wang et al. (2013) study, CCIs and CNH were asked to imitate as closely as possible a modeled child’s utterances in a “happy” and “sad” emotion. Adult listeners heard and rated the closeness of the CCI production to the model utterance. When the imitative sentences were low-filtered to eliminate any phonetic cues, CCI emotions remained harder to identify than their NH peers. Compared to the CNH imitative productions, CCI imitations scored lower than CNH productions and CCIs were less accurate in reproducing prosodic cues of emotion.
Due to the assumption that perception leads to production (Edwards, 1974; Purvis et al., 2001), the difficulty in perceiving vocal emotion due to lack of dynamic pitch cues in CI users could suggest difficulties in their production of vocal emotion. However, there is lack of evidence showing a relationship between emotion perception and production in this area of research. Peng et al. (2004) reported no correlation between lexical tone perception and production, however, Peng et al. (2008) showed a significant correlation between question/statement productions and perceptions. Both studies take into account the ability to hear and use voice prosody.
In cochlear implant users, the ability to correctly identify emotions is positively correlated with a self-reported higher quality of life (Schorr et al., 2009) and improved social development (Eisenberg et al., 2010). The perception of emotion in normal hearing children and children with cochlear implants’ (CI) plays an important role in social and developmental aspects of life (Eisenberg et al., 2010). Socially, CIs are shown to help in social knowledge (Kusche, Garfield, ; Greenberg, 1983) and social adjustment (Vernon ; Greenberg, 1999). Due to the importance of emotion understanding in young children and based on previous studies on emotion recognition of CCIs and limited studies on imitative productions of emotions, we wanted to investigate the productions of children with CCIs.
The purpose of this study was to identify how well the emotions (happy/sad) produced by CCIs were recognized by their normal hearing peers since emotion understanding is necessary for the proper development of social relationships. Speech intelligibility measurements were included to determine that the sentences produced by the CCIs were intelligible and the emotion production was our measurement. Since CCIs have limited access to pitch cues through their device, the study acoustically analyzed and compared the CCIs’ productions of happy/sad emotions to the productions of CNH. Adults with CIs (ACIs) learn language acoustically and lost their hearing at an older age, unlike their CCIs counterparts, their development of emotion was learned through acoustic cues. The study compared the two CI groups’ (CCIs’ and ACIs’) emotion productions.
We hypothesized that 1) CCIs’ productions were less identifiable to normal hearing peers compared to CNHs’ productions 2) ACIs’ productions were just as identifiable as ACIs’ productions 3) the intelligibility of the CCIs’ speech would not play a role in identifying their intended emotions 4) acoustic analyses would confirm smaller contrasts between emotions in the CCIs’ emotion productions compared to CNHs’ emotions.
This study focused on the auditory aspects of emotions “happy” and “sad” because of their highly contrastive suprasegmental properties and the ease in which a young child could identify. Dynamic changes in pitch were of interest during acoustic analyses in this study due to the limited access to salient pitch cues in CI technology. Dynamic changes in pitch vary in the emotions “happy” and “sad” such that “sadness” has a lower and narrower pitch range with fewer pitch fluctuations while “happiness” has a higher and wider pitch range with more pitch fluctuations (Murray & Arnott, 1993).
Group 1 included seven children talkers with CIs (2 male, 5 female, age range: 7.0-18.14, mean age: 11.74, median age: 11.89, s.d.: 4.27, mean duration of CI use: 10.34) used in lab’s previous data collection (Damm ; Chatterjee, 2016), group 2 included six children talkers with CIs (2 male, 4 female, age range: 7.9 – 18.49, mean age: 13.50, median age: 14.025, s.d.: 4.07, mean duration of CI use 13.02 years), and group 3 included nine children talkers with NH (4 male, 5 female, age range: 6.56-18.1, mean age: 12.5, median age: 12.86, s.d.: 4.37) (Damm ; Chatterjee, 2016). The groups recorded the material used in this study. The CCIs were pre-lingually deaf and implanted before the age of two years due to the significant amount of language acquisition and growth when implanted at an earlier age compared to later implanted children (Tomblin et al., 2005; Tye-Murray et al., 1995)(Figure 1 – 2).
Adult speakers with NH included nine normal hearing adults (ANH) (3 male, 6 female, age range: 21 -45 years) and ten post-lingually deaf adults (4 male, 6 female, age range: 27 – 75 years) implanted with CIs (ACI) recorded the same production stimuli (Table 3).
All productions were recorded in the Auditory Prosthesis and Perception laboratory (APPL) at Boys Town National Research Hospital. All information was obtained in a written subjective case history form from the talkers or the talkers’ parents prior to participating.
Table 1: CCI Group 1:
Child CI Participant Age of Testing (years) Age of Implantation Duration of CI Use Gender Manufacturer/Device Pre/Postlingual Deafness
CICH02 18.14 2 16.14 Male Cochlear Nucleus PrelingualCICH03 11.89 1.4 10.49 Female Advanced Bionics PrelingualCICH13 7.72 0.83 6.89 Female Advanced Bionics PrelingualCICH18 17.2 1.7 15.5 Female Advanced Bionics PrelingualCICH19 7 0.9 6.1 Female Advanced Bionics PrelingualCICH20 7.6 1.1 6.5 Male Advanced Bionics PrelingualCICH22 12.62 1.86 10.76 Female Advanced Bionics PrelingualTable 2: CCI Group 2
Child CI Participant Age of Testing (years) Age of Implantation Duration of CI Use Gender Manufacturer/Device Pre/Postlingual Deafness
CICH35 12.73 1 11.73 Male Advanced bionics PrelingualCICH36 16.27 1.5 14.77 Female Med-El PrelingualCICH37 18.49 1.5 16.99 Female Advanced Bionics PrelingualCICH38 7.9 1.25 6.65 Female Cochlear Nucleus PrelingualCICH39 16.61 1.17 15.44 Female Advanced Bionics PrelingualCICH40 14.025 1.5 12.53 Male Advanced Bionics PrelingualTable 3: Adult CI
Adult CI Participant Age of Testing (years) Age of Implantation (years) Duration of CI Use Gender Manufacturer/Device Cause of hearing loss
C01 37 31 6 Female Advanced Bionics Meningitis
C03 67 55 12 Male Advanced Bionics Noise exposure
C05 68 63 5 Female Advanced Bionics Meningioma
C06 75 55 20 Female Advanced Bionics Unknown
C07 68 67 1 Female Advanced Bionics Family hxN5 53 50 3 Female Cochlear Sudden HL
N6 51 44 7 Male Cochlear Noise exposure/mumps
N7 57 51 6 Female *** Unknown
N15 61 *** 61 Male Cochlear Unknown
N16 27 25 2 Male Cochlear Unknown
*** Information not provided in the case history.
The CCIs, CNH, ACIs, and ANH talkers read 20 simple emotion-neutral sentences (e.g., “This is it,” “She is back,” “It’s my turn”) in a happy way and the same 20 sentences in a sad way for a total of 40 sentences. Each talker said the 20 sentences with a happy emotion three times, followed by saying the sentences with a sad emotion three times. Emotion-neutral sentences were chosen to avoid any bias during emotion production and during the task. No training, modeling, or feedback was given to the talker.
All of the productions were recorded in a sound booth using microphone AKG C 2000 B. The talkers were sitting in a chair 12 inches from the microphone, aligned with the subject’s mouth. Adobe Audition software (sample rate: 44100, mono-channel, 16-bit) was used to record the productions at a recording level of -12 dB. Using PRAAT software, the 20 happy sentences and 20 sad sentences were trimmed and saved into individual files for the subjects’ productions for a total of 40 stimuli. The second production of each recorded sentence was used unless the production included non-speech artifacts, in which case the third or first production was used. The productions were put through a 75 Hz high pass filter in Adobe Audition.
Listeners were seated in a sound-treated booth directly in front of a sound field speaker and computer screen. Using the software Emognition 2.0.3 (BTNRH, 2014), calibration was performed before each set of speaker’s sentences were played. The 1 kHz calibration tone was based on the average root mean square intensity of the speaker’s production set. The tone was played from the speaker and volume was adjusted to 65 dB SPL at the participants’ left ear based on the reading of the sound meter to keep conditions consistent across participants. There was no training prior to testing and no feedback given throughout the test. Each sentence was presented twice in a randomized order. The participants indicated emotions using Emognition 2.0.3 software. The emotion word (e.g. “happy”, “sad”), along with a corresponding black and white emotion smiley face, was placed in a white box located vertically on the right side of the computer screen (happy on top, sad on bottom). Participants indicated their perceived emotion by using the computer mouse to click on the appropriate box. Reaction times, confusion matrices, and percent correct scores were recorded.
In task 1, twenty-one CNH listeners (9 male, 12 female, age range: 6.84-18.49, mean age 13.77, s.d.: 3.167, median 14.83) listened to group 1 CCI productions in a single-interval, two alternative forced choice procedure (AFC) (Damm & Chatterjee, 2016). Eleven CNH listeners (7 males, 4 females, age range 6.49-16.88) listened to group 3 CNH productions in a single-interval, 2AFC procedure (Damm & Chatterjee, 2016). A new set of 22 CNH participants (13 male, 9 female, age range: 6.88 – 18.15 years; average: 11.02; median: 10.31; sd:3.24) listened to group 2 CCI productions in a single-interval, 2AFC procedure.
In task 2, the procedure was set up similarly to task 1. In this task, 11 CNH listeners (4 male, 7 female, age range: 7.87 – 12.91, mean age: 10.17 years, s.d.: 1.76 , median: 9.90 years) listened to group 3 CNH productions and a different set of 11 CNH listeners (4 male, 7 female, age range: 7.24 – 13.97, mean age: 10.32, s.d.: 2.05, median:10.24) listened to group 1 CCI productions and indicated the intended emotion in a single-interval, 5AFC procedure (choices: happy, scared, neutral, sad, angry). Participants were unaware that the emotions were only spoken in either a happy or sad way. CNH listeners heard either group 1 CCI or CNH productions to avoid possible fatigue over a long period of testing. The first 10 listeners of the 22 CNH listeners in task 2 were counterbalanced by which set of productions the subject would listen to. For example, a subject would listen to the CNHs’ productions then the next participant would listen to the CCIs’ productions. Due to the differences in ages between the listener groups, the following CNH listeners were age-matched to have balanced listener age groups. Eleven ANH (1 male, 9 female, age range: 19.33-25.03) listened to a randomized set of both group 1 CCI and group 3 CNH productions in a single interval, 5AFC procedure.
In task 3, a group of 12 adults (5 male, 7 female, age range 20 – 31 years) listened to either the ACI or ANH productions; six(age range: 20-21 years) listened to the ACI and six (age range: 21-31 years) listened to the ANH in a single-interval 2AFC procedure.
All listener participants were screened at a level of 20 dB HL for normal hearing from 250 – 8000 Hz. Participants were recruited using Boys Town National Research Hospital volunteer database and tested at Boys Town National Research Hospital in Omaha, NE.
A group of ten normal hearing adults listened to and repeated back twenty sentences randomly chosen from the CCIs’ happy/sad production sets. The total words correct were recorded by a research assistant for each speaker and converted to total percent words correct.
Acoustic Analyses of Productions and Statistical Analysis
Acoustic analyses were conducted using Praat (Boersma, 2001) software. Statistical analyses and graphs were completed using the software R (R Core Team, 2014), using packages lme4 (Bates et al., 2015) and ggplot2 (Wickham, 2009).
Percent correct scores obtained from CNH participants listening to CCIs’ productions were significantly lower than those from CNH participants listening to CNHs’ productions (p=0.016). Figure 1 includes data previously collected data in the Auditory Prosthesis and Perception Lab (Damm & Chatterjee, 2016) and the data collected for this capstone paper. On the y-axis, the “CI Talker” group includes group 1 and 2 CCIs’ productions. The y-axis represents the percent correct scores the children listeners received when listening to the productions of the CCIs and CNH. The children listeners obtained a mean score of 83.87% (median = 87.9%, min = 59.82%, max = 96.61%) when listening to the CCIs’ productions. The children listeners obtained a mean score of 95.02% (median = 97.11%, min = 81.59%, max = 99.09%) when listening to the CNH productions.
In our lab’s previous statistical analysis (Damm & Chatterjee, 2016) using group 1 CCI productions, listener and talker effects were identified. Figures were not included in this paper. A listener age effect showed younger children had greater difficulty recognizing the talkers’ intended emotion with older child listeners having higher percent correct scores. The CCIs with an earlier age of implantation (AOI) had improved emotion production scores. Children implanted at a later date, with longer device experience, produce more recognizable emotions. Initial analyses showed no effects of talkers’ chronological age once AOI and TIS (time in sound, or experience with the device) were taken into account.
1632247-50800Emotion Recognition Score (% Correct)
Talkers aged 6-18 years
Listeners aged 6-18 years
Emotion Recognition Score (% Correct)
Talkers aged 6-18 years
Listeners aged 6-18 years
Figure 1: Scores obtained with CCIs’ productions were significantly lower than those with CNHs’ productions.
Percent correct scores with adults listening to ACIs’ and ANHs’ productions showed that both groups were highly recognizable. However, ACIs’ productions were less recognizable than ANHs’ productions (independent t-test, p=0.019). Adults listening to ACIs’ productions obtained a means score of 96.23% (median = 96.15%, min = 92.5%, max = 99.17%). The adults listening to the ANHs’ productions obtained a mean score of 98.58% (min = 96.67%, and max = 99.79%) (Figure 2).1212752-117670Emotion Recognition Scores (% Correct)
Adult Talkers, Adult Listeners
0Emotion Recognition Scores (% Correct)
Adult Talkers, Adult Listeners
Figure 2: Scores obtained with ACIs’ and ANHs’ productions were highly recognizable to adult NH listeners. ACIs’ productions were recognizable but less so than ANHs’ productions (independent t-test, p=0.019).
In Task 2, the chance level is 20% (as the listeners have five choices). Consistent with the results of Task 1, performance was better with the CNHs’ productions than with the CCIs’ productions (Figure 3). When analyzing confusion matrices, the mean proportion correct averaged across both happy and sad are indicated by the squares and circles, whisker bars show the standard error (s.e.) This was statistically verified for the CNH listener data (independent samples t-test, p=0.01) and for the ANH listener data (LME analysis, F((158, 1)=84.17, p<0.001). The children who listened to the CCIs’ productions obtained a mean score of 40.71% (standard error (s.e.) = .062). The children who listened to the CNHs’ productions obtained a mean score of 59.34% (s.e. = .041). The adult listeners who listened to the CCIs’ productions obtained a mean score of 40.54% (s.e. = .070) and the adults who listened to the CNHs’ productions obtained a mean score of 70.4% (s.e. = .051).
Figure 3: In Task 2, the chance level is 20% (as the listeners have five choices). Consistent with the results of Task 1, performance was better with the CNHs’ productions than with the CCIs’ productions in a 2AFC task.
Total Number (out of 20)
Adult NH Listeners
Child NH Listeners
Total Number (out of 20)
Adult NH Listeners
Child NH Listeners
Figure 4: Confusion Matrices: The mean and s.e. of the total number of happy (red) and sad (blue) sentences produced by group 1 CCIs assigned by the listeners to each of the five emotion choices are shown.
Figure 4 shows results from the confusion matrices. The red circles indicate how many times “happy” was identified as each emotion listed on the x-axis. The blue triangles indicate how many times “sad” was identified as each emotion. The total number of sentences in each of the two categories was 20. Ideally, there should be should peaks at “Happy” and “Sad”, with very few confusions. This is the pattern observed in the bottom plots with CNHs’ productions (Table 4 and 5). In the CNH productions “happy” was labeled as “happy” approximately 75% of the time and “happy” labeled as “neutral” only 15% of the time. Listeners scored lower when identifying “sad” emotions, however, sad productions were still recognizable. Happy and sad emotions were significantly different from neutral (happy: t-test, p = .000031; sad: t-test, p = .000174). “Sad” was labeled as “sad” 65%, followed by “neutral” at 14.7%, “scared” 7.9%, “angry” 7.3%, and “happy” 4.14%. On the other hand, the CCIs’ productions show a more diffuse picture, with smaller peaks at “Happy” and “Sad” and more confusions seen in the top two panels. Preliminary analyses (t-tests) showed no significant differences between the proportion of “happy” sentences assigned to “happy” and “neutral” categories by child listeners, while “sad” sentences fared somewhat better (a marginally significant difference between “sad” and “neutral”, p=0.058). Similarly, NH adults confused “happy” and “sad” productions by CCIs most frequently with the “neutral” emotion (no significant difference).
Happy Scared Neutral Sad Angry
Happy 75.08333 3.75000 15.30556 0.38889 5.47222
Scared 0 0 0 0 0
Neutral 0 0 0 0 0
Sad 4.13889 7.91667 14.72222 65.91667 7.30556
Angry 0 0 0 0 0
Table 4: Confusion matrices for NH listeners listening to CNH productions show the mean percent correct (%). The emotions in the second column indicate the intended spoken emotion. The emotions on the top row indicate the listeners’ response. “Happy” was identified as “happy” most of the time, followed by “neutral”. “Sad” was identified as “sad” most of the time, followed by “neutral”.
Happy Scared Neutral Sad Angry
Happy 37.75 7.214286 37.60714 3.821429 13.60714
Scared 0 0 0 0 0
Neutral 0 0 0 0 0
Sad 7.25 11.32143 36.07143 43.32143 2.035714
Angry 0 0 0 0 0
Table 5: Confusion matrices for NH listeners listening to CCIs’ productions show the mean percent correct (%). The CCIs’ emotions were harder to identify compared to the CNHs’ productions. “Happy” was labeled both as “happy” and “neutral”. Happy and sad emotions are not significantly different from neutral.
CCI production scores showed high intelligibility. (median = 97% correct, whisker bars represent the maximum and minimum scores, 100% and 91% correct, respectively) (Figure 5).
16313157620Speech Intelligibility Score (% words recognized)
00Speech Intelligibility Score (% words recognized)
Figure 5: Intelligibility: CCI production scores showed high intelligibility when measuring the percent of words recognized by NH adult listeners.
Results from the acoustic analysis are shown in box plots in Figures 6, 6.1, and 6.2. The y-axis represents the ratio of mean F0, F0 range, and intensity differences (dB) for each talkers’ “happy” production compared to the talkers’ “sad” production across each sentence (x-axis). In each figure, green and blue indicate CNH and CCI groups, respectively. Acoustic analyses showed smaller F0 mean contrasts (t(19)=3.44, p=0.003), F0 range contrasts (t(19.31)=3.64, p=0.002) and intensity differences (t(19)=3.32, p=0.004) between “happy” and “sad” emotions in the productions of the CCI talkers than the CNH talkers. Linear mixed-effects models with the subject- and sentence-based random intercepts showed significant effects of Group (CCI, CNH) in all three measures. We note that the CNH also show larger inter-subject variability in all three measures than CCIs.
Ratio of Mean F0 Happy:Sad
Figure 6: The mean F0 of each CCI and CNH happy sentence was compared to the corresponding sad sentence and averaged across the speaker groups. The figure above shows the average ratio of F0 of happy and sad productions per sentence. Each of the CNHs’ sentences had a larger difference of mean F0 between happy and sad sentences compared to the CCIs’ sentences.
67437028003500Ratio of F0 Range (max/min F0 Happy:Sad
Figure 6.1: The F0 range of each CCI and CNH happy sentence was compared to the corresponding sad sentence and averaged across the speaker groups. The figure above shows the average ratio of F0 range of happy and sad productions per sentence. The CNHs’ sentences showed a larger difference of F0 range between happy and sad sentences compared to the CCIs’ sentences, but greater variability was seen across sentences.
44894532385000Intensity Difference in dB Happy – Sad
Figure 6.2: Intensity differences (dB) of each CCI and CNH happy sentence was compared to the corresponding sad sentence and averaged across the speaker groups. The figure above shows the average intensity difference (dB) of happy and sad productions per sentence. The CNHs’ sentences showed greater intensity differences between happy and sad sentences compared to the CCIs’ sentences. CCIs’ productions had fewer intensity differences between their happy and sad sentences.
The purpose of this study was to investigate how recognizable CCIs’ productions of sentences said in a “happy” and “sad” way were compared to their normal hearing peers. In addition, we also acoustically compared them to the productions of CNH. Although the CCI talkers’ productions were highly intelligible in terms of phoneme/word recognition, their intended emotions were significantly harder to identify than the CNH talkers’ emotions. Our first hypothesis was that CCIs’ productions would be less recognizable than their NH peers. Results from Task 1 showed a significant difference between how recognizable the emotion productions of CCIs’ were to the productions of CNH. A greater variability between the recognition scores was seen in the CCIs’ than in the recognition scores of CNHs’ productions. In Damm & Chatterjee (2016) analysis of group 1 CCI productions, the listener’s age was a significant predictor of performance, i.e. the younger the CNH listener the poorer the identification of the CCI talkers’ intended emotion compared to older CNH listeners. Other predictors included the talker’s age of implantation, time in sound, and year of implantation. It is noteworthy that even though all of group 1 CCI participants were implanted before the age of 2 there was still an effect of age of implantation.
Many studies show benefits of early implantation due to neural plasticity. Studies have shown speech and vocabulary benefit (Connor et al., 2006) and better speech intelligibility if a child is implanted between age two to five years compared to implantation at five to eight years (Tye-Murray et al., 1995). Early age of implantation benefit is suggestive of greater neural plasticity measured by cortical auditory evoked potentials. The older a child is implanted the greater the risk of auditory deprivation and potential deficits in language development. The critical period of language development is important for providing a child access to speech and language at a young age and shapes the way the child perceives and produces language through the strengthening of neural connections (Purves et al., 2001).
Data from Task 2 confirmed findings from Task 1. CCIs’ intended emotions were harder to recognize than the CNH talkers’ intended emotions. Adult and child listeners performed similarly when listening to the CCI and CNH productions. However, CCIs’ productions were less recognizable when given more options to choose from by both adult and child listeners. Analyses of the confusion matrices suggested that child and adult listeners’ mislabeled CCIs talkers’ productions as “neutral” more frequently, suggesting greater uncertainty about the intended emotion.
We hypothesized that ACIs’ productions would be as identifiable compared to ANHs’ productions since post-lingually deaf adults learn speech through acoustic cues then lose their hearing at a later date in life. Although post-lingually deaf adults’ productions were highly recognizable, in our findings they were still significantly less recognizable than the adult counterparts. Further investigation is needed to better this understanding of the significant difference between post-lingually deaf and normal hearing adults.
Our hypothesis that the CCIs’ productions would have smaller contrasts of their productions compared to the productions of CNH was confirmed. Acoustic analyses showed smaller acoustic contrasts in F0 mean, F0 range, and intensity differences between “happy” and “sad” emotions in productions of CCIs compared to CNHs’ productions. This suggests that the poor perception, possibly due to the degraded electrical signal of CI technology, influences the productions of CCIs compared to their normally hearing peers which learn speech through acoustic cues.
This study had several limitations which included: 1) a small set of CCI talkers 2) facial emotion recognition was not included which may relate to more real world scenarios 3) relied on a child expressing “happy” and “sad” emotions as best they could despite the artificial nature of task 4) the five alternative, force task procedure only included data from the group 1 productions and not all the CCI talkers (group 2) 5) the speech intelligibility test included adult listeners, however, child listeners would need to be included to determine how intelligible the productions are to children 6) acoustic analysis was only performed on the CCIs’ productions. Further acoustic analyses of the adult productions are underway.
Previous studies in this area of research focus on emotion recognition and imitation, which both of which approaches have shown deficits in CCIs’ ability to perform. Results from this study suggest that CCIs have deficits in their productions of emotions “happy” and “sad” and are harder to recognize when listened to by NH peers. Since CCIs’ learn speech and language through a degraded speech signal unlike their normal hearing peers, it is justifiable to hypothesize they may have difficulties producing speech prosody due to the lack of salient pitch cues from their device. This may show that lack of prosody perception influences the CCIs’ productions of emotions since prosody is an important cue for identifying and differentiating vocal emotions. However, emotion is identified by a combination of facial emotion and auditory information. Real world tests of emotion productions may need to study both vocal and facial emotion recognition.
These results indicate that pre-lingually deaf children with CIs have significant deficits in vocal emotion communication even though their productions were highly intelligible in terms of phoneme/word recognition. Device limitations restrict access to voice pitch information and may play a role in the deficits of the CCIs’ productions. Subjects implanted at an earlier age and with longer duration of device use produced more identifiable emotions, suggesting a role for neural plasticity. Children implanted more recently may have potentially benefited from improvements in technology and intervention services compared to children who received their implants in earlier years. Pre-lingually deaf children implanted with CIs have significant deficits in vocal emotion communication and may benefit from early intervention services with a focus on prosodic aspects of speech. Further research should focus on the relationship between emotion productions and perception to obtain a broader view of the impact in which limited pitch perception can have on the productions of speech prosody.
Bracken, B., & Cato, L. (1986). Rate of conceptual development among deaf preschool and primary children as compared to a matched group of nonhearing impaired children. Psychology in the Schools, 23, 95-99.
Chatterjee, M. (2016). Personal Communication. Boys Town National Research Hospital, Omaha, NE.
Chatterjee, M., Zion, D., Deroche, M., Burianek, B., Limb, C., Goren, A., Kulkarni, A., & Christensen, J. (2015). Voice emotion recognition by cochlear-implanted?children and their normally-hearing peers. Hearing Research, 322, 151-162.
Chatterjee, M., & Peng, S. (2008). Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition. Hearing Research, 235(1-2), 143-156.
Connor, C., Craig, H., Raudenbush, S., Heavner, K., & Zwolan, T. (2006). The age at which young deaf children receive cochlear implants and their vocabulary and speech-production growth: is there an added value for early implantation? Ear & Hearing, 27(6), 628-644.
Cutting, A. & Dunn, J. (1999). Theory of mind, emotion understanding, language, and family background: individual differences and interrelations. Child Development, 70(4), 853-865.
Deroche, M., Kulkarni, A., Christensen, J., Limb, C., & Chatterjee, M. (2016). Deficits in the sensitivity to pitch sweeps by school-aged children wearing cochlear implants. Frontiers in Neuroscience, 10(73), 1-15.
Dyck, M., & Denver, E. (2003). Can the emotion recognition ability of deaf children be enhanced? A pilot study. Journal of Deaf Studies and Deaf Education, 8(3), 348-356.
Edwards, M. (1974). Perception and production in child phonology: the testing of four hypotheses. Journal of Child Language, 1(2), 205-219.
Eisenberg, N., Spinrad, T., & Eggum, N. (2010). Emotion-related self-regulation and its relation to children’s maladjustment. Annual Review of Clinical Psychology, 6. 495-525.
Emognition. Vers. 2.0.3. Omaha, NE: Boys Town National Research Hospital, 2014. Computer software.Fengler, I., Nava, E., Villwock, A., Buchner, A., Lenarz, T., & Roder, B. (2017). Multisensory emotion perception in congenitally, early, and late deaf CI users. A Peer-Reviewed, Open Access Journal, 12(10).
Grossmann, T. (2010). The development of emotion perception in face and voice during infancy. Restorative Neurology & Neuroscience, 28(2), 219-236.
Kirk, K., Ying, E., Miyamoto, R., O’Neill, T., Lento, C., & Fears, B. (2002). Effects of age at implantation in young children. Annals of Otology, Rhinology, & Larngology, 111, 69-73.
Kong, Y., Cruz, R., Jones, J., & Zeng, F. (2004). Music perception with temporal cues in acoustic and electric hearing. Ear and Hearing, 25(2), 173-185
Kusche, C., Garfield, T., & Greenberg, M. (1983). The understanding of social attributions in deaf adolescents. Journal of Clinical Child Psychology, 12, 153- 160.
Leaver, A. & Rauschecker, J. (2016). Functional topography of human auditory cortex. The Journal of Neuroscience, 36(4), 1416-1428.
Lenden, J. & Flipsen, P. (2007). Prosody and voice characteristics of children with cochlear implants. Journal of Communication Disorders, 40, 66-81.
Levi, S. & Pisoni, D. (2007). Indexical and linguistic channels in speech perception: Some effects of voiceovers on advertising outcomes. Psycholinguistic Phenomena in Marketing Communications, 203-219.
Ludlow, A., Heaton, P., Rosset, D., Hills, P. & Deruelle, C. (2010). Emotion recognition in children with profound and severe deafness: Do they have a deficit in perceptual processing? Journal of Clinical and Experimental Neuropsychology, 32, 923-928.
Mildner, V. & Koska, T. (2014). Recognition and production of emotions in children with cochlear implants. Clinical Linguistics & Phonetics, 28(7-8), 543-554.
Munson, B., McDonald, E., DeBoe, N., & White, A. (2006). The acoustic and perceptual bases of judgements of women and men’s sexual orientation from read speech. Journal of Phonetics, 34, 202-240.
Murray, I. & Arnott, J. (1993). Towards the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion. Journal of Acoustic Society of America, 93, 1097-1198.
Nakata, T., Trehub, S., & Kanda, Y. (2012). Effect of cochlear implants on children’s perception and production of speech prosody. Acoustical Society of America, 131(2), 1307-1314.
NIH (2013). Cochlear Implants. Retrieved March 28, 2018, from https://report.nih.gov/NIHfactsheets/ViewFactSheet.aspx?csid=83
Peng, S., Chatterjee, M., Lu, N. (2012). Acoustic cue integration in speech intonation recognition with cochlear implants. Trends in Amplification, 16(2), 67-82.
Peng, S., Hui-Ping, L., Nelson, L., Yung-Song, L., Deroche, M., & Chatterjee, M. (2017). Processing of acoustic cues in lexical-tone identification by pediatric cochlear-implantrecipients. Journal of Speech, Language, and Hearing Research, 60, 1223-1235.
Peng, S., Tomblin, B., Cheung, H., Lin, Y., & Wang, L. (2004). Perception and production of mandarin tones in prelingually deaf children with cochlear implants. Ear?and Hearing, 25(3), 251-264.
Peng, S., Tomblin, B., & Turner, C. (2008). Production and perception of speech intonation in pediatric cochlear implant recipients and individuals with normal hearing. Ear and Hearing, 29(3), 336-351.
Prosodic features of speech. Retrieved on 28 March 2018 from http://www.litnotes.co.uk/prosodicspeech.htm
Purves D, Augustine GJ, Fitzpatrick D, et al., editors. Neuroscience. 2nd edition. Sunderland (MA): `=Sinauer Associates; 2001. The Development of Language: A Critical Period in Humans. Available from: https://www.ncbi.nlm.nih.gov/books/NBK11007/
Schorr, E., Roth, F., & Fox, N. (2009). Quality of life for children with cochlear implants: perceived benefits and problems and the perception of single words and?emotional sounds. Journal of Speech, Language, and Hearing Research, 52, 141-152.
Schvartz-Leyzac, K. & Chatterjee, M. (2015). Fundamental-frequency discrimination using noise-band-vocoded harmonic complexes in older listeners with?normal hearing. Acoustic Society of America, 138(3), 1687-1695.
Shannon, R., Zeng, F., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270(5234), 303-304.?
Shannon, R. (1983). Multichannel electrical stimulation of the auditory nerve in man. II. Channel interaction. Hearing Research, 12, 1-16.
Sharma, A., Dorman, M., & Spahr, A. (2002). A sensitive period for the development of the central auditory system in children with cochlear implants: implications for age of implantation. Ear & Hearing, 23(6), 532-539.
Tomblin, B., Barker, B., Spencer, L., Zhang, X., & Gantz, B. (2005). The effect of age at cochlear implant initial stimulation on expressive language growth in infants and toddlers. Journal of Speech, Language and Hearing Research, 48, 853-867.
Tye-Murray, N., Spencer, L., & Woodworth, G. (1995). Acquisition of speech by children who have prolonged cochlear implant experience. Journal of Speech and Hearing Research, 38,327-337.Vernon, M., & Greenberg, S. (1999). Violence in deaf and hard-of-hearing people: A review of the literature. Aggression and Violent Behavior, 4, 259-272.
Von Bekesy G. (1949). The vibration of the cochlear partition in anatomical preparations and in models of the inner ear. Journal of the Acoustical Society of America, 21, 233-245.
Wang, D., Trehub, S., Volkova, A., & Van Lieshout, P. (2013). Child implant users’ imitation of happy- and sad- sounding speech. Frontiers in Psychology, 4, 1-8.
Wiefferink, C., Rieffe, C., Ketelaar, L., Raeve, L. & Frijns, J. (2012). Emotion understanding in deaf children with a cochlear implant. Journal of Deaf Studies and Deaf Education, 18(2), 175-186.