Orhun Uluşahin

The influence of contextual and talker f0 information on fricative perception

Orhun Uluşahin, Hans Rutger Bosker, James M. McQueen & Antje S. Meyer2023

Abstract

Speech perception is extensively influenced by contrastive acoustic context effects [1], such as the contrastive effect of fundamental frequency (F0) on the perception of voiceless fricatives’ spectral center of gravity (CoG). That is, lower F0 contexts elicit a higher CoG perception and higher F0 contexts elicit a lower CoG perception [2]. However, it remains unknown whether knowledge of a talker’s typical F0 profile (i.e., as opposed to context) can have similar effects. This study therefore investigated whether talker-bound F0 information can cause perceptual biases in the same contrastive direction. In Experiment 1, female native Dutch listeners (N=10) categorized target words as the Dutch words sok “sock” (/sɔk/) or sjok “(I) trudge” (/ʃɔk/). The target words were created by replacing the original fricatives in a female native Dutch speaker’s natural utterances of the words sok and sjok with tokens from an 8-step fricative continuum between /s/ and /ʃ/ (modelled on the same speaker). The target words were preceded by the carrier sentence Nu komt het word… “Now comes the word…” The carrier and the vowel /ɔ/ in the target words were pitch-shifted ±4 semitones to create High-F0 and Low-F0 conditions respectively, accompanied by an unshifted Mid-F0 control condition. Across 240 randomized trials containing all F0 conditions and fricative steps, participants categorized ambiguous fricatives from the synthesized fricative continuum as being more /s/-like in the Low-F0 trials compared to the High-F0 trials. In Experiment 2, another group of participants (N = 32) listened to 20 minutes of speech from the same talker whose speech had been pitch-shifted ±4 semitones to create High-F0 and Low-F0 talker groups, respectively. After the exposure phase, participants performed a 2AFC task in which they categorized words containing fricatives from a 5-step subset (on account of ceiling effects observed in Exp. 1) of the original 8-step continuum (i.e., original steps 3-7) as sok or sjok. Crucially, in the test phase, the carrier sentence and the F0 context manipulation were removed, given the tendency of proximal context to take over talker information (e.g., [3]). Thus, participants encountered the same Mid-F0 acoustic context on each trial. Despite the lack of variability in the immediate context, participants in the Low-F0 talker group perceived the synthesized fricative continuum as being more /s/-like compared to the High-F0 talker group. This pattern persisted over a large number of trials (i.e., 160) but only became statistically robust after the first 40 trials, suggesting that participants may have needed to train themselves on the continuum. During this training period, participants from both groups overwhelmingly categorized targets as containing /ʃ/, and a global /ʃ/ bias was observed throughout the experiment, despite the subsequent divergence in response proportions. Two further experiments (N = 32 in each) were run with methodological adjustments in an attempt to minimize the interference of the biases observed in Experiment 2. Experiment 3 was run online, and Experiment 4 was run in person. In both of these experiments, the 5-step subset of the original continuum was shifted to the original continuum steps 2-6 to sound more /s/- like, four practice trials with feedback (i.e., correct/incorrect) were introduced to the 2AFC task with original steps 1 and 8 as stronger endpoints, and breaks were removed from the 2AFC task. While these methodological changes eliminated the early /ʃ/ bias, they failed to eliminate the global /ʃ/ bias. Furthermore, neither experiment replicated the results of Experiment 2 as the observed effects were assimilatory rather than contrastive (i.e., the High-F0 talker groups perceived the continuum as being more /s/-like than the Low-F0 groups), and this effect was significant in Experiment 4. Overall, the effect of the immediate acoustic context in Experiment 1 aligns with previous work while the effect of talker F0 remains unclear. Further research is required to establish the reliability of the talker effects, and whether they are contrastive or assimilatory in nature.

References

[1] C. Stilp, ‘Acoustic context effects in speech perception’, WIREs Cogn. Sci., vol. 11, no. 1,

p. e1517, 2020, doi: 10.1002/wcs.1517.

[2] O. Niebuhr, ‘On the perception of “segmental intonation”: F0 context effects on sibilant

identification in German’, EURASIP J. Audio Speech Music Process., vol. 2017, no. 1, p. 19, Aug. 2017, doi: 10.1186/s13636-017-0115-3.

[3] E. Reinisch, ‘Speaker-specific processing and local context information: The case of

speaking rate’, Appl. Psycholinguist., vol. 37, no. 6, pp. 1397–1415, Nov. 2016, doi: 10.1017/S0142716415000612.

<-- Back to posters <-- Back to research