Evaluation of a parametric pinna model for the calculation of head-related transfer functions

—Personalised head-related transfer functions (HRTFs) are crucial when it comes to binaural audio playback. There is currently great effort in the scientiﬁc community to facilitate HRTF acquisition for a wider audience. One of these approaches is to numerically calculate HRTFs from a three-dimensional mesh of a listener. In this article, we introduced a parametric pinna model (PPM) aiming to represent a listener’s pinna. In an evaluation performed for three listeners, the proposed PPM was aligned to a listener’s pinna and the resulting mesh was compared to the corresponding reference mesh by means of a geometric error. Further, personalised HRTFs were calculated, localisation experiments were simulated, and simulated localisation performance was obtained. The HRTFs calculated from the aligned PPM yielded similar localisation performance as those obtained for calculated HRTFs of the reference meshes. This indicates that the proposed PPM is able to represent a listener’s pinna and to yield perceptually valid results in terms of sound-localisation performance.


I. INTRODUCTION
Spatial hearing describes the ability of humans to assign direction and distance to an auditory event [1].The incoming sound wave is filtered by the listener's anatomy, i.e., torso, head, and pinnae.This filtering is described by the head-related transfer functions (HRTFs) [2].Because of their relatively large dimensions, torso and head mainly affect frequencies below 4 kHz; above this frequency limit, the pinnae shapes dominate the spectral effects [2]- [4], vastly varying across individual listeners.
HRTFs can be acoustically measured [2], but because the measurement setups are rather sophisticated [5], numerical calculation approaches have become an alternative [6]- [9].For the calculation of personalised HRTFs, an exact representation of the 3D geometry (further denoted as "mesh") of the listener's pinnae is required [10].Other anatomical structures such as the head and torso can be approximated by simple shapes, i.e., a sphere and an ellipsoid [11].However, capturing the exact geometry of the pinnae is not a trivial endeavour: if the acquisition method requires a sophisticated The work presented in this chapter was supported by the Austrian Research Promotion Agency (FFG, project "softpinna" 871263) and the European Union (EU, project "SONICOM" 101017743, RIA action of Horizon 2020).equipment or setup, such as laser scans [12], MRI [13], [14] or CT scans [10], the advantages provided by the numericalcalculation method disappear.Accordingly, simple and accessible methods for the geometry acquisition are required.An example for a simple but promising method is the acquisition of 2D photos and calculation of the 3D pinna geometry by means of photogrammetry [15].Unfortunately, these simple methods introduce issues concerning the accuracy and uniform resolution of the 3D representation.
The main goal of this article is to simplify the personalisation of a pinna mesh as a basis for calculating personalised HRTFs.To this end, we describe a parametric pinna model (PPM) based on the preliminary model introduced in [16].The PPM is attached to a template pinna mesh, that can be in a parametric way adapted to any target pinna mesh.With the PPM, we aim at reducing the dimensionality of the mesh consisting of thousands of points required to represent the complex geometry of the pinna.We evaluated the PPM by manually adapting the PPM to three target meshes and analysing the geometric errors and two localisation errors from simulated localisation experiments [17].

II. THE MODEL
The PPM was created in Blender 2.93 LTS (Blender Foundation, Netherlands) [18], utilising skeletal animation and morph target animation by means of "bendy bones" and "shape keys", respectively.With these tools, affine transformations can be applied to local areas and thus deform the shape of the mesh.The PPM is attached to a template mesh, which is a high-resolution average left pinna from the WiDESPREaD database [19].
Bendy bones represent an approximation of curvatures in the complex manifold of the pinna by Beziér curves, which are supported by control points at both their start and end points.Thereby, an armature is constructed that, once attached to the mesh, can deform the pinna shape in a rough way by adapting the bendy bones to the pinna structure.The influence of each bendy bone on it's proximal vertices is described by a process called "weight painting", see Fig. 1b.We also use shape keys to consider variations in the concave areas or other prominent features.While bendy bones were assigned to the contours of the pinna, the shape keys account for local changes in concave areas, defining the affine transformations of vertices.These concave areas are of special interest because they heavily influence the first two peaks (P1, P2) and the first notch (N1) of the HRTFs in the median plane [20].
Similar to [21], the total mesh deformation controlled by our PPM can be described as where x described the deformed target mesh, and x 0 the template mesh linked with the armature.The first sum describes the deformation of M bendy bones a m , weighted with the matrix V m modelling the rigid transformation of control points and scaling of bendy bones.The user can manipulate the position and orientation of the control points and size of the bendy bones, which then results in a deformation of the mesh.The second sum describes the N shape keys b n , weighted with w n describing their prominence.Table I list the bendy bones with their indices.In the model implementation, the bendy bones have the postfix "Bendy" and their control bones at their start and end have the postfixes "Start" and "End", respectively.Table II shows the shape keys of the PPM.In total, our PPM consists of 31 bones and 18 shape keys, resulting in an overall dimensionality of 151.Note that bendy bone 31 is linked to the armature as a parent bone in order to be able to perform affine transformations, i.e., translation, rotation and scaling on all vertices of the pinna equally.This is a necessary link enabling rough modification of the global position, orientation, and size of a pinna without any modification of its internal structure.The shape keys enable modification of local areas, especially the concave ones.When pushed to it's boundary, a shape key describes a maximum deviation from the template.For example, one of the shape keys controls the depth of the cavum conchae avoiding any overlap with the adjacent geometry such as the antihelix or the backside of the pinna.
Figure 1a shows the PPM with the unmodified mesh template.Panels b to d of Fig. 1 show the active region of the shape key for an exemplary selected bendy bone, i.e., its weight painting, the effect of the modification caused by the rotation of an exemplary bendy bone, and the effect the modification of an exemplary shape key, respectively.Figure 2 shows an example of the PPM aligned to a target mesh.Notice the different positions and orientations of the armature components as compared to Fig. 1a.

III. EVALUATION
For the evaluation of the PPM, we manually aligned the PPM to target pinnae of three listeners (NH5, NH130, and NH131 corresponding to the IDs from the ARI database [22], tested in [23]).For the target meshes, we used three highresolution left pinna meshes acquired via CT scans of a mold.Each mold was created by sealing the ear canals with ear protection and enclosing the pinna in a silicone material with a low viscosity.Then, the mold was scanned with a high-energy CT scanner and the layers of the images were processed to obtain a 3D mesh.For more details on this mesh acquisition process, see [10].The target meshes also served as ground truth to compare with the aligned PPM.The adapted meshes were evaluated in two domains: in the geometric and in the psychoacoustic domain.
In the geometric domain, we used the Hausdorff distance metric [24] implemented in Meshlab v2020.06(Istituto di Scienza e Tecnologie dell'Informazione & National Research Council, Italy) [25] to describe the similarity between the ground truth target mesh and the PPM-deformed template mesh.The Hausdorff distance as a scalar value describes the largest smallest distance d between point a from mesh A and point b from mesh B: ( It can also be stated as a vector, i.e., assigning each vertex of one mesh the smallest distance value to a vertex of the other mesh.For the visualisation of the Hausdorff distance, it was obtained as a scalar for each vertex of the mesh provided by  the PPM.For further analyses, we calculated statistics of the distances calculated for the PPM mesh, see Tab.III. In the psychoacoustic domain, we calculated HRTFs using Mesh2HRTF v0.5.0 (Acoustics Research Institute, Austrian Academy of Sciences, Vienna & Technical University, Berlin) [26].Then, a localisation experiment was simulated using the sagittal-plane localisation model [17] implemented in the auditory modeling toolbox (AMT) [27].The localisation experiment was simulated for sound sources located in the median plane, i.e., in the model, we used a lateral offset of 0. In the simulations, sensitivity of 0.6 was used according to [17].The localisation performance was evaluated by means of the quadrant error rates (QEs) and polar errors (PEs).The QE describes the rate of confusing the hemifield (e.g., front and back), i.e., the rate of localisation errors greater than 90 • .The PE describes the root-mean-square error of responses within the correct hemifield, i.e., for errors smaller than 90 • .
For the reference and comparison, the general localisation ability of the listeners was simulated by running the model with the acoustically measured HRTFs of the corresponding listener and numerically calculated HRTFs of the corresponding reference mesh.

IV. RESULTS
Figure 3 shows the Hausdorff distance (in mm) visualised for the three listeners.Note that the distance can be up to 1.5 mm, but the corresponding regions are not within the pinna or the HRTF-relevant areas, but rather on the back side of the pinna or the head geometry.Table III shows the statistics of the Hausdorff distance for the three listeners in terms of median, average, and standard deviation.The median distance is below 1 mm, which can be considered as a threshold for meshes yielding perceptual valid HRTFs [23].When aligning the pinnae manually with the PPM, the Hausdorff distance provided feedback on the alignment quality.As the result, the median distance was below 1 mm for all alignments, indicating negligible geometric errors between the target and the PPMaligned mesh. Figure 4 shows the results from simulated localisation experiments in the median plane with HRTFs calculated based on the aligned target meshes.Each panel shows the simulated probability (encoded by the brightness) to respond at a response angle as a function of the target angle of the simulated sound source.The circles depict simulated positions of the highest probability, i.e., most probably responses which would be obtained in an actual localisation experiment.The  results indicate a good simulated localisation ability with HRTFs calculated based on the PPM-aligned meshes.As in actual experiments, the simulated localisation performance was best for sound sources at the eye level, with an increased localisation uncertainty for sound sources located between 60 and 120 • degree.
Table IV shows the QEs and PEs obtained for the three simulated listeners, calculated from the simulated probabilities shown in Fig. 4.This table also includes two references of listening with with the listeners' own ears simulated for the acoustically measured HRTFs (acoustic) and for the HRTFs calculated using the corresponding reference mesh (calculated).The resulting QEs and PEs were within the typical range of sound localisation with listener-specific HRTFs [28], indicating a good representation of the mesh by our PPM in terms of sound-localisation performance.

V. CONCLUSIONS
In this article, we described a parametric pinna model (PPM) designed for the numerical calculation of HRTFs.This PPM was evaluated by manually aligning it to a target mesh of a listener's pinna.This alignment, performed for three listeners, showed geometric errors below 1 mm.The psychoacoustic-based measures showed comparable simulated localisation performance as those obtained for numerically calculated HRTFs of the corresponding listeners.The manual alignment took a few minutes per listener only, showing the promising capability of the PPM to help in numeric calculations of personalised HRTFs.
The PPM has a few limitations, though.Currently, because of the complex structure of the pinna, the weights of the bendy bones and shape keys need to be assigned carefully in order to avoid an overlap in the surfaces of the manifold.Further, some regions of the pinna template mesh are controlled by multiple shape keys whose weights need be controlled coherently in order to avoid inconsistencies in the mesh.An automated synchronisation of such linked parameters and an automated alignment process would further help to ease the access of numerical HRTF calculation for a wide audience.

Fig. 1 .
Fig. 1. a) Parametric pinna model (PPM) consisting of a template mesh and an armature implemented in Blender.The template mesh is depicted in light grey, bendy bones are described by the green curves, and their control bones are black spheres at the start and end of each bone, respectively.b) Example for the effect of proximal vertices around the bendy bone 2, i.e., Helix Low Bendy.c) Example for the alignment of the PPM to an other pinna by moving and rotationg the control bone Helix Low End.Note the difference between the unmodified template mesh in a) and the modified mesh here.d) Example for the effect of the shape keys: Shape key 10, i.e., Fossa Triangularis depth was increased its largest prominence value.

Fig. 2 .
Fig. 2. Example of the PPM aligned to a target pinna.Note that control and bendy bones do not necessarily stick on the mesh surface.

Fig. 3 .
Fig. 3. Hausdorff distance between the PPM aligned to a listener's target pinna and the target pinna.a) NH5.b) NH130.c) NH131.In each panel, the distance distribution is depicted om the left side.It ranges from the smallest distance (bottom, red) to largest distance (top, green).

Fig. 4 .
Fig. 4. Simulation of localisation experiments with the aligned target pinnae, showing response angle as a function of the target angle (in degree), the brightness-coded probability, and circles depicting estimated localised positions of sound cues for listeners NH5, NH130, and NH131 of the ARI database (left, center, and right panels, respectively).

TABLE I BENDY
BONES OF THE PPM, INDEXED AS IMPLEMENTED IN THE MODEL INCLUDING THEIR CONTROL BONES AT THE START AND END OF THE RESPECTIVE BENDY BONE.THE PARENT BONE DEFINES THE GLOBAL POSITION AND ORIENTATION OF THE PINNA.

TABLE III STATISTICS
OF THE HAUSDORFF DISTANCE (IN MM) OBTAINED FOR THE

TABLE IV SIMULATED
LOCALISATION PERFORMANCE (QE IN % AND PE IN • ) OBTAINED FOR THE ACOUSTICALLY MEASURED HRTFS (ACOUSTIC), HRTFS CALCULATED FROM THE REFERENCE MESHES (CALCULATED), AND HRTFS CALCULATED FROM THE ALIGNED MESHES (PPM).