Patent 11303997 - Method for controlling a microphone array and device for controlling a microphone array > Description
This application claims the benefit of the foreign priority of German Patent Application No. 10 2019 134 541.3, filed on Dec. 16, 2019, the entirety of which is incorporated herein by reference.
The present principles relate to a device for controlling a microphone array and to a method for controlling a microphone array.
For capturing individual acoustic events in a large planar-like detection area in the presence of a high level of interference noise, WO2019/211487A1 proposes a microphone arrangement consisting of a circular arrangement of shotgun microphones that point radially outwardly. Since for planar acoustic detection areas no time-variant control of the audio beam along the dimension perpendicular to the plane of detection is required, the microphone array uses directly the directivity of the microphones as a fixed directivity with respect to this dimension. With respect to the dimension of the plane, however, such array allows a time-variant acoustic beam steering with an almost constant beam pattern in all directions.
A typical example of such a large planar-like detection area that simultaneously has a high level of interference noise is a sports field, where individual ball kick sounds or the sound of a referee whistle are to be captured, for instance during a soccer match. For such a task, the possible detection area is the soccer field. In addition, there is typically a high level of background noise during a game in a sports stadium, which emanates mainly from the stands around the playing field. A peculiarity of ball sports in general is the fact that both the ball and the players usually move very quickly, so that the beam steering needs to be fast in order to be able to capture the ball kick sound. The microphone array should not be positioned on the playing field, but may be e.g. on the edge of the field.
If the position of the acoustic target relative to the position of the array can be automatically tracked (e.g. by visual tracking using video cameras), the beam steering can be accomplished in a fully automated way, avoiding the need of a human operator. An automatic tracking system or tracker may in this case provide so-called tracking data, i.e. position data and velocity data of various target objects. The most important target object in this context is the ball. However, the following problems arise.
First, the tracking data have a latency and an uncertainty of this latency. The tracking data for controlling the direction of the beam are usually provided with a certain latency, which is caused for instance by image processing algorithms applied in the context of visual tracking or by transmission of the tracking data itself from the tracking system to the microphone array. For the case of moving sound objects to be captured with the microphone array this means that by the time the information about the object position arrives at the microphone array, the object is usually already located at a different position, which results in a mismatched beam steering. Typically, the latency of the tracking data is time-invariant and, what is even more important, not precisely known.
Second, there is an uncertainty in the tracking data accuracy: tracking systems are usually not able to provide the exact position of the tracked objects, but they provide the position only with a certain positional accuracy instead, for instance in the form of a confidence interval.
Third, sound propagation is associated with a delay. The sound needs a certain time to propagate from the object triggering the sound event to the microphone arrangement. Assuming that the sound objects to be captured are moving within a certain maximum distance from the array (e.g. up to 50 m), this effect can be regarded as a kind of “negative latency” with respect to the tracking data processing, requiring the beam steering to wait until the sound corresponding to a certain position has arrived at the microphone array. In contrast to the tracking data latency, the negative latency due to the sound propagation is time-variant, since it corresponds to the distance between the sound object and the microphone array.
Both effects result in a poor capture quality of the sound object, since the beam direction is not correctly time-aligned (for instance, the beam is directed into a certain direction too late or too early).
A suboptimal solution for the problem of tracking data latency in a real-time capturing system consists in simply delaying the audio signal by the expected mean latency before applying beam forming. This solution, however, does not consider uncertainties in the latency of the tracking data nor time-variant object-to-array distances. These effects often result in a temporal misalignment, that is, a difference between the set direction and the actual direction of the sound object to be captured in this moment.
The present invention solves at least this problem. In one embodiment, the invention relates to a method for controlling a microphone array. In another embodiment, the invention relates to a device for controlling a microphone array. In yet another embodiment, the invention relates to a non-transitory computer-readable storage medium having stored thereon instructions that when executed on a computer cause the computer to execute the steps of the method. Further advantageous embodiments are disclosed in the following description and the dependent claims.
According to the invention, the latency (including the uncertainty of the latency) of the tracking data and the sound propagation are accounted for by changing the width of the steered audio beam temporally. The beam is steered to be as narrow as possible, but as wide as necessary for fully and securely capturing the desired object sound. This creates a time-variant beam width control for the microphone array, where the width of the beam depends from at least the following parameters: the tracking data, i.e. the velocity of the moving object and its distance to the microphone array, and the tracking latency, i.e. the time the tracking data need to arrive at the microphone array. This allows to securely capture the sound of a sound event triggered by the moving object.
Further details and advantageous embodiments are depicted in the drawings, in which:
However, the tracking data relate to the ball position at the time tTR while the sound waves were created by the sound event at the time tE. If the sound travelling time equals the tracker latency, both match. Otherwise, the sound event was created at another position at an earlier or later time tE. Since the position, the trajectory Tr0 (i.e. direction) and the velocity of the ball at tTR are known from the tracking data, and since the tracking latency is also known and a tracking accuracy can at least be estimated, the ball position at the time tE can be calculated.
If the distance between the position provided by the tracking system and the microphone array is larger than a maximum value rMAX, the sound cannot travel this distance within the latency of the tracking system. Thus, the tracking data relating to the sound event have in this case already arrived at the microphone array 40 (or at the external processing device, respectively) when the sound waves 50 arrive. This distance results from rMAX=vS*dTRACK (with vS being the speed of sound and dTRACK the tracking latency). That is,
Moreover, there may be cases where the latency of the tracking system is not exactly known or the position data coming from the tracking system are erroneous. In such cases, a maximum possible latency may be given as an upper limit. Thus, also in these cases the width of the acoustic beam can be controlled adaptively in order to account for these uncertainties. Generally it makes sense then to increase the beam width; the faster the object causing the sound event moves and the smaller the distance between the object and the microphone array is, the larger the beam width should be.
However, a case where the distance between the position detected by the tracking system and the microphone array is smaller than the maximum value rMAX is critical. This case is considered in the following.
If however the sound propagation from the ball kick position to the array is taken into account, a narrower beam width that is sufficient can be calculated. In particular, for all possible ball kick positions p1,K, p2,K, p3,K there exists a minimum time duration dAIR,min that the ball kick sound needs for propagating through the air to the microphone array. Accordingly, there is a maximum time duration dBALL,max in which the ball has moved before being kicked such that the sound created by the kick arrives at the array at the time t0. Both cases occur together if the ball moves from the tracking position pTR along the trajectory Tr3 directly towards the array and is kicked on this path at a distance rreal, max from the tracking position pTR. The distance rreal,max may be derived from the fact that the sum of both time durations, dBALL,max (=tE−tTR) and dAIR,min (=t0−tE), must equal the tracker latency in order for the sound of the ball kick to arrive at the array at the time t0, i.e.
Expressing the time durations by the corresponding distances and velocities according to
wherein vS denotes the speed of sound and r denotes the distance between the microphone array and the tracking position, and solving for rreal,max results in
where rreal,max≈2.71 m results with the exemplary numbers mentioned above. This is the radius of a circular area Breal around a center pTR that represents the real area of uncertainty of the ball kick position; it is smaller than the dashed circle BTr. Thus, the ball kick noise is securely captured if the beamformer at the time t0 (i.e. when the tracking data arrive) is steered to generate a beam as narrow as possible for covering the smaller circle Breal. In the situation described above and shown in
Generally, the area of possible ball positions Breal becomes smaller if the distance between the tracking position pTR and the array increases, if the ball velocity vBALL decreases, or if the maximum latency of the tracker becomes smaller. Further, also the tracking accuracy can be incorporated into the beam width control, wherein the more inaccurate the tracking is, the stronger the beam width is to be increased. Vice versa, the more accurate the tracking is known to be, the narrower can the beam be. The smaller the calculated area Breal of possible ball positions is, the narrower is the beam and the less unwanted ambient sound is captured. Therefore, the increased focusing according to the invention leads to an improved audio signal quality.
A basic idea of the disclosed beam width control is that, between the occurrence of the sound event at the sound source and the arrival of the sound at the microphone array, a certain time has lapsed, during which the sound source has already moved.
For smaller distances r<rMAX however, the width or (azimuthal) angle of the directional characteristic is variable and depends on the velocity of the moving object 10, such that a higher velocity of the moving object 10 leads to a larger width or larger opening angle respectively of the directional characteristic. A minimum width or minimum opening angle respectively is obtained for r=rMAX. The minimum width or minimum opening angle is not undershot and may be in a range of 5°-10°, for instance. The variable directional characteristic may be generated e.g. by modifying filters of a filter-and-sum beamformer. For this, modified filter coefficients that may be retrieved from a memory 235 in which they are stored may be used. For changing the direction, the individual delay values for the single microphone signals may be modified. In an embodiment, suitable delay values according to the direction may also be retrieved from the memory 235. For other types of beamformers, other values that determine the beam width or opening angle respectively may be modified, such as e.g. weighting factors for Ambisonics signals in a modal beamformer.
In one embodiment, the width or opening angle respectively of the directional characteristic is modified 140 also dependent from the tracking latency, wherein a larger tracking latency leads to a larger beam width or larger opening angle of the directional characteristic, and vice versa. In a further embodiment, the width or opening angle is modified 150 also in dependence of the distance between the moving object 10 and the microphone array, wherein a larger distance leads to a smaller width or smaller opening angle of the directional characteristic respectively, and vice versa, and wherein the width or opening angle of the directional characteristic remains above a given non-zero minimum value.
In an embodiment, various of the microphone capsules are in different microphones, each with a directional characteristic, wherein the opening angle of the directional characteristic of the microphone array is calculated and variable in only one dimension (e.g., azimuth angle), while it is predetermined by the directional characteristic of the microphones in another dimension (e.g., elevation angle) where it remains unchanged over time.
In an embodiment, updated positional information is received 110 in regular time intervals of up to 100 ms from the tracking system, which may be video based for instance, and the width or opening angle α respectively of the beam is adapted to the updated positional information.
In embodiments, the invention may be implemented by a software configurable computer or processor. The computer or processor may be configured by instructions stored on a computer-readable non-transient storage medium. The instructions when executed on the computer or processor cause the computer or processor to execute the steps of the method described above.
The invention is in particular advantageous for usage in sports fields or sports stadiums in general, not only for soccer. However, it is clear that the invention may also be used in venues other than a sports stadium. While various different embodiments have been described, it is clear that combinations of features of different embodiments may be possible, even if not expressly mentioned herein. Such combinations are considered to be within the scope of the present invention.