If you want to have the list of publications issued from a specific Individual Project (IP), write in the search field (IM2.IP). IP can have the following value: DMA, AP, VP, MPR, MCA, HMI, ISD, BMI
If you want to find joint publications between IPs, write in the search field (joint), click on search and then click on Keywords
If you want to display all the publications for a specific author, use the shortcut called -Authors- located in the main menu
Abstract: A speech/audio codec based on Frequency Domain Linear Prediction (FDLP) exploits auto-regressive modeling to approximate instantaneous energy in critical frequency sub-bands of relatively long input segments. The current version of the FDLP codec operating at 66 kbps has been shown to provide comparable subjective listening quality results to state-of-the-art codecs on similar bit-rates even without employing standard blocks such as entropy coding or simultaneous masking. This paper describes an experimental work to increase compression efficiency of the FDLP codec by employing entropy coding. Unlike conventional Huffman coding employed in current speech/audiocoding systems, we describe an efficient way to exploit arithmetic coding to entropy compress quantized spectral magnitudes of the sub-band FDLP residuals. Such an approach provides 11\% ( 3 kbps) bit-rate reduction compared to the Huffman coding algorithm ( 1 kbps).
Abstract: We describe novel audiocoding technique designed to be utilized at medium bit-rates. Unlike classical state-of-the-art audio coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band-sized sub-bands. We apply auto-regressive model to approximate Hilbert envelopes in frequency sub-bands. Residual signals (Hilbert carriers) are demodulated and thresholding functions are applied in spectral domain. The Hilbert envelopes and carriers are quantized and transmitted to the decoder. Our experiments focused on designing audio coder to provide broadcast radio-like quality audio around $10-20$kbps. Objective quality measures indicate comparable performance with the 3GPP-AMR speech codec standard for both speech and non-speech signals.
Abstract: In this paper, we describe additional experiments based on a novel audiocoding technique that uses an autoregressive model to approximate an audio signal's Hilbert envelope. This technique is performed over long segments (1000 ms) in critical-band-sized sub-bands. We have performed a series of experiments to find more efficient methods of quantizing the frequency components of the Hilbert carrier, which is the excitation found in the temporal audio signal. When using linear quantization, it was found that allocating 5 bits for transmitting the Hilbert carrier every 200 ms was sufficient. Other techniques, such as quantizing the first derivative of phase and using an iterative adaptive threshold, were examined.
Abstract: Frequency Domain Linear Prediction (FDLP) represents the technique for approximating temporal envelopes of a signal using autoregressive models. In this paper, we propose a wide-band audiocoding system exploiting FDLP. Specifically, FDLP is applied on critically sampled sub-bands to model the Hilbert envelopes. The residual of the linear prediction forms the Hilbert carrier, which is transmitted along with the envelope parameters. This process is reversed at the decoder to reconstruct the signal. In the objective and subjective quality evaluations, the FDLP based audio codec at $66$ kbps provides competitive results compared to the state-of-art codecs at similar bit-rates.
Abstract: Audio codec based on Frequency Domain Linear Prediction (FDLP) exploits auto-regressive modeling to approximate instantaneous energy in critical frequency sub-bands of relatively long input segments. Current version of the FDLP codec operating at 66 kbps has shown to provide comparable subjective listening quality results to the state-of-the-art codecs on similar bit-rates even without employing strategic blocks, such as entropy coding or simultaneous masking. This paper describes an experimental work to increase compression efficiency of the FDLP codec provided by employing entropy coding. Unlike traditionally used Huffman coding in current audiocoding systems, we describe an efficient way to exploit Arithmetic coding to entropy compress quantized magnitude spectral components of the sub-band FDLP residuals. Such approach outperforms Huffman coding algorithm and provides more than 3 kbps bit-rate reduction.
Abstract: Unlike classical state-of-the-art coders that are based on short-term spectra, our approach uses relatively long temporal segments of audio signal in critical-band-sized sub-bands. We apply auto-regressive model to approximate Hilbert envelopes in frequency sub-bands. Residual signals (Hilbert carriers) are demodulated and thresholding functions are applied in spectral domain. The Hilbert envelopes and carriers are quantized and transmitted to the decoder. Our experiments focused on designing speech/audio coder to provide broadcast radio-like quality audio around 15-25kbps. Obtained objective quality measures, carried out on standard speech recordings, were compared to the state-of-the-art 3GPP-AMR speech coding system.
Abstract: This paper describes employment of non-uniform QMF decomposition to increase the efficiency of a generic wide-band audiocoding system based on Frequency Domain Linear Prediction (FDLP). The base line FDLP codec, operating at high bit-rates ( 136 kbps), exploits a uniform QMF decomposition into 64 sub-bands followed by sub-band processing based on FDLP. Here, we propose a non-uniform QMF decomposition into 32 frequency sub-bands obtained by merging 64 uni- form QMF bands. The merging operation is performed in such a way that bandwidths of the resulting critically sampled sub-bands emulate the characteristics of the critical band filters in the human auditory system. Such frequency decomposition, when employed in the FDLP audio codec, results in a bit-rate reduction of 40\% over the base line. We also describe the complete audio codec, which provides high-fidelity audio compression at 66 kbps. In subjective listening tests, the FDLP codec outperforms MPEG-1 Layer 3 (MP3) and achieves similar qualities as MPEG-4 HE-AAC codec.
Abstract: Audiocoding based on Frequency Domain Linear Prediction (FDLP) uses auto-regressive models to approximate Hilbert envelopes in frequency sub-bands. Although the basic technique achieves good coding efficiency, there is a need to improve the reconstructed signal quality for tonal signals with impulsive spectral content. For such signals, the quantization noise in the FDLP codec appears as frequency components not present in the input signal. In this paper, we propose a technique of Spectral Noise Shaping (SNS) for improving the quality of tonal signals by applying a Time Domain Linear Prediction (TDLP) filter prior to the FDLP processing. The inverse TDLP filter at the decoder shapes the quantization noise to reduce the artifacts. Application of the SNS technique to the FDLP codec improves the quality of the tonal signals without affecting the bit-rate. Performance evaluation is done with Perceptual Evaluation of Audio Quality (PEAQ) scores and with subjective listening tests.
Abstract: Audiocoding based on Frequency Domain Linear Prediction (FDLP) uses auto-regressive model to approximate Hilbert envelopes in frequency sub-bands for relatively long temporal segments. Although the basic technique achieves good quality of the reconstructed signal, there is a need for improving the coding efficiency. In this paper, we present a novel method for the application of temporal masking to reduce the bit-rate in a FDLP based codec. Temporal masking refers to the hearing phenomenon, where the exposure to a sound reduces response to following sounds for a certain period of time (up to $200$ ms). In the proposed version of the codec, a first order forward masking model of the human ear is implemented and informal listening experiments using additive white noise are performed to obtain the exact noise masking thresholds. Subsequently, this masking model is employed in encoding the sub-band FDLP carrier signal. Application of the temporal masking in the FDLP codec results in a bit-rate reduction of about $10$\% without degrading the quality. Performance evaluation is done with Perceptual Evaluation of Audio Quality (PEAQ) scores and with subjective listening tests.
Abstract: This paper proposes an efficient video coding method based on audio-visual attention, which is motivated by the fact that cross-modal interaction significantly affects humans perception of multimedia experiences. First, we propose an audio-visual source localization method to locate the sound source in a video sequence. Then, its result is used for applying spatial blurring to the images in order to reduce redundant high-frequency information and achieve coding efficiency. We demonstrate the effectiveness of the proposed method for H.264/AVC coding along with the results of a subjective test.
Abstract: In this paper we propose an extension of the very low bit-rate speech coding technique, exploiting predictability of the temporal evolution of spectral envelopes, for wide-band audiocoding applications. Temporal envelopes in critically band-sized sub-bands are estimated using frequency domain linear prediction applied on relatively long time segments. The sub-band residual signals, which play an important role in acquiring high quality reconstruction, are processed using a heterodyning-based signal analysis technique. For reconstruction, their optimal parameters are estimated using a closed-loop analysis-by-synthesis technique driven by a perceptual model emulating simultaneous masking properties of the human auditory system. We discuss the advantages of the approach and show some properties on challenging audio recordings. The proposed technique is capable of encoding high quality, variable rate audio signals on bit-rates below $1$bit/sample.