Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Fractional Linear Prediction Toolbox for MATLAB Despotovic, Vladimir ; in Proc. of 21th International Carpathian Control Conference (ICCC) (2020, October) This paper presents the Fractional Linear Prediction (FLP) Toolbox implemented in MATLAB with a supporting livescript interface that offers an user-friendly environment for the prediction of one ... [more ▼] This paper presents the Fractional Linear Prediction (FLP) Toolbox implemented in MATLAB with a supporting livescript interface that offers an user-friendly environment for the prediction of one-dimensional signals. Two versions of the FLP are implemented in the toolbox and presented here. While the first approach is using the “full” memory (the whole history of the signal), the second approach uses the “restricted” memory (two, three or four previous samples). Both FLP approaches are compared to the standard linear prediction and their performance is evaluated in examples using a test signal (sine wave signal), and a real-data signal (speech signal) as the input. [less ▲] Detailed reference viewed: 90 (0 UL)Speech Based Estimation of Parkinson’s Disease Using Gaussian Processes and Automatic Relevance Determination Despotovic, Vladimir ; ; Schommer, Christoph in Neurocomputing (2020), 401 Parkinson’s disease is a progressive neurodegenerative disorder often accompanied by impairment in articulation, phonation, prosody and fluency of speech. In fact, speech impairment is one of the earliest ... [more ▼] Parkinson’s disease is a progressive neurodegenerative disorder often accompanied by impairment in articulation, phonation, prosody and fluency of speech. In fact, speech impairment is one of the earliest Parkinson’s disease symptoms, and may be used for early diagnosis. We present an experimental study of identification of Parkinson’s disease and assessment of disease progress from speech using Gaussian processes, which is further combined with Automatic Relevance Determination (ARD) for efficient feature selection. Hyperparameters of ARD covariance functions are learned for each individual feature; therefore, can be used for evaluation of their importance. In that way only a small subset of highly relevant acoustic features is selected, leading to models with better performance and lower complexity. The performance of the proposed method was assessed on two datasets: Parkinson’s disease detection dataset, which contains a range of biomedical voice measurements obtained from 31 subjects, 23 of them suffering from Parkinson’s disease and 8 healthy subjects; and Parkinson’s telemonitoring dataset, containing biomedical voice measurements collected from 42 Parkinson’s disease patients for estimation of the disease progress. Gaussian process classification with automatic relevance determination is able to successfully discriminate between Parkinson’s disease patients and healthy controls with 96.92% accuracy, outperforming Support Vector Machines and decision tree ensembles (random forests, boosted and bagged decision trees). The usability of Gaussian processes is further confirmed in regression task for tracking the progress of the disease. [less ▲] Detailed reference viewed: 92 (7 UL)Gaussian source coding based on variance-mismatched three-level scalar quantisation using Q-function approximations ; ; Despotovic, Vladimir in IET Communications (2020), 14(4), 594-602 Detailed reference viewed: 53 (1 UL)Two-dimensional fractional linear prediction ; Despotovic, Vladimir ; in Computers and Electrical Engineering (2019), 77 Linear prediction (LP) has been applied with great success in coding of one-dimensional, time-varying signals, such as speech or biomedical signals. In case of two-dimensional signal representation (e.g ... [more ▼] Linear prediction (LP) has been applied with great success in coding of one-dimensional, time-varying signals, such as speech or biomedical signals. In case of two-dimensional signal representation (e.g. images) the model can be extended by applying one-dimensional LP along two space directions (2D LP). Fractional linear prediction (FLP) is a generalisation of standard LP using the derivatives of non-integer (arbitrary real) order. While FLP was successfully applied to one-dimensional signals, there are no reported implementations in multidimensional space. In this paper two variants of two-dimensional FLP (2D FLP) are proposed and optimal predictor coefficients are derived. The experiments using various grayscale images confirm that the proposed 2D FLP models are able to achieve comparable performance in comparison to 2D LP using the same support region of the predictor, but with one predictor coefficient less, enabling potential compression. [less ▲] Detailed reference viewed: 46 (9 UL)Audio signal processing using fractional linear prediction ; Despotovic, Vladimir in Mathematics (2019), 7(7), Fractional linear prediction (FLP), as a generalization of conventional linear prediction (LP), was recently successfully applied in different fields of research and engineering, such as biomedical signal ... [more ▼] Fractional linear prediction (FLP), as a generalization of conventional linear prediction (LP), was recently successfully applied in different fields of research and engineering, such as biomedical signal processing, speech modeling and image processing. The FLP model has a similar design as the conventional LP model, i.e., it uses a linear combination of “fractional terms” with different orders of fractional derivative. Assuming only one “fractional term” and using limited number of previous samples for prediction, FLP model with “restricted memory” is presented in this paper and the closed-form expressions for calculation of FLP coefficients are derived. This FLP model is fully comparable with the widely used low-order LP, as it uses the same number of previous samples, but less predictor coefficients, making it more efficient. Two different datasets, MIDI Aligned Piano Sounds (MAPS) and Orchset, were used for the experiments. Triads representing the chords composed of three randomly chosen notes and usual Western musical chords (both of them from MAPS dataset) served as the test signals, while the piano recordings from MAPS dataset and orchestra recordings from the Orchset dataset served as the musical signal. The results show enhancement of FLP over LP in terms of model complexity, whereas the performance is comparable. [less ▲] Detailed reference viewed: 81 (6 UL)Novel Two-Bit Adaptive Delta Modulation Algorithms ; ; Despotovic, Vladimir in Informatica (2019), 30(1), 117-134 This paper introduces two novel algorithms for the 2-bit adaptive delta modulation, namely 2-bit hybrid adaptive delta modulation and 2-bit optimal adaptive delta modulation. In 2-bit hybrid adaptive ... [more ▼] This paper introduces two novel algorithms for the 2-bit adaptive delta modulation, namely 2-bit hybrid adaptive delta modulation and 2-bit optimal adaptive delta modulation. In 2-bit hybrid adaptive delta modulation, the adaptation is performed both at the frame level and the sample level, where the estimated variance is used to determine the initial quantization step size. In the latter algorithm, the estimated variance is used to scale the quantizer codebook optimally designed assuming Laplace distribution of the input signal. The algorithms are tested using speech signal and compared to constant factor delta modulation, continuously variable slope delta modulation and instantaneously adaptive 2-bit delta modulation, showing that the proposed algorithms offer higher performance and significantly wider dynamic range. [less ▲] Detailed reference viewed: 108 (1 UL)Optimal fractional linear prediction with restricted memory ; Despotovic, Vladimir ; in IEEE Signal Processing Letters (2019), 26(5), 760-764 Linear prediction is extensively used in modeling, compression, coding, and generation of speech signal. Various formulations of linear prediction are available, both in time and frequency domain, which ... [more ▼] Linear prediction is extensively used in modeling, compression, coding, and generation of speech signal. Various formulations of linear prediction are available, both in time and frequency domain, which start from different assumptions but result in the same solution. In this letter, we propose a novel, generalized formulation of the optimal low-order linear prediction using the fractional (non-integer) derivatives. The proposed fractional derivative formulation allows for the definition of predictor with versatile behavior based on the order of fractional derivative. We derive the closed-form expressions of the optimal fractional linear predictor with restricted memory, and prove that the optimal first-order and the optimal second-order linear predictors are only its special cases. Furthermore, we empirically prove that the optimal order of fractional derivative can be approximated by the inverse of the predictor memory, and thus, it is a priori known. Therefore, the complexity is reduced by optimizing and transferring only one predictor coefficient, i.e., one parameter less in comparison to the second-order linear predictor, at the same level of performance. [less ▲] Detailed reference viewed: 44 (7 UL)Signal prediction using fractional derivative models ; Despotovic, Vladimir in Bǎleanu, Dumitru; Mendes Lopes, António (Eds.) Handbook of Fractional Calculus with Applications (2019) In this chapter the linear prediction (LP) and its generalisation to fractional linear prediction (FLP) is described with the possible applications to one-dimensional (1D) and two-dimensional (2D) signals ... [more ▼] In this chapter the linear prediction (LP) and its generalisation to fractional linear prediction (FLP) is described with the possible applications to one-dimensional (1D) and two-dimensional (2D) signals. Standard test signals, such as the sine wave, the square wave, and the sawtooth wave, as well as the real-data signals, such as speech, electrocardiogram and electroencephalogram are used for the numerical experiments for the 1D case, and grayscale images for the 2D case. The 1D FLP model is proposed to have a similar construction as the LP model, i.e. it uses linear combination of fractional derivatives with different values of the fractional order. The 2D FLP model uses linear combination of the fractional derivatives in two directions, horizontal and vertical. The scheme for the computation of the optimal predictor coefficients for both 1D and 2D FLP models is also provided. The performance of the proposed FLP models is compared to the performance of the LP models, confirming that the proposed FLP can be successfully applied in processing of 1D and 2D signals, giving comparable or better performance using the same or even smaller number of parameters. [less ▲] Detailed reference viewed: 60 (10 UL)An efficient two-digit adaptive delta modulation for Laplacian source coding ; ; Despotovic, Vladimir in International Journal of Electronics (2019), 106(7), 1085-1100 Delta Modulation (DM) is a simple waveform coding algorithm used mostly when timely data delivery is more important than the transmitted data quality. While the implementation of DM is fairly simple and ... [more ▼] Delta Modulation (DM) is a simple waveform coding algorithm used mostly when timely data delivery is more important than the transmitted data quality. While the implementation of DM is fairly simple and inexpensive, it suffers from several limitations, such as slope overload and granular noise, which can be overcome using Adaptive Delta Modulation (ADM). This paper presents novel 2-digit ADM with six-level quantization using variable-length coding, for encoding the time-varying signals modelled by Laplacian distribution. Two variants of quantizer are employed, distortion-constrained quantizer which is optimally designed for minimal mean-squared error (MSE), and rate-constrained quantizer, which is suboptimal in the minimal MSE sense, but enables minimal loss in SQNR for the target bit rate. Experimental results using real speech signal are provided, indicating that the proposed configuration outperforms the baseline ADM algorithms, including Constant Factor Delta Modulation (CFDM), Continuously Variable Slope Delta Modulation (CVSDM), 2-digit and 2-bit ADM, and operates in a much wider dynamic range. [less ▲] Detailed reference viewed: 66 (3 UL)One-parameter fractional linear prediction Despotovic, Vladimir ; ; in Computers and Electrical Engineering (2018), 69 The one-parameter fractional linear prediction (FLP) is presented and the closed-form expressions for the evaluation of FLP coefficients are derived. Contrary to the classical first-order linear ... [more ▼] The one-parameter fractional linear prediction (FLP) is presented and the closed-form expressions for the evaluation of FLP coefficients are derived. Contrary to the classical first-order linear prediction (LP) that uses one previous sample and one predictor coefficient, the one-parameter FLP model is derived using the memory of two, three or four samples, while not increasing the number of predictor coefficients. The first-order LP is only a special case of the proposed one-parameter FLP when the order of fractional derivative tends to zero. Based on the numerical experiments using test signals (sine test waves), and real-data signals (speech and electrocardiogram), the hypothesis for estimating the fractional derivative order used in the model is given. The one-parameter FLP outperforms the classical first-order LP in terms of the prediction gain, having comparable performance with the second-order LP, although using one predictor coefficient less. [less ▲] Detailed reference viewed: 57 (2 UL)Forward Adaptive Laplacian Source Coding Based on Restricted Quantization ; ; Despotovic, Vladimir et al in Information Technology and Control (2018), 47(2), 209-219 A novel solution for Laplacian source coding based on three-level quantization is proposed in this paper. The restricted three-level quantizer is designed by assuming the restricted Laplacian distribution ... [more ▼] A novel solution for Laplacian source coding based on three-level quantization is proposed in this paper. The restricted three-level quantizer is designed by assuming the restricted Laplacian distribution of the input signal. Quantizer and Huffman encoder are jointly designed. Forward adaptive scheme was employed, where the adaptation to the signal variance (power) was performed on frame-by frame basis. We employ switched model that consists of two restricted quantizers having unequal support regions. The simulation results (measured as SQNR) of the proposed scheme with a switched restricted three-level quantizer are compared to the cases when it involves three-level unrestricted quantizer and the Lloyd-Max quantizers having N=2 and N=4 levels. It is shown that the proposed solution offers performance comparable to the one of N=4 levels Lloyd-Max’s baseline with large savings in bit rate, while outperforming two other baselines. [less ▲] Detailed reference viewed: 73 (0 UL)Machine learning techniques for semantic analysis of dysarthric speech: An experimental study Despotovic, Vladimir ; ; in Speech Communication (2018), 99 We present an experimental comparison of seven state-of-the-art machine learning algorithms for the task of semantic analysis of spoken input, with a special emphasis on applications for dysarthric speech ... [more ▼] We present an experimental comparison of seven state-of-the-art machine learning algorithms for the task of semantic analysis of spoken input, with a special emphasis on applications for dysarthric speech. Dysarthria is a motor speech disorder, which is characterized by poor articulation of phonemes. In order to cater for these non- canonical phoneme realizations, we employed an unsupervised learning approach to estimate the acoustic models for speech recognition, which does not require a literal transcription of the training data. Even for the subsequent task of semantic analysis, only weak supervision is employed, whereby the training utterance is accompanied by a semantic label only, rather than a literal transcription. Results on two databases, one of them containing dysarthric speech, are presented showing that Markov logic networks and conditional random fields substantially outperform other machine learning approaches. Markov logic networks have proved to be espe- cially robust to recognition errors, which are caused by imprecise articulation in dysarthric speech. [less ▲] Detailed reference viewed: 82 (9 UL)Dual-mode quasi-logarithmic quantizer with embedded G.711 codec ; ; Despotovic, Vladimir et al in Journal of Electrical Engineering (2018), 69(1), 46-51 The G.711 codec has been accepted as a standard for high quality coding in many applications. A dual-mode quantizer, which combines the nonlinear logarithmic quantizer for restricted input signals and G ... [more ▼] The G.711 codec has been accepted as a standard for high quality coding in many applications. A dual-mode quantizer, which combines the nonlinear logarithmic quantizer for restricted input signals and G.711 quantizer for unrestricted input signals is proposed in this paper. The parameters of the proposed quantizer are optimized, where the minimal distortion is used as the criterion. It is shown that the optimized version of the proposed quantizer provides 5.4 dB higher SQNR (Signal to Quantization Noise Ratio) compared to G.711 quantizer, or equivalently it performs savings in the bit rate of approximately 0.9 bit/sample for the same signal quality. Although the complexity is slightly increased, we believe that due to the superior performance it can be successfully implemented for high-quality quantization. [less ▲] Detailed reference viewed: 61 (0 UL)Linear prediction of speech: The fractional derivative formula Despotovic, Vladimir ; in Book of Abstracts, 2017 International Workshop on Fractional Calculus and Its Applications (2017, May) Detailed reference viewed: 39 (1 UL)Sentiment Analysis of Microblogs Using Multilayer Feed-forward Artificial Neural Networks Despotovic, Vladimir ; in Computing and Informatics (2017), 36(5), 11271142 Sentiment analysis aims to extract public opinion on a particular topic and microblogs, especially Twitter as the most influential platform, represent a significant source of information. The application ... [more ▼] Sentiment analysis aims to extract public opinion on a particular topic and microblogs, especially Twitter as the most influential platform, represent a significant source of information. The application to microblogs has to cope with difficulties, such as informal language with abbreviations, internet jargons, emoticons, hashtags that do not appear in conventional text documents. Sentiment analysis technique for microblogs based on a feed-forward artificial neural network (ANN) with sigmoid activation function is proposed in this paper and compared to machine learning approaches, i.e. Multinomial Naive Bayes, Support Vector Machines and Maximum Entropy. Experiments were performed on Stanford Twitter Sentiment corpus, a balanced dataset which contains noisy training labels weakly annotated using emoticons as sentiment indicators; and SemEval-2014 Task 9 corpus, an unbalanced dataset which contains manually annotated training examples. The obtained results show that ANN produces superior or at least comparable results to state-of-the-art machine learning techniques. [less ▲] Detailed reference viewed: 57 (1 UL)Fractional-order speech prediction Despotovic, Vladimir ; in International Conference on Fractional Differentiation and its Applications (ICFDA ‘16) (2016, July) Detailed reference viewed: 41 (1 UL)Semantic Analysis of Spoken Input Using Markov Logic Networks Despotovic, Vladimir ; ; in Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015) (2015, September) We present a semantic analysis technique for spoken input using Markov Logic Networks (MLNs). MLNs combine graphical models with first-order logic. They are particularly suitable for providing inference ... [more ▼] We present a semantic analysis technique for spoken input using Markov Logic Networks (MLNs). MLNs combine graphical models with first-order logic. They are particularly suitable for providing inference in the presence of inconsistent and in- complete data, which are typical of an automatic speech recognizer’s (ASR) output in the presence of degraded speech. The target application is a speech interface to a home automation system to be operated by people with speech impairments, where the ASR output is particularly noisy. In order to cater for dysarthric speech with non-canonical phoneme realizations, acoustic representations of the input speech are learned in an unsupervised fashion. While training data transcripts are not required for the acoustic model training, the MLN training requires supervision, however, at a rather loose and abstract level. Results on two databases, one of them for dysarthric speech, show that MLN-based semantic analysis clearly outperforms baseline approaches employing non-negative matrix factorization, multinomial naive Bayes models, or support vector machines. [less ▲] Detailed reference viewed: 34 (1 UL)An Evaluation of Unsupervised Acoustic Model Training for a Dysarthric Speech Interface ; Despotovic, Vladimir ; et al in Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) (2014, September) In this paper, we investigate unsupervised acoustic model training approaches for dysarthric-speech recognition. These models are first, frame-based Gaussian posteriorgrams, obtained from Vector ... [more ▼] In this paper, we investigate unsupervised acoustic model training approaches for dysarthric-speech recognition. These models are first, frame-based Gaussian posteriorgrams, obtained from Vector Quantization (VQ), second, so-called Acoustic Unit Descriptors (AUDs), which are hidden Markov models of phone-like units, that are trained in an unsupervised fashion, and, third, posteriorgrams computed on the AUDs. Experiments were carried out on a database collected from a home automation task and containing nine speakers, of which seven are considered to utter dysarthric speech. All unsupervised modeling approaches delivered significantly better recognition rates than a speaker-independent phoneme recognition baseline, showing the suitability of unsupervised acoustic model training for dysarthric speech. While the AUD models led to the most compact representation of an utterance for the subsequent semantic inference stage, posteriorgram-based representations resulted in higher recognition rates, with the Gaussian posteriorgram achieving the highest slot filling F-score of 97.02%. [less ▲] Detailed reference viewed: 33 (2 UL)Artificial Intelligence Techniques for Modelling of Temperature in the Metal Cutting Process ; Despotovic, Vladimir in Metallurgy – Advances in Materials and Processes (2014) Detailed reference viewed: 149 (1 UL)Design of nonlinear predictors for adaptive predictive coding of speech signals Despotovic, Vladimir ; in Proceedings of the 21st Telecommunications Forum Telfor (TELFOR) (2013, November) Linear predictive coding is probably the most frequently used technique in speech signal processing. Its main advantage comes from the analogy of the simplified vocal tract model with speech production ... [more ▼] Linear predictive coding is probably the most frequently used technique in speech signal processing. Its main advantage comes from the analogy of the simplified vocal tract model with speech production system. However, this neglects nonlinearities in the speech production process. The paper deals with nonlinear prediction of speech based on truncated Volterra series. Long-term one-tap Volterra predictor is designed in order to decrease computational complexity. Further improvements are obtained using frame/subframe structure and fractional delay. [less ▲] Detailed reference viewed: 82 (0 UL) |
||