Learning Gamelan

April 5, 2016 — August 5, 2022

convolution

functional analysis

music

neural nets

nonparametric

signal processing

sparser than thou

Suspiciously similar content

Crib notes for a 2-year-long project which I ultimately abandoned in late 2018, about approximating convnets with recurrent neural networks for analysing time series. This project currently exists purely as LaTeX files on my hard drive, which need to be imported for future reference. I did learn some useful tricks along the way about controlling the poles of IIR filters for learning by gradient descent. Most of the ideas were about how to learn a sparse, linear, time-invariant filterbank have been surpassed by S4 models

I feel a certain class of audio signal should be easy to decompose and then learn in a musically useful way; ones approximated by LTI, nearly-linear, nearly-additive filterbanks with sparse activations. At the time we mostly we handled musical signals via convnets, which is not satisfying, and I felt we could do better with a more appropriate architecture. This project was about finding that architecture.

1 References

Abdallah, and Plumbley. 2004. “Polyphonic Music Transcription by Non-Negative Sparse Coding of Power Spectra.” In.

Allen-Zhu, and Li. 2019. “Can SGD Learn Recurrent Neural Networks with Provable Generalization?” arXiv:1902.01028 [Cs, Math, Stat].

Alliney. 1992. “Digital Filters as Absolute Norm Regularizers.” IEEE Transactions on Signal Processing.

Antoniou. 2005. Digital signal processing: signals, systems and filters.

Arjovsky, Shah, and Bengio. 2016. “Unitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. ICML’16.

Ascher. 2008. Numerical methods for evolutionary differential equations. Computational science and engineering 5.

Atal. 2006. “The History of Linear Prediction.” IEEE Signal Processing Magazine.

Bach, and Jordan. 2006. “Learning Spectral Clustering, with Application to Speech Separation.” Journal of Machine Learning Research.

Bach, and Moulines. 2013. “Non-Strongly-Convex Smooth Stochastic Approximation with Convergence Rate O(1/n).” In arXiv:1306.2119 [Cs, Math, Stat].

Banitalebi-Dehkordi, and Banitalebi-Dehkordi. 2014. “Music Genre Classification Using Spectral Analysis and Sparse Representation of the Signals.” Journal of Signal Processing Systems.

Baydin, and Pearlmutter. 2014. “Automatic Differentiation of Algorithms for Machine Learning.” arXiv:1404.7456 [Cs, Stat].

Bayro-Corrochano. 2005. “The Theory and Use of the Quaternion Wavelet Transform.” Journal of Mathematical Imaging and Vision.

Ben Taieb, and Atiya. 2016. “A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting.” IEEE transactions on neural networks and learning systems.

Bengio, Y., Simard, and Frasconi. 1994. “Learning Long-Term Dependencies with Gradient Descent Is Difficult.” IEEE Transactions on Neural Networks.

Bengio, Samy, Vinyals, Jaitly, et al. 2015. “Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 28. NIPS’15.

Bertin, Badeau, and Vincent. 2010. “Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription.” IEEE Transactions on Audio, Speech, and Language Processing.

Blackman, and Tukey. 1959. The measurement of power spectra from the point of view of communications engineering.

Blei, Kucukelbir, and McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association.

Bogert, Healy, and Tukey. 1963. “The Quefrency Alanysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking.” In.

Bora, Jalal, Price, et al. 2017. “Compressed Sensing Using Generative Models.” In International Conference on Machine Learning.

Bordes, Bottou, and Gallinari. 2009. “SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent.” Journal of Machine Learning Research.

Borzì, and Schulz. 2012. Computational Optimization of Systems Governed by Partial Differential Equations. Computational Science and Engineering Series.

Boulanger-Lewandowski, Bengio, and Vincent. 2012. “Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.

Bridle, and Brown. 1974. “An Experimental Automatic Word Recognition System.” JSRU Report.

Buch, Quinton, and Sturm. 2017. “NichtnegativeMatrixFaktorisierungnutzendesKlangsynthesenSystem (NiMFKS): Extensions of NMF-Based Concatenative Sound Synthesis.” In Proceedings of the 20th International Conference on Digital Audio Effects.

Cakir, Ozan, and Virtanen. 2016. “Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection.” In Neural Networks (IJCNN), 2016 International Joint Conference on.

Carabias-Orti, Virtanen, Vera-Candeas, et al. 2011. “Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.” IEEE Journal of Selected Topics in Signal Processing.

Chang, Meng, Haber, Tung, et al. 2018. “Multi-Level Residual Networks from Dynamical Systems View.” In PRoceedings of ICLR.

Chang, Meng, Haber, Ruthotto, et al. 2018. “Reversible Architectures for Arbitrarily Deep Residual Neural Networks.” In arXiv:1709.03698 [Cs, Stat].

Charles, Balavoine, and Rozell. 2016. “Dynamic Filtering of Time-Varying Sparse Signals via L1 Minimization.” IEEE Transactions on Signal Processing.

Chevillon. 2007. “Direct Multi-Step Estimation and Forecasting.” Journal of Economic Surveys.

Choi, Fazekas, Cho, et al. 2017. “A Tutorial on Deep Learning for Music Information Retrieval.” arXiv:1709.04396 [Cs].

Choi, Fazekas, and Sandler. 2016. “Automatic Tagging Using Deep Convolutional Neural Networks.” In PRoceedings of ISMIR.

Choi, Fazekas, Sandler, et al. 2016. “Convolutional Recurrent Neural Networks for Music Classification.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Choi, Fazekas, Sandler, et al. 2017. “Transfer Learning for Music Classification and Regression Tasks.” In Proceeding of The 18th International Society of Music Information Retrieval (ISMIR) Conference 2017.

Chollet. 2016. “Xception: Deep Learning with Depthwise Separable Convolutions.” arXiv:1610.02357 [Cs].

Choromanska, Henaff, Mathieu, et al. 2015. “The Loss Surfaces of Multilayer Networks.” In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics.

Chung, Ahn, and Bengio. 2016. “Hierarchical Multiscale Recurrent Neural Networks.” arXiv:1609.01704 [Cs].

Chung, Gulcehre, Cho, et al. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” In NIPS.

Chung, Kastner, Dinh, et al. 2015. “A Recurrent Latent Variable Model for Sequential Data.” In Advances in Neural Information Processing Systems 28.

Collins, Sohl-Dickstein, and Sussillo. 2016. “Capacity and Trainability in Recurrent Neural Networks.” In arXiv:1611.09913 [Cs, Stat].

Cooijmans, Ballas, Laurent, et al. 2016. “Recurrent Batch Normalization.” arXiv Preprint arXiv:1603.09025.

Cyrta, Trzciński, and Stokowiec. 2017. “Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings.” arXiv:1708.02840 [Cs].

Dai, Dai, Qu, et al. 2016. “Very Deep Convolutional Neural Networks for Raw Waveforms.” arXiv:1610.00087 [Cs].

Davis, and Mermelstein. 1980. “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.” IEEE Transactions on Acoustics, Speech, and Signal Processing.

Defferrard, Benzi, Vandergheynst, et al. 2017. “FMA: A Dataset For Music Analysis.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.

Dieleman, and Schrauwen. 2014. “End to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Doerr, Daniel, Schiegg, et al. 2018. “Probabilistic Recurrent State-Space Models.” arXiv:1801.10395 [Stat].

Duchi, Hazan, and Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research.

Dumitrescu. 2017. Positive trigonometric polynomials and signal processing applications. Signals and communication technology.

Durbin, and Koopman. 2012. Time Series Analysis by State Space Methods. Oxford Statistical Science Series 38.

Eichler, Dahlhaus, and Dueck. 2016. “Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions.” Journal of Time Series Analysis.

Ekanadham, Tranchina, and Simoncelli. 2011. “Recovery of Sparse Translation-Invariant Signals With Continuous Basis Pursuit.” IEEE Transactions on Signal Processing.

Elbaz, and Zibulevsky. 2017. “Perceptual Audio Loss Function for Deep Learning.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.

Engel, Resnick, Roberts, et al. 2017. “Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders.” In PMLR.

Evensen. 2009. “The Ensemble Kalman Filter for Combined State and Parameter Estimation.” IEEE Control Systems.

Févotte, Bertin, and Durrieu. 2008. “Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis.” Neural Computation.

Finke, and Singh. 2016. “Approximate Smoothing and Parameter Estimation in High-Dimensional State-Space Models.” arXiv:1606.08650 [Stat].

Flamary, Févotte, Courty, et al. 2016. “Optimal Spectral Transportation with Application to Music Transcription.” In arXiv:1609.09799 [Cs, Stat].

Fonseca, Plakal, Ellis, et al. 2019. “Learning Sound Event Classifiers from Web Audio with Noisy Labels.” arXiv:1901.01189 [Cs, Eess, Stat].

Fraccaro, Sø nderby, Paquet, et al. 2016. “Sequential Neural Models with Stochastic Layers.” In Advances in Neural Information Processing Systems 29.

Friston. 2008. “Variational Filtering.” NeuroImage.

Fukumizu, and Amari. 2000. “Local Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons.” Neural Networks.

Gal, and Ghahramani. 2015. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).

———. 2016. “A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In arXiv:1512.05287 [Stat].

Gemmeke, Ellis, Freedman, et al. 2017. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” In Proceedings of ICASSP 2017.

Geronimo, and Woerdeman. 2004. “Positive Extensions, Fejér-Riesz Factorization and Autoregressive Filters in Two Variables.” Annals of Mathematics.

Ghosh. 2017. “Towards a New Interpretation of Separable Convolutions.” arXiv:1701.04489 [Cs, Stat].

Goertzel. 1958. “An Algorithm for the Evaluation of Finite Trigonometric Series.” The American Mathematical Monthly.

Goodfellow, Vinyals, and Saxe. 2014. “Qualitatively Characterizing Neural Network Optimization Problems.” arXiv:1412.6544 [Cs, Stat].

Goodwin, and Vetterli. 1999. “Matching Pursuit and Atomic Signal Models Based on Recursive Filter Banks.” IEEE Transactions on Signal Processing.

Goudarzi, Banda, Lakin, et al. 2014. “A Comparative Study of Reservoir Computing for Temporal Signal Processing.” arXiv:1401.2224 [Cs].

Graves. 2012. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, v. 385.

Green, and Bass. 1984. “Representing Periodic Waveforms with Nonorthogonal Basis Functions.” IEEE Transactions on Circuits and Systems.

Gregor, and LeCun. 2010. “Learning fast approximations of sparse coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10).

———. 2011. “Efficient Learning of Sparse Invariant Representations.” arXiv:1105.5307 [Cs].

Gribonval, R. 2003. “Piecewise Linear Source Separation.” In Proc. Soc. Photographic Instrumentation Eng.

Gribonval, R., and Bacry. 2003. “Harmonic Decomposition of Audio Signals with Matching Pursuit.” IEEE Transactions on Signal Processing.

Gribonval, R., Figueras i Ventura, and Vandergheynst. 2006. “A Simple Test to Check the Optimality of a Sparse Signal Approximation.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing,.

Grosse, Raina, Kwong, et al. 2007. “Shift-Invariant Sparse Coding for Audio Classification.” In The Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007).

Gruslys, Munos, Danihelka, et al. 2016. “Memory-Efficient Backpropagation Through Time.” In Advances in Neural Information Processing Systems 29.

Gu, Albert, Johnson, Goel, et al. 2021. “Combining Recurrent, Convolutional, and Continuous-Time Models with Linear State Space Layers.” In Advances in Neural Information Processing Systems.

Gu, Shixiang, Levine, Sutskever, et al. 2016. “MuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In Proceedings of ICLR.

Haber, and Ruthotto. 2018. “Stable Architectures for Deep Neural Networks.” Inverse Problems.

Hamel, Davies, Yoshii, et al. 2013. “Transfer Learning In MIR: Sharing Learned Latent Representations For Music Audio Classification And Similarity.” In.

Hardt, Ma, and Recht. 2018. “Gradient Descent Learns Linear Dynamical Systems.” The Journal of Machine Learning Research.

Harris. 1978. “On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform.” Proceedings of the IEEE.

Haykin, ed. 2001. Kalman Filtering and Neural Networks. Adaptive and Learning Systems for Signal Processing, Communications, and Control.

Hazan, Levy, and Shalev-Shwartz. 2015. “Beyond Convexity: Stochastic Quasi-Convex Optimization.” In Advances in Neural Information Processing Systems 28.

Hazan, Singh, and Zhang. 2017. “Learning Linear Dynamical Systems via Spectral Filtering.” In NIPS.

Helén, and Virtanen. 2005. “Separation of Drums from Polyphonic Music Using Non-Negative Matrix Factorization and Support Vector Machine.” In Signal Processing Conference, 2005 13th European.

Helmholtz. 1863. Die Lehre von Den Tonempfindungen Als Physiologische Grundlage Für Die Theorie Der Musik.

Henaff, Jarrett, Kavukcuoglu, et al. 2011. “Unsupervised Learning of Sparse Features for Scalable Audio Classification.” In ISMIR.

Heyde. 1974. “On Martingale Limit Theory and Strong Convergence Results for Stochastic Approximation Procedures.” Stochastic Processes and Their Applications.

Hinton, Deng, Yu, et al. 2012. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine.

Hochreiter. 1998. “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.” International Journal of Uncertainty Fuzziness and Knowledge Based Systems.

Hochreiter, Bengio, Frasconi, et al. 2001. “Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.” In A Field Guide to Dynamical Recurrent Neural Networks.

Holan, Lund, and Davis. 2010. “The ARMA Alphabet Soup: A Tour of ARMA Model Variants.” Statistics Surveys.

Hornik. 1991. “Approximation Capabilities of Multilayer Feedforward Networks.” Neural Networks.

Hornik, Stinchcombe, and White. 1989. “Multilayer Feedforward Networks Are Universal Approximators.” Neural Networks.

Hoshen, Weiss, and Wilson. 2015. “Speech Acoustic Modeling from Raw Multichannel Waveforms.” In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on.

Hou, Lawrence, and Hero. 2016. “Penalized Ensemble Kalman Filters for High Dimensional Non-Linear Systems.” arXiv:1610.00195 [Physics, Stat].

Huang, Liu, Weinberger, et al. 2016. “Densely Connected Convolutional Networks.” arXiv:1608.06993 [Cs].

Hua, and Sarkar. 1990. “Matrix Pencil Method for Estimating Parameters of Exponentially Damped/Undamped Sinusoids in Noise.” IEEE Transactions on Acoustics, Speech and Signal Processing.

Huggins, and Zucker. 2007. “Greedy Basis Pursuit.” IEEE Transactions on Signal Processing.

Hürzeler, and Künsch. 2001. “Approximating and Maximising the Likelihood for a General State-Space Model.” In Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science.

Huszár. 2015. “How (Not) to Train Your Generative Model: Scheduled Sampling, Likelihood, Adversary?” arXiv:1511.05101 [Cs, Math, Stat].

Hyvärinen, and Hoyer. 2000. “Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces.” Neural Computation.

Ionides, Edward L., Bhadra, Atchadé, et al. 2011. “Iterated Filtering.” The Annals of Statistics.

Ionides, E. L., Bretó, and King. 2006. “Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.

Jaeger. 2002. Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the” Echo State Network” Approach.

Jing, Shen, Dubcek, et al. 2017. “Tunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR.

Johnson. 2012. “A Simple Explanation of A Spectral Algorithm for Learning Hidden Markov Models.” arXiv:1204.2477 [Cs, Stat].

Jost, Vandergheynst, and Frossard. 2006. “Tree-Based Pursuit: Algorithm and Properties.” IEEE Transactions on Signal Processing.

Jost, Vandergheynst, Lesage, et al. 2006. “MoTIF: An Efficient Algorithm for Learning Translation Invariant Dictionaries.” In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings.

Jozefowicz, Zaremba, and Sutskever. 2015. “An Empirical Exploration of Recurrent Network Architectures.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15).

Jung. 2013. “An RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29.

Kailath. 1980. Linear Systems. Prentice-Hall Information and System Science Series.

Kantas, Doucet, Singh, et al. 2009. “An Overview of Sequential Monte Carlo Methods for Parameter Estimation in General State-Space Models.” IFAC Proceedings Volumes, 15th IFAC Symposium on System Identification,.

Karpathy, Johnson, and Fei-Fei. 2015. “Visualizing and Understanding Recurrent Networks.” arXiv:1506.02078 [Cs].

Kaul. 2020. “Linear Dynamical Systems as a Core Computational Primitive.” In Advances in Neural Information Processing Systems.

Kavčić, and Moura. 2000. “Matrices with Banded Inverses: Inversion Algorithms and Factorization of Gauss-Markov Processes.” IEEE Transactions on Information Theory.

Kingma, Salimans, Jozefowicz, et al. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29.

Klapuri, Virtanen, and Heittola. 2010. “Sound Source Separation in Monaural Music Signals Using Excitation-Filter Model and Em Algorithm.” In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

Knudson, Yates, Huk, et al. 2014. “Inferring Sparse Representations of Continuous Signals with Continuous Orthogonal Matching Pursuit.” In Advances in Neural Information Processing Systems 27.

Kolter, and Manek. 2019. “Learning Stable Deep Dynamics Models.” In Advances in Neural Information Processing Systems.

Kong, Xu, Wang, et al. 2017. “A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data.” In Proceedings of ICASSP 2017.

Kreutz-Delgado, Murray, Rao, et al. 2003. “Dictionary Learning Algorithms for Sparse Representation.” Neural Computation.

Krishnamurthy, Can, and Schwab. 2022. “Theory of Gating in Recurrent Neural Networks.” Physical Review. X.

Krishnan, Shalit, and Sontag. 2015. “Deep Kalman Filters.” arXiv Preprint arXiv:1511.05121.

———. 2017. “Structured Inference Networks for Nonlinear State Space Models.” In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.

Krizhevsky, Sutskever, and Hinton. 2012. “Imagenet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems.

Kronland-Martinet, R., Guillemain, and Ystad. 1997. “Modelling of Natural Sounds by Time–Frequency and Wavelet Representations.” Organised Sound.

Kronland-Martinet, R, Guillemain, and Ystad. 2001. “From Sound Modeling to Analysis-Synthesis of Sounds.” In Workshop on Proceedings of MOSART Current Research Directions in Computer Music Workshop.

Kuleshov, Enam, and Ermon. 2017. “Audio Super-Resolution Using Neural Nets.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Kumar, and Raj. 2017. “Deep CNN Framework for Audio Event Recognition Using Weakly Labeled Web Data.” arXiv:1707.02530 [Cs].

Kutschireiter, Surace, Sprekeler, et al. 2015a. “A Neural Implementation for Nonlinear Filtering.” arXiv Preprint arXiv:1508.06818.

Kutschireiter, Surace, Sprekeler, et al. 2015b. “Approximate Nonlinear Filtering with a Recurrent Neural Network.” BMC Neuroscience.

Lamb, Goyal, Zhang, et al. 2016. “Professor Forcing: A New Algorithm for Training Recurrent Networks.” In Advances In Neural Information Processing Systems.

Laroche, Papadopoulos, Kowalski, et al. 2017. “Drum Extraction in Single Channel Audio Signals Using Multi-Layer Non Negative Matrix Factor Deconvolution.” In ICASSP.

Laurent, and von Brecht. 2016. “A Recurrent Neural Network Without Chaos.” arXiv:1612.06212 [Cs].

Law, West, and Mandel. 2009. “Evaluation of Algorithms Using Games: The Case of Music Tagging.” In.

Lee, Honglak, Grosse, Ranganath, et al. 2009. “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.” In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09.

Lee, Jongpil, Park, Kim, et al. 2017. “Sample-Level Deep Convolutional Neural Networks for Music Auto-Tagging Using Raw Waveforms.” In arXiv:1703.01789 [Cs].

Leglaive, Badeau, and Richard. 2017. “Multichannel Audio Source Separation: Variational Inference of Time-Frequency Sources from Time-Domain Observations.” In 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP). Proc. 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Lei, and Zhang. 2017. “Training RNNs as Fast as CNNs.” arXiv:1709.02755 [Cs].

Lewicki, Michael S. 2002. “Efficient Coding of Natural Sounds.” Nature Neuroscience.

Lewicki, M S, and Sejnowski. 1999. “Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations.” In NIPS.

Lewicki, Michael S., and Sejnowski. 2000. “Learning Overcomplete Representations.” Neural Computation.

Li, Yuhong, Cai, Zhang, et al. 2022. “What Makes Convolutional Models Great on Long Sequence Modeling?”

Li, Shuai, Li, Cook, et al. 2018. “Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN.” In arXiv:1803.04831 [Cs].

Lindström, Ionides, Frydendall, et al. 2012. “Efficient Iterated Filtering.” In IFAC-PapersOnLine (System Identification, Volume 16). 16th IFAC Symposium on System Identification.

Lipton, Berkowitz, and Elkan. 2015. “A Critical Review of Recurrent Neural Networks for Sequence Learning.” arXiv:1506.00019 [Cs].

Liu, Jen-Yu, Jeng, and Yang. 2016. “Applying Topological Persistence in Convolutional Neural Network for Music Audio Signals.” arXiv:1608.07373 [Cs].

Liu, Jane, and West. 2001. “Combined Parameter and State Estimation in Simulation-Based Filtering.” In Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science.

Li, Yanghao, Wang, Liu, et al. 2017. “Demystifying Neural Style Transfer.” In IJCAI.

Ljung, L. 1979. “Asymptotic Behavior of the Extended Kalman Filter as a Parameter Estimator for Linear Systems.” IEEE Transactions on Automatic Control.

Ljung, Lennart. 1999. System Identification: Theory for the User. Prentice Hall Information and System Sciences Series.

Ljung, Lennart, Pflug, and Walk. 2012. Stochastic Approximation and Optimization of Random Systems.

Mallat, and Zhang. 1993. “Matching Pursuits with Time-Frequency Dictionaries.” IEEE Transactions on Signal Processing.

Marelli, and Fu. 2010. “A Recursive Method for the Approximation of LTI Systems Using Subband Processing.” IEEE Transactions on Signal Processing.

Martens, and Sutskever. 2011. “Learning Recurrent Neural Networks with Hessian-Free Optimization.” In Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML’11.

———. 2012. “Training Deep and Recurrent Networks with Hessian-Free Optimization.” In Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science.

Masri, Bateman, and Canagarajah. 1997a. “A Review of Time–Frequency Representations, with Application to Sound/Music Analysis–Resynthesis.” Organised Sound.

———. 1997b. “The Importance of the Time–Frequency Representation for Sound/Music Analysis–Resynthesis.” Organised Sound.

Mattingley, and Boyd. 2010. “Real-Time Convex Optimization in Signal Processing.” IEEE Signal Processing Magazine.

McFee, Bertin-Mahieux, Ellis, et al. 2012. “The Million Song Dataset Challenge.” In.

McFee, and Ellis. 2011. “Analyzing Song Structure with Spectral Clustering.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Megretski. 2003. “Positivity of Trigonometric Polynomials.” In 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

Mehri, Kumar, Gulrajani, et al. 2017. “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Meinshausen, and Yu. 2009. “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics.

Mermelstein, and Chen. 1976. “Distance Measures for Speech Recognition: Psychological and Instrumental.” In Pattern Recognition and Artificial Intelligence,.

Meyer, Beutel, and Thiele. 2017. “Unsupervised Feature Learning for Audio Analysis.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Mhammedi, Hellicar, Rahman, et al. 2017. “Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR.

Młynarski, and McDermott. 2017. “Learning Mid-Level Auditory Codes from Natural Sound Statistics.” arXiv:1701.07138 [Cs, q-Bio].

Mohammed, and Scheutzow. 1997. “Lyapunov Exponents of Linear Stochastic Functional-Differential Equations. II. Examples and Case Studies.” The Annals of Probability.

Monner, and Reggia. 2012. “A Generalized LSTM-Like Training Algorithm for Second-Order Recurrent Neural Networks.” Neural Networks.

Moorer. 1974. “The Optimum Comb Method of Pitch Period Analysis of Continuous Digitized Speech.” IEEE Transactions on Acoustics, Speech and Signal Processing.

Moradkhani, Sorooshian, Gupta, et al. 2005. “Dual State–Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter.” Advances in Water Resources.

Mozer, Kazakov, and Lindsey. 2018. “State-Denoised Recurrent Neural Networks.” arXiv:1805.08394 [Cs].

Müller, Kurth, and Clausen. 2005a. “Audio Matching via Chroma-Based Statistical Features.” In Proc. Int. Conf. Music Info. Retrieval.

———. 2005b. “Chroma-Based Statistical Audio Features for Audio Matching.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

Narayan, Temchin, Recio, et al. 1998. “Frequency Tuning of Basilar Membrane and Auditory Nerve Fibers in the Same Cochleae.” Science.

Needell, and Tropp. 2008. “CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples.” arXiv:0803.2392 [Cs, Math].

Nerrand, Roussel-Ragot, Personnaz, et al. 1993. “Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms.” Neural Computation.

Nussbaum-Thom, Cui, Ramabhadran, et al. 2016. “Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units.” In.

Oliveira, and Skelton. 2001. “Stability Tests for Constrained Linear Systems.” In Perspectives in Robust Control. Lecture Notes in Control and Information Sciences.

Pascanu, Mikolov, and Bengio. 2013. “On the Difficulty of Training Recurrent Neural Networks.” In arXiv:1211.5063 [Cs].

Peeters. 2004. “A Large Set of Audio Features for Sound Description (Similarity and Classification) in the CUIDADO Project.”

Pillonetto. 2016. “The Interplay Between System Identification and Machine Learning.” arXiv:1612.09158 [Cs, Stat].

Pons, Lidy, and Serra. 2016. “Experimenting with Musically Motivated Convolutional Neural Networks.” In 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

Pons, and Serra. 2018. “Randomly Weighted CNNs for (Music) Audio Classification.” arXiv:1805.00237 [Cs, Eess].

Preis, and Georgopoulos. 1999. “Wigner Distribution Representation and Analysis of Audio Signals: An Illustrated Tutorial Review.” Journal of the Audio Engineering Society.

Qu, Li, Dai, et al. 2016a. “Learning Filter Banks Using Deep Learning For Acoustic Signals.” arXiv:1611.09526 [Cs].

———, et al. 2016b. “Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms.” arXiv:1611.09524 [Cs].

Rafii. 2018. “Sliding Discrete Fourier Transform with Kernel Windowing [Lecture Notes].” IEEE Signal Processing Magazine.

Ragazzini, and Zadeh. 1952. “The Analysis of Sampled-Data Systems.” Transactions of the American Institute of Electrical Engineers, Part II: Applications and Industry.

Rajan, Misra, and Murthy. 2017. “Melody Extraction from Music Using Modified Group Delay Functions.” International Journal of Speech Technology.

Rall. 1981. Automatic Differentiation: Techniques and Applications. Lecture Notes in Computer Science 120.

Ravelli, Richard, and Daudet. 2008. “Fast MIR in a Sparse Transform Domain.” In Int. Conf. Music Info. Retrieval.

Rawat, and Wang. 2017. “Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review.” Neural Computation.

Rebollo-Neira, Laura. 2007. “Oblique Matching Pursuit.” IEEE Signal Processing Letters.

Rebollo-Neira, L., and Lowe. 2002. “Optimized Orthogonal Matching Pursuit Approach.” IEEE Signal Processing Letters.

Robbins, and Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics.

Roberts, Engel, and Eck. 2017. “Hierarchical Variational Autoencoders for Music.” In NIPS Workshop on Machine Learning for Creativity and Design.

Robertson, and Plumbley. 2007. “B-Keeper: A Beat-Tracker for Live Performance.” In Proceedings of the 7th International Conference on New Interfaces for Musical Expression. NIME ’07.

Robertson, Stark, and Davies. 2013. “Percussive Beat Tracking Using Real-Time Median Filtering.” In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.

Robertson, Stark, and Plumbley. 2011. “Real-Time Visual Beat Tracking Using a Comb Filter Matrix.” In Proceedings of the International Computer Music Conference 2011.

Routtenberg, and Tabrikian. 2010. “Blind MIMO-AR System Identification and Source Separation with Finite-Alphabet.” IEEE Transactions on Signal Processing.

Rubinstein, Bruckstein, and Elad. 2010. “Dictionaries for Sparse Representation Modeling.” Proceedings of the IEEE.

Sainath, T. N., Kingsbury, Mohamed, et al. 2013. “Learning Filter Banks Within a Deep Neural Network Framework.” In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

Sainath, Tara N., and Li. 2016. “Modeling Time-Frequency Patterns with LSTM Vs. Convolutional Architectures for LVCSR Tasks.” Submitted to Proc. Interspeech.

Sainath, Tara N., Weiss, Senior, et al. 2015. “Learning the Speech Front-End with Raw Waveform CLDNNs.” In INTERSPEECH.

Särelä, and Valpola. 2005. “Denoising Source Separation.” Journal of Machine Learning Research.

Schniter, and Rangan. 2012. “Compressive Phase Retrieval via Generalized Approximate Message Passing.” In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

Sefati, Cowan, and Vidal. 2015. “Linear Systems with Sparse Inputs: Observability and Input Recovery.” In 2015 American Control Conference (ACC).

Seuret, and Gouaisbaut. 2013. “Wirtinger-Based Integral Inequality: Application to Time-Delay Systems.” Automatica.

Shah, Kumar, Hauptmann, et al. 2018. “A Closer Look at Weak Label Learning for Audio Events.” arXiv:1804.09288 [Cs, Eess].

Sjöberg, Zhang, Ljung, et al. 1995. “Nonlinear Black-Box Modeling in System Identification: A Unified Overview.” Automatica, Trends in System Identification,.

Smaragdis, Paris. 2004. “Non-Negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs.” In Independent Component Analysis and Blind Signal Separation. Lecture Notes in Computer Science.

Smaragdis, P., and Brown. 2003. “Non-Negative Matrix Factorization for Polyphonic Music Transcription.” In Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on.

Smith, Steven W. 1997. The Scientist and Engineer’s Guide to Digital Signal Processing.

Smith, Julius O. 2007. Introduction to Digital Filters with Audio Applications.

Smith, Evan C., and Lewicki. 2004. “Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters.” In Advances in Neural Information Processing Systems.

———. 2006. “Efficient Auditory Coding.” Nature.

Smith, Leslie N., and Topin. 2017. “Exploring Loss Function Topology with Cyclical Learning Rates.” arXiv:1702.04283 [Cs].

Söderström, and Stoica, eds. 1988. System Identification.

Soh, and Chandrasekaran. 2017. “A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.” arXiv:1701.01207 [Cs, Math, Stat].

Stepleton, Pascanu, Dabney, et al. 2018. “Low-Pass Recurrent Neural Networks - A Memory Architecture for Longer-Term Correlation Discovery.” arXiv:1805.04955 [Cs, Stat].

Sutskever. 2013. “Training Recurrent Neural Networks.”

Sutskever, Martens, Dahl, et al. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In ICML (3).

Szegedy, Liu, Jia, et al. 2015. “Going Deeper with Convolutions.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Tallec, and Ollivier. 2017. “Unbiasing Truncated Backpropagation Through Time.”

Telgarsky. 2017. “Neural Networks and Rational Functions.” In PMLR.

Thickstun, Harchaoui, and Kakade. 2017. “Learning Features of Music from Scratch.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Tong, Bickett, Christiansen, et al. 2007. “Learning Grammatical Structure with Echo State Networks.” Neural Networks.

Tran, Hoffman, Saurous, et al. 2017. “Deep Probabilistic Programming.” In ICLR.

Triefenbach, Jalalvand, Demuynck, et al. 2013. “Acoustic Modeling With Hierarchical Reservoirs.” IEEE Transactions on Audio, Speech, and Language Processing.

Tropp, Wakin, Duarte, et al. 2006. “Random Filters for Compressive Sampling and Reconstruction.” In Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing.

Tsipas, Vrysis, Dimoulas, et al. 2017. “Efficient Audio-Driven Multimedia Indexing Through Similarity-Based Speech / Music Discrimination.” Multimedia Tools and Applications.

Tufts, and Kumaresan. 1982. “Estimation of Frequencies of Multiple Sinusoids: Making Linear Prediction Perform Like Maximum Likelihood.” Proceedings of the IEEE.

Uncini. 2003. “Audio Signal Processing by Neural Networks.” Neurocomputing, Evolving Solution with Neural Networks,.

van den Oord, Dieleman, Zen, et al. 2016. “WaveNet: A Generative Model for Raw Audio.” In 9th ISCA Speech Synthesis Workshop.

van Eeghem, and De Lathauwer. 2013. “Blind System Identification as a Compressed Sensing Problem.”

Vaz, Toutios, and Narayanan. 2016. “Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data.” In.

Venkataramani, and Smaragdis. 2017. “End to End Source Separation with Adaptive Front-Ends.” arXiv:1705.02514 [Cs].

Venkataramani, Subakan, and Smaragdis. 2017. “Neural Network Alternatives to Convolutive Audio Models for Source Separation.” arXiv:1709.07908 [Cs, Eess].

Vincent, Bertin, and Badeau. 2008. “Harmonic and Inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch Transcription.” In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

Virtanen, Tuomas. 2006. “Unsupervised Learning Methods for Source Separation in Monaural Music Signals.” In Signal Processing Methods for Music Transcription.

Virtanen, T. 2007. “Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria.” IEEE Transactions on Audio, Speech, and Language Processing.

Wang, Zhong-Qiu, Roux, Wang, et al. 2018. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” arXiv:1804.10204 [Cs, Eess, Stat].

Wang, Xinxi, and Wang. 2014. “Improving Content-Based and Hybrid Music Recommendation Using Deep Learning.” In Proceedings of the 22Nd ACM International Conference on Multimedia. MM ’14.

Welch. 1967. “The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging over Short, Modified Periodograms.” IEEE Transactions on Audio and Electroacoustics.

Werbos. 1988. “Generalization of Backpropagation with Application to a Recurrent Gas Market Model.” Neural Networks.

———. 1990. “Backpropagation Through Time: What It Does and How to Do It.” Proceedings of the IEEE.

Wiatowski, Grohs, and Bölcskei. 2018. “Energy Propagation in Deep Convolutional Neural Networks.” IEEE Transactions on Information Theory.

Williams, and Peng. 1990. “An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories.” Neural Computation.

Williams, and Zipser. 1989. “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.” Neural Computation.

Wisdom, Powers, Hershey, et al. 2016. “Full-Capacity Unitary Recurrent Neural Networks.” In Advances in Neural Information Processing Systems.

Wisdom, Powers, Pitton, et al. 2016. “Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29.

Wright, Beauchamp, Fitz, et al. 2001. “Analysis/Synthesis Comparison.” Organised Sound.

Wu, Zhang, Zhang, et al. 2016. “On Multiplicative Integration with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 29.

Wyse. 2017. “Audio Spectrogram Representations for Processing with Convolutional Neural Networks.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [Cs.NE]).

Yaghoobi, Nam, Gribonval, et al. 2013. “Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling.” IEEE Transactions on Signal Processing.

Yin, Osher, Goldfarb, et al. 2008. “Bregman Iterative Algorithms for \(\ell_1\)-Minimization with Applications to Compressed Sensing.” SIAM Journal on Imaging Sciences.

Yoshii, and Goto. 2012. “Infinite Composite Autoregressive Models for Music Signal Analysis.” In.

Yu, D., and Deng. 2011. “Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].” IEEE Signal Processing Magazine.

Yu, Dong, and Li. 2018. “Recent Progresses in Deep Learning Based Acoustic Models (Updated).” arXiv:1804.09298 [Cs, Eess].

Yu, Guoshen, and Slotine. 2009. “Audio Classification from Time-Frequency Texture.” In Acoustics, Speech, and Signal Processing, IEEE International Conference on.

Yu, Haizi, and Varshney. 2017. “Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Zhang, Yuchen, Liang, and Wainwright. 2016. “Convexified Convolutional Neural Networks.” arXiv:1609.01000 [Cs].

Zhang, X., and Zbigniew. 2007. “Analysis of Sound Features for Music Timbre Recognition.” In International Conference on Multimedia and Ubiquitous Engineering, 2007. MUE ’07.

Zhu, Engel, and Hannun. 2016. “Learning Multiscale Features Directly from Waveforms.” In Interspeech 2016.

Zils, and Pachet. 2001. “Musical Mosaicing.” In Proceedings of DAFx-01.

Zinkevich. 2003. “Online Convex Programming and Generalized Infinitesimal Gradient Ascent.” In Proceedings of the Twentieth International Conference on International Conference on Machine Learning. ICML’03.