Deep Neural Network Architectures for Polyphonic Pitch Detection in violin recordings

碩士 === 國立成功大學 === 資訊工程學系 === 105 === Multiple pitch detection, an important issue in music information retrieval (MIR), is used in many applications including note separation, chord recognition, and automatic music transcription, all of which rely on a robust pitch estimation algorithm. In recent ye...

Full description

Bibliographic Details
Main Authors: Jia-TaiLin, 林嘉泰
Other Authors: Wen-Yu Su
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/hgc2kb
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系 === 105 === Multiple pitch detection, an important issue in music information retrieval (MIR), is used in many applications including note separation, chord recognition, and automatic music transcription, all of which rely on a robust pitch estimation algorithm. In recent years, the use of neural networks for polyphonic pitch detection has been studied, but there is room for improvement with respect to the pitch estimation of bowed string instruments. In this thesis, we investigate polyphonic pitch detection in violin recordings and apply three deep neural network (DNN) architectures for handling the problem of harmonic interference. Specifically, we adopt the RWC music database for training pitch estimation models to build training datasets including single notes and two-note, three-note, and four-note chords. In addition, we consider the playing techniques pizzicato and vibrato at five different intensity levels. Based on the pitch range of violins, we analyse the suitable parameters and architectures of the DNNs using customized octave bands. In addition to the numbers of layers and nodes, the most important differences of the three architectures are their input layers. In Architecture Ⅰ (Arch-Ⅰ), each input octave band is considered independently. Architecture Ⅱ (Arch-Ⅱ), which is an extension of Arch-Ⅰ, combines the second harmonics of the current octave band. In Architecture Ⅲ (Arch-Ⅲ), the input layer is composed of the current octave band and the estimation results of the lower octaves. We develop another DNN of the pitch classes and apply it to Arch-Ⅰ and Arch-Ⅱ to extract the correct pitch. We also use post-processing for pitch smoothing in the three architectures. Our evaluations show that Arch-Ⅲ has outstanding performance in violin solos and duos and Arch-Ⅰ and Arch-Ⅱ have similar results in violin solos, but Arch-Ⅱ performs better than Arch-Ⅰ in violin duos.