• Size: 314.98 KB
  • Uploaded: 2019-04-16 11:36:41
  • Status: Successfully converted

Some snippets from your converted document:

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS Juhan Nam Jiquan Ngiam Honglak Lee Malcolm Slaney Stanford University Stanford University Univ. of Michigan, Ann Arbor Yahoo! Research juhan@ccrma.stanford.edu jngiam@cs.stanford.edu honglak@eecs.umich.edu malcolm@ieee.org ABSTRACT which corresponds to a note class. They are trained with short-time acoustic features and labels for the corresponding Recently unsupervised feature learning methods have shown note class (i.e., note on/off) and then used to predict the note great promise as a way of extracting features from high di- labels for new input data. Although classification-based ap- mensional data, such as image or audio. In this paper, we proaches make minimum use of knowledge of acoustics, apply deep belief networks to musical data and evaluate the they show comparable results to iterative F0 searches and learned feature representations on classification-based poly- joint source estimation, particularly for piano music [9, 12]. phonic piano transcription. We also suggest a way of train- However, when the training set is limited or the piano in the ing classifiers jointly for multiple notes to improve training test set has different timbre, tuning or recording environ- speed and classification performance. Our method is evalu- ments, classification-based approaches can overfit the train- ated on three public piano datasets. The results show that the ing data, a problem common to many supervised learning learned features outperform the baseline features, and also tasks [13]. As a means to obtain features robust to acoustic our method gives significantly better frame-level accuracy variations, researchers have designed networks of adaptive than other state-of-the-art music transcription methods. oscillators on auditory filter banks or normalized spectro- 1. INTRODUCTION gram on the frequency axis [9, 12]. The majority of machine learning tasks rely on these kinds Music transcription is the task of transcribing audio into a of hand-engineered approaches to extract features. Recently, score. It is a challenging problem because multiple notes on the other hand, unsupervised feature learning methods are often played at once (polyphony), and thus individual that automatically capture the statistical relationship in data notes interfere by virtue of their harmonic relations. and learn feature representations have shown great promise. A number of methods have been proposed since Moorer In particular, deep belief networks have been successfully first attempted to use computers for automatic music tran- applied to many computer-vision and speech-recognition ta- scription [10]. State-of-the-art methods can be categorized sks as an alternative to typical feature-extraction methods, into three approaches: iterative F0 searches, joint source es- but also a few music-related tasks [4, 8]. timation and classification-based approaches. Iterative F0- searching methods first find the predominant F0 and subtract In this paper, we apply

Recently converted files (publicly available):