Some Experiments and Other Stuff

May 26th, 2017
I can't exactly remember what my train of thought was when I started to dive into this, but as a person that almost exclusively listen to dojin touhou remixes, that very likely had something to do with it. That being said, I wanted to see if I can train a model to generate touhou music.

The first thing that came to mind to use a neural network to do this. And with a search of neural network music generator I found a few different libraries that set out to accomplish this task, being generally divided into two different types: ones that took in midis/sheet music and ones tried to analyze waveforms. So far, I've tried one model of each type (I imagine the others are fairly similar). The two I've tried so far are Hexahedria's model and WaveNet.

Model 1:
Unfortunately, the wavenet model I trained on the entire touhou soundtrack ended up producing garbage results (which I would have known if I had read up some more. Give me back my time! ).

Model 2:
Hexahedria's model has some pretty decent results for what it is. Originally, I took midis ripped from touhou 6-8 as it's training corpus which provided tolerable results given the complete failure I had seen with the wavenet model.
Some samples from most to least tolerable:

http://a.pomf.cat/wvzsjd.mp3

http://a.pomf.cat/fezcom.mp3

http://a.pomf.cat/txamtz.mp3

Some obvious issues:
1) It's really dissonant.
2) There's tons of repetition/silence.

Model 2 (Slightly refined):
After reading up a bit more, I figured limiting the only instrument to the piano would provide much clearer results as it would reduce the relations that the neural net would have to learn, while at the same time giving me a larger corpus to work with. I used the midis provided on Kijiriki's youtube channel. I find the results from this model are more tolerable to listen to. The two issues mentioned above are still present, but seem slightly more bearable anyhow.
Some samples:

http://a.pomf.cat/sboeyc.mp3

http://a.pomf.cat/ucedxf.mp3

http://a.pomf.cat/lisham.mp3

And another sample that sounded pretty good from a lower training iteration:

http://a.pomf.cat/cqizpq.mp3

While I don't think people would mistakenly identify any of these results as a touhou track in a blind test, I can hear the touhou inspiration in them. It kind of sounds like an amateur bashing away at his keyboard during an improv session inspired by touhou, which is a pretty fair result in my books.

As a bit of proof that training this model actually resulted in something, I present you with a sample from a model that was trained on classical piano music, but uses the piano touhou music as input to generate new music. I assert it sounds pretty different from the above.
http://a.pomf.cat/xivxpn.mp3

Further Research:
I might give the slightly refined 2nd model some more training time (since I only let it run 10k epochs which was around 10 hours on my setup and it seems to still be getting better with each iteration). To address some of the problems present, I'll be examining using this tuner from Magenta and under the assumption that this won't magically solve all my problems, I'll look into this comment made on reddit as well as other papers on the subject.

Log in

Some Experiments and Other Stuff

Author

holyshin

Comments

Log in

Some Experiments and Other Stuff

Author

holyshin

Comments

Sort Comments

Useful Searches