Here’s how I would do art with machine learning if I had to

June 6, 2016 — February 1, 2022

buzzword
computers are awful
generative art
machine learning
making things
music
neural nets
photon choreography

I’ve a weakness for ideas that give me plausible deniability for making generative art while doing my maths homework.

NB I have recently tidied this page up but the content is not fresh; there is too much happening in the field to document.

Figure 1: Quasimondo: so do you.

This page is more chaotic than the already-chaotic median, sorry. Good luck making sense of it. The problem is that this notebook is in the anti-sweet spot of “stuff I know too much about to need notes but not working on enough to promote”.

Some neural networks are generative, in the sense that if you train ’em to classify things, they can also predict new members of the class. e.g. run the model forwards, it recognises melodies; run it “backwards”, it composes melodies. Or rather, you maybe trained them to generate examples in the course of training them to detect examples. There are many definitional and practical wrinkles, and this ability is not unique to artificial neural networks, but it is a great convenience, and the gods of machine learning have blessed us with much infrastructure to exploit this feature, because it is close to actual profitable algorithms. Upshot: There is now a lot of computation and grad student labour directed at producing neural networks which as a byproduct can produce faces, chairs, film dialogue, symphonies and so on.

Perhaps other people will be more across this?

Oh and also google’s AMI channel, and ml4artists, which publishes sweet machine learning for artists topic guides.

There are NeurIPS streams about this now.

Figure 2

1 Visual synthesis

There is a lot going on, which I should triage. Important example: Maybe I should do generative art with neural diffusion networks.

1.1 AI image editors

See image editing with aI.

1.2 Style transfer and deep dreaming

You can do style transfer a number of ways, including NN inversion and GANs.

See those classic images from google’s tripped-out image recognition systems. Here’s a good explanation of what is going on.

  • deep art:

    Our mission is to provide a novel artistic painting tool that allows everyone to create and share artistic pictures with just a few clicks. All you need to do is upload a photo and choose your favourite style. Our servers will then render your artwork for you.

  • For example, OPEN_NSFW was my favourite (NSFW).

  • Differentiable Image Parameterizations looks at style transfer with respect to different decompositions of the image surface. (There is stuff to follow up about checkerboard artefacts in NNs which I suspect is generally important.)

  • Self-Organising Textures stitches these two together, using a VGG discriminator as a loss function for training textures.

  • Deep dream generator does the classic deep-dreaming style perturbations

  • fast style transfer.

1.3 GANs

Clever uses of GANs that are not style transfer.

Figure 3

1.4 Incoming

2 Text synthesis

At one point, Ross Gibson’s Adventures in narrated reality was a state-of-the-art text generation using RNNs. He even made a movie of a script generated that way. But now, massive transformer models have left it behind technologically, if not humorously.

3 Music

3.1 Symbolic composition via scores/MIDI/etc

See ML for composition.

3.2 Audio synthesis

See analysis/resynthesis, voice fakes.

4 Incoming

5 References

Boulanger-Lewandowski, Bengio, and Vincent. 2012. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.
Bown, and Lexer. 2006. Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance.” In Applications of Evolutionary Computing. Lecture Notes in Computer Science 3907.
Briot, and Pachet. 2020. Deep Learning for Music Generation: Challenges and Directions.” Neural Computing and Applications.
Carlier, Danelljan, Alahi, et al. 2020. DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation.” In.
Champandard. 2016. Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks.” arXiv:1603.01768 [Cs].
Denton, Chintala, Szlam, et al. 2015. Deep Generative Image Models Using a Laplacian Pyramid of Adversarial Networks.” arXiv:1506.05751 [Cs].
Dieleman, and Schrauwen. 2014. End to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Dosovitskiy, Springenberg, Tatarchenko, et al. 2014. Learning to Generate Chairs, Tables and Cars with Convolutional Networks.” arXiv:1411.5928 [Cs].
Dumoulin, Shlens, and Kudlur. 2016. A Learned Representation For Artistic Style.” arXiv:1610.07629 [Cs].
Gatys, Ecker, and Bethge. 2015. A Neural Algorithm of Artistic Style.” arXiv:1508.06576 [Cs, q-Bio].
Goodfellow, Ian, Pouget-Abadie, Mirza, et al. 2014. Generative Adversarial Nets.” In Advances in Neural Information Processing Systems 27. NIPS’14.
Goodfellow, Ian J., Shlens, and Szegedy. 2014. Explaining and Harnessing Adversarial Examples.” arXiv:1412.6572 [Cs, Stat].
Gregor, Danihelka, Graves, et al. 2015. DRAW: A Recurrent Neural Network For Image Generation.” arXiv:1502.04623 [Cs].
Gregor, and LeCun. 2010. Learning fast approximations of sparse coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10).
———. 2011. Efficient Learning of Sparse Invariant Representations.” arXiv:1105.5307 [Cs].
Grosse, Salakhutdinov, Freeman, et al. 2012. Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
He, Wang, and Hopcroft. 2016. A Powerful Generative Model Using Random Weights for the Deep Image Representation.” In Advances in Neural Information Processing Systems.
Hinton, and Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks.” Science.
Jetchev, Bergmann, and Vollgraf. 2016. Texture Synthesis with Spatial Generative Adversarial Networks.” In Advances in Neural Information Processing Systems 29.
Jing, Yang, Feng, et al. 2017. Neural Style Transfer: A Review.” arXiv:1705.04058 [Cs].
Johnson, Alahi, and Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution.” arXiv:1603.08155 [Cs].
Karras, Aila, Laine, et al. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation.” In Proceedings of ICLR.
Karras, Laine, and Aila. 2018. A Style-Based Generator Architecture for Generative Adversarial Networks.” arXiv:1812.04948 [Cs, Stat].
Larsen, Sønderby, Larochelle, et al. 2015. Autoencoding Beyond Pixels Using a Learned Similarity Metric.” arXiv:1512.09300 [Cs, Stat].
Lazaridou, Nguyen, Bernardi, et al. 2015. Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation.” arXiv:1506.03500 [Cs].
Li, Wang, Liu, et al. 2017. Demystifying Neural Style Transfer.” In IJCAI.
Luo, Chen, Hershey, et al. 2016. Deep Clustering and Conventional Networks for Music Separation: Stronger Together.” arXiv:1611.06265 [Cs, Stat].
Malmi, Takala, Toivonen, et al. 2016. DopeLearning: A Computational Approach to Rap Lyrics Generation.” arXiv:1505.04771 [Cs].
Mital. 2017. Time Domain Neural Audio Style Transfer.” arXiv:1711.11160 [Cs].
Mnih, and Gregor. 2014. Neural Variational Inference and Learning in Belief Networks.” In Proceedings of The 31st International Conference on Machine Learning. ICML’14.
Mordvintsev, Pezzotti, Schubert, et al. 2018. Differentiable Image Parameterizations.” Distill.
Mordvintsev, Randazzo, Niklasson, et al. 2020. Growing Neural Cellular Automata.” Distill.
Niklasson, Mordvintsev, Randazzo, et al. 2021. Self-Organising Textures.” Distill.
Olah, Mordvintsev, and Schubert. 2017. Feature Visualization.” Distill.
Sarroff, and Casey. 2014. Musical Audio Synthesis Using Autoencoding Neural Nets.” In.
Sigtia, Benetos, Boulanger-Lewandowski, et al. 2015. A Hybrid Recurrent Neural Network for Music Transcription.” In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Smith, and Lewicki. 2006. Efficient Auditory Coding.” Nature.
Stanley. 2007. Compositional Pattern Producing Networks: A Novel Abstraction of Development.” Genetic Programming and Evolvable Machines.
Sturm, Ben-Tal, Monaghan, et al. 2018. Machine Learning Research That Matters for Music Creation: A Case Study.” Journal of New Music Research.
Sun, Liu, Zhang, et al. 2016. Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding.” arXiv:1611.05416 [Cs].
Theis, and Bethge. 2015. Generative Image Modeling Using Spatial LSTMs.” arXiv:1506.03478 [Cs, Stat].
Ulyanov, Vedaldi, and Lempitsky. 2016. Instance Normalization: The Missing Ingredient for Fast Stylization.” arXiv:1607.08022 [Cs].
———. 2017. Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis.” arXiv:1701.02096 [Cs].
Oord, Aaron van den, Dieleman, Zen, et al. 2016. WaveNet: A Generative Model for Raw Audio.” In 9th ISCA Speech Synthesis Workshop.
Oord, Aäron van den, Kalchbrenner, and Kavukcuoglu. 2016. Pixel Recurrent Neural Networks.” arXiv:1601.06759 [Cs].
Oord, Aäron van den, Kalchbrenner, Vinyals, et al. 2016. Conditional Image Generation with PixelCNN Decoders.” arXiv:1606.05328 [Cs].
Walder. 2016a. Modelling Symbolic Music: Beyond the Piano Roll.” arXiv:1606.01368 [Cs].
———. 2016b. Symbolic Music Data Version 1.0.” arXiv:1606.02542 [Cs].
Wu, Shen, Hengel, et al. 2015. What Value High Level Concepts in Vision to Language Problems? arXiv:1506.01144 [Cs].
Wyse. 2017. Audio Spectrogram Representations for Processing with Convolutional Neural Networks.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [Cs.NE]).
Yu, D., and Deng. 2011. Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].” IEEE Signal Processing Magazine.
Yu, Haizi, and Varshney. 2017. “Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.
Zhu, Krähenbühl, Shechtman, et al. 2016. Generative Visual Manipulation on the Natural Image Manifold.” In Proceedings of European Conference on Computer Vision.
Zukowski, and Carr. 2017. “Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles.” In 31st Conference on Neural Information Processing Systems (NIPS 2017).