This post is going to be a strange one. There are a few things that feed into the following information, all of which seem disparate but, hopefully, will come together to make some sort of sense.
As a disclaimer, I will state openly that I am new to the use of neural networks in general. On the other hand, I have nearly a decade's worth of intimate connection to Shakespeare. With all that in mind, I want to show you what I've made. Let's begin by examining one of my favorite sonnets:
Sonnet LXXIII
That time of year thou mayst in me behold
When yellow leaves, or none, or few, do hang
Upon those boughs which shake against the cold,
Bare ruin'd choirs, where late the sweet birds sang.
In me thou seest the twilight of such day
As after sunset fadeth in the west,
Which by and by black night doth take away,
Death's second self, that seals up all in rest.
In me thou see'st the glowing of such fire
That on the ashes of his youth doth lie,
As the death-bed whereon it must expire
Consumed with that which it was nourish'd by.
This thou perceivest, which makes thy love more strong,
To love that well which thou must leave ere long.
It's beautiful. There's no question about that. I'm not here to analyze the poetry itself, though that would be fun and might be done in future posts. No - the mission today is to try and make a computer generate things that might, plausibly, be Shakespearean.
Collecting and Cleaning the Data
Data collection is important for any project. Thankfully, Shakespeare has been dead for a long time and his lawyers let his copyright lapse some time ago. The Gutenberg Project has a full collection of his sonnets available in text format, which is what I'll be using to train my neural network. It'll also be used for the Markov text generation, which is a separate but interesting project alongside the neural network. The Markov text generation topic will be covered later.If you'd like a copy of the cleaned text file that I created (this basically just means removing the Sonnet headings), feel free to check out this link to my github project.
Going forward, I'll try and be as transparent as possible about where I collected my data from. As this was a fairly simple project, this section is pretty light.
Neural Networks and Learning
Neural Networks are weird. I've watched a few dozen hours of videos on how, exactly, neural networks work. They're fascinating and alien in nature. It's a way for people, using algorithms, to train a computer to recognize something or process data without being given really specific instructions. The "training" is based on scoring the network using training data, so that each time it makes a guess it is told whether it was right or how badly it was wrong. To steal a graphic from Wikipedia for demonstration purposes:The astute observer will see that a large section of this photo says "hidden". This does not mean that neural networks are magic, merely that the layers in between are not visible as the network learns.
All of this sounds neat, though it's both unlikely to be 100% accurate and also nowhere near comprehensive on the subject. As I said, I'm not an expert and likely wont be for some time (if ever). The reason I started this project was to learn about neural networks and machine learning more generally, and this is merely the first step in a long process towards doing that.
What I Did
I'm using Python (as I always do) alongside some specific packages or tutorials that I found pretty easy to get used to as a beginner. The specific process for getting everything together and running was as follows:- Install Ubuntu on my personal laptop.
- Install Anaconda on the laptop.
- Install TensorFlow GPU to a conda environment.
- Find tutorials on Neural Network text generation through training.
Installing Ubuntu
This was easy. All you have to do is load Ubuntu onto a thumb drive and install from the boot menu. I used Rufus. I chose Ubuntu because I've used it before, though I was told that ArchLinux is actually better suited to easily install TensorFlow GPU.Installing Anaconda
This was much harder for me to do on Linux than it was on Windows. For one, I'm not super familiar with Linux environments, so the installation process of anything is more convoluted merely from a lack of familiarity. Once I became more familiar with installing/uninstalling programs and python packages, I've actually found it to be as easy if not easier as on Windows.Install TensorFlow GPU in a Conda Environment
Pretty easy to do, given the right instructions. Testing whether it's running properly is as easy as typing "nvidia smi" into the terminal window once you're running a training model and seeing whether your GPU usage has gone up.Find tutorials on Neural Network Text Generation
This was marginally more difficult. I perused the net for a while before settling on a tutorial I liked, though it ended up producing results I was unhappy with. Eventually, I switched packages to something simpler that worked about the same.Results
Here's where the fun comes in. Let's see what some of the resulting stuff looked like. I'll comment on the process as we go and try to give some insight into what is happening. We'll first look at my initial pass on Neural Network text generation and see the results from that.First Pass using Online Tutorial
The results from this were less than stellar for a few reasons which I'll get into. If you'd like to see "my code" (which is just the code from the tutorial), check out the "training.ipynb" file on the above linked GitHub repo.Before we start, I should explain a core concept in neural networks: Epochs. An epoch is one complete pass of a dataset against the algorithm that needs to be trained. In general, a tiny number of epochs (for example, one or two) will not be enough to train a neural network to perform how you'd like it to. You'd think that a huge number of epochs would train the model to be amazing at doing the thing you want, but this is not necessarily the case. When generating my model, I found that the loss function began increasing after a certain point in time if I ran it for enough epochs.
A loss function is, in essence, how close your neural network performs on a given epoch versus the expected answer. It's important to know that a neural network requires training data, so it's essentially already given a good batch of test information to run against and see how many it gets right - this is the loss function. It's a positive number, and the closer it is to zero the "better" the neural network is performing.
When I first ran the training model, I was pretty disappointed. I gave the model three training epochs to begin with - as per the tutorial - and got very bad results. Here's a few examples, which I've actually rerun just for this post to explain.
God tinde of wu feld merend his dy iwhyich tove, Ther welze af ogerume sferak CU thove pan, Bert therI th stp thvengnond, Wos challf aly iw than hed mey than be shre, phathe fry sadd, Nite mowjer solsyo nw par s thy IlrI woy te thais ing no ererere mee horis poraneg, by halg or mseens I dessco ghaten! pare; WThr erisud thinind wopelir bpilh susry ard mond Tlow hiis ho chongerbe, Toj pilly A the ou to byy vis ind ow tocm ka thaigr file, Theg, fone, And wfist lene wian thos buee, fous mantof thes the malat, AngAn s in shor sy witu sive moif told oude thensrenang? CRtor asiesmr whapurer high than? X X)FAn fyo slels thy int ofarn see, maperee H rowk sure, fif thauth dheurt; Set con bt my thind be whow wyor forpanded the allen e, sigl's bu: s alisneung yavif whand of in mo staty inowas trals thtind paclllft asthein mare. SaAnd mime fine co rsanthes silt mece hoy ;o f, F zo bebe, ghou imy peat ind swrang for feand fouf beres sidw, Shin in coud I Whuld at be
As you can see, my neural network was close to Shakespeare already. More seriously, my first attempted "solution" was to increase the number of epochs that my model ran through. At this point, the loss function was sitting at around 2.3, which is bad for this sort of thing in my experience. Here's an example of the model generating text with ten epochs.
God? Tikeds ip cwats uppalftuct the stiy thy say keec My be is hew, of nights that neied by me farm it rid dith my arontores, dear mait, And batt; sond shaver rinds bept ot'ar'd stownich priend; Indefrawith pliesuscige bay tot thy lesss and, For that preaugy bations gostion, that your frows will with wight. I vosting, and gater sill cansed wor ast songte Co live so sham well, For the cond tither to jeight, But fir tham thou wepth tur you what thy abmexp hanst is then mortaich courls tight thow the onrers's farme's rens, Whou aim blong shile, Then bestyoun beauty on messight, and miching eyest, And sall lenome, thene ming, reauts of thoug truppine: Be; And thy heart weal acain you destrecs, Bus braws not sum ree Of horsh abine eut stail hilds askion I pair, Leste to see wetise love, ard dis colls, Thing, awing fall wieth dishads will faserest, Prenadien'd that all the solf, atifse inore. XXXIII Ag that de sunds one wist till dingrerl'? Sall thil our farus f
Ignoring the fact that this looks like complete gibberish when you read it, you can see that it's changed two very important things: first, the sentence length is marginally more coherent, which means it has slightly more a sense of what it's meant to be doing. Second, it included a Sonnet Number, which I found funny. I decided to leave the sonnet numbers in at this point just for the laughs. If anything, it probably hurts my model, but I like to imagine that it's reading off it's best attempt at a thirty-third sonnet.
The loss function at this point was closer to 1.7, which is an improvement over the prior model which only had three epochs. Now lets go ahead and cook with gas - I'm going to show the output at thirty epochs.