# How to Build a Cool Markov Chain Lyrics Generator (Python)

This Markov Chain Lyrics Generator is one of my favourite projects. Basically, it uses a type of prediction model (called a Markov Chain) to generate its own version of your favourite artist's lyrics. These end up being super hilarious sometimes, because they sound similar to the artist but just... not quite. AI generated stuff is totally my sense of humour so I've literally spent hours laughing at the responses. Call me lame, I dare you.

A Markov Chain is the simplest kind of prediction model - it only uses the previous state to predict the next one. In our case, it only uses the previous word to predict the next word in the lyrics.

Dive right into the code from GitHub [here](https://github.com/catmcgee/220-tutorials-markov-lyrics-generator).

I feel like this is one of those recipe blogs where you have to read through the rambling nonsense before getting to the actual recipe. Sorry about that. Let's get on with it.

## Step 1: Install dependencies

Python3 - [Download here](https://www.python.org/downloads/)

`re` - Python’s regex library. We’ll be using this when scraping a website to find the links to lyrics

`pip install re`

`urllib` \- URL library. We use this library to scrape the HTML off of a page and read it to a string.

`pip install urllib`

`markovify` - This library can generate a Markov Chain for us.

`pip install markovify`

## Step 2: Create a file for markovify to read from We’re going to be putting all of the artist’s lyrics into one file for markovify to read from.

```plaintext
originalLyrics = open('lyrics.txt', 'w')
```

## Step 3: Scrape the links to the lyrics

Here, we *could* manually input all the lyrics into the file, or use a paid API. But this way is more fun.

We’ll be using AZLyrics to scrape the lyrics of your artist and place them in `lyrics.txt`.

The first thing we need to do here is go to the artist’s page on AZLyrics and find every link on the page that links to a song. Have look at the Coldplay page on AZLyrics to see what I mean (we'll be using Coldplay in this tutorial. I don't really know why.)

Use the `urllib` library to request the HTML from the artist’s page, and convert it into a string using read so we can easily search it for links.

```plaintext
url = "https://www.azlyrics.com/c/coldplay.html"
artistHtml = urllib.request.urlopen(url)
artistHtmlStr = str(artistHtml.read())
To find the links on the page, we’ll be using the re library we imported which will find all a link.

links = re.findall('href="([^"]+)"', artistHtmlStr)
```

Now you have a list called `links` which contains all of the links on the artist page! Try printing the links to see what happens.

```plaintext
print(links)
```

Notice anything? If you look through the links, you’ll see that you have all of the links to the lyrics pages (great!) but also to other pages, like contact pages and even the CSS files. Another thing you should notice is that all of the lyrics links have the string `lyrics/coldplay` in them.

Now we have to filter those links so that we only get the lyrics pages. Initiate another list that will hold only lyrics links.

```plaintext
songLinks = []
```

We’ll loop through the `links` list, selecting everything that contains the string `lyrics/coldplay` and appending them to the `songLinks` list.

Still with me?

```plaintext
for x in links:
    if "lyrics/coldplay" in x:
        songLinks.append(x)
```

Print `songLinks` outside of the for loop to see what you end up with. Great - only the lyrics links! However, we still have another problem. The links do not have the full URL, only the end. And they have `..` in them. Before adding the links to the songLinks array, we need to append the beginning of the URL to the link, and replace `..` with an empty string. Update your if statement to look like this.

```plaintext
for x in links:
    if "lyrics/coldplay" in x:
        x = x.replace("..", "")
        x = "https://www.azlyrics.com/" + x
        songLinks.append(x)
```

Print songLinks again and see what happens. Now we have only full lyrics links. Awesome.

## Step 4: Scrape the lyrics from the lyrics links

The next step is to get the HTML from each link the `songLinks` list, find the actual lyrics and nothing else, and save it to the file we created in Step 1. Alright, let’s go.

Firstly, we’ll need to iterate through our `songLinks` list and scrape the HTML, converting it into a string. We already did this for the artist page in Step 2.

```plaintext
for x in songLinks:
    songHtml = urllib.request.urlopen(x)
    songHtmlStr = str(songHtml.read())
```

If we print `songHtmlStr` here, we’ll get everything on the song page. We don’t want that, or our prediction model will be taking from more text than just the lyrics. It'll be like 'yeah, baby, contact us, join the newsletter.' So we need to find where the lyrics are on the page, and split our string so that we’re only adding the lyrics to the file.

Look at a lyrics page on AZLyrics and right click the part where the lyrics are. When you click `inspect` (or `inspect element` depending on your browser), you’ll see that the lyrics start after a disclaimer (‘content by any third-party…’) and end at the end tag `</div>`. So we need to split our string twice - once to get all HTML after the disclaimer, and another time to get all the HTML before `</div>`.

Luckily, Python has a super easy way to split a string - `split()`. The split function takes an argument for the text in the string that marks the end of one split and the beginning of the other. It returns a list of strings, but we can simply get get the first or second item in that list.

To split our lyrics after the disclaimer, we pass two arguments - the disclaimer text, and `1` because we only want to split the string twice, not at every instance of the disclaimer.

```plaintext
split = songHtmlStr.split('content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->',1)
```

Then we split the new string, `splitHtml`, at the instance of `</div>`, again only splitting once, and the first item in the list will be our lyrics!

```plaintext
split_html = split[1]
    split = split_html.split('</div>',1)
    lyrics = split[0]
```

Try printing this `lyrics` list - you’ll see we have all the scraped lyrics! However, we have another problem. We’ve picked up all of the extra bits like `<br>` and `\n` that we don’t want. Use replace methods to replace each of these with empty strings or whatever you like. I used these ones in the video, but they’re not perfect - you can write your own!

```plaintext
lyrics = lyrics.replace('\\', '')
lyrics = lyrics.replace('\nn', '\n')
lyrics = lyrics.replace('<i>', '')
lyrics = lyrics.replace('</i>', '')
lyrics = lyrics.replace('[Chorus]', '')
```

Now all we have to do is write these lyrics to the file we had opened earlier! Once we’ve finished going through the loop, we’ll close the file - we no longer need to write anything to it.

Here’s what the whole for loop should look like.

```plaintext
for x in songLinks:
    songHtml = urllib.request.urlopen(x)
    songHtmlStr = str(songHtml.read()) 
    split = songHtmlStr.split('content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->',1)
    splitHtml = split[1]
    split = splitHtml.split('</div>',1)
    lyrics = split[0]
    lyrics = lyrics.replace('<br>', '\n')
    lyrics = lyrics.replace('\\', '')
    lyrics = lyrics.replace('\nn', '\n')
    lyrics = lyrics.replace('<i>', '')
    lyrics = lyrics.replace('</i>', '')
    lyrics = lyrics.replace('[Chorus]', '')
    originalLyrics.write(lyrics)
originalLyrics.close()
```

## Step 5: Generate the new lyrics

This step is where the magic happens, but it's also the easiest part. Python’s `markovify` does everything for us! We need to create two variables - one that can store the new generated lyrics, and one string variable that markovify can read from. To create this string, we’re going to read() our `lyrics.txt` file that we created and filled with lyrics. This time, we’ll open it in read-only mode ('r').

```plaintext
generatedlyrics = ()
file = open('lyrics.txt', 'r')
text = file.read()
```

We’ll pass our text variable to markovify to generate a markovify model. This is super easy:

```plaintext
markovifyTextModel = markovify.Text(text)
```

Now all we need to do is use that markovify model to generate a sentence. There are hundreds of things you can do with this model, but we’ll be using make\_sentence, a markovify method that just predicts one sentence.

```plaintext
generatedlyrics = markovifyTextModel.make_sentence()
```

Print generatedLyrics and you’ll see some predicted Coldplay lyrics! These can be absolutely hilarious, and sometimes quite morbid… my one in the video is 'I took my son.'

Top tip: You may want to put all the steps into different files or functions or comment out all the code before Step 5. You don’t need to create a new `lyrics.txt` file every time.

## Now it’s your turn!

To really make the most out of these tutorials, try to build something on top of this project. Here are some ideas:

* Try lots of different artists
    
* Make it into a Twitter bot - I actually created one for [Led Zeppelin](https://twitter.com/GeneratedZepp) a while ago, but it's inactive now
    
* Create functions so you don’t have to run the scraper every time
    
* Use filter instead of replace ([https://www.tutorialspoint.com/lambda-and-filter-in-python-examples](https://www.tutorialspoint.com/lambda-and-filter-in-python-examples))
    
* Ask the user what artist they would like at the beginning
    
* Make a GUI
    
* Generate more than one sentence - check out what else Markovify can do!
    
* Use the same technique to scrape other websites and create Markov Chains for scripts or speeches
    

And there you have it! A funny project you can show your friends. [DM me on Twitter](https://twitter.com/CatMcGeeCode) or comment here when you've build it and let me see some of your results!
