Posts Lab Session 08
Post
Cancel

Lab Session 08


Python Logo

Welcome to Lab Session 8: Python, Part 2

For this lab session we’ll continue with Python programming.

Let me know if you need help with any of exercises! I can also help explain concepts from the lecture if there was anything you found particularly challenging.

Additionally, don’t forget that you can discuss the exercises with your fellow classmates on MittUiB, the UiB Studyfellowship Discord server (channel: #ling123) or right here in ling123labs.com’s comments section.

The lecture notes for lecture 8 can be found here.


Exercise 1: Tokenizer with RE in Python

Take a look at the example in the lecture notes.
a) What are two important differences between the results of tokenize and word_tokenize?
b) What if we want to use word_tokenize but also want casefolding?

Exercise 2: Zippers

What happens if you try to take the next of a zip generator that is exhausted?

Exercise 3: N-grams with Python

a) Once again, there is a difference between the results, due to a difference between tokenize and word_tokenize. Why would you use one or the other?

b) The resulting n-grams are in both cases generators. If you do not convert them to lists, you can use next.

c) Look at the following. Use such a method to convert the list of n-gram tuples to a list of strings in which the words are separated by spaces.

>>> [" ".join(('alpha', 'beta', 'gamma'))]
['alpha beta gamma']



Additional Exercises


Exercise 4: NLTK Stopwords

Create a function that removes stopwords from a given text.

Tip:

1
2
from nltk.corpus import stopwords
stoplist = stopwords.words('english')


Exercise 5: Matplotlib basics

Use the matplotlib and pandas libraries to draw a line chart of Bitcoin price data from bitcoin-prices.csv:

1
2
3
4
5
6
7
8
9
10
Date,Open,High,Low,Close
Mar-09-2021,52272.97,54824.12,51981.83,54824.12
Mar-08-2021,51174.12,52314.07,49506.05,52246.52
Mar-07-2021,48918.68,51384.37,48918.68,51206.69
Mar-06-2021,48899.23,49147.22,47257.53,48912.38
Mar-05-2021,48527.03,49396.43,46542.51,48927.30
Mar-04-2021,50522.31,51735.09,47656.93,48561.17
Mar-03-2021,48415.81,52535.14,48274.32,50538.24
Mar-02-2021,49612.11,50127.51,47228.85,48378.99
Mar-01-2021,45159.50,49784.02,45115.09,49631.24

Tip: You can use pandas.read_csv('bitcoin-prices.csv', parse_dates=True, index_col=0) to read the data from the .csv file.

The output should look something like this: A line chart of Bitcoin price data A line chart of Bitcoin price data, created using Matplotlib and Pandas
Feel free to play around with the Matplotlib library. For example, you can experiment with the parameters color, linestyle, linewidth, marker and markersize to customize the graph.

Tip: If you have 4 line charts on a single plot, you need to set color to a list of four matplotlib colors.

You can also check out the matplotlib documentation in its entirety here!

This post is licensed under CC BY 4.0 by the author.