Welcome to Lab Session 8: Python, Part 2
For this lab session we’ll continue with Python programming.
Let me know if you need help with any of exercises! I can also help explain concepts from the lecture if there was anything you found particularly challenging.
Additionally, don’t forget that you can discuss the exercises with your fellow classmates on MittUiB, the UiB Studyfellowship Discord server (channel: #ling123) or right here in ling123labs.com’s comments section.
The lecture notes for lecture 8 can be found here.
Exercise 1: Tokenizer with RE in Python
Take a look at the example in the lecture notes.
a) What are two important differences between the results of tokenize
and word_tokenize
?
b) What if we want to use word_tokenize but also want casefolding?
Exercise 2: Zippers
What happens if you try to take the next
of a zip generator that is exhausted?
Exercise 3: N-grams with Python
a) Once again, there is a difference between the results, due to a difference between tokenize
and word_tokenize
. Why would you use one or the other?
b) The resulting n-grams are in both cases generators. If you do not convert them to lists, you can use next
.
c) Look at the following. Use such a method to convert the list of n-gram tuples to a list of strings in which the words are separated by spaces.
>>> [" ".join(('alpha', 'beta', 'gamma'))]
['alpha beta gamma']
Additional Exercises
Exercise 4: NLTK Stopwords
Create a function that removes stopwords from a given text.
Tip:
1 2 from nltk.corpus import stopwords stoplist = stopwords.words('english')
Exercise 5: Matplotlib basics
Use the matplotlib
and pandas
libraries to draw a line chart of Bitcoin price data from bitcoin-prices.csv
:
1
2
3
4
5
6
7
8
9
10
Date,Open,High,Low,Close
Mar-09-2021,52272.97,54824.12,51981.83,54824.12
Mar-08-2021,51174.12,52314.07,49506.05,52246.52
Mar-07-2021,48918.68,51384.37,48918.68,51206.69
Mar-06-2021,48899.23,49147.22,47257.53,48912.38
Mar-05-2021,48527.03,49396.43,46542.51,48927.30
Mar-04-2021,50522.31,51735.09,47656.93,48561.17
Mar-03-2021,48415.81,52535.14,48274.32,50538.24
Mar-02-2021,49612.11,50127.51,47228.85,48378.99
Mar-01-2021,45159.50,49784.02,45115.09,49631.24
Tip: You can use
pandas.read_csv('bitcoin-prices.csv', parse_dates=True, index_col=0)
to read the data from the .csv file.
The output should look something like this: A line chart of Bitcoin price data, created using Matplotlib and Pandas
Feel free to play around with the Matplotlib library. For example, you can experiment with the parameters color
, linestyle
, linewidth
, marker
and markersize
to customize the graph.
Tip: If you have 4 line charts on a single plot, you need to set
color
to a list of four matplotlib colors.
You can also check out the matplotlib
documentation in its entirety here!