Posts Lab Session 01

Lab Session 01


Welcome to the LING123 Lab Sessions!

In this very first lab session, we’ll begin with looking at some basic concepts for computational linguistics and processing language data. After that, we’re getting started with shell scripting!

Complete the exercises in order. Don’t hesitate to ask me or your fellow students for help if you get stuck! Additionally, you can also discuss the exercises on MittUiB, the UiB Studyfellowship Discord server (channel: #ling123) or right here in’s comments section.

If you need help with the lab exercises during the extent of this course, or for any other reason in particular need to get in touch, feel free to shoot me an email by clicking the letter icon on the navigational sidebar at the bottom left of this page. There are also many great resources on the web.
Remember: A programmer’s most useful tool is the search engine!

You can find the Lecture notes here.

Part 1: About computational linguistics and digital language data

Exercise 1.1: Computers in the study of language

  • a) Which areas of language study might benefit from computational data and tools?
  • b) Think of other applications for corpora besides machine translation.

Exercise 1.2: Corpora

  • a) Think of linguistically relevant expressions to search for.
  • b) Look at the NorGramTall blog, which uses a corpus to find out preferences in Norwegian.
  • c) What are some limitations of the web as a corpus?

Exercise 1.3: Searching in corpora

Tip: You need to log in with Feide in order to use Corpuscle. At Corpuscle, click “CLARIN SPF” at the top, search for “Feide”, and log in. You can then navigate to the Corpus list and select the “Child’s Rights” corpus. Select “Metadata” from the menu on the left-hand side and click ‘Accept’. You are now able to query the corpus by selecting “Query” from the menu. Navigate creating queries, searching for collocations and creating distributions through the menu.

When using Antconc, I suggest starting with a .txt-file such as lofoten.txt

  • a) Look for all words starting with child by means of the regular expression child.* in an English corpus, for instance, in the corpus Child Rights in Corpuscle. Make a word list. Find collocations. Find their distribution relative to country.
  • b) Try out Antconc, a tool with which you can analyze your own corpus.

Exercise 1.4: The trouble with language

  • a) What happens if a ligature (ffl) is considered different from its component letters?
  • b) Which different kinds of knowledge are necessary to understand language? How much "common sense" and knowledge about the world is necessary?
  • c) Some characters may be hard to distinguish, for instance different dashes, which could be a hyphen, minus sign, etc. (‐ - ⁃ -) and different characters similar to apostrophe (' ʼ ′ ʹ). Try to find a way to examine if they are different, or if they are the same character in various fonts.

Part 2: Introduction to shell scripting

An example of shell scripting An example of shell scripting (using WSL Ubuntu). Intimidating, huh? Don’t worry, we’ll get you there in no time!

Software installations and setup

If you are using Windows, please make sure you have installed a compatible shell. I have posted an installation guide here.

Basic usage of the shell

When coding in the shell, you first enter the command, then any arguments. The command can be navigational, like cd, or it can be a program such as grep.
[command] [arg1] [arg2]...
The command, and it’s arguments are separated with spaces. Any argument containing a space must be escaped with a backlash (\) or be quoted with ‘’ or “”.

To start things out, we can print text to the terminal like this:

echo "Hello world!"

Hello world!

The bash shell includes the command help, which can be used to get basic information about included commands, and more information about them by typing help [command].
In addition, most commands and command-line programs have a
manual available by typing man [command]. More information about the command-line can be found here.

Exercise 2.1: Getting started with shell scripting

  • a) Check your locale with the locale command. Change the locale to a different language and type date.

Exercise 2.2: Additional exercises for basic shell usage and word counting

  • a) Acquaint yourself with the command-line, try navigating directories, viewing files, etc. Try out some various commands.
  • b) Try editing a file using a terminal text editor such as Vim, Nano, or Emacs. What are the benefits of using one of these instead of an IDE like Atom or Pycharm? What are the drawbacks?

Exercise 2.3: Counting lines and words

  • The wc command can take more than one file as arguments. Test something like the following which also uses a text file lofoten.txt:

         wc chess.txt lofoten.txt

Exercise 2.4: Word counting on the Web

  • Find an article on the web. Copy the content to a new file in a terminal text editor (vim, nano, emacs>).
  • If Ctrl+c and Ctrl+v (Cmd for MacOS) doesn't work, try Ctrl+c and right click instead.
  • Use a word or a pattern of your choosing, and find the number of occurrences with the wc command.
This post is licensed under CC BY 4.0 by the author.