Posts Lab Session 12
Post
Cancel

Lab Session 12

Welcome to Lab Session 12: More XSLT and regular expressions

The XSLT Process A visualization of the XSLT Process

Today we’re going to continue working with XSLT and regular expressions. As always, you can solve the exercises alone or in groups, and I’ll go through the proposed solutions towards the end of the session. Let me know if you get stuck on any of the exercises and I’ll give you a hint on how to solve them!

Additionally, make sure to let me know if you have any questions regarding the second obligatory assignment. It is due Friday the 16th of April at 1PM in Inspera.



Loops and content selectors in XSLT

The XSL tag <xsl:for-each> can be used to select every XML element of a specified parent node(s), while <xsl:value-of> selects the text content for a single specified element. <xsl:for-each> needs a closing tag (</xsl:for-each>) while <xsl:value-of> does not.

The syntax can look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
  <body>
    <h2>Our Menu</h2>
    <table>
      <tr>
        <th>Title</th>
        <th>Price</th>
        <th>Description</th>
      </tr>
      <xsl:for-each select="menu/menu-item">
      <tr>
        <td><xsl:value-of select="title"/></td>
        <td><xsl:value-of select="price"/></td>
        <td><xsl:value-of select="description"/></td>
      </tr>
      </xsl:for-each>
    </table>
  </body>
</html>
</xsl:template>
</xsl:stylesheet>

If an XML file with matching elements is provided as input to the XSLT program, the output would be a table in HTML with three columns: “Title”, “Price” and “Description”. The values in the table would be automatically extracted from each menu item in the XML document.


Exercise 1: Creating a wishlist for books with XSLT

Consider the following XML document describing a catalog of books:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
<?xml version="1.0"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications
      with XML.</description>
  </book>
  <book id="bk102">
    <author>King, Stephen</author>
    <title>The Dark Tower: The Gunslinger</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>1982-12-10</publish_date>
    <description>In the first book of this brilliant series, Stephen King introduces readers to one of his
      most enigmatic heroes, Roland of Gilead, The Last Gunslinger. He is a haunting figure,
      a loner on a spellbinding journey into good and evil. In his desolate world, which
      frighteningly mirrors our own, Roland pursues The Man in Black, encounters an alluring woman named Alice,
      and begins a friendship with the Kid from Earth called Jake. Both grippingly realistic and eerily dreamlike,
      The Gunslinger leaves readers eagerly awaiting the next chapter.</description>
  </book>
  <book id="bk103">
    <author>Haldeman, Joe</author>
    <title>The Forever War</title>
    <genre>Science Fiction</genre>
    <price>6.95</price>
    <publish_date>1974-12-03</publish_date>
    <description>The Earth's leaders have drawn a line in the interstellar sand—despite the fact that the fierce
      alien enemy that they would oppose is inscrutable, unconquerable, and very far away. A reluctant conscript
      drafted into an elite Military unit, Private William Mandella has been propelled through space and time to
      fight in the distant thousand-year conflict; to perform his duties without rancor and even rise up through
      military ranks. Pvt. Mandella is willing to do whatever it takes to survive the ordeal and return home.
      But "home" may be even more terrifying than battle, because, thanks to the time dilation caused by space
      travel, Mandella is aging months while the Earth he left behind is aging centuries.</description>
  </book>
  <book id="bk104">
    <author>Rothfuss, Patrick</author>
    <title>The Name of the Wind</title>
    <genre>Fantasy</genre>
    <price>7.95</price>
    <publish_date>2001-03-10</publish_date>
    <description>Told in Kvothe's own voice, this is the tale of the magically gifted young man who
      grows to be the most notorious wizard his world has ever seen. A high-action story written with a poet's hand,
      The Name of the Wind is a masterpiece that will transport readers into the body and mind
      of a wizard.</description>
  </book>
  <book id="bk105">
    <author>Kurzweil, Ray</author>
    <title>How to Create a Mind</title>
    <genre>Science</genre>
    <price>14.99</price>
    <publish_date>2012-11-13</publish_date>
    <description> In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project
      in human-machine civilization—reverse engineering the brain to understand precisely how it works and using
      that knowledge to create even more intelligent machines.</description>
  </book>
</catalog>

Transform the XML document to HTML by editing the XSL stylesheet below. The answer should look like a table with two columns containing the attributes “Title” and “Author” for each book in the book catalog. Test your XSL stylesheet on the XML document and render the transformed output to a web page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <html>
  <body>
    <h2>My Book Wishlist</h2>
    <table border="1">
      <tr bgcolor="#9acd32">
        <th>Title</th>
        <th>Author</th>
      </tr>
      <!-- ENTER YOUR XSL CODE FOR THE FOR-LOOP HERE-->
      <tr>
        <td><!-- ENTER YOUR XSL CODE FOR CONTENT SELECTION HERE--></td>
        <td><!-- ENTER YOUR XSL CODE FOR CONTENT SELECTION HERE--></td>
      </tr>
      <!-- ENTER YOUR XSL CODE TO CLOSE THE LOOP HERE-->
    </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Tip: As an alternative to xsltproc in Linux and opening the result in a browser, you can use this website to test the XSLT code and this website to render the HTML output.




A regular expression cheat sheet

It’s always good to have a regular expression cheat sheet where you can look up syntax which you don’t remember. I don’t expect you to remember all of the different character combinations by heart, though you should definitely remember the most important ones. Anyway, here’s my cheat sheet from when I took LING123. You can just copy-paste the content into a .txt-file of your choosing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
### REGEX COMMANDS (Perl compatible) ###

# Matching characters to create expressions #
------------------------------------------------------------------------------------------------------------------------
  regex    |                         Explanation                                                                       |
------------------------------------------------------------------------------------------------------------------------
.          | Wildcard. Matches any character (but not newline)
\d         | Digit (0-9)
\D         | Not a digit (0-9)
\w         | Word character (a-z, A-Z, 0-9, _)
\W         | Not a word wharacter
\s         | Whitespace (space, tab, newline)
\S         | Not whitespace (space, tab, newline)
\b         | Word boundary
\B         | Not a word boundary
^          | Beginning of a string
$          | End of a string
[]         | Matches characters in brackets
[^ ]       | Matches characters NOT in brackets
|          | Logical "Either/Or". One side of the vertical bar must be matched.
( )        | Capturing group (regular expression inside parentheses)
\\         | A literal backslash
\b         | Backspace
\n         | A new line
\r         | Return
\t         | A Horizontal tab
\v         | A vertical tab
c1-c2      | all characters from c1 to c2 in ascending order
*          | 0 or more match of preceding expression
+          | 1 or more match of preceding expression
?          | 0 or 1 match of preceding expression
{n}        | Exact number of matches of preceding expression
{n,m}      | Range of matches (minimum, maximum) of preceding expression
{n,}       | n or more matches of preceding expression
[:alnum:]  | all letters and digits
[:alpha:]  | all letters
[:blank:]  | all horizontal whitespace
[:cntrl:]  | all control characters
[:digit:]  | all digits
[:graph:]  | all printable characters, not including space
[:lower:]  | all lower case letters
[:upper:]  | all upper case letters
[:print:]  | all printable characters, including space
[:punct:]  | all punctuation characters
[:space:]  | all horizontal or vertical whitespace
\1         | Equivalent to inserting the text which matched the first group in the expression
\2         | Equivalent to inserting the text which matched the second group in the expression
(?=)       | Positive lookahead. q(?=u) matches a q that is followed by a u, without making the u part of the match.
(?!)       | Negative lookahead. q(?!u) does NOT match a q that is followed by a u.
(?<=regex) | Positive lookbehind.
(?<!regex) | Negative lookbehind.
-----------------------------------------------------------------------------------------------------------------------|


Exercise 2: Regular expressions for Norwegian dates

Write a regular expession for a Norwegian short date, e.g. 10.03.2004 or 10.03.04. The century and the leading zero for the day and month should be optional.


Exercise 3: Matching strings with word combinations

Write a regular expression which matches all strings that contain both natural and language, both consecutively or with many other words in between. Match the whole string, not just the part where these two words occur.

Here are some strings you can test with:

1
2
3
4
5
6
7
8
9
10
11
12
1) natural language processing

2) naturally long hair

3) natural-language-processing

4) - "Food which requires any kind of processing is not good for you. You need to eat stuff naturally, right out of the ground", he said.
- "Now you're speaking my language", said the farmer.

5) All natural languages will be extinct in 15 years. Elon Musk will make sure of that.

6) It's natural to want to go outside, be with friends, live your life. But that will have to wait for now.


Exercise 4: JFLAP and regular expressions

Look at the lecture notes on JFLAP.

a) Write the transition table for the non-deterministic variant.

b) Construct these automata in Jflap and test them with input.

c) The examples all describe the same language, which is finite. Make a new automaton for the infinite language described by the regexp kr\. [12]0*\.00. Test. Also write a transition table for this automaton.

d) Modify the last regexp so that only an even number of 0’s before the decimal point are allowed, e.g. kr. 100.00, kr. 20000.00, etc., but not kr. 20.00. Make an automaton for this language and test.

This post is licensed under CC BY 4.0 by the author.