Regular Chatbot

Learning Exercise

Introduction

Regular Expressions

Regular expressions (regexes) are sequences of characters that specify a search pattern in text.

Learning regular expression syntax is beyond the scope of this topic. We will focus on the expressions that jq provides to utilize regexes.

Regex Flavour

Different tools implement different versions of regular expressions. jq incorporates the Oniguruma regex library that is largely compatible with Perl v5.8 regexes.

The specific syntax used by jq version 1.7 can be found on the Oniguruma GitHub repo.

Caution

jq does not have any special syntax for regular expressions. They are simply expressed as strings. That means that any backslashes in the regular expression need to be escaped in the string.

For example, the digit character class (\d) must be written as "\\d".

Regex Functions

Regular expressions in jq are limited to a set of filters.

Simple Matching

When you need to know if a string matches a pattern, use the test filter.

STRING | test(REGEX)
STRING | test(REGEX; FLAGS)
STRING | test([REGEX, FLAGS])

This filter outputs a boolean result.

"Hello World!" | test("W")    # => true
"Goodbye Mars" | test("W")    # => false

Information About the Match

When you need to extract the substring that actually matched the pattern, use the match filter.

STRING | match(REGEX)
STRING | match(REGEX; FLAGS)
STRING | match([REGEX, FLAGS])

This filter outputs:

nothing if there was no match, or
an object containing various properties if there was a match.

This example looks for two identical consecutive vowels by using the backref syntax, \1.

"Hello World!" | match("([aeiou])\\1")
# => empty

"Goodbye Mars" | match("([aeiou])\\1")
# => {
#      "offset": 1,
#      "length": 2,
#      "string": "oo",
#      "captures": [
#        {
#          "offset": 1,
#          "length": 1,
#          "string": "o",
#          "name": null
#        }
#      ]
#    }

The match filter returns an object for each match. This example shows the "g" flag in action to find all the vowels.

"Goodbye Mars" | match("[aeiou]"; "g")
# => { "offset": 1, "length": 1, "string": "o", "captures": [] }
#    { "offset": 2, "length": 1, "string": "o", "captures": [] }
#    { "offset": 6, "length": 1, "string": "e", "captures": [] }
#    { "offset": 9, "length": 1, "string": "a", "captures": [] }

Captured Substrings

Similar to the match filter, the capture filter returns an object if there was a match.

STRING | capture(REGEX)
STRING | capture(REGEX; FLAGS)
STRING | capture([REGEX, FLAGS])

The returned object is a mapping of the named captures.

"JIRAISSUE-1234" | capture("(?<project>\\w+)-(?<issue_num>\\d+)")
# => {
#      "project": "JIRAISSUE",
#      "issue_num": "1234"
#    }

Just the Substrings

The scan filter is similar to match with the "g" flag.

STRING | scan(REGEX)
STRING | scan(REGEX; FLAGS)

scan will output a stream of substrings.

"Goodbye Mars" | scan("[aeiou]")
# => "o"
#    "o"
#    "e"
#    "a"

Use the [...] array constructor to capture the substrings.

"Goodbye Mars" | [ scan("[aeiou]") ]
# => ["o", "o", "e", "a"]

Note

Note that jq v1.6 does not implement the 2-argument scan function, even though the version 1.6 manual says it does:

[version 1.7 source code][src-scan-1.7]
[version 1.6 source code][src-scan-1.6]

[src-scan-1.7]: https://github.com/jqlang/jq/blob/11c528d04d76c9b9553781aa76b073e4f40da008/src/builtin.jq#L92) [src-scan-1.6]: https://github.com/jqlang/jq/blob/2e01ff1fb69609540b2bdc4e62a60499f2b2fb8e/src/builtin.jq#L90)

Splitting a String

If you know the parts of the string you want to keep, use match or scan. If you know the parts that you want to discard, use split.

STRING | split(REGEX; FLAGS)

Caution

The 1-arity split filter treats its argument as a fixed string.

To use a regex with split, you must provide the 2nd argument; it's OK to use an empty string.

An example that splits a string on arbitrary whitespace.

"first   second           third fourth" | split("\\s+"; "")
# => ["first", "second", "third", "fourth"]

Substitutions

The sub and gsub filters can transform the input string, replacing matched portions of the input with a replacement string.

To replace just the first match, use sub. To replace all the matches, use gsub.

STRING | sub(REGEX; REPLACEMENT)
STRING | sub(REGEX; REPLACEMENT; FLAGS)
STRING | gsub(REGEX; REPLACEMENT)
STRING | gsub(REGEX; REPLACEMENT; FLAGS)

"Goodnight kittens. Goodnight mittens." | sub("night"; " morning")
# => "Good morning kittens. Goodnight mittens."

"Goodnight kittens. Goodnight mittens." | gsub("night"; " morning")
# => "Good morning kittens. Good morning mittens."

The replacement text can refer to the matched substrings; use named captures and string interpolation.

"Some 3-letter acronyms: gnu, csv, png"
| gsub( "\\b(?<tla>[[:alpha:]]{3})\\b";     # find words 3 letters long
        "\(.tla | ascii_upcase)" )          # upper-case the match
# => "Some 3-letter acronyms: GNU, CSV, PNG"

Flags

In all the above filters, FLAGS is a string consisting of zero of more of the supported flags.

g - Global search (find all matches, not just the first)
i - Case insensitive search
m - Multi line mode ('.' will match newlines)
n - Ignore empty matches
p - Both s and m modes are enabled
s - Single line mode ('^' -> '\A', '$' -> '\Z')
l - Find longest possible matches
x - Extended regex format (ignore whitespace and comments)

For example

"JIRAISSUE-1234" | capture("(?<project>\\w+)-(?<issue_num>\\d+)")

# or with Extended formatting

"JIRAISSUE-1234" | capture("
                     (?<project>   \\w+ )  # the Jira project
                     -                     # followed by a hyphen
                     (?<issue_num> \\d+ )  # followed by digits
                   "; "x")

Instructions

You have been hired as a Regular Expression Specialist in a company that is developing a Chatbot.

It is in a very basic phase of development. Your mission is to use Regular Expressions to improve the Chatbot's ability to understand and generate natural language.

1. Check Valid Command

Apart from being smart, the Chatbot is also a loyal assistant. To ask the Chatbot something, the user must say the word "Chatbot" in the first position of the command. It doesn't matter if the keyword is in UPPERCASE or lowercase. The important part is the position of the word.

Implement the function is_valid_command that helps the Chatbot recognize when the user is giving a command.

"Chatbot, play a song from the 80's." | is_valid_command
# => true
"Hey Chatbot, where is the closest pharmacy?" | is_valid_command
# => false
"CHATBOT, do you have a solution for this challenge?" | is_valid_command
# => true

2. Remove Encrypted Emojis

The Chatbot has a difficult time understanding how humans use emojis to express their emotions. When the Chatbot receives user messages, each emoji is represented as the string "emoji" followed by an id number.

Implement the remove_emoji method which takes a string and removes all the emoji throughout the message.

Lines not containing emojis should be returned unmodified. Just remove the emoji string. Do not adjust the whitespace.

"I love playing videogames emoji3465 it's one of my hobbies" | remove_emoji
# => "I love playing videogames  it's one of my hobbies"

3. Check Valid Phone Number

At some point in the interaction with the Chatbot, the user will provide a phone number. The Chatbot can only call a number with a specific format.

Implement the check_phone_number function.

If the number is valid, the Chatbot answers with a message thanking the user and confirming the number. If the number is invalid, the Chatbot informs the user that the phone number is not valid.

The expected format is (+NN) NNN-NNN-NNN, where N is a digit.

"chatbot my phone number is (+34) 659-771-594" | check_phone_number
# => "Thanks! Your phone number is OK."
"chatbot, call me at 659-771-594" | check_phone_number
# => "Oops, it seems like I can't reach out to 659-771-594."

4. Get Website Link

The Chatbot is a really curious software. Even though it can search the internet for a particular topic, it likes to ask users about cool websites to visit to find relevant information.

Example conversation:

Chatbot: Hey User, I would like to learn how to code in JavaScript. Do you know any cool website where I could learn?

User: I learned a lot from exercism.org, there's lots of great stuff there.

Implement the function get_domains which returns an array of website domains.

"I learned a lot from exercism.org and google.com" | get_domains
# => ["exercism.org", "google.com"]

5. Greet the User

A polite Chatbot will speak to users by name. When a user introduces themselves, our Chatbot will detect their name and respond with a friendly greeting.

Write the function nice_to_meet_you.

If the input string contains "My name is Someone.", capture the name and return the string "Nice to meet you, Someone.".

"My name is Jean-Luc" | nice_to_meet_you
# => "Nice to meet you, Jean-Luc"

6. Very Simple CSV Parsing

Yielding to "creeping featuritis", we'll add a CSV parsing function to the Chatbot.

Implement the parse_csv function that takes a string and returns an array of the resulting fields. The field separator should be "comma plus optional whitespace".

We won't worry about any of the edge cases with the CSV format (such as fields containing commas).

"first, second,third,   fourth" | parse_csv
# => ["first", "second", "third", "fourth"]

Edit via GitHub

Language Tracks

#48in24 Challenge

Your Journey

Exercism Perks

Community Videos

Brief Introduction Series

Interviews & Stories

Discord

Forum

Getting started

Mentoring

Docs

Contributors

Donate

About Exercism

Our Impact

Insiders

SWAG