CPSC 231: Assignment 3
Lines and Spaces and Tabs, Oh My!

Due Friday Nov 7 at 4 PM

New Concepts to be applied for the assignment


Glossary

command line arguments:
Inputs given to a program at the same time that it is run. Examples of working with command line arguments can be found in the 'tutorials' directory under the subdirectory "oct19_25/assign_like".
command line qualifiers:
Command line qualifiers are a type of command line argument which can specify its execution e.g., "ls -a -l".  Command line qualifiers typically are prefixed with a "-" to distinguish them from command line arguments, but occasionally they are prefixed with a different character (such as "+", which is used in this assignment).
pass:
            The "no-op" comma for Python. When encountered the interpreter literally "does nothing" (often used as a 'filler' when a command is syntactically required (e.g., empty functions).
prefix spaces/tabs:
The spaces or tabs used to indent a line of code in a python program; spaces and tabs that precede the instructions on a particular line of a Python program. In the example below <SP> represents a space, <PSP> represents prefix spaces, <Tab> represents a tab, and <PTab> represents a prefix tab:
    if (x > 0):
    <PTab><PSP><PSP>print<SP>("Positive")<Tab># printed only if x > 0
printable text:
All the ASCII characters that can been seen.  This is most of the characters with an ASCII value greater than 32.  See also white space.
redirection:
On both UNIX systems and Windows, a "<" followed by a file name redirects standard input to be from that file; a ">" followed by a file name redirects standard output to that file.  For example, the command line:
    % ls > temp.txt
will cause the file temp.txt to be created containing the text listing the current directory.  If this concept is not familiar to you, see the more detailed explanation.
Also you can refer to your tutorial notes for the week of Oct 12 - 18.
standard input:
The source from which the Python input() function reads.  This is normally the terminal (keyboard), but it could be redirected be from a file. Here is an example that illustrates how the standard input can be redirected from a file.
standard output:
The place to which the Python print() function writes.  This is normally the terminal (screen), but it could be redirected to a file. Here is an example that illustrates how the standard output can be redirected to a file.
white space:
The ASCII characters that can't be seen, but usually have some effect.  This includes all the characters with ASCII value less than 33, and includes space, tab, CR, LF, etc. 

Introduction

Python is a fairly unusual language in that it uses indentation as part of it's syntax.  But tabs and space characters can complicate things.  A single tab may look identical to a sequence of spaces when the program is printed or displayed on the screen, but the Python interpreter may see the two as very different levels of indentation.  This issue can lead to very difficult-to-find bugs in programs.  What we need are tools to 1) allow us to actually see the tabs and spaces as printable characters, and 2) tools to detect and intelligently convert tabs to spaces or spaces to tabs.  The second problem is compounded because tabs can be "equivalent to" any number of spaces (typically 2, 3, 4, or 8 spaces are "equivalent to" a tab). 


Assignment Description

Write a multi-functional UNIX-style utility program to help visualize and fix problems with tabs and spaces in Python programs.  This program will process python program files in the following ways:
  1. Change tabs in the indenting to spaces
  2. Change spaces in the indenting to tabs
  3. Substitute spaces, tabs, and newlines for printable characters, maintaining formatting
  4. Undo 3.

Obviously, these 4 functions will not always be used at the same time.  Therefore, we will use command-line qualifiers to allow the user to specify the subset of functionalities he/she wants.  We do this in UNIX-eze, where we use a minus sign to introduce a short qualifier.  If our program is called "tabs" and [] represents optional:

  tabs [+t] [-t] [-T<integer>] [+v] [-v] [-help]

         +t replaces prefix sequences of spaces of length T with a single tab

-t replaces prefix tabs with sequences of T spaces

-T<integer> the <integer> defines the space-to-tab ratio, T (default=4)

+v changes all spaces, tabs, and newlines to printable (visible) characters

-v undoes the effects of +v

-help prints out help text
(More detailed explanation and examples for these command-line qualifiers are provided in the text that follows e.g., "+v" and "-v" are explained under "Point D".)

 The program will take it's input from standard input (i.e., the user), and output it's results to standard output (i.e., the screen).  That will allow us to type in python commands from the console and get the results back immediately after we finish typing a line.  The program continues to read input until it encounters EOF (End-Of-File), which a a ctrl-D character on UNIX, Linux, and Mac (and, I believe, a ctrl-Z-return sequence in Windows).

But this kind of program is most useful when it can read a file and place it's output in a different file.  So typically, this program will be run with redirected ("<") input from a file, and possibly redirected (">") output to a file.  These two command-line operators temporarily redirect input (output) so that it appears you typed in a file (or the output goes to a file, respectively). 

 For example:

   $ python3 tabs.py +v +t -T4 < A3.py > temp.out         # 4 spaces becomes 1 tab (results in temp.out)
will change prefix space sequences of length 4 to tabs in the source file of this program, and save it in temp.out.  Then, if we do:
   $ python3 tabs.py -v -t -T4 < temp.out > temp2.out    # 1 tab becomes 4 spaces (results in temp2.out)
we should find (assuming the original file was properly indented by 4s with spaces only) that:
   $ diff tabs.py temp2.out
finds no differences (diff is a UNIX utility that can be used to determine if there are any 'differences' between files).

Functional requirements

A.  Command-line interface requirements

  1. Qualifiers +t and -t are incompatible.  If both appear on the command line, it is an error, and you should print "Qualifiers +t and -t cannot both be used together."
  2. Qualifiers +v and -v are incompatible.  If both appear on the command line, it is an error, and you should print "Qualifiers +v and -v cannot both be used together."
  3. If any argument on the command line is not recognized (whether it is prefixed by "-" or not), it is an error, and you should print "Unrecognized argument: " followed by the not-recognized argument.
  4. The -T qualifier must be followed immediately by an integer.  If it is not, it is an error and the program should print out "The -T qualifier must be immediately followed by an integer: -Txxx", were "xxx" is the offending text (e.g., entering -T12.5 would display the message "The -T qualifier must be immediately followed by an integer: -T12.5").
  5. The integer given after the -T qualifier must be between 2 and 8 inclusive. If it is not, it is an error and the program should print out "Tab sizes greater than 8 or less than 2 are disallowed : -Txxx", were "xxx" is the offending integer.
  6. If there is an error on the command line (see points 1 to 5), then the program should not run.  But ALL applicable error messages (from points 1 to 5) should be printed, then the following helpful information (synopsis) should be printed to remind the user how to properly use the program.
    Synopsis:
       tabs [+t] [-t] [-T<integer>] [+v] [-v] [-help]
         +t    -replaces prefix sequences of spaces of length T with a single tab 
         -t    -replaces prefix tabs with sequences of T spaces
         -T<integer> -the <integer> defines the space-to-tab ratio, T (default=4) 
         +v    -changes all spaces, tabs, and newlines to printable (visible) characters 
         -v    -undoes the effects of +v 
         -help -prints out this help text 
      +t and -t are incompatible 
      +v and -v are incompatible 
  7. If the ONLY command-line qualifier is -help, then the program should print the help text (see the help example below for the help text) and exit.
  8. If NO command-line qualifiers are present then the program just copies it's input to output.
  9. The -help qualifier will be recognized in any case (all caps, all lower, or any mix) and may be abbreviated -h, -he, or -hel (in any case).  There is no abbreviation or case tolerance for any of the other qualifiers.

B.  +t requirements

  1. Space replacement occurs ONLY on the indenting white space on a line, and stops as soon as the first printable character is encountered.
  2. Tab stops are based on the -T<integer> qualifier which defaults to -T4.  
  3. Tab stops are treated like tab stops: it's not the case that you can simply replace all sequences of N spaces by a tab.  For example, assuming -T4, the line "<space><space><tab>pass" will become "<tab>pass".
  4. Extra spaces at the end of indentation sequence (but before the first printable character) should be left there.  For example, assuming -T4, the line "<space><space><space><space><space><space>pass" will become "<tab><space><space>pass". If this isn't clear to you, see Tabs in text editors and word processors.
  5. It is not necessarily true that a -t operation applied to a file followed by a +t operation applied to the resulting file will result in the original file and output being textually identical.  However this sequence WILL yield textually identical files if the original file was indented using tabs only.  For example, the following command will yield identical files tabs.py and temp.txt if tabs.py was indented with tabs only:
        python3 tabs.py -t <tabs.py | python3 tabs.py +t >temp.txt
    If you don't understand the pipe ("|") above or you're running on Windows (where pipes don't work), the following commands are equivalent (but produce an intermediate file):
        python tabs.py -t < tabs.py > out1.txt  # tabs removed and effects saved in out1.txt
        python tabs.py +t < out1.txt > temp.txt # tabs put back and effects saved in temp.txt

C.  -t requirements

  1. Tab replacement occurs ONLY on the indenting white space on a line, and stops as soon as the first printable character is encountered.
  2. Tab stops are based on the -T<integer> qualifier which defaults to -T4.  
  3. Tab stops are treated like tab stops: it's not the case that you can simply replace all tabs by a sequence of N spaces.  For example, assuming -T4, the line "<space><space><tab>pass" will become "<space><space><space><space>pass".  If this isn't clear to you, see Tabs in text editors and word processors.
  4. It is not necessarily true that a +t operation applied to a file followed by a -t operation applied to the resulting file will result in the original file and output being textually identical.  However this sequence WILL yield textually identical files if the original file was indented using spaces only. For example, the following command will yield identical files tabs.py and temp.txt if tabs.py was indented with tabs only:
        python3 tabs.py +t <tabs.py | python3 tabs.py -t >temp.txt
    If you don't understand the pipe ("|") above or you're running on Windows (where pipes don't work), the following commands are equivalent (but produce an intermediate file):
        python tabs.py +t < tabs.py > out1.txt  # tabs removed and effects saved in out1.txt
        python tabs.py -t < out1.txt > temp.txt # tabs put back and effects saved in temp.txt

D.  +v requirements

  1. All space (ASCII code 32) characters will be replaced by a "·" (ASCII code 183) character.
  2. All tab characters (ASCII code 9) will be replaced by a "»" (ASCII code 187) character, followed by a tab character (ASCII code 9 - to preserver formatting).
  3. All lines should end with a "¶" (ASCII code 182) character, followed by a real newline character ("\n") to preserve formatting).

E.  -v requirements

  1. -v should exactly undo all formatting done by +v.

F.  functionality interaction requirements

  1. Assume file F has been processed by a +v operation.  A -t or +t operation acting on a file F should produce output identical to file F (because the first character on every line will be a printable character).  However, if a -v qualifier is also included on the command line, the output should be as if F had never been processed with the +v qualifier (because the -v qualifier will have removed the effect of the +v qualifier before the rest of the processing takes place).

Non-functional Requirements

  1. Your program must be written in Python 3 and function properly on the UNIX machines in the CPSC undergrad lab.
  2. This assignment is about functions and program decomposition, so besides your main() function, and the getInput() function (given below), it is expected you will create AT LEAST 4 other functions (at least one function for each of the -v, +v, -t, and +t functionalities).  (The instructors' solution makes use of 8 additional functions, which is not to say that yours should necessarily use 8 -- it could use more or less.) 

Examples

In the following examples, "%" is used for the operating system (O/S) prompt, and colour is used as follows:

blue
user-typed commands or input
black
text displayed by the O/S of program
red
documentation that explains what's happening during program execution

No-qualifier example (with input redirection)

% python3 tabs.py < A1.py     # Redirect standard input from file A1.py.
# Your name: Rob Kremer       # The output is an exact copy of A1.py 
# Student ID: 00999888
# Tutorial #: 05
'''
Created on Aug 25, 2014

@author: kremer
'''
...
print()
print("Weighted mini assignment grade "+"%1.2f"%miniAssn) 
print("Weighted assignment grade "+"%1.2f"%assn) 
print("Weighted midterm grade "+"%1.2f"%midterm) 
print("Weighted final exam grade "+"%1.2f"%final) 
print("Weighted term grade "+"%1.2f"%(miniAssn+assn+midterm+final))
%                             # note that the program terminates on its own after reading the input file

+v example

% python3 tabs.py +v < A1.py  # redirect standard input from file A1.py
#·Your·name:·Rob·Kremer¶      # The output is a copy of A1.py with prefix spaces changed to "·", etc.
#·Student·ID:·00999888¶
#·Tutorial·#:·05¶
'''¶
Created·on·Aug·25,·2014¶
¶
@author:·kremer¶
'''¶
¶
...
print()¶ print("Weighted·mini·assignment·grade·"+"%1.2f"%miniAssn)·¶ print("Weighted·assignment·grade·"+"%1.2f"%assn)·¶ print("Weighted·midterm·grade·"+"%1.2f"%midterm)·¶ print("Weighted·final·exam·grade·"+"%1.2f"%final)·¶ print("Weighted·term·grade·"+"%1.2f"%(miniAssn+assn+midterm+final))¶ % # note that the program terminates on its own after reading the input file

+t and -t examples (and some +v and -T) 

Note that in these examples, my terminal has an 8-space tab (usually the default for terminal programs), whereas the tabs program has a default 4-space tab.

% python3 tabs.py +t
        pass               # input: 8-space indent (which is 2 default 4-space tabs)
		pass       # output: 2 tabs (8 spaces each on the terminal)
      pass                 # input: 6-space indent 
	  pass             # output: a tab and 2 spaces
  	pass               # input: 2 spaces and a tab
	pass               # output: 1 tab
<ctrl-D>                           
% python3 tabs.py +t -T8 # same user input as above pass pass # output: 1 tab pass pass # output: 6 spaces pass pass # output: 1 tab <ctrl-D>
% python3 tabs.py +t +v # these 2 runs are exactly the same as the above 2, but with +v to show what's going on pass » » pass¶ pass » ··pass¶ pass » pass¶ <ctrl-D> % python3 tabs.py +t +v -T8 pass » pass¶ pass ······pass¶ pass » pass¶ <ctrl-D>
% python3 tabs.py -t +v pass # input: 2 tabs (8 spaces each on the terminal) ········pass¶ pass # input: a tab and 2 spaces ······pass¶ pass # input: 2 spaces and a tab ····pass¶ <ctrl-D>
%

-help example

% python3 tabs.py -HeLp

This program can process python program files in the following ways:
1. Change tabs in the indenting to spaces
2. Change spaces in the indenting to tabs
3. Substitute spaces, tabs, and newlines for printable characters, maintaining formating
4. Undo 3.
see the synopsis (below) for details on the command line
interface. 

Typically, this program will be run with redirected "<" input from a file, and possibly
redirected ">" output to a file.  For example:
 
   $ python3 A3.py +v +t -T4 < A3.py > temp.out
 
will change prefix space sequences of length 4 to tabs in the text of the input file("A3.py"), 
and save it in "temp.out". Then, if we do: $ python3 A3.py -v -t -T4 < temp.out > temp2.out we should find (assuming the original file was properly indented by 4s with spaces only) that: $ diff A3.py temp2.out finds no differences (diff is a UNIX program). Synopsis: tabs [+t] [-t] [-T<integer>] [+v] [-v] [-help] +t -replaces prefix sequences of spaces of length T with a single tab -t -replaces prefix tabs with sequences of T spaces -T<integer> -the <integer> defines the space-to-tab ratio, T (default=4) +v -changes all spaces, tabs, and newlines to printable (visible) characters -v -undoes the effects of +v -help -prints out this help text +t and -t are incompatible +v and -v are incompatible % # the program terminates because only the help text was requested

Error example

% python3 A3.py -t +t +v -v -V
Unrecognized argument: -V
Qualifiers +v and -v cannot both be used together.
Qualifiers +t and -t cannot both be used together.

Synopsis:
  tabs [+t] [-t] [-T<integer>] [+v] [-v] [-help]
    +t    -replaces prefix sequences of spaces of length T with a single tab
    -t    -replaces prefix tabs with sequences of T spaces
    -T<integer> -the <integer> defines the space-to-tab ratio, T (default=4)
    +v    -changes all spaces, tabs, and newlines to printable (visible) characters
    -v    -undoes the effects of +v
    -help -prints out this help text
+t and -t are incompatible
+v and -v are incompatible
%    # the program terminated because there were command-line errors

Hints & Help

This section is meant to help you with your program.  You CAN cut-and-paste the code given here into your code without citing it without fear of penalty.  Do not cut-and-paste any other code without citing though!

Defining your program

Since this is a fairly complex program, your TA may use a testing harness to verify that your program will run correctly.  Therefore you must make it possible for another program to include and run your program, so all your code that you normally run when you invoke your program from the command line should be embedded within a main program function.  In addition, you want to leave calling the main program to any test harness, but still call your program if the file in invoked directly from the command line.  To do that, use the following paradigm:

# imports go here

# global constants go here

# all your function definitions go here

def main(): # Or start()
# Body of main() function goes here
main()

Command-line input

You will need to gather the user-specified qualifiers from the command line.  To do that you need to use sys.argv from the system library.  To import from the system library use the following import statement:

import sys # needed to collect command-line arguments (sys.argv)

sys.argv is a sequence containing the command line arguments.  The first element of sys.argv is the program name, and the remainder are the "words" (arguments) the user typed after the program name.  These do NOT include expressions (such as redirections ["<", ">", and ">>"] and pipes ["|"]) interpreted by the command-line processor.  Thus, the paradigm for reading command lines is as follows:

    firstArg = True
    for arg in sys.argv:
        if firstArg: # the first argument is always the program name, so ignore it
            firstArg = False
        elif (arg=="-t"):
            ...
...
 else: #if we got here, then we didn't recognize the argument ...
Examples of working with command line arguments can be found in the "tutorials" directory under the subdirectory "oct19_25/assign_like"

Input

The standard Python input() function does not handle EOF (End-Of-File) (the ctrl-D character) gracefully: If you type a ctrl-D in response to an input(), it will throw an exception (i.e., a run-time error and 'crash') and your program will terminate.  The problem is easily solved, but we have not covered that yet.  So use the following function in lieu of the standard input() function:

def getInput():
    """This function works exactly like input() (with no arguments), except that,
    instead of throwing an exception when it encounters EOF (End-Of-File), 
    it will return an EOF character (chr(4)).
    Returns: a line of input or EOF if EOFError occurs during input.
    """ 
    try:
        ret = input()
    except EOFError:
        ret = EOF
    return ret

You might not understand this code since it includes concepts that we haven't covered yet (such as exceptions).  That's okay for now -- just make sure you understand how to use it. Also note that unlike Python's input() it does not take any parameters: That's OK because this program might be taking it's input from a file, we should not prompt for each line of input..

An example of using the getInput() function can be found under the 'tutorials' directory under the subdirectory "oct19_25/assign_like" and the program name is display_sentences.py.

Substitutions for tab, space and newline:

The "visible" forms of the newline, tab, and space characters require the use of the chr() function. This function takes an integer as a parameter (which should be an ASCII code) and returns the character for that code.  For example chr(65) would return capital letter 'A' because the ASCII code for this character is 65. You can refer to the program [chr_example.py] to get a feel for how this function works.

There are substitution characters for tab, space, and newline, as specified in requirement D1, D2 and D3.  The getInput() function also requires an EOF character.  These can be defined as constants as follows:
EOF = chr(4)            # A standard End-Of-File character (ascii value 4)
TAB_CHAR = chr(187)     # A ">>" character (as a single character) in the extended ascii set
                        #   Used to make a tab character visible.
SPACE_CHAR = chr(183)   # A raised dot character in the extended ascii set
                        #   Used to make a space character visible
NEWLINE_CHAR = chr(182) # A backwards P character in the extended ascii set
                        #   Used to make a newline character visible

Submitting your work:

  1. Assignments (the source code/'dot-py' file) must be electronically submitted according to the assignment submission requirements using D2L.
  2. As a reminder, you are not allowed to work in groups for this class. Copying the work of another student will be regarded as academic misconduct (cheating).  For additional details about what is and is not okay for this class please refer to the notes on misconduct for this course.
  3. Before you submit your assignment here is a [checklist] of items to be used in marking.

Using pre-written Python libraries

For this assignment, you can use any of the Python built-in functions.  You may also use any of the build-in string methods; you might find <str>.replace(), <str>.lstrip(), <str>.expandtabs(), <str>.uppper(), and the 'slicing' [<start> : <end>]  operator particularly useful.