Python Utilities

In this section, we look at a few of Python's many standard utility modules to solve common problems.

File System -- os, os.path, shutil

The *os* and *os.path* modules include many functions to interact with the file system. The *shutil* module can copy files.

  • os module docs
  • filenames = os.listdir(dir) -- list of filenames in that directory path (not including . and ..). The filenames are just the names in the directory, not their absolute paths.
  • os.path.join(dir, filename) -- given a filename from the above list, use this to put the dir and filename together to make a path
  • os.path.abspath(path) -- given a path, return an absolute form, e.g. /home/nick/foo/bar.html
  • os.path.dirname(path), os.path.basename(path) -- given dir/foo/bar.html, return the dirname "dir/foo" and basename "bar.html"
  • os.path.exists(path) -- true if it exists
  • os.mkdir(dir_path) -- makes one dir, os.makedirs(dir_path) makes all the needed dirs in this path
  • shutil.copy(source-path, dest-path) -- copy a file (dest path directories should exist)
## Example pulls filenames from a dir, prints their relative and absolute paths
def printdir(dir):
  filenames = os.listdir(dir)
  for filename in filenames:
    print (filename)  ## foo.txt
    print (os.path.join(dir, filename)) ## dir/foo.txt (relative to current dir)
    print (os.path.abspath(os.path.join(dir, filename))) ## /home/nick/dir/foo.txt

Exploring a module works well with the built-in python help() and dir() functions. In the interpreter, do an "import os", and then use these commands look at what's available in the module: dir(os), help(os.listdir), dir(os.path), help(os.path.dirname).

Using the subprocess Module

The recommended approach to invoking subprocesses is to use therun()function for all use cases it can handle. For more advanced use cases, the underlyingPopeninterface can be used directly.

The run()function was added in Python 3.5.

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, capture_output=False, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None)

Run the command described by args. Wait for command to complete, then return a CompletedProcess instance.

The arguments shown above are merely the most common ones (hence the use of keyword-only notation in the abbreviated signature). The full function signature is largely the same as that of the Popen constructor - most of the arguments to this function are passed through to that interface. (timeout, input, check, and capture_output are not.)

If capture_output is true, stdout and stderr will be captured. When used, the internal Popen object is automatically created with stdout=PIPE and stderr=PIPE. The stdout and stderr arguments may not be supplied at the same time as capture_output. If you wish to capture and combine both streams into one, use stdout=PIPE and stderr=STDOUT instead of capture_output.

The timeout argument is passed toPopen.communicate(). If the timeout expires, the child process will be killed and waited for. The TimeoutExpired exception will be re-raised after the child process has terminated.

The input argument is passed toPopen.communicate() and thus to the subprocess’s stdin. If used it must be a byte sequence, or a string if encoding or errors is specified or text is true. When used, the internal Popen object is automatically created with stdin=PIPE, and the stdin argument may not be used as well.

If check is true, and the process exits with a non-zero exit code, a CalledProcessErrorexception will be raised. Attributes of that exception hold the arguments, the exit code, and stdout and stderr if they were captured.

If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the io.TextIOWrapper default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.

If env is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. It is passed directly to Popen.

Examples:

>>> subprocess.run(["ls", "-l"])  # doesn't capture output
CompletedProcess(args=['ls', '-l'], returncode=0)

>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1

>>> subprocess.run(["ls", "-l", "/dev/null"], capture_output=True)
CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,
stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n', stderr=b'')

Exceptions

An exception represents a run-time error that halts the normal execution at a particular line and transfers control to error handling code. This section just introduces the most basic uses of exceptions. For example a run-time error might be that a variable used in the program does not have a value (ValueError .. you've probably seen that one a few times), or a file open operation error because a file does not exist (IOError). Learn more in the exceptions tutorial and see the entire exception list.

Without any error handling code (as we have done thus far), a run-time exception just halts the program with an error message. That's a good default behavior, and you've seen it many times. You can add a "try/except" structure to your code to handle exceptions, like this:

  try:
    ## Either of these two lines could throw an IOError, say
    ## if the file does not exist or the read() encounters a low level error.
    f = open(filename, 'rU')
    text = f.read()
    f.close()
  except IOError:
    ## Control jumps directly to here if any of the above lines throws IOError.
    sys.stderr.write('problem reading:' + filename)
  ## In any case, the code then continues with the line after the try/except

The try: section includes the code which might throw an exception. The except: section holds the code to run if there is an exception. If there is no exception, the except: section is skipped (that is, that code is for error handling only, not the "normal" case for the code). You can get a pointer to the exception object itself with syntax "except IOError as e: .." (e points to the exception object).

HTTP -- urllib and urlparse

The module *urllib* provides url fetching -- making a url look like a file you can read from. The *urlparse* module can take apart and put together urls.

  • urllib module docs
  • ufile = urllib.urlopen(url) -- returns a file like object for that url
  • text = ufile.read() -- can read from it, like a file (readlines() etc. also work)
  • info = ufile.info() -- the meta info for that request. info.gettype() is the mime type, e.g. 'text/html'
  • baseurl = ufile.geturl() -- gets the "base" url for the request, which may be different from the original because of redirects
  • urllib.urlretrieve(url, filename) -- downloads the url data to the given file path
  • urlparse.urljoin(baseurl, url) -- given a url that may or may not be full, and the baseurl of the page it comes from, return a full url. Use geturl() above to provide the base url.

In Python 3, urllib and urllib2 are merged into urllib.request, and urlparse becomes urllib.parse. All of their exceptions are in urllib.error.

## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget(url):
  ufile = urllib.urlopen(url)  ## get file-like object for url
  info = ufile.info()   ## meta-info about the url content
  if info.gettype() == 'text/html':
    print ('base url:' + ufile.geturl())
    text = ufile.read()  ## read all its text
    print (text)

The above code works fine, but does not include error handling if a url does not work for some reason. Here's a version of the function which adds try/except logic to print an error message if the url operation fails.

## Version that uses try/except to print an error message if the
## urlopen() fails.
def wget2(url):
  try:
    ufile = urllib.urlopen(url)
    if ufile.info().gettype() == 'text/html':
      print (ufile.read())
  except IOError:
    print 'problem reading url:', url

Exercise

To practice the file system and external-commands material, see the Copy Special Exercise. To practice the urllib material, see the Log Puzzle Exercise.