Documentation Keywords

Each patch in the tapestry of knowledge has associated with it a few key words or phrases. Once the keywords are known, searching the knowledge becomes much easier.

I use GIMP to help me find colors for my webpages. But GIMP colors are in base 10 while HTML uses base 16. I got tired of doing the conversions by hand so I wrote a pair of simple Python programs called 2hex.py and hex2.py. While writing this code, I had a familiar experience: trouble finding what I wanted in documentation. Python documentation is some of the best around, but these problems still occur. Here is the tale:

It is easy to convert an integer into a string in hex format.

    i = 123
    print i, hex(i), '%X' % i
But how do I convert a string in hex format into an integer? First I try
    try:
        i = int('0x3a')
    except Exception, x:
        print str(x)
Perhaps the built-in function hex has an opposite. I go to Library Reference, 2.2 Built-in Functions and find nothing. Next I use the UNIX utility grep to search all the documentation text for the letters "hex". Nothing. I remember the C function atoi. I remember that Python has this function in module string. There I find "(Also note: for a more flexible interpretation of numeric literals, use the built-in function eval().)" Problem solved. I was in the wrong mental space.
    s = '0x3a'
    print eval(s)
The only thing of interest in this story is the process. I know a collection of facts about some system. I use these facts to try to uncover another fact. If I succeed, I either write down a note somewhere or memorize a new fact. The more I know about a system, the easier this process is. Imagine the pain I went through the first time I installed Linux: where are the X configuration files for RedHat 4.2? And I didn't know the word "configuration"!

I would be helped by (automatically generated) keyword links. Click on a little icon at the documentation for hex() and find all other places in the documentation where the words "hexadecimal", "literal", "OverflowError", etc. occur. In some cases, sets of associated words will need to be input into the code that generates the links. For example, "hexadecimal <--> base".

These sets of associated words can be useful in a broader context. Suppose I want to answer the question "How do I write programs that do things with images?" by searching the Internet. Once I find the first magic phrase, say "image processing", I probably can find others, "computer vision", "image understanding", "machine vision", "medical imaging", etc. Here are some further examples:

I searched DejaNews for articles in comp.lang.python with the word "newbie" in the title. The first four articles listed that had answers got me:

Back to hex(). As I wrote this, I searched comp.lang.python for "hex OR literal OR "word size" OR OverflowError" trying to locate questions about hex(). The results included:

    import string
    string.atoi(hex(150),0)
    eval('0xf')
    string.atoi('0xf', 16)
and a _lot_ of other stuff. It might be useful to embed canned news article searches into the documentation. Is there a better search string to use? Is there any software that can help eliminate irrelevent answers?

Here are the little programs 2hex.py and hex2.py.


    #=== 2hex.py ==================================================

    import sys

    if len(sys.argv) < 2:
        raise Exception, 'Must have at least one argument.'

    for i in range(1, len(sys.argv)):
        print '%X' % int(sys.argv[i]),
    print

    #=== hex2.py ==================================================

    import sys

    if len(sys.argv) < 2:
        raise Exception, 'Must have at least one argument.'

    for i in range(1, len(sys.argv)):
        print eval('0x' + sys.argv[i]),
    print
The following one line shell file has helped me a lot:
    grep $1 /usr/doc/python-2.1/*/*.html
Edit this to match your system.