Thursday, July 25, 2013

Pair Programming

Tom and I did some pair programming last week: he drove (wrote the code) and I navigated (dictated instructions). What we ended up with (link to GitHub) was something that crawls through text looking for color words. It divides the text into lines, and the lines into words. Any time the words red, orange, yellow, green, blue, or purple come up, the other words in that line are filed into a matrix. Once the algorithm is finished, it returns the set of words that have more than one correlation. For example, given the text, The Loves of Krishna in Indian Painting and Poetry by W. G. Archer, the program returned this:

The only really interesting result here is that Krishna is indeed blue (the literal translation of the name is "black" or "dark" but most depictions give him blue skin). It's singular, but still very exciting. One text is too small a dataset, so I'm building up some compilations of my favorite poets, transcendentalists, aesthetic philosophers, etc. I'll also tokenize the text by sentence rather than line (except in poems), and weigh the associated words by how close they are to the color word (so that in the line, red shoes by the newsstand, shoes gets more points than newsstand). I'll also add the words color/s and colour/s.

The results improved dramatically when I added words like black, dark, white, and light. These words are used much more often, particularly in metaphor. It occurred to me to start collecting those for Nila, my black and white painting robot, and I'm thrilled with the idea.

No comments:

Post a Comment