Signs: Words and Images
One thing that makes it difficult to separate literacy from visual literacy is that literacy itself is partly visual from the onset. In an essay titled “Visual Literacy or Literary Visualcy,” W.T. Mitchell (2008) outlined the confusion between seeing and reading, since reading itself uses the powers of sight:

If seeing is like reading, it is so only at the most rudimentary and literal levels. Reading strikes us immediately as a much more difficult acquired skill. Normally, before one can even begin to learn to read a language, one must already have learned how to speak it. If the writing system is phonetic, one will have to have learned the alphabet that coordinates the spoken with the written word: in this sense, we might note, the skill of reading is already a visual skill, since it involves the recognition of the distinct letters of the alphabet, and the linking of them with appropriate sounds. If the writing system is not phonetic, but ideographic or pictographic, then the demands of the visual system are even more profound. Chinese has over two thousand characters that must be memorized before one can begin to read, much less write. (p. 11)

A letter is a visual symbol, a semiotic sign; however, the phonetic letter (the alphabet) and the pictographic letter (the Chinese character) are different kinds of signs. Lupton and Miller (1996) have written, “Unlike pictographic or ideographic scripts, phonetic writing represents the signifier of language (its material sound) rather than the signified (its conceptual meaning or content). Whereas an ideogram depicts a concept, phonetic characters merely indicate sound” (p. 12). A written word is something that is seen and then linked with a physical sound, and this physical sound is linked with a meaning. This kind of analysis is based in the semiotic work of Ferdinand de Saussure. Echoed above by Lupton and Miller, Saussure pointed out that “the linguistic system unites, not a thing and a name, but a concept and a sound-image” (Saussure, 2004, p. 61). Saussure’s (2004) sign-signified-signifier distinction proposed “to retain the word sign to designate the whole and to replace concept and sound-image respectively by signified and signifier” (p. 62). A cursory study of semiotics suggests to me that literacy is not easily separable from the visual realm, since words are themselves partly visual sign symbols.
Graphic designers, typographers, and calligraphers would all attest to the fact that reading is visual. Books, newspapers, magazines, and most other text-based materials are visual entities, whether they have images or not, in that to one degree or another they all have design principles instilled in them. Many of these design principles are designed to function in the background and not be noticed. If legibility is compromised, or the reader is distracted by the design, then a font, spacing, or some other design attribute must be somehow awry. Leo Lionni, a well-known author of children’s books set only in the Century Schoolbook font, has said, “Typography should be seen and not heard, because reading is functional and should not be tampered with” (as cited in Heller & Pomeroy, 1997, p. 108). Lionni’s opinion may be that of the majority, but there are other graphic designers and children’s book authors who have pushed the limits of the letter itself, warping the look of the text to better visually represent the intended meaning. This practice could truly be said to need skills in visual literacy, since the word is now also a picture, and as such the meaning is being communicated by multiple modes. One early example of this technique is the way Lewis Carroll “painstakingly curled lines of hot metal type into the shape of a mouse’s tail in Alice’s Adventures in Wonderland” (Heller & Pomeroy, p. 108). Margaret Wise Brown’s Noisy Book from 1939 with its “machine-inspired” Futura font set with uneven spaces and line breaks to approximate “the sounds of household appliances and larger machines” (Heller & Pomeroy, p. 108). This practice of giving things a double-representation, by welding together word and picture, is quite common today. Picking up an advertisement flyer or a magazine, you will probably find at least one example of a word-image. This is also a technique I often use in Vidtionary’s video definitions. I have, for instance used bird feathers to spell the word ‘crow’, arils to spell ‘pomegranate’, and yellow-and-red leaves for ‘autumn’. A proponent of traditional literacy might not like to see the word played with in such a way, but I certainly think this technique of combining the word with image works well with Vidtionary’s goal of linking the word with the object or action it represents. Paul Rand, a well-known twentieth-century designer, said that “Logos are memory aids that give you something to hook onto when you see it, and especially when you don’t see it” (Heller & Pomeroy, p. 172). Since words are arbitrary in relation to the concept they represent, I think within the context of a visual dictionary it makes sense to melt the picture itself into the word in the hopes of creating a stronger memory link.
Today, new technologies make it easier to integrate text and image, including moving images, and also sound. Enabled by new technology, hybrid formations of text and image increasing permeate our society. Mitchell (2008) has pointed out there is little in the way of purely visual media. He wrote, “Media are always mixtures of sensory and semiotic elements, and all the so-called visual media are mixed or hybrid formations, combining sound and sight, text and image” (p. 15). Gunther Kress (1998) has suggested that visual literacy should not be opposed to traditional literacy, but rather has argued for the interacting of multi-modal literacies. Each medium has its own strengths and weaknesses, enabling certain types of communication, while prohibiting others (p. 54). In some cases, Kress considered images as being able to more efficiently convey large amounts of information. He wrote:

It is not an accident that the flight decks of airliners use visual and not verbal means for nearly all the information displayed. Nor is it an accident that dealers on the forex [foreign exchange] markets have information visually and not verbally displayed on their screens. In both cases, vast amounts of information have to be processed in microseconds. (p. 54)

He then suggested that new technologies, such as the computer screen, facilitate advances in the visual field and media in general. It is now possible for one person, working alone, to integrate sound, visuals, and text into their own composition. Schools, however, and academia in general seem somewhat ill-prepared to adjust curriculum to accommodate the highly evolved tools we have for multimedia production. He stated:

Contemporary technologies of page or text production make it easy to combine different modes of representation – image can be combined with language, sound can be added to image, movement of image is possible … But he or she now has to understand the semiotic potentials of each mode – sound, visual, speech – and orchestrate them to accord with his or her design. (p. 56)

Kress’ point supports what I think should be the goal of a visual literacy program: to learn how to freely combine media, while understanding the ‘semiotic potentials of each mode’.
In creating Vidtionary’s video definitions, I have to combine sound, images, and text. I think the methods and techniques I have learned and others I still need to learn are similar to those that educators should introduce to their students. These techniques include: the handling of fonts, photo manipulation, creation of simple graphics, film editing, 3D modeling, sound editing, audio editing, understanding where to access images and sounds, and finally how to combine various media elements. Visual literacy needs to be about the production and interpretation of visuals, in the same way that traditional literacy encompasses not only reading but writing as well. The ability to produce images is obviously going to better inform how we see images. The questions I have had to ask myself, in the task of creating video definitions, are also questions that arise when we consider literacy being expanded to include visual literacy, or text in a world of images. These questions relate to the interpretation of a word, and then how it should be visually represented.