Introduction (The Gossip on Voice Chips)
This essay develops a frequently asked question (FAQ) list for Voice Chips. Like the questions in most FAQs, these questions are not actually frequently asked, but they might be, and like every FAQ, the attempt is to structure the accumulation of experiences in a sociotechnical project.
Voice Chips and their newer partners, speech recognition chips, are small low power silicon chips that synthesize voice, play prerecorded voice messages, or recognize voice commands. Although this functionality is not new, what makes voice chips unique is that they are small and cheap enough to be deployed in many, in fact almost any, product. Sprinkled throughout the technosocial landscape, their presence in products is a (not quite arbitrary) sampling mechanism, and enables us to compare very different products. So their secondary function, the concern of this essay, is as a simple instrument to slice through the history of our attempts to swap attributes with machines and be able to understand the nuances of complex sociotechnical systems -- precisely because the systems are rendered in the form in which we can best recognize nuance: English, be it our own or the machines'.
These chips represent in-the-wild models of the interactions between humans and machines -- as reductive as they are comic, but at least a manageable examination. They are caricatures of more complex human-machine interactive systems. We will ask: What is the structure of participation scripted with these new products and with increasingly ubiquitous information technologies? By examining the "structure of participation" (who addresses what, what addresses whom, who listens, what hears and who or what acts... and other forms of participation elaborated later) rather than focusing on the interaction between the device and the "user," we pay attention to peripheral participation, the participation between users and around things; between users and things within systems. It is an approach to human (singular)-computer (singular) interaction that reconsiders interaction as a form of participation and escapes the simple dichotomy between social and technological.
The question we begin with is simply, when things can talk, what do they say? Our intent is to actually listen, and try to figure out whose voice it is and what it means. We then ask the complementary question: when we can talk to things (i.e., when there is speech recognition capacity embedded), what do we say? Who are we addressing? And what do we sound like? Are we polite, at least? And what is the appropriate way to talk to things (social norms)? Does it change language to be talking across the human/nonhuman divide, or how does it change us? Can we get some new insights into the old question, Is language uniquely human? We ask the voice chips these questions because they literally talk back, insisting on the scripts of participation that they were built with, reflecting the expectations and failures of our interactive technologies.
How have the things that voice chips say changed over the years since voice functionality was implemented? Does the "Chatty Cathy" of the sixties (a tape mechanism precursor to the voice chip) have anything different to say vis-à-vis the Barbie of the eighties or contemporary interactive toys? Or what is the relationship between novelty and familiarity, stability and instability, managed in these devices? Which things are given voices and which things are not, and why not? Why are they different from other talking hardware? What did the patents say they would say? And what did they actually say? What are the differences between these innovations as intellectual property and the novel devices as viable products? Exploring these questions tells us about the process of commodification of an ephemeral device, and explores the pattern of propagation of an innovation.
Where are the voices coming from? Who is hearing them, who isn't? Whose accent do they have? Does the failure of voice chips in automobiles predict anything for the future of speech recognition chips inserted in other modes of transportation, and other places? Do they work in public or private places? What does any of this tell us about ubiquitous computing? Do these voices actually work? Does a voice chip reminder to not leave your things behind, watch your step, stand back, actually make you take your thing, watch your step, or stand back? How does the function of the product change the meaning of the voice? How does the voice change the product? How do nonhuman speech devices change language? And similarly: Now that we can talk to things, what do we say? What would we prefer to say? What would be the correct thing to say? What could we say? What does this tell us about the contingency of meaning?
Can You Capture Voice?
Voice is the icon of person. "To be given a voice" is shorthand for the fundamental units of democracy: voting, "being represented," or participating. A device of sociality and therefore interaction, it is used to interpolate a subject (presumably a person) into society (Althusser 1971), or as a performative device to instantiate social agreements and identities (Butler 1993). We will trace how the responsive and ephemeral social device of voice interaction is commodified and sold back to us.
What's So Special about Voice Chips?
Talking hardware has existed since before the time of Thomas Edison (who is generally credited with having invented the phonograph around 1877), when Alexander Graham Bell's telephone learnt to talk. The proliferation of talking hardware since has brought about the recording industry, the broadcast industry and the multimedia industry. Our exposure to voices (and other communicative sounds) that emanate from inanimate objects has become a significant part of our daily interactions: from radios to the more recent talking elevators, answering machine messages and prerecorded music, television, automated phone menus, automatic teller machines, alarms and alerts, each of which, as we will show, speaks in a language or dialect that makes little distinction between music, sound effects and articulated words, and privileges the situational function of language over the semantic and interactive.
There are, however, distinctions to make between the voice
chips, the concern of this essay, and noisy hardware more generally.
Voice chips refers colloquially to: Texas Instrument TSP50C04/06 and
TSP50C13/14/19 synthesizers; Motorola MC34018 or any other "speech
synthesis chip implemented in C-MOS to reproduce various kinds of
voices, and includes a digital/analog (D/A) converter, an ADPCM
synthesizer, an ADPCM ROM that can be configured by the manufacturer to
produce sound patterns simulating certain words, music or other
effects."
1
The speech recognition chip is exemplified by the ISD-SR3000
Embedded Speech Recognition Engine.
The voice chip differs from other technologies of automated
sound production in that it offers autonomous voices, as opposed to
broadcast voices. That is, voices which are not necessarily associated
with a performer, a brand, or any other preestablished identity. These
chips present what we will call "local talk" in products that refer to
themselves and don't often make claims to another's identity, or to the
faithful reproduction of someone else's voice. In fact, their sound
quality has effectively limited this. The "I" in "I'm sorry, I could not
process your request," or the "I will transfer you now" voice of the
automated operator
2
claims agency by using the first person pronoun. Presumably, the
machine is referring to itself when saying "I,"
3
because it is not identifiably anyone else.
Attributing agency to technologies is a strategy that has been
used by theorists to better understand the social role of technologies
(Latour 1988; Callon 1995). It is a strategy that dislodges the
immediate polarization of techniques and society, a strategy that
refuses reduction to a situation that is merely social or only
technological. Bruno Latour bases his Actor Network Theory -- a theory
that regards things as well as people as actors in any
sociotechnological assemblage -- on the ability of humans and nonhumans
to swap properties. He claims that "every activity implies a generalized
principal of symmetry or, at least, offers an ambiguous mythology that
disputes the unique position of humans." Michel Callon and John Law
(1982) have also explored nonhumans as agents, but their strategy starts
with an indisputable agent (a white male scientist) and strips away his
enabling network of humans and nonhumans to demonstrate that his agency,
his ability to act as a white male scientist, is distributed throughout
his network of people, places and instruments. The more traditional
(default) theory of technological determinism rests on the assumption
that technology has an agency apart from the people who design,
implement or operate it, and hence can determine social outcomes. Voice
chip products take these ideas literally and actually attribute, with
little debate or contest, the human capacity of speech to technological
devices. Voice chips humbly preempted the theory.
4
The voices of chips also differ from those of loudspeakers, TV/radio, and other broadcasting technologies in the social spaces they inhabit. Although radio and TV have become so portable that their voices can emanate from any vehicle, serving counter, or room -- voice chip voices, by virtue of their peripheral relationship to the product, inhabit even more diverse social spaces. The identity of the voice that emanates form TV and radio reminds us that it is coming from elsewhere: "..for CBS News," "It is 8 o'clock GMT; this is London." And although Channel 9 is not a physical place, its resources and speech are organized around creating its identity, as an identifiable place on the dial. The voice chip that tells you "your keys are in the ignition" is not creating a Channel 9 identity, however. Its identity is "up for grabs," not quite settled; it speaks from a position of a product in the social space of daily use.
Similarly, recording media and hardware refer to what they
record. We know we are listening to someone when we listen to an Abba
CD. And although it is the tape player in the car that produces the
sound, we claim to be listening to the violin concerto itself. The tape
player as a product does not itself have a voice; it never pretends to
sing, speak or synthesize violin sounds itself. The recording industry
and associated technologies, born at a very different historical moment
from voice chips, came out of the performance tradition.
5
Its claim to represent someone, from the earliest promotions
using opera singers, to contemporary megastars, has focused the
technologies around "fidelity" issues. Additionally, telephones,
telephonic systems and the telecommunications industry, motivated by the
communication imperative, prioritize real-time voices passing to
real-time ears over fidelity. Simply stated, it is an industry that puts
technologies between people, things to communicate through, "overcoming
the tyranny of distance" (Minneman 1991). Invisible distance and
seamless technology reflect the recording industry's ambition to
"overcome the tyranny of time," enabling people to duplicate the
performance regardless of when or where it was originally performed.
Voice chips and their inferior sound quality do not refer beyond
themselves. Their position in a product becomes their position as a
product.
How Are Voice Chips Distributed?
Voice chips provide the opportunity to add "voice functionality"
to the whole consumer-based electronics industry. They are the
integrated circuits that can record, play and store sounds, and more
importantly, voice. They are the patented chips that play "Jingle Bells"
in Hallmark greeting cards.
6
They are the voice in the car that reminds you, "Your lights are
on."
7
They are the technology that makes dolls say "Meet me at the
mall,"
8
and gives voices to products ranging from picture frames to
pens.
9
The well-sung virtues of integrated circuits (chips) are that
they are cheap, tiny and require little power. Smaller than a baby's
fingernail, they have the force of a global industry behind them and an
entire economic sector invested in expanding their application.
Technically, they can be incorporated into any product without
significant changes in their housing, their circuit design, power
supply, or price. Wherever there is a flashing light, there could
instead, or as well, be a voice chip.
Although most personal computers can record and play voice, the
voice chip is different in that it is dedicated solely to that function.
The same integrated circuit technology found in calculators and
computers allows this tiny package to be placed ad hoc in consumer
devices. Their development exploited the silicon chip manufacturing
processes and its dedication to miniaturization. With sound storage
capacities ranging from seconds of on-board memory to minutes and hours
of recording time when configured with memory chips, they were conceived
to enable voices in existing hardware, to be incorporated into products.
They are the saccharin additive of consumer electronics.
10
They were first mass marketed in 1978 by Texas Instruments,
though they had existed in several forms before that, particularly in
the vending industry. It was not until seven years later, in 1985, that
the Special Interest Group in Computer-Human Interface (SIGCHI) of the
Association for Computing Machinery (ACM) professional society broke off
into their own conference from other more general computing conferences.
This institutionalization formalized the discussion in design
communities on the Human-Computer Interface as a site of scientific
investigation that differs from earlier formulations of this interface,
such as Englebart's human augmentation thesis or Turing's
standing-in-for ideal (Bardini 1997), but whose concerns for evaluating
an interface tends toward task decomposition, with metrics of efficiency
still dominating (Dourish 2001). This liminal zone where people and
machine purportedly interact is where the voice chips were intended to
reside. The voice chips arrived to mediate, even to negotiate, this
boundary. Voice chips promised to make hardware "user-friendly," a
phrase that defines the technical imagination of the time, by turning
the person into an interchangeable standardized "user" and attributing a
personality (i.e., friendliness) to the device. In this context the
problem for designing user-friendly devices begins with the assumption
that the hardware has agency in the interaction.
Writes Turkle: "Marginal objects, objects with no clear place, play important roles. On the lines between categories, they draw attention to how we have drawn the lines. Sometimes in doing so they incite us to reaffirm the lines, sometimes to call them into question, stimulating different distinctions" (Turkle 1984, 31).
Do Marginal Voices Have Any Say in the Market?
Finally, before listening to the voices themselves, I want to emphasize the peripheral relationship of the voice chip to the product. It is the position of the voice chip as marginal, not particularly intended to be the primary function of the product, that increases the present curiosity in it. The motor vehicle, for example, is not purchased primarily for its talking capacity, and pens that speak are still useful for writing. This marginality gives voice chips a mobility to become distributed throughout the product landscape and mark, like fluorescent dye, a social geography of product voices.
The chips are usually deployed -- to borrow the economic sense of the term -- for their marginal effects, to distinguish one product (e.g., an alarm system) from another, and give it some marginal advantage over a competing product. However, the chips are not evenly distributed throughout competitive markets (e.g., consumer electronics) in the manner one would expect for the propagation of a low-cost technical innovation driven by market structure alone.
Although consumer preferences are often claimed to have a causal determination on the appearance or disappearance of marginal benefits, it is difficult to see how the well-developed paths of product distribution have the capacity to communicate those "preferences" developed after the point of purchase. Lending the market ultimate causality (or agency) ignores the specific experience of conversing with products, the micro-interactions that enact the market phenomenon, and occludes the attribution of agency to the voice chip products, insomuch as these products speak for themselves. The voice chip products themselves have something to say, although their voices are usually ignored. In this essay we will not be examining voice chip products in the interactions of daily use, as contrapuntal to market descriptions - however, by recognizing the social assumptions that determine their physical design, we frame the imagined interactions and social worlds in which these products make sense.
Hearing Voices?
The marginality of the product makes it difficult to
systematically study. Neither of the two largest manufacturers of voice
chips of various types (Motorola and Texas Instruments) keep information
on what products incorporate this technology, partly because they can be
configured in many different ways -- not necessarily as voice chips --
and partly because products that talk are not a marketing category of
general interest. This essay traces voice chips in two ways: first via
the patent literature, and second through a more ad hoc method of
searching catalogues, electronics, and toy and department stores, to
compile a survey of products that have been available in the last six
years (my voice chip collection was begun in June 1996).
11
What is initially observable from the list of products and patents that contain voice chips is that there is no obvious systematic relationship between the products that include voice chips and the uses or purposes of those products. Except for children's toys, no particular electronics market sector is more saturated with talkative products. These chips are distributed throughout diverse products. However, we can view the voices as representatives, as in a democratic republic where voices are counted. Just as in a republic, each citizen has a vote, but most chose not to exercise it; likewise, most products could incorporate voice chips but most do not. We will count what we can.
19.1. The talking watch. (These images are drawn from the ephemeral propaganda form known as the product catalog. They have been deliberately pixelated.)
What Do Voice Chips Say?
A review of the patent literature yields a loose category scheme
or typology, not by where the voice chips appeared (a technology sector
approach that we will visit later), but by what they said. The patents
themselves hold a tentative relationship to the products. For only two
of the products on the market did I find the corresponding patents, the
CPR device
12
and the recordable pen.
13
Though patents do not directly reflect the marketed products,
they do represent a rather strange world of product generation, a
humidicrib for viable and unfeasible proto-products. Patents track how
products have been imagined and protected; while they do not by any
means demonstrate market success, they do reflect a conviction of their
worth, being invested in and protected. Patents are a step in the
process of becoming owned, are therefore worth money, and thereby
demonstrate how voice, a social technology, becomes property.
There were as of October 2001 only 163 North American patents that included a voice chip. (More recent years show a proportional increase.) In the context of the patent literature, the first thing to note is that this is a very small number -- compared, that is, to the integrated circuit patent literature more generally. The question "Why not more?" we will return to later. The federal trademark office offers a suggestive list of speech-invoking names, including: who's voice; provoice; primovox; ume voice; first voice; topvoice; voice power; truvoice; voiceplus; voicejoy; activevoice; vocalizer; speechpad; audiosignature. These monikers introduce how the voice is conceptualized in the realms of intellectual property, in a different form, claiming that these voices are premium (should be listened to?) in various ways. However, the voice chips themselves seem to fall into the following loose categories:
1. Translators, which range from reporting and alerting to alarming and threatening and include "interactive" instructional voices;
2. Transformers, which transform the voice;
3. Voice as Music, that make speech indistinguishable from music or that present voice as sound effect;
4. Locating Voices, speaking from here to there about being here;
5. Expressive Voices, expressing love, regret, anger and affection;
6. Didactic Voices and Imitative Voices, mainly as in educational and whimsical children's toys;
7. Dialogue Products, which explicitly intend to be in dialogue with the user, as opposed to delivering instructions to a passive listener.
Products and patents often exist in more than one of these categories; for instance, the Automatic Teller Machine will not only apologize (expressive) for being out of order but will also simply function to translate the words on the screen into speech. This said, the categories remain, for the most part, distinguishable and useful.
Translators
A large category, this is the voice that translates the language
of buzzes and beeps into sentences -- whether English, French, or
Chinese. A translator is a chip that translates the universal flashing
LED, the lingua franca of the peizo electric squeal, the date code, the
bar code, the telephone ringer adapter that translates that familiar
ring, the tingling insistent trill of an incoming call, into "a
well-known phrase of music"
14
(an approach that has since become popular in cell phones, where
this function is useful in differentiating whose phone is ringing), or
the unrelated patent that translates the caller identification signal
into a vocal announcement.
Within the translators there are distinct attitudes; for instance, the impassive reporting, almost a "voice of nature." This is exemplified by the patent for the menstrual cycle meter. The voice reports the date and time of ovulation, in addition to stating the gender more likely to be conceived at a particular date or time during a woman's fertility cycle. Another example is the patent for the "train defect detecting and enunciating system," which "reports detected faults in English." These chips speak with a "voice of reality," reporting "fact" by the authority of the instrument that triggers them.
Another type of translator claims more urgency than those that
simply report fact. These raise an alarm and expect a response. They are
less factual, more contestable perhaps. Take the "Writing device with
alarm,"
15
an "invention which relates to a writing device which can emit a
warning sound -- or appropriate verbal encouragement -- in order to
awaken a person who has fallen asleep while working or studying"; or the
baby rail device which exclaims, "The infant is on the rail, please
raise the rail"... and then if there is no subsequent response from an
attendant caregiver, raises it automatically.
16
A product on the market that will politely tell you if there is
water on the ground is pictured in figure 19.2. These voice chips ask
for and direct the involvement of their humans counterparts -- they
assume "interactive humans."
19.2. Flood warning.
These chips articulate not only simple commands, but series of
instructions as well. The CPR device
17
in figure 19.3 guides the listener through the resuscitation
process. And finally, these chips translate menus of choices into
questions. The car temperature monitor that asks the driver, "Would you
like to change the temperature?" translates from the visual menu of
choices, but in the process also takes over the initiating role. What is
lost or gained in the translation generates many questions: Does
translating from squeals to a more articulate alarm make it any more
alarming? How do spoken instructions transform written instructions? We
will try to address these questions later.
19.3. CPR prompt rescue aid.
There is an notable set of aberrant but related patents that
exist in this "alarming" category: "Alarm system for sensing and for
vocally warning a person approaching a protected object,"
18
"Alarm system for sensing and for vocally warning of an
unauthorized approach towards a protected object or zone,"
19
and "Alarm system for sensing and for vocally warning a person
to step back from a protected object."
20
What seems almost like hair-splitting turns of phrase to get three separate patents has little technical consequence: the second patent has the extra functionality to detect authorized persons (or their official badge), and the third can, but need not, imply a different sensor -- but each implies a different attitude. Although all patents are contestable, patent attorneys typically advise that you would not be able to successfully claim as separate patents an alarm system that warned at 15 feet and one that alerted at two feet. The "novel use" being patented here depends on the wording: the phrasing of the instruction that determines the arrangement of the sensor and alarm/voice chip. On the strength of a differently worded warning, the importance of the technically defined product description seems to have diminished. Perhaps ElectroAcoustic Novelties, the owner of the patents, has a linguist generating an alarm system for other phrases. These patents seem to be articulating the semantics of the technology. The intentionality of the system is its voice.
19.4. Voice changer.
Transformers
Transformers are distinct from patents that translate the voice.
They translate in the other direction -- not from the buzzes and squeals
to spoken phrases, but from the human voice to a less particular voice.
For instance: to assist the hearing impaired, a chip that transforms
voices into frequency range the listener can still hear (usually a
higher frequency); or the "Electronic Music Device" effecting a
"favorable musical tone." "The voice tone color can be imparted with a
musical effect, such as vibrato, or tone transformed."
21
Into this category fall children's products like the "YakBak," popular in the 1997-1999 seasons, which plays back a child's voice with a variety of distortions; and the silicon-based megaphones that allow children to imitate technological effects, or sound like machines. These are voice masks, for putting on the accent of techno-dialect. The socializing voices broadcast on radio and TV, the voices of authority heard over public address systems, and the techno-personalities of androids and robots are practiced and performed by playing with these devices. This is also the category of voice chips that is concentrated in products for the hearing impaired or the otherwise disabled, and for children. These transforming devices act as if to integrate these marginalized social roles into a sociotechnical mainstream.
Speech as Music
Many of the patents that are granted specifically collapse any
difference between music and speech. This contrasts with the careful
attention given to the meaning of the words used in the alarm system
family of the translators. An explicit example is the business card
receptacle, which solves the problem of having business cards stapled
onto letters -- making them more difficult to read -- and provides an
"improved receptacle that actively draws attention to the receptacle and
creates an interest in the recipient by use of audio signals, such as
sounds, voice messages, speech, sound effects, musical melodies, tones
or the like, to read and retain the enclosed object."
22
Another example is the Einstein quiz game that alternately
states, "Correct, you're a genius!" or sounds bells and whistles when
the player answers the question correctly. This interchangeability of
speech and music is common in the patent literature presumably because
there is no particular difference technically. In this way patents are
designed to stake claims -- the wider the claim the better. The lack of
specificity, and deliberate vagueness in the genre of intellectual
property law contradicts the carefulness of copyright law, the dominant
institution for "owning" words.
Local Talk from a Distance
One would expect chips that afford miniaturization and inclusion
in many low-power products to be designed to address their local
audience, in contrast to booming public address systems or broadcast
technologies. However, several of these voice chip voices recirculate on
the already-established (human) voice highways, imagined to transmit
information as you or I would. The oil spill detector
23
that transmits via radio the GPS position of the accident, or
"the cell phone-based automatic emergency vehicle location system" that
reports the latitude and longitude into an automatically dialed cell
phone
24
-- these are examples of a voice chip standing in for and
exploiting the networks established for humans, transmitting as pretend
humans. This class of products, local agents speaking to remote sites,
is curious because the information can easily be transmitted efficiently
as signals of other types. Why not just transmit the digital signal
instead of translating it first into speech? The voice networks are more
"public access," more inclusive, if we count these products as part of
our public, too. The counterexample, of voice chips acting as the local
agent to perform centrally generated commands, is also common, as in the
credit card-actuated telecommunication access network that includes a
voice chip to interact locally with the customer while the actual
processing is done at the main switchboard. Although the voice is
generated locally, the decisions on what it will say (i.e., the
interactions) are not.
Expressives
The realm of expressiveness, often used to demarcate the boundaries between humanity and technology, is transgressed by voice chips. There are, of course, expressive voice chips ranging from a key ring that offers a choice of expletives, swear words and curses to the "portable parent" that plays stereotypical advice and parental orders to the array of Hallmark cards that wish you a very happy birthday, or say, "I love you." These expressive applications also remind us of the complexities of interpreting talking cards. The meaning of these products is of course dependent on the details of the situation, rather than on the actual words being uttered: who sent the card, and when; or what traffic situation preceded the triggering of the key ring expletive.
19.5. Recordable pen product.
These novelty devices lead into the most populous voice chip category: those intended for children. The toy department store Toys "R" Us currently has seven aisles of talking and sound-making products -- approximately 45 different talking books alone, in addition to various educational toys, dolls and figures that speak in character. The voices are intended for the entire age range, from the earliest squeaking rattles for babies, to strategy games for children 14 years of age and up -- for example, the "Talking Battle Ship," in which you can "hear the Navy Commander announce the action" as well as "exciting battle sounds." The categorization of the multitude of toys extends far beyond "expressive" types, from the encouraging voices inserted in educational toys ("Awesome!," "No, try again" or "You're rolling now") such as the Phonics learning system, the Prestige Space Scholar, and Einstein's trivia game, to the same recordable voice chips used for executive voice memo pads. Chips for children are placed in pens, balls, and YakBaks; then there is the multitude of imitative toys that emulate cute animals, nonfunctional power tools and many trademarked personae, from Tigger and Pooh to Disney's recent animation characters Sampson and Delilah, Ariel the mermaid, and others.
This listing demonstrates a cultural phenomenon that
enthusiastically embraces children interacting with machine voices, and
articulates the specific didactic attitudes projected onto products.
These technological socialization devices have already been subject to
analysis, as in Sherry Turkle's study of children's attitudes towards
"interactive" products.
25
Barbie, for instance, was taken very seriously for what she had
to say about the polarized notions of gender she embodies. Since
Barbie's introduction in 1957 she has been given a voice three times
(each with slightly different technology); her most controversial voice
during the 1980s was censored for saying, "Math is hard." This
controversy rests on the assumption that voice chips are social actors
and do have determining power to affect attitudes -- in this case a
young Barbie player's attitude to math.
Although Barbie is currently silent, a myriad of talking dolls
remain, from Tamagotchi virtual pets, with their simple tweets, to
crying dolls that ask to be fed, and an ever-increasing taxonomy of
robotic dolls and creatures. The utility patent literature continues to
award "new and novel" applications in this area. One of the "new" voice
chip patents is for a doll that squeals when you pull her hair (dolls
that cry when they are wet or turned upside-down are technically
differentiated by their simple response triggers).
26
There is also a new doll patent that covers an "electronic
speech control apparatus and methods and more particularly for...
talking in a conversational manner on different subjects, deriving
simulated emotions... methods for operating the same and applications in
talking toys and the like."
27
The functional categories at work here are not linguistic, nor
do they resemble other ways in which a voice has been transformed into a
document -- for example, as in the copyright of a radio show. It would,
in other realms, be very difficult to get copyright on "talking in a
conversational way." In the material world the ownership of voice has
been redefined.
Recording Chips
This category encompasses many of the most recent voice chip products. It is the existence of these products that tests the nature of the communication we have with these technologies: do we, can we, converse with these products? This category draws from the other typologies but is distinguishable, for the most part, by the recording functionality that is the raison d'être of the product. The category includes those products that perform a more specific speech function that could not be alternatively represented by lights, beeps, or visual display, i.e., perhaps they are more communicative. This category includes the products that seem to hold dialogue.
The category's range of products includes the shower radio (see figure 19.6) that reinterprets bathing as a time for productive work, an opportunity to capture notes and ideas on a voice chip, consistent with the theory that there is an ongoing expansion of the work environment into "private" life. It also includes both the recordable pen and its business-card-size counterpart, the memo pad. Both the pen and the pad have many versions on the market currently, and they seem to be becoming more and more populous. The YakBak is the parallel product for children, deploying the same technology with different graphics, and to radically different ends.
The growing popularity of this category compared to the others
arouses a number of questions. Firstly, how do we understand why this
category is popular? Is the popularity driven by consumers because these
products are successful at what they do? And is what they do dialogue?
Or is it that the cost and portability of the technology make it an
affordable newtech symbol beyond what is attributable to its function
alone? Is this category popular because it alone can be marketed as a
work product?
28
And then conversely, why are these devices not more popular? Why
is it that only a few types of products become the voice sites? Pens,
photo frames and memo pads are all documents of a sort, in contrast to
switches or menu choices.
According to the patent literature, "the failure of the market
place to find a need for voice capability on home appliances has
discouraged the use of voice chips in other products,"
29
but lending the market agency for design assumptions is circular
logic. This does express, however, the sentiment that many more products
could have speech functionality then do.
Although miniaturization has made these products possible, the concept of embedding recording capability in products has been possible with other technologies. There has been no technical barrier to providing recording capability in cars, or in any of the larger products -- a refrigerator, for instance -- certainly since the existence of cheap magnetic recording technologies. Why is it that now we want consumer products that talk to us?
It is striking that the majority of talking products on the market currently are for conversing with oneself. Although deeply narcissistic, this demonstrates a commodification of self-talk that transforms the conceptualization of the self into subjectivity in relationship with our products. It suggests, without subtlety, that the relationship with these products is a relationship with the self. The constitution of personal and social identity by means of the acquisition of goods in the marketplace (Shields 1992) -- the process of identifying products that provide the social roles we recognize and desire -- cannot be excluded from the consideration of the social role of pr