Real-Time Imaging (P. Laplante, A. Stoyenko, Eds.). Piscataway, NJ: IEEE Press, January, 1996, pp. 261-299.
As multimedia unfolds, we understand that it is more the expression of technology at work than of fundamental science. This is not uncharacteristic of the entire real-time imaging field, and to a large extent of the vast majority of new human practical experiences having the computer at their core. The peculiarity of this development should serve as encouragement to proceed, parallel to continuing with innovation and experimentation, in the direction of articulating a stimulating, open-ended theory of the field. Such a theory could open new avenues and help foster further innovation. It may be too early to proceed with a grand scheme multimedia science, but some foundation can be set. The task is far from superfluous. Given the inter- and cross-disciplinary nature of interactive multimedia, students, as well as practitioners and innovators are faced with the need to understand concepts and principles of many disciplines that are supposed to converge here. In this spirit, the contributions the reader can expect from this text are: defining the field; characterizing current and possible applications; drawing attention to the critical component that we identify as the underlying aesthetics; showing how a heterogenous generic configuration is designed and implemented; dealing with the critical issues of authoring and navigation; introducing a method for real-time image acquisition; and, finally, addressing issues of the evolving networked multimedia. For clarity’s sake, examples are provided, solutions diagrammed, and applications discussed. While real-time imaging is the ongoing premise, the focus is rather on what makes it possible, and sometimes how, not what it is, which is probably its most changing aspect.
1 Time and Data Types
There are as many definitions of multimedia as the many flavors it comes in. One characteristic of multimedia is the richness in data types, in particular time-defined types. Moreover, the set of data types being open (i.e., more can become available at any time), it follows that richness is not only a characteristic, but a challenge. Richness in data types facilitates flexibility in conveying information, especially pertinent to dynamic phenomena. It also ensures the various functions of multimedia and its broader field of applications. Among the data types currently integrated in real-time imaging platforms are text, computer-generated images, electronic photography, imported still images, analog and digital video (in standard formats), as well as high definition TV (analog and digital) in real-time or pre-recorded, sound (real-time input or pre-recorded, in analog or digital format), real-time images from electronic microscopy, radioastronomy, seismology, or manufacturing processes, animation (on film, video, or computer-generated), film in its known formats, haptic data via transducers, and more. There is no limit, especially if we examine multimedia not in its canned forms (CD-ROM primarily), but as a dynamic system that can be used in almost any kind of human activity. Although to date games and documentary applications-historic accounts, in the form of “electronic books,” of events or accomplishments subject to interactive queries, exemplary case studies in medicine, geology, physics, et al-as well as teleconferencing, probably dominate the field, interactive multimedia is already deployed in many other activities. It covers e-mail, telemedicine, design, and engineering, CAD/CAM, entertainment. Businesses adopted the technology to improve marketing, but also for more effective information processing. Education is rapidly integrating it in the dissemination of subject matter for which the book, slides, the overhead projector, and video are no longer adequate. Training based on interactive multimedia applications running over client-server networks is now a practical solution to a problem businesses and state agencies have faced for a long time. And, of course, the military made multimedia part of a new way to conceive and solve tasks in an age of fast changes in methods and means of warfare. Regardless of the intended function, one thing should be clear from the outset: The fact that interactive multimedia integrates various data types is a curse and a blessing at the same time. One-dimensional, homogenous communication means of computer origin required a more easily manageable discipline. In the multimedia space of many means, the coordination becomes extremely difficult. What might have been, for better or worse, an issue of culture and intuition in the simpler world of computer graphics or desktop publishing simply escapes human control. Unless we realize what we have to do to cope with complexity, chances are that more will not translate into better.
1.1 Hardware Structure
General purpose, as well as specialized computers providing high-performance processing, gave to real-time imaging the power and diversity expected in more complex endeavors. Advanced graphics (including imaging), video, fast and effective rendering (extended to 3-D), not to mention the ever richer feature set of image display methods and technologies, are part of the multimedia technological advancement.The dominant real-time imaging structure still unites a geometry engine and a raster processor. Along the pipeline of polygon processing, huge amounts of data flow to the display component, where texture mapping, transparency, local shading, etc. are integrated into what finally appears as an interactive image. Further progress, from the operating system concepts down (multithreading, for instance), and obviously in the area of parallel processing, will eventually make real-time more real. The most seductive, as well as the really bare-bones, multimedia configuration is based on the traditional real-time imaging structure. Nevertheless, in order to address the goals for which the static image, and even computer animation, are less adapted, this configuration was extended in a combination where software and additional hardware (all kinds of boards) compensated for limitations inherent in the computer graphics structure. Of determining importance was progress in high capacity storage, as well as in the early adoption of standards defining what became the generic CD-ROM medium. As a publishing medium, CD-ROM and its relatively limited number of variations (CD-ROM/XA for fully lip synchronized, interleaved motion and sound, CD-I, CD-V, Photo-CD) made possible a viable alternative to proprietary formats. As a result, technical characteristics of the medium-access time in the first place-remain an area open to competitive solutions, while format engages providers of content in a competition for using the storage capacity for significant work and for making available appropriate supporting software tools (for navigation, indexing, retrieval, etc.).
1.2 A New Generic Medium
To account for the diversity of the various components of multimedia is already a very difficult task. Text on a page of an illustrated magazine or in a book (scientific treatise, popular encyclopedia, art publication) is unlike such texts in the context of a multimedia “page,” be it an e-mail message that integrates moving images and sound, or a motion clip. One can look at the author reading his poem, can hear the voice, can change the typeface of the displayed text (if one chooses to have it displayed); one can add music (already existing or synthesized “on the fly”). One can animate words, replace the reader with an actor who you think would do more justice to the poem. One can transform words into abstract shapes. We can, of course, visualize the world described poetically, regardless of whether it is a realistic rendition or an abstraction. All this has been shown in experimental or commercial products. The limit is that of our knowledge or imagination. Our respect for what we hear and see, our visual and musical abilities, and, yes, the multimedia platform we use affect the outcome. The same applies to applications that are not artistic or educational in nature.
Those cheap machines that integrate a CD-ROM, maybe even a laser disc player, and which their manufacturers call “the first multimedia station,” are not yet what they are proclaimed to be. If they were, they could support a design application suitable for architecture, product design, or event design (remember the Olympics?). The video sequence that an architect captured at the site of a commissioned building can be integrated in the architectural design; each new sketch can be turned into a 3-D rendition, placed on tape, and viewed from all possible angles. Fly-over, our obsession since the inception of computer animation, is as easy as movement through the not yet completely designed, not to say constructed, home, hotel, or university. Interactive programs, as well as non-interactive immersive virtual reality of multimedia intent, support walk-throughs. Motion is the feature (usually relying on some animation capability) supporting digital immersion. The balance sheet of the endeavor-e.g., how much such a building costs, how much a change in design or materials would affect time and cost-is part of such an interactive multimedia environment. So is the possibility to inquire into other databases in order to validate one or another design decision. Multimedia database is a powerful dimension of this new level of real-time imaging in action. New products or events involving thousands of people, better yet hundreds of thousands or even millions, can be designed in a multimedia environment. An events-designer literally “sees” what was planned in a multimedia rendition before the event takes place, and maximizes the intended effect of each sequence. Certainly, if trivialized, multimedia turns out to be merely a sleeker form of presentation-instead of one slide projector use ten (and never forget fade-in, fade-out effects from a dissolve unit); and instead of a video presentation, use a wall of monitors into which images from various tapes are fed simultaneously. You know the rest.
Many innovative projects, integrating video, film, 3-D images, and-why not?-virtual reality methods and gear, result in multimedia artifacts never experienced before. To a great extent, the project for the fixing of the Hubble telescope was a multimedia development with a virtual reality component. Technologically, all parts exist for endeavors of such and even higher complexity. We can “write” digital images to videotape; we can integrate video into digital computer graphics and animation. We can edit video, film, and digital sequences from a multimedia station. We can output to hard discs, to digital or analog video, to film, to printed matter, and to holography. But in order to achieve quality and value, we need to understand what all this means .The business community is interested in getting interactive multimedia presentations over networks. Powerful servers and effective visual database management programs reflect this interest. Imagine a business meeting where one can effectively show how the on-line point-of-sale financial results translate into an animation of redesigned products (“Forget the blue, it doesn’t sell! And change the length”), or a new store design (to avoid bottlenecks at checkout, or to minimize loss through shoplifting at Christmas time); or how a TV ad for a new product makes it into the games children play. A simulated world of the new genetics of individual diet can effectively translate into the action taken by those concerned about how they look or how much they weigh, how swimming, walking, or simply becoming aware of the control they have on their own decisions affects their health and life expectancy. Or, for the more demanding souls and minds, there can be generic shows of artists revered by the public, or of the works we would like to hang on our walls at home, of the music or poetry we’d like to experience. Again, technologically all this is possible today. The challenge is in understanding the aesthetic implications of “speaking” and “writing” in the multi-dimensional language of multimedia-time being the critical dimension.
Interactive multimedia can serve as a design tool and visualization medium. It can be an originator of communication (using heterogeneous sources of information) and political activism, or a medium for electronic publishing (laser discs, CD-ROM, CD-I, video in both analog and digital formats, and so on). It is a participatory medium, not a single device mixed output. Interactive multimedia is an exceptionally powerful educational environment: it allows for active discovery in the process of teaching and learning, supporting individualized “navigation” in the wealthy world of knowledge and experience.
1.3 The Gnoseological Platform
From among the many forms in which multimedia enters our practical experiences, the scientific endeavor leading to new knowledge stands out. Indeed, as knowledge in our age becomes more and more computational, the acquisition of knowledge and its experimental validation require gnoseological platforms on which real-time scientific goals can be effectively pursued. Fundamentally, the gnoseological platform made possible by real-time imaging has already benefited and changed science. It affords a new medium for representation, a medium for experimentation (simulation), and a new medium for designing (including the design of new experiments) based on the knowledge computationally acquired.
Knowledge in computational form succeeds where previous analytical attempts-many based on inductive procedures (from observations translated into data to information processing)-failed. Chaos theory is an example. As crude as the graphics routines used were, when Lawrence, Feigenbaum, Mandelbrot, et al made their observations, without computers our realization of chaotic behavior (population growth, fluid dynamics, the physics of weather patterns, etc.) would have been delayed. On real-time systems, initial knowledge was often refined, fractals emerged, and the new science from which they stem was put to use in imaging technologies (for instance, in compression algorithms). Computational science became possible once computer representations (or visualizations, as they are called) in a medium richer than graph paper, photography, film, video, and even holographs allowed for the development of a body of knowledge otherwise impossible. Radioastronomy illustrates this quite convincingly. The birth of a new galaxy is really not a static subject, but a dynamic process requiring powerful visualization methods. Regardless of how spectacular all these are, we are already past this level. Although multimedia is far from being the reality some claim to design and others to manufacture and sell-we will return to this soon-it already established the next level of possibilities and expectations on the gnoseological platform afforded by real-time imaging.
Scientific metaphors are as much metaphors as those of poetry. Only their intention is different. On the interactive multimedia gnoseological platform, hypotheses can be formulated visually, or even in some syncretic form that combines moving images, sounds, alphanumeric sequences, etc. Genetic research, as much as artificial life (AL) inquiries, is finding in multimedia a richer “language.” This composite “language” results from the cooperation and coordination of other languages. The production of meaning in this language is quite different from any other semantics we are aware of.
1.3.1 Science and Aesthetics
Despite the many differences among these three different forms of computational acquisition, representation, and communication of knowledge, one important aspect appears to shared among them: They all assume and rely on an underlying aesthetic component. More precisely, in all previous forms of knowledge, such as in the language of theories, in mathematical or logical formulations, or in the positivist approach of experiments, the aesthetic component is reducible to what is called elegance (of formulations, formulae, experiments). Mathematicians, physicists, chemists, and many others in the sciences wrote about the beauty of theories that seem to parallel their appropriateness or even validate their truth. But once science becomes computational, and computation offers powerful real-time imaging, the underlying aesthetic becomes more complex. Moreover, since decisions of an aesthetic nature, such as selection of color codes (e.g., for representation of abstract entities) or visual conventions (for dynamic phenomena), have to be made prior to the scientific formulation, it is clear that such aesthetic decisions “tint” the knowledge. Therefore, it is not unusual in our days to have on the teams of research facilities-some of the supercomputing centers are known for this-people qualified to handle questions of aesthetics. In the world of graphics and visualization, graphic designers were sufficiently qualified to support scientists in their attempts to visually formulate new hypotheses. That in some cases images coming from the gnoseological platform are still ugly, or far from clear, is a phenomenon experienced inside and outside the scientific community. With the advent of animation, the situation became even more critical. Multimedia, which integrates real time, is faced with even bigger challenges. As a still new experience, it has probably failed on aesthetic grounds as often, if not more frequently, as on technological and scientific grounds. This is the reason why, before attempting to think of a multimedia configuration, regardless of its intended functionality (entertainment, education, business, medicine, gnoseologic platform, etc.), one needs to account for its intrinsic aesthetic characteristics.
Indeed, the complexity of the tasks necessitating the deployment of interactive multimedia is paralleled-make no mistake about this!-by the aesthetic complexity on whose account multimedia finally succeeds or fails. Having asserted this, some further consideration regarding the matter have to be made.
2 What We Know and What We Have to Discover
Since Gutenberg (and the Chinese well before him), we learned how to deal with the printed text. Since Niépce and Daguerre, we learned heliography, i.e., how to “draw with light,” that is, how to take photographs. Since Muybridge, Dickson, and the Lumière brothers (at the end of the 19th century), we learned to work with film; and later on, with Nipkow, Leblanc, and Zworykin, to use television. Our experience with computer graphics, for over 25 years now, has also taught us many things about a new medium, as well as a new way of thinking. We have integrated the experiences of printing and computer graphics, making desktop publishing a reality. We have integrated scientific data acquisition, design knowledge, and imaging into powerful visualization. Aesthetic structure gave coherence to our knowledge, more and more made up of fragments of specialized research.
When all seemed clear and settled, a new perspective opened: interactive multimedia. Curiously, while the graphics pipeline (described in the introductory lines) as it evolved in a rather static environment is still maintained, the new perspective opened by multimedia is fundamentally dynamic. Interactive multimedia is one among other new technologies resulting from progress in computation. And it is one of the fastest growing, both in terms of the technology and in business terms. Unfortunately, it reached this status of success even before its many originators could take the time to understand what it is. They rushed to patent almost trivial aspects instead of asking themselves some fundamental questions. (Compton’s New Media Hypercard Handbook is the best known example.) For many, it is just a gimmick, a buzzword that brings in grant money and new clients. For others, it is a new name for what they did before, but now with faster computers, more memory, and better output devices.
2.1 The Challenge of Dynamics
For those really serious about what they do and how they use their talent, knowledge, and money, it is a field still in the process of defining itself. This process is not easy. We know a lot about printing, photography, typography, graphics, video, and film. Each carries aesthetic assumptions already integrated in our culture. Each field established expectations of quality. Experience in these media showed that type is not reducible to the rules of calligraphy, a picture is not a drawing, a movie is much, much more than a sequence of moving photos. While there are common qualities, usually defined as aesthetic characteristics-symmetry, rhythm, harmony of colors or shapes-each of these media has its own condition. The constraints of type-even when taking into consideration the vast difference between the hot type of yesteryear and today’s digital type-are dissimilar in nature from the constraints of photographic film. The qualifiers “slow” or “high-speed,” to describe sensitivity and granularity, are only at the tip of the iceberg. So are the many different ways of processing and printing. The dynamic quality of cinematography is, despite appearances, unlike in nature from the dynamic quality of a video. Film resolution, understood as part of the intrinsic aesthetic condition of the medium, is yet to be matched by any other visual media we are aware of.
Multimedia, as already stated, is different from the particular media it integrates. In interactive multimedia, text and image and movement, the worlds of the digital and of the analog, sources of images, sounds (from reality or synthesized), animation, and everything else that people use in expressing themselves can be united. Interactive multimedia is effective when all its components are well designed and their integration results in an expressive unity, subject to the active participation in the work, regardless of its functionality, by the viewer. Interactivity means the integration of the viewer in the work.
2.2 The Multimedia Book
The outcome of good interactive multimedia recalls the Gestalt principle: The resulting dynamic entity is more than the sum of its parts. Non-linearity and non-sequentiality, which are intrinsic to any visually oriented activity (integrating sound or not), confer upon interactive multimedia possibilities that no other medium or tool have when taken independently.
The new kind of thinking required by multimedia is not the result of combining what some knew about type, what others knew about photo or video cameras, what yet others knew about computer graphics and animation, optical storage and hypercard (or any other hypertext embodiment). While film scripting and editing comes closer to multimedia, it is still not at the same level of complexity. I was once commissioned to find out what it would take to create a CD-ROM of one of Isaac Asimov’s stories about the future (I, Robot). A book publisher, who quite conveniently owns the rights to the story, wanted to step into the world of interactive multimedia. The product had to allow an animated fictional character to navigate the reader through the text, making visual each part of the world Asimov described. It also had to introduce the writer himself (at that time seriously ill, but enormously interested in a “new book” of interactions and anticipatory characteristics). A game, i.e., the making of a robot from parts described in the text, was also desired. This introduced an interactive dimension, a challenge to the “reader.” Finally, if the “reader” so wished, the story should be printable from the disc so that the pleasure and intellectual reward of a literate understanding of the text would continue the digital multimedia journey. To accomplish all these in a proper way one had to design an interactive digital book shell, not only to illustrate Asimov’s text. The project, if approached properly, is neither technologically nor aesthetically trivial. Such a project changes our cultural notion of a book, of reading, of interaction. It goes well beyond the mediocre SONY and Voyager products marketed as multimedia, and beyond what finally the publisher actually produced. To trivialize instead of adding new dimensions to past values is a danger interactive multimedia should avoid at any price.
2.3 Conflicting Demands
To design multimedia means many things. Obviously, it means to understand all that can fuse in the new syncretic language, to be aware of technical and communicational aspects, sensitive to the viewers’ cognitive characteristics, and willing to challenge stereotypes. As someone said, the public at large, and technology innovators in particular, is more familiar with bad design and conditioned to prefer it. This is what we receive through all media most of the time. Still, for multimedia to succeed, it has to develop means and methods of expression that ensure an effective trade-off between functional expectations and aesthetic goals. Information driven design is the place to start at. Indeed, defining what the specific problem addressed in a given case is, the designer acknowledges two sets of choices: a) what has to be achieved; b) how to achieve the goal. Functional considerations can be formalized in an information processing language. Aesthetic considerations are formal in nature, but functional in the context we work in. Contrast, for instance, is a powerful aesthetic element. Used appropriately, it enhances functionality. Used excessively, it can detract attention from other elements.
There is no list of aesthetic goals that can be checked off with the expectation that once each element is in place, the whole is right. But there is an aesthetic rationality that starts with a general expectation of elegance-the Latin root of the word associates elegance to selection, choosing-and cultivates simplicity. A multimedia design needs to promote integrity. Corruption of expressive means-colors, shapes, rhythms, and others-is as bad as corruption of data. In order to achieve integrity, means of expression have to be kept to an expressive minimum. In order to achieve this minimum, which depends on the context, the designer works through stages of refinement. In so doing, each formal element is evaluated for how appropriate (or not) it is within the broad scope of the multimedia work.
Multimedia being a mixed medium, its design starts by acknowledging time dependency. Scale, contrast, and proportion are all time sensitive. The language of multimedia might be visually dominant, but it is not reducible to only what we see, but also when and how what we see, and timing, enhance each other. Under certain circumstances, an image can be purposely fuzzy (not enough contrast) if the sound or other time element (movement) compensates. On paper, lack of contrast is deadly. In multimedia, it can be a powerful way to attract attention to something that otherwise would pass by in the flow of images.
Aesthetic integration is of extreme importance. This means to acknowledge not only how various data types are properly processed, but also how a “house” is built from the many different “bricks” available in the digital realm. We will probably have to expand the Gestalt body of knowledge that deals with images (figure/ground strategy, for instance), to see how the composite multimedia can take advantage of the “hard-wired” characteristics of human holistic perception. Perceptual structuring (which is what Gestalt means) makes us aware of the role of proximity (what is close to what, but in terms of an expanded notion of closeness, i.e., which sound is close to a fast moving laser beam), similarity, continuity, closure. This can translate into design strategies for grouping components of multimedia, for introducing hierarchies, or for building on shared conventions of aesthetic relationship (color contrast related to sound contrast, for instance).
2.4 The Road to Interactivity
Regardless of the purpose for which it is used, interactive multimedia needs to be well conceived, appropriately designed, and technologically comprehensible and manageable. This last expectation raises, among other things, critical aspects of the role interfaces play in interactive multimedia . Production tools already come with a language of operations, in extension of the platform dependent user interface. Presentation tools introduce new conventions. Ideally, there should be no real gap between the two, except that the viewer/participant should not be subjected to learning yet another interface (or to reading hundreds of pages of manuals). All these expectations are a tall order. Multimedia requires integration of technological means, design at a higher level of the heterogeneous ways and means people use to express themselves, and dynamic qualities. It is non-sequential; that is, it is a configurational space of expression and communication. It implies an appropriate understanding (cognitive, cultural, and sociological) of how people relate to such a wealth of means of expression, and of the dangers of manipulation and disconnectedness from reality. Last, and perhaps the most difficult, it implies an understanding of interactivity as a way of unleashing human creativity, and of eventually making us all spiritually richer through vastly shared experiences.
3.1 Analog and Digital
In order to video record computer graphics animation, one has to address the issue of converting digital data into analog signals that meet standard display characteristics. A computer graphics monitor is predominantly a red-green-blue (RGB) component display of non-interlaced 1280 x 1024 signal scanning at 75 Hz. Broadcast video, using the 1941 NTSC (National Television Standards Committee) standard, or the two European systems known as PAL (Phase Alternative Line) and SECAM (Sequential Color and Memory), applied mainly in France, of more recent times, is a composite interlaced 525 (in the USA) or 625 (in Europe) lines signal, scanning at 30 Hz. The problem at hand is the conversion from the computer graphics display to video broadcast. Conversely, the problem is how to convert a video signal-in the standard formats mentioned-into digital data. In its generality, this problem extends to high definition (analog and digital) television and recording technologies, seen either as input or output or both. Evidently, moving images are quite demanding in terms of memory and synchronization. Live video usually includes sound, and this adds another level of complexity.
Keeping the entire approach simple, let us follow a concrete situation. Computer graphics animation is sent through a frame buffer to video-encoding hardware. Digital frames are thus converted into analog components (RGB) signal. A scan converter brings down the scan rate of the component signal obtained as output from the buffer to the horizontal and vertical refresh rate of a video monitor. Within the same process, the resolution is corrected downwards. A video encoder (NTSC, PAL, or SECAM) encodes the signal, translating the component (RGB) non-interlaced signal into composite interlaced. A so-called Sync Generator synchronizes video output with broadcast timing standards. Gen lock, i.e., locking the signal to the timing of the deflection (“electron gun”) of the TV is a function of extreme importance for video quality. A video camera could, under circumstances, take the desired image from the computer screen directly. But short of providing frame accurate recording, the video quality will not be more acceptable than a Polaroid picture compared to a quality photograph.
Another possible avenue is storing animation frames on analog or digital videodisc recorders (Frame Storage Devices). Being also frame accurate and offering large storage capacity, such devices help in handling the memory bottleneck of live video processing. Up to 80,000 frames (and recently, up to 60 minutes of live video) can be recorded. These media are either write-once-read-many (WORM) or even re-writable.
On the video-to-computer side, almost the parallel sequence has to be followed. Again, some solutions are standard (and in the meanwhile integrated in some desktop stations). Others are proprietary.
Similar paths must be provided for the processing of sound. Usually, sound synthesis and interfaces to electronic instruments are also provided. The low cost digital-to-audio converter chip was replaced by computer based audio at 16-bit 44.1 KHz digitization, which is the resolution of standard audio CD. In recent years, some of these functions were integrated in computer stations (from the pioneering NeXT station of mixed memories to Silicon Graphics, Sun, Hewlett-Packard, and to the Apple Quadra machines). Analog-to-digital (A/D) converter circuits and digital signal processors (DSP) are among standard features of machines designed with multimedia applications in mind. The real-time performance of the digital signal processors (e.g., the 24 bit DSP 56001 from Motorola, or the 32 bit DSP 3210 chip from AT&T) ensures processing of sound as it is inputted (via microphone, synthesizer, interface, electronic instruments, video channels, etc.). It must be pointed out that with the increased attention to voice recognition, the expectation is that in interactive multimedia this technology will become a standard. The DSP chip clearly helps in the task.
3 A Configuration Shell
Having looked at interactive multimedia from a functional perspective-from the gnoseological platform to the minimal computer-CD-ROM player (or battery of players) combination-and from the perspective of the underlying aesthetics, we can conclude that various applications will require various configurations. There is no such thing as an off-the-shelf multimedia system that can satisfy each and every demand. The major effort in designing a multimedia configuration is in determining the set of data types and the aesthetic constraints culturally acknowledged in respect to each type. In addition, issues of interfacing-process interfaces, as well as user interfaces-become critical in providing the best return on the investment. Generally speaking, while multimedia became possible exactly because faster real-time imaging machines, at the core of this platform, are cost effective, these platforms almost always incorporate expensive proprietary technologies. In addition, awareness of intellectual property is a significant factor in pricing visual databases from which multimedia applications will in time extract more and more information.
After exemplifying many varieties of applications, we can suggest a shell configuration, actually a design that was implemented for purposes as diverse as design and production of multimedia products (including laser discs and CD-ROM), design research, modeling and simulation, and even knowledge acquisition and dissemination. Without going into detail, we have to explain some of the concepts and technologies incorporated in the configuration.
3.2 Voice and Handwriting Recognition
While searching through huge image and sound databases, it is quite counter-intuitive (and counterproductive) to be constrained by keyboard input. Nevertheless, the state of the art is still some steps away from commands based on voice recognition. Even handwriting recognition is not yet so effective as to justify a pen-based computer as front end for the search engine. Be this as it may, chances are very good that in not too distant a future (2-5 years), the new interactive multimedia configuration will routinely use them. Actually, in order to reflect the awareness of further progress in areas such as voice or handwriting recognition, as well as in other aspects of multimedia, a very useful design concept is modularity. Not that we would simply pull out the keyboard and replace it with a microphone at some moment in the future, but we could preserve the integrity of the configuration and work on the new interfaces, drivers, and routines that allow for better solutions. Together with the notion of modularity, we should build upon the expectation of networked multimedia, that is, distributed interactive multimedia. Of course, this layer introduces constraints and standards of broadband high speed communication (and the appropriate protocols), which extend beyond the scope of this text.
3.3 A Multimedia Publisher Environment
The configuration I am presenting is an attempt to integrate the desktop, videotop, soundtop, and production means necessary to output in a variety of media. Omitted are specific input and output devices that a scientist would use in the gnoseological platform, or a business executive in a presentation (slide projector controllers, dissolve units, LCD displays, or large screen projectors, etc.). The configuration is broken into two modules: Editing and Recording. Other modules can be added as the nature of the activity requires. Instruments for monitoring the process, in particular level and quality of signals, are indicated. Their presence (or absence) does not affect the functionality. Switching can be performed via physical patches or through soft-patching in a programmable router.
In these two diagrams, provisions were made for monitors and measuring devices which, although they do not affect functionality, are of critical importance for the quality of the outcome. The principle is that of viewing images before and after each processing step. Regardless of how well hardware and software work, conversions always result in loss of detail or in various forms of shifting (colors, shapes, timebase, etc.). Even the first generation (original image or master copy) will not automatically guarantee the integrity of sound or of the visual, or of their combination if the production does not adhere to strict rules of quality control and does not provide various correction facilities. Rendering of animation sequences often poses major issues of color saturation (they appear as “dirty” on TV monitors) or brightness (luminance). A graphic designer, used to the medium of mixed pigments and print, is simply not equipped to address the complexities of video design. Video producers have problems in realizing the distinction between a tape (non-random-access medium, of constraints specific to the magnetic medium) and a laser disc or, even better, CD-ROM.
We mention all these aspects mainly in order to specifically point to areas where the underlying aesthetics is subject to corruption. Some of this corruption is reversible-a digitally scanned image can be corrected until it carries the values of the unaffected original. In other cases, such as in digitizing a live video image, the corruption might not be correctable, or the reference might become unavailable. The consequence differs from application to application. It is evident that gaining knowledge about phenomena requires extreme precision and faithfulness to detail, insofar as they taint the knowledge. For presentation purposes, the choice is somewhat broader. The unity among time-defined data types needs to be preserved in ways appropriate to the multimedia goal. But even in this case, there are distinctions to be made among different times involved. Synchronization is implicit in the process; timelines are explicit in multimedia scripts; real time, of occurrences, natural phenomena, etc., are part of the larger temporal scheme of human life and experience. None can be treated without the understanding of how they might affect each other, or the interactive quality of the multimedia experience.
4 Authoring and Navigation
Multimedia is often a real-time participatory interactive process. It can also be interactive communication, a publication or a distribution medium, or a combination of these. Until the day data, regardless of their type, can be saved to a CD-ROM device within a multimedia application similarly to how we “write” data to a floppy or a hard disk, many more gigabytes and terabytes will have to pass through the pipelines leading to current publishing techniques. Nevertheless, for low volume situations, as they can occur in scientific applications, or for extremely customized products, alternatives to proofing and limited publishing are already available as an extension of the desktop. The so-called compact disc-recordable (CD-R) offers such an alternative. As a matter of routine, almost like the desktop brought printing to the individual writer, multimedia brings electronic publishing to the author.
Multimedia publication is of interest here not through its ever changing technology, but rather in view of the many implications it has on managing and formatting data, as well as in providing means for successful navigation. It really does not matter whether the data winds up on a glass master or on a polycarbonate substrate with pre-patterned grooves topped by a photo-sensitive dye (this is the CD-R technology). What does matter is that heterogenous contents-video, slides, sound, animation, etc.-are fused into a multimedia product that will be accessed by its viewers through the regular CD-ROM players. These viewers will not know about the standard spiral file format (the ISO 9660, with its strict file naming conventions) shared by CD-ROM, CD-audio, CD-I, and similar products (such as the Sony Data Discman and the Kodak Photo-CD), but they will definitely realize how much more economical the medium is. From archival storage to testing work in progress (interface, interactivity, integration of various data types, etc.), functions of a full-fledged CD-ROM are provided in CD-R. Once larger volume is considered, the work can be economically recorded to full CD-ROM. Multimedia publications, compared to paper, are 100 to 1000 times cheaper. As we cannot afford any more printed full color catalogs of art, maps, furniture, cars, tourist offerings, new fashion collections, and so much more, we can afford them in CD-ROM. Clinical data, teaching materials, and all kinds of catalogs can be published and interactively accessed. But for this to happen, one must either buy commercially available authoring programs or develop such programs in order to do justice to the content published. The same applies to retrieval tools. Publishing software would need to meet precise expectations of those needing it: user interface, emulation of CD-ROM on hard disc, compliance with the ISO 9660 file naming conventions-along with the possibility to ignore them, e.g., when only in-house archiving is performed-format variety, control over physical location of files, etc. The software will also have to meet expectations of a more general nature: ability to integrate all the desired data types; good scripting facilities; the possibility to support external commands might become necessary, i.e., the program should allow for additions of features (or extensions). Last but not least, the publisher should be allowed to distribute run-time files created in the software.
Multimedia is a non-sequential, non-linear medium. These characteristics should be accounted for in the publishing software. Non-sequential means that various components do not have to be provided in the sequence of writing (letters, words, sentences); moreover, they do not have to be accessed as linear tape is. Random access to video changes the condition of the medium, regardless of whether multimedia only integrates video or is outputted as video. It confers upon the medium dynamic qualities that make video interactive. Navigation through the wealth of multimedia is far more demanding than reading a book, watching a tape, or listening to music. Pretty early in the evolution of multimedia, attempts were made to apply the experience of navigation in text-based contents. This is why hypertext-the visionary non-linear reading and interpretation of text suggested at the end of WW2 by Vannevar Bush  in his prophetic “As We May Think”-was examined as a candidate for handling complexities well beyond those of large collections of static information. In the meantime, hypertext was embodied in some commercial products (Apple’s Hypercard™, Asymetrix’s Toolbook™, and others).
For large collections of images associated with a comprehensive database, with video, and with sound, several ways to search, retrieve, and access can be defined. In what follows, I shall describe a simple hypermedia model inspired by the classic function of a docent (in a science or natural history museum, for instance). A docent is a guide knowledgeable about the contents of a collection. Docents continuously improve their knowledge. They can actively take notes as new data about the collection becomes available. They author articles or lectures on subsets of the collection. Assuming a collection of images on a laser disc (slides, video, sound), the following diagram explains the hypermedia situation to be handled by a docent:
On a more general level, we can think of a hypermedia document and how information pertinent to it can be stored in card format (since the card, discussed in Vannevar Bush’s article, is the paradigm adopted, regardless of the computer platform). Whether laser disc, CD-ROM, hard drive, floppies, slides with indexed positions on a remote random access slide projector, the content is indexed on cards. Accordingly, the space of addresses is what the program searches through. Hypermedia extends the idea of hypertext to include still and moving images, and sound . Figure 5 shows the relationship between the Docent and a videodisc. The Docent contains the database that can be searched. The text can contain links to still images, motion picture clips, and sounds that are stored on some random access medium (videodisc, CD-ROM, remote random access slide projector, etc.). A link may point to a still image on the videodisc, or to a motion sequence that includes sound. A single image may actually be a still frame from a motion sequence. One single Notebook Note Card can include text and recorded voice comments, as well as links to still images and motion sequences with sound.
A Docent and its associate image database is a hypermedia document. Reading hypermedia documents is accomplished with the aid of links that create and define the web of the hypermedia document. The process of following links is called navigation or exploration. Hypermedia links can create very richly detailed presentations which simultaneously engage several of the viewer’s sensory modalities-visual, auditory, and kinesthetic.
Data Cards within the Docent contain all of the information the Docent knows about images on the videodisc. Each Data Card contains fields. (Every data Card has the same number of fields and the same kind of information.)
4.1.1 Design Objectives
The primary design objective of Docent was to avoid creating overly complex navigation and information structures. The aim was to end up with a program that made it easy to navigate through large collections of text and images, while at the same time providing powerful search tools for locating desired images. When creating hypertext and hypermedia programs, there is a tendency, to create environments that make it easy to “get lost” while examining links, making it difficult to get back to a particular point. Avoiding this phenomenon-the “Where am I?” problem-was a major organizational principle behind Docent. The use of breadcrumb trails is a typical hypermedia and hypertext navigation tool. As you look through the information, the system remembers where you’ve been, creating implicit links between bits of information. Usually, however, this means that it is necessary to backtrack along the trail to get to an intermediate point along the trail. (Jumping back to the beginning is usually quite easy.)
The organization of Docent is such that it is possible to jump from one part of the program to another very easily. Exploration using Docent is supposed to offer two major conveniences:
1. Structural search requests using Boolean logic without assuming that the users know what it is (while not penalizing those who do)
2. Providing an intuitive point of entry; that is, by not assuming that those searching already know what they are looking for (name of the book, melody of the song, color of a flower, etc.).
4.1.1.a Boolean Operators
In order to extract knowledge from a database, Boolean operators (“And,” ” Or,” etc.) are frequently used. While an art historian might look for a work identified by the artist’s name, the medium (oil, watercolor, ink, etc.) and genre (landscape, portrait , still life), a salesperson might need to match specifications of size, material, price, availability of spare parts catalogued in electronic format. Getting the search to come as close as possible to how people formulate their objectives requires an interface that is intuitive, but also sufficiently precise. Driven by an iconic interface, the Docent allows the user to define search criteria:
These are evaluated by the program as they are entered. If a criterion is not available, the user is promptly informed. For the informed user, multiple search criteria can be entered (such as sequences of OR, AND, etc.). Performance of the search engine is only marginally affected by the multiple criteria. Searching can be performed in Automatic procedure or Manual (the search starts once the user initiates it).
4.1.1.b Point of Entry
Regardless of the content of the database, and even of its function, the point of entry issue is critical and helps the user find out what is available. A Search Index was put in place in order to inform the user about each Search Category (database fields). In other words, the Docent tells the user what will be recognized as a criterion.
4.2 Docent Notes
A major objective in designing the Docent was making possible a record of searches and allowing for activities such as annotations, note-taking, and authoring. As already mentioned, this is done in a utility called Notes (no relation to the Lotus product). Each successful search can be “written” to Notes. That means that subsets of images, sounds, video, or a combination, can be selected. If the videodisc is about art history, the subset is like the set of slides a professor prepares in advance for a given subject (“Portraits by Rembrandt” or “Expressionist Landscapes” or “The Mother in Luciente’s Paintings”). For collections of images regarding design (CAD, for instance), engineering, architecture, or fashion, the subsets extract all that relates to an intended function. Each time the subset is “written” to Notes, database information is automatically copied there, too. This allows for interactive multimedia lecturing, with prompts provided on the screen as the lecture is delivered. If in need of changing the order within a selected set, the lecturer can interactively “shuffle” the “slides,” or get back to any image desired.
All Notes texts can be exported, together with digitized images, from the videodisc to a desktop platform. Thus a full authoring system for interactive multimedia “performance” is provided. This “reading” software emulates all the functions of the “writing, i.e., publishing software.” It can be used to generate product catalogs, interactive networked presentations, articles, or new multimedia presentations.
The MetaDocent creates Docents for different kinds of large collections of data, regardless of their kind. It is a high level program that compiles a text database pertinent to multimedia components regardless of where these are physically located (on hard disks, videotapes, laser disks, CD-ROM). Basically, a MetaDocent is built on top of an empty Docent. It allows an informed developer to specify field names, describe field types, and to determine formats and search criteria. The program verifies proper length of records and proper parsing of search indexes. Functionality of Notebooks is also defined at this level. An import facility from databases checks for the integrity of the data. Search indexes and external indexes are automatically generated. Once the MetaDocent finishes creating a particular Docent, a Yank function removes all that pertained to the labor (like builders removing scaffolding after they finish a house).
4.3.1 From Entry Level to High End
As succinct as this presentation had to be, it shows how one can derive a hypermedia environment from a hypertext structure. The role played by the user interface is critical, especially in view of the many complexities multimedia production has. The MetaDocent can be used as an authoring tool for creating CD-ROM, videodisks, or combinations in which all data types are present.
Many other tools available in the market offer functionality appropriate to a variety of applications. Between low end (entry level) programs incorporating Quicktime of Video for Windows (for integrating video sequences or sound tracks) and high end authoring systems, the difference is not only in price. At the high end, programmability is a characteristic. It can be embodied in visual programming, i.e., manipulation of icons “underneath” which code is hidden. Such code can be for calls to hardware devices or responses to user inputs (interactivity). Chances are that the major computing platform (Windows, OS/2, Macintosh, UNIX) will evolve in the direction of acknowledging standard formats while simultaneously maintaining the competitive edge of proprietary solutions. Object-oriented programming will probably play an important role in this direction.
4.3.2 Non-Persistent Data Modes
To follow a link in hypertext means to find a word or a significant language construct relevant to a search or interpretation. But when the source node consists of non-persistent data, to follow the link is at least deceiving. In addition, the critical issue of synchronization, e.g., the time relations among elements in a composite multimedia component, needs to be addressed. Interactive training modules or interactive kiosks of all types bring their own set of requirements to expression. On-screen responses are different from menu driven instructions or tests. Cross-platform compatibility is probably another characteristic that will have to be addressed. While it is true that hypermedia is in many ways the meeting point of multimedia and hypertext, in many more ways it is a new qualitative aspect of real-time computing in the sense that it has complex timing schemes at its core. Such schemes are critical in maintaining the integrity of data and providing a coherent semantic framework. Once multimedia enters the world of networks, of client-server and distributed computing architectures, such demands become critical.
5 Digital Library
Multimedia is celebrated for the spectacular innovation that brings it into the public eye. From on-line shopping (predicted to grow twenty-fold within the next three years) to the Multimedia University, the public is experiencing advanced combinations of text, graphics, video, animation, and sound. But there are many more down to earth multimedia applications. In the course of developing multimedia systems and concepts, I realized that the very critical issues of preservation and access are more than what became known as the lucrative field of document management. Progress has indeed been made in converting the paper archives of the past into the new digital archives, fully indexed and easily accessible. Scanners have been hard at work; so were digitizers of all kinds. The new archives are of many orders of magnitude better adapted to their function (often prescribed by law). But there are some limitations, not the least loss in “touch and feel,” probably irrelevant for business data, but not at all for preserving only manuscripts, books, and artifacts affected by physical and chemical agents. Along this line of cultural interest, I worked on major preservation projects, including some that extended to prestigious libraries (the Vatican Library, the Library of Corvey Castle in northern Germany) whose collections need to be preserved and made available to a larger public. Multimedia proved to be of significance in more than one way. In order to preserve, one has to define alternative media. These are all quite heterogenous. To replace the film medium of microfiches (black-and-white, or color), we had to find substitutes that are of no lesser quality (in maintaining faithfulness to the original), less subject to change over time, and easily accessible. Solutions differ as the contents differ. After a laser disc containing Vatican Library incunabulae (Vat. Lat. 39), after several attempts with CD-ROM (incorporating the Docent), and after taking the expensive path of printing high quality facsimiles, the concept of the Digital Library finally crystallized. In short, this concept provides:
- fast, real-time image acquisition in HDTV format;
- transfer from HDTV format to a variety of media (from 35 mm slides to printing plates);
- integration in full fledged multimedia with video, sound, text database, animation.
The end result of the digital acquisition phase was a digital original that can be rendered compatible with any imaginable display technology . What is most attractive in this multimedia solution is that the object (book, manuscripts) becomes a locus of interaction. Its digital facsimile can be endlessly approached, studied from many angles, distributed over networks, and integrated into the broader context to which it belongs. True, in the process of experimenting with this concept, we ran into the problems of standards: digital vs. analog, European vs. American formats, aspect ratio. Nevertheless, it became clear that from a suitable HDTV format, it was rather trivial to derive videodiscs (at HDTV resolution as well as in NTSC, PAL, SECAM), CD-ROM at full resolution and size sub-sampled, printed output, and separation films with overlay proofs. What remains to be tested is distribution over networks and a video-server for a client-server configuration that will replace the library as we know it with access to collections too frail to withstand further damage in the course of being examined by readers. This dimension of multimedia, i.e., the ability to disseminate values unapproachable not too long ago is in line with the interactive communication potential it has. To make accessible what Vannevar Bush defined as our “bewildering store of knowledge” is one of the most exciting potentials of multimedia .
6 How and What Do We Interconnect?
Multimedia triggered innovation in many direction. It also set expectations for which further technological progress and theories are necessary. From among the many directions this progress will take, one can single out interconnection. It is clear that graphical display technology can be further improved through special purpose architectures. The evolution of many multimedia applications make it unequivocal that the path is from software to the bus to the mother-board. It is like salmon going upstream to spawn. This improvement path is probably more effective than the current strategy of connecting general purpose machines. Nevertheless, a third possibility is even more exciting, and within the spirit of multimedia: connection of various specialized machines. Maybe new graphics primitives will affect the architecture of the real-time imaging station. Maybe parallelism will be better adapted to tasks of animation. But even if one machine will embody the best there is for imaging, we will still not have addressed the wide variety that can result from interconnecting various specialized machines. Casual transfer of high resolution images at real-time rates necessitated the new client workstations that can display images from video-servers. Such servers, still in their infancy, need to have scalable power and vast (really vast!) database storage (no matter how good our compression schemes). They have to be endowed with bandwidth connections for fiber or wireless networking. Such servers need to coordinate distributed processing of clients (capable of special functions) and to synchronize it with its own parallel processors. Extremely high floating point performance will be required in some cases (for instance, in advanced modeling), advanced graphics (3-D rendering, for example) in others. Distribution of tasks means very good communication between the various processing units. The geometry specialized chip, the raster chip, the one dedicated to texture mapping (including solid texture) would have to coordinate their efforts better than the processors of polygons or pixels now in use.
Obviously, swapping video e-mail or addressing video rich databases is not yet in need of this advanced technology. Not even video-on-demand for digital cable TV systems needs more than what we have on LAN servers. Time dependency of multimedia data is a characteristic of the medium that was only superficially addressed in all the attempts to bring it to the networks. As long as our “intelligence” in dealing with data on networks is on the network (switching hubs, frame relay, and various other network gear), we will be limited even in understanding what else multimedia can do. My argument is that very knowledgeable clients and a powerful specialized server will eventually make possible what no networking based technology, as we now use it, can. But until then, we have to continue navigating through the maze of FDDI and ATM and whatever, while learning more and more that the multimedia we experience and generate is probably no more of a beginning than what the UNIVAC once was for the computer revolution.
- W. J. Grosky, “Multimedia Information Systems,” IEEE Multimedia, Spring 1994, p.22
- T. M. Maybury (Editor), Intelligent Multimedia Interfaces, MIT Press, Cambridge MA, 1994
- V. Bush, “As We May Think,” in The Atlantic Monthly, 176,1, pp. 101-108
- J. Nielsen, Hypertext and Hypermedia, Academic Press, San Diego, 1990
- A.C. Luther, Digital Video in the PC Environment, 2nd edition, McGraw Hill, New York, 1989
- E. Barrett (Editor), The Society of Text: Hypertext, Hypermedia, and the Social Construction of Information, MIT Press, Cambridge MA, 1989
Posted in Ubiquitous Computing & Digital Media