Extended Summary

And to finish this introduction and in order to help the reader grasp a previous idea of this Thesis we will now present an extended summary of its different chapters^0.1.

Chapter 1. Foundational Issues

In chapter 1 we will see the various building blocks upon which the thesis is built. We will now briefly summarize the most important ideas introduced in the different sections.

In section 1.1.1 we introduce the most important concepts related to the object-oriented paradigm. An object is a real-world or abstract entity made up of an identity, a state, and a behavior. A class is an abstraction of a set of objects that have the same behavior and represent the same kind of instances. The object-oriented paradigm can be deployed in the different phases of a software life-cycle and the UML language supports most of the activities contained in them. Apart from the concepts of object and class and the different kinds of relationships that can be established between objects and classes, other concepts such as encapsulation, inheritance hierarchies or polymorphism are important for fully understanding the object-oriented paradigm. The object-oriented paradigm presents many different advantages that can be summarized in: it maps more directly to real world concepts, it enhances encapsulation, it improves information hiding, it promotes good structuring and it favors re-use. Finally, it is important to note that, although it is often considered otherwise, object-orientation does not imply less efficient code and furthermore its techniques make it even easier to end up having more efficient and robust final results.

In the next section, 1.2, we define the main concepts related to models and systems. The most commonly accepted definition of a system is that by Hall and Fagen in which they define a system as ``a set of objects together with relationships between the objects and between their attributes.'' On the other hand, a model is an abstract representation of a system with a well-defined purpose, different models may exist for a single system. It is also important to note that the birth of object-orientation is very much related to the study of system simulations by Kristan Nygaard. Finally we define a metamodel as model of models, that is an abstract model that can be used to model a collection of related models.

In section 1.3 we address the issue of software framework development. Although many different definitions can be given for a software framework it is probably that in [Johnson and Foote, 1988] the one that is most commonly accepted. According to this definition,``a framework is a set of classes that embodies an abstract design for solutions to a family of problems''. Frameworks offer a way to reuse analysis, design and code. Frameworks can be classified, among other ways, into white-box and black-box. In white-box frameworks users extend previously existing classes, particularizing for their specific needs. On the other hand, black-box frameworks offer ready-to-use components that can be used as building blocks for an application. Although different approaches may be used for developing a software framework, it is usually recommended to use an application-driven methodology, using a limited amount of already existing applications as the driving force and favoring user-feedback as much as possible. Finally, a well-designed software framework can become a sort of metamodel in itself as it will offer a model of models for a given domain.

Metadata is defined as ``data about data''. In section 1.4 we introduce the most important concepts and tools related to metadata and our domain of object-orientation and multimedia signal analysis. XML is a general-purpose tagged language that is rapidly becoming the standard for metadata annotation of any sort. Using this same language, MPEG-7 is an ISO proposed standard for multimedia annotation. On the other hand the Object Management Group of the ACM has also proposed the MOF (Meta Object Facility) standard as a metadata management framework for object-oriented systems and technologies.

Graphical Models of Computation are abstract representations of a family of related computer-based systems that use a graph-based representation as the primary way of communicating information about the system. There are many different graphical MoC's, each of them particularly well-suited for some purpose. The most important are outlined in section 1.5. In the context of signal processing applications, Kahn Process Networks and related models such as Dataflow Networks are of particular importance. Although some authors defend that these models should be seen as instances of the Process-Oriented paradigm we defend the thesis that process-orientation or actor-orientation is not more than a particular instance of the object-oriented paradigm.

Chapter 2. Environments for Audio and Music Processing

In this chapter we present a thorough overview of audio and music processing environments. Although all of them have different scopes and motivations, we present a classification in different categories. These categories are summarized in the following list:

General purpose signal processing and multimedia frameworks: software frameworks for manipulating signals or multimedia components in a generic way. The most important examples in this category are Ptolemy and ET++.
Audio processing frameworks: software frameworks that offer tools and practices that are particularized to the audio domain.
1. Analysis Oriented: Audio processing frameworks that focus on the extraction of data and descriptors from an input signal. Marsyas is the most important framework analyzed in this sub-category.
2. Synthesis Oriented: Audio processing frameworks that focus on generating output audio from input control signals or scores. Here it is important to mention STK.
3. General Purpose: General purpose Audio processing frameworks offer tools both for analysis and synthesis. Out of the ones presented in this sub-category both SndObj and CSL are in a similar position, having in any case some advantages and disadvantages but no being very mature.
Music processing frameworks: These are software frameworks that instead of focusing on signal-level processing applications they focus more on the manipulation of symbolic data related to music. Siren is probably the most prominent example in this category.
Audio and Music visual languages and applications: Some environments base most of their tools around a graphical metaphor that they offer as an interface with the end user. In this section we include important examples such as the Max family or Kyma.
Music languages: In this category we present different languages that can be used to express musical information. We have excluded those having a graphical metaphor, which are already listed in the previous category.
1. N-Music languages: Music-N languages base their proposal on the separation of musical information into static information about instruments and dynamic information about the score, understanding this score as a sequence of time-ordered note events. Music-N languages are also based on the concept of unit generator. The most important language included in this section, because of its acceptance, is CSound.
2. Score languages: These languages are simply ways of expressing information in a musical score, usually based on a textual or readable format.

The basis that we will set in our analysis of the state of the art in our particular domain will be used for both constructing our proposals and also comparing the final results.

Chapter 3. The CLAM Framework

In this chapter we present the CLAM framework. This software framework is a comprehensive environment for developing audio and music applications. It may be also used as a research platform for the same domain. CLAM can be seen both as the origin and the prove of concept of the conceptual models and metamodels that are included in this thesis.

CLAM is written in C++, it is efficient, object-oriented, and cross-platform. It presents a clean and clear design result of applying thorough software engineering techniques. The framework can be used as a black-box, relying on the offered repository, or as a white-box framework, extending its functionality through its infrastructure.

CLAM's repository is made up of a large collection of signal processing algorithms encapsulated as Processing classes and a number of data structures included in its Processing Data repository. The Processing repository basically includes algorithms for signal analysis, synthesis and transformation. Furthermore it also includes encapsulated platform and system-level tools such as audio and MIDI input/audio both in streaming and file mode. On the other hand the Processing Data repository offers those data types that are needed as inputs or outputs of the processing algorithms. These include classes such as Audio, Spectrum or Fundamental Frequency. It also includes a collection of statistical Descriptors that can be obtained from the basic Processing Data objects.

On the other hand CLAM's infrastructure offers ways of extending the already existing repository by deriving new Processing or Processing data classes. In the case of Processing classes this is accomplished by a simple inheritance mechanism in which the user is forced to implement some particular behavior in his/her concrete Processing class. Mechanisms for composing with Processing objects, handling input and output data through Ports and control data through Controls are also offered. The Processing Data Infrastructure is based on CLAM's Dynamic Types. This is a special C++ class that, using macros and template metaprogramming techniques, offers a very simple way of creating data containers with a homogeneous interface and automatic services such as introspection or passivation facilities. CLAM's infrastructure is completed by a set of tools for platform abstraction, such as audio and MIDI or multithreading handling mechanisms, a cross-platform toolkit-independent visualization module, XML serialization facilities or application skeletons.

CLAM also offers a number of usage examples and ready-to-use applications. These applications include SMSTools, a graphical environment for audio analysis/synthesis/transformation, and Salto, a spectral-sample based sax and trumpet synthesizer. Another important application is the Network Editor, a graphical tool for creating CLAM Networks using a graphical boxes-and-connections metaphor ala Max. This application can be used as a rapid prototyping and research tool. But CLAM has also been used in many other internal projects for instance for developing a voice processing VST plugin, a high-quality time-stretching algorithm or content-based analysis applications.

Chapter 4. The Digital Signal Processing Object-Oriented Metamodel

In this chapter we present the Digital Signal Processing Object-Oriented Metamodel (or DSPOOM for short). This metamodel may be considered the main contribution of this thesis and is basically a result of abstracting the conceptual conclusions found in developing the CLAM framework.

DSPOOM combines the advantages of the object-oriented paradigm with system engineering techniques and particularly with graphical Models of Computation in order to offer a generic metamodel that can be instantiated to model any kind of signal processing related system.

To do so the metamodel presents a classification of signal processing objects into two basic categories: objects that process or Processing objects and objects that hold data or Processing Data objects. Processing objects represent the object-oriented encapsulation of a process or algorithm. They include support for synchronous data processing and asynchronous event-driven control processing as well as a configuration mechanism and a explicit life cycle. Data input and output to Processing objects is done through Ports and control data is handled through the Control mechanism. On the other hand Processing Data objects must offer a homogeneous getter/setter interface and support for meta object facilities such as reflection and automatic serialization services.

The metamodel also presents mechanisms for composing statically and dynamically with basic DSPOOM objects. Static compositions are called Processing Composites and dynamic compositions are called Networks.

Finally the DSPOOM metamodel can also be considered as an object-oriented implementation of a graphical Model of Computation, particularly the Context-aware Dataflow Networks.

Chapter 5. The Object-Oriented Content Transmission Metamodel

Here we present an object-oriented metamodel for content processing and transmission called Object-Oriented Content Transmission Metamodel or OOCTM for short. This metamodel may be seen both as an extension and a particularization of the Digital Signal Processing Object-Oriented Metamodel presented in the previous chapter and presents a way of modeling signal processing applications that deal with all aspects of content-based processing such as content analysis or content-based transformations.

The metamodel is based on two conceptual foundations: on one hand we call content to any semantic information that is meaningful for the target user; on the other hand, and applying one of the object-oriented paradigm maximas, we state that all semantic information contained in a given signal can be modelled as a collection of related objects.

Following the traditional Shannon&Weaver model for information transmission our metamodel is divided into three main components: a semantic transmitter, a channel, and a semantic receiver. The semantic transmitter is in charge of performing a multilevel analysis on the signal, identifying objects and finally building a multilevel object-based content description and encoding it in an appropriate format such as XML. The channel transports this metadata description and any added noise will not be considered as such unless the original meaning is modified. Finally the semantic receiver receives the multilevel content description, decodes it and translates it into a synthesizer-readable format. The synthesizer included in the receiver then synthesizes the output signal.

It is important to note that we are not so much interested in the fidelity of the final synthesized signal to the original but rather on whether the original ``meaning'' is preserved and is useful for the final user.

The Object-Oriented Content Transmission Metamodel can be seen as an extension of the classical Shannon&Weaver model for information transmission. It is very much related to the Structured Audio metamodel and can also be seen as a step beyond parametric encoding. Finally if we add a transformation function to the channel we end-up having a general scheme for content-based transformations.

In the chapter we also give several examples of applications that represent particular instances of the metamodel or subparts. But we also present one sample application that instantiates the whole metamodel in order to transmit and synthesize a previously analyzed and extracted musical melody.

Chapter 6. An Object-Oriented Music Model

Finally in this chapter we present an object-oriented music model that can be interpreted as an instance of the basic Digital Signal Processing Object-Oriented Metamodel dealing in this case with higher-level symbolic musical data.

Following again the object-oriented paradigm we model a music system as a set of interrelated objects. These objects will in general belong to one of the following abstract classes: Instrument, Generator, Note or Score.

An Instrument is a generating Processing object that receives input controls and generates an output sound. An Instrument is as a matter of fact a logical grouping of autonomous units named Generators. A Generator is the atomic sound producing unit in an instrument and can be independently controlled from the other generators (although it often receives their influence). Examples are the six strings in a guitar or each of the keys in a piano.

A Note is the actual sounding object attached to each generator. A Note can be turned on and off and its properties depend on the internal state of its associated Generator and Instrument.

Finally the internal state of the whole object-oriented music system changes in response to events that are sent to particular Instruments or Generators. A time-ordered collection of such events is known as a Score.

The abstract model described is implemented in the MetriX language or in its XML-based version MetriX-ML. MetriX-ML is a Music-N language that therefore offers a way of defining both Instruments and Scores. It is implemented in CLAM and, apart from the concepts previously presented, includes support for defining timbre spaces, break-point-functions and relations between control parameters in an Instrument.

2004-10-18