MODE and Siren

The MODE [Pope, 1991c,Pope, 1994,Pope, 1991b] was a collection of OO classes for general sound, event, event list and score processing as well as a music oriented user interface tool kit, embedded in the Smalltalk-80 Programming System. The MODE was substituted by Siren [Pope, 2001,Pope, 1998a,Pope, 2003,www-Siren, ] in 1998.

Both frameworks are the result of the author's, Stephen Travis Pope, continuous iterations in order to find a tool for his compositions as well as a platform for practically demonstrating his research on object-oriented programming and software engineering[Pope, 1991d]. As such, they implement a quite particular vision of musical composition that is tightly integrated with an object-oriented model[Pope, 1991b,Pope, 1997]. Both frameworks have been developed taking into account very little inputs from users and as the author observes ``(...) if Siren works well for other composers, it is because of its idiosyncratic approach, rather than its attempted generality''[Pope, 2001].

Before taking a look at the MODE and Siren, let us summarize the different packages and versions the author has worked on.

The MODE was already the result of several iterations of Smalltalk-80 based toolkits for musical score and sound processing and performance. It was itself a reimplementation of the author's earlier package, the HyperScore ToolKit [Pope, 1987], son of DoubleTalk, son of ARA. ARA was a Lisp system. Double-Talk was a Petri net editing system in Smalltalk-80.

Siren is a software framework that includes a set of flexible and reusable components that are designed for extension and customization but also carries with it a ``way of thinking'' about music and composition. It has been developed in the Smalltalk language and it is intended to be used by Smalltalk-80 programmers. Previous systems strove for extreme flexibility at the expense of additional complexity but Siren makes decisions differently.

The system is designed to accept pluggable front ends and back ends. It is efficient for real-time composition and portable as it runs in several OS. Application areas are: sound and score editors, real-time algorithmic composition, and music performance front ends. The purpose of the framework is to provide comprehensive note, score and sound processing tools for the rapid prototyping of music applications. The framework includes an abstract music representation language, an interface for real-time I/O, a user interface framework, and connection to object databases. It is also integrated with a scalable distributed processing framework.

Siren is a software framework for sound and music composition and production made of about 350 Smalltalk classes. It is platform independent and runs on Macintosh, Windows, and Unix-based computers. The Smalltalk code is available for free.

The motivation behind the MODE and now Siren was to build a powerful, flexible, and portable computer-based composer's tool and instrument. Siren is designed to support composition, off-line realization, and live performance. Other desired applications are music databases, music analysis and music scholarship and pedagogy.

On the other hand, the technical goal is to present good OO design principles and elegant state-of-the-art software engineering practice. It needs to be easily extensible, to provide abstract models of high-level musical constructs and flexible management of large datasets.

The main components or packages in Siren are:

It is possible to use inheritance for building specialized versions of existing components.

Smoke [Pope, 1992] is the ``kernel'' of Siren. It is a set of classes organized in meta-categories such as Magnitudes, Events and EventLists, Schedulers, or Interfaces. Smoke is described in terms of two description languages: a compact binary interchange format and a mapping onto concrete data structures.

According to the author Smoke can be summarized as follows: Music can be represented as a series of events. Events are simply property lists or dictionaries that can have named properties with arbitrary values. These properties may be music-specific objects and for that reason models of many common musical magnitudes are provided.

Music Magnitudes are extensible abstract representations for the properties of musical events such as pitch, duration and loudness. Each Music Magnitude can have different representations (e.g. pitch can be represented in integer, float, string or fraction). Their primary behavior is that they can translate freely between their representations. MusicMagnitude objects are characterized by their identity, class, species and values (e.g. the pitch object representing the note C3 is a member of the class SymbolicPitch, of the species Pitch and has a value of c3 (note that class+species allows for multiple inheritance).

The basic abstract model classes are Pitch, Loudness and Duration. They are abstract and have no subclasses, they are used by species for families of classes.

The basic event classes, Event and EventList both of which derive from AbstractEvent, are used for describing musical structures. In Smoke, an event is simply an object that has a duration and possibly arbitrary other properties. The AbstractEvent in Smoke is modeled as a property-list dictionary with a duration. There is no prescribed grain size or level for events.

EventList hold collections of events sorted by start time. Event lists are events in themselves and can therefore be nested into trees in a hierarchical structure. An event can be in more than one list at different relative start times and with different properties mapped into it. Events don't know their start-time, which is always relative to some outer scope.

Events and EventLists are ``performed'' by the action of a scheduler that passes them to an interpretation object or Voice. Voices map event properties onto IO parameters.

NoteEvent classes are like generic Events that represent musical notes with the default parameters pitch, amplitude and voice. Links between events and event lists can have some symbolic description (e.g. isVariationOf, isTonalAnswerTo...)

Sampled sounds can be properties of events. The Sound class allows reading and writing a number of file formats and maintains a list of named cue points in the sound.

Siren has classes for representing ``middle-level'' structures e.g. cluster, chord, ostinato or rubato. Music formats can be characterized in a very compact way. Two abstract classes are defined: EventGenerator and EventModifier. Composers can enrich the generator hierarchy for a specific composition.

EventGenerators can either return an EventList or behave like processes and be told to play and stop. The three abstract EventGenerators are Cluster, Cloud and Ostinato. Cluster classes describe a one-dimensional collection of pitches or rhythms (their events occur simultaneously or are repetitions of the same event). Concrete types of Clusters are chords and arpeggi. Cloud classes are random generators that produce notes from a given range. Most process-oriented generators take the form of Ostinati, which create repeating versions or variations of the input material or parameters.

The EventModifier class models objects that have a function object and a property name so they can apply the function to the given property of an Event or an EventList. EventModifiers can be lazy or eager. Eager EventModifiers apply the function as soon as they are given an EventList while lazy wait until scheduling time.

Using the messages to the previous basic classes one can make scripts (programs) of messages to Events, EventLists and Functions to describe simple musical processes. Regular messages from the Smalltalk-80 environment can be used to inspect objects.

Siren has a special structure that allows the same score to be played independently of the final synthesis method. Properties of events are encoded in an abstract symbolic way that is then expanded into device-specific or output format-specific parameters.

Siren also has a complete graphical environment that can be used to develop graphical applications for music processing.

One of the basic problems for making cross-platform music tools was the lack of good portable APIs for sound and MIDI I/O. This has been helped recently by cross-platform libraries such PortAudio [Bencina and Burk, 2001], PortMIDI [www-PortMIDI, ] or LibSndFile [www-libsndfile, ]. All of these libraries are implemented in C/C++ and it was difficult to integrate them into Smalltalk. But VisualWorks has a powerful for interfacing Smalltalk code to C libraries.

For network and file-oriented IO, it uses Open Sound Control (OSC)[Wright, 1998a]. Siren has also been used as a front-end to CSL (see 2.3.3) through OSC messages.

The author of Siren has also implement the Create Real-Time Application Manager (CRAM) [Pope et al., 2001] for large-scale distributed processing. Siren and CSL are designed to be used in distributed systems controlled by CORBA and with messages sent through OSC.

Also recently new class libraries have been added to support using large speech databases with phoneme segmentation and detailed feature extraction. The analysis core of the Siren speech database is the Segmenter, which uses a combination of time-domain and spectral-domain features to break continuous speech into phonemes.

Although as already commented most of the applications developed with the Siren/MODE framework are musical composition environments, the frameworks is sufficiently flexible so as to be used in different situations. Paleo, for instance, is a suite of sound and music analysis tools integrated with an OO persistence mechanism in Siren. Paleo uses dynamic feature vectors and on-demand indexing. Annotational information derived from analysis can be added to the database at any time. Paleo performs analysis of MIDI files and allows for complex queries.

Most of the advantages and disadvantages of Siren are related to its language of choice, Smalltalk. These are, according to its author the most important advantages of Smalltalk and therefore Siren:

Smalltalk is a simple programming language; the class library is quite compact and extense especially when compared to C++; Smalltalk has an extensive development environment with code browsers or in-place debugger; finally, it is important to point out that the language, libraries and IDE have been quite stable for the last 20 years.

As disadvantages he cites the following: Smalltalk now is not a mainstream language; the VisualWorks/Smalltalk implementation is large (2000 classes + 300 Siren classes), this is a very large system to learn; like in Java, Smalltalk programs are generally compiled to a virtual machine which may be interpreted, translated or cross-compiled at run-time, this provides cross-platform portability of object code but at the cost of some run-time performance; garbage collection also makes development easier but also adds overhead; finally the Siren package itself is complex and implements a very particular design approaches. It does not include MIDI sequencing or common music notation editors due to lack of interest by the authors. They do with Siren what they cannot do with a combination of SuperCollider, Peak, Finale and ProTools.

New applications in different areas are planned for Siren [Pope, 2003] such as controlling graphical animation from Siren. But probably the most surprising news is that, after 20 years of Smalltalk development, the author is thinking on changing to a different language. At the moment of this writing, they are experimenting with Ruby, Self, and Supercollider. Porting Siren to another language means keeping Smoke's class library specification but abandoning its syntax that is too Smalltalk-oriented. Smoke event list are then written in the language as the implementation being used.

2004-10-18