Interview with Stefano Mazzocchi

Autor:
Thomas Bayer
Orientation in Objects GmbH
Thomas Bayer
Thomas Bayer
Datum:August 2002

Summary

On July 2002, Thomas Bayer, CEO at Orientation in Objects interviewed Stefano Mazzocchi, inventor of the Cocoon framework and a member of the Apache Software Foundation for JavaMagazin. In this interview Mazzocchi talks about his involvement in the ASF, the difference compareing cocoon with jsp, XSLT processing performance, the 'flowscript' engine and the next Cocoon version.

Stefano Mazzocchi

Stefano Mazzocchi

Thomas Bayer: How did you get involved with the ASF?

Stefano Mazzocchi: It was November '97. Together with two friends, Pierpaolo Fumagalli and Federico Barbieri, we got a contract to write a networked solution for a Microsoft Access based data management system for a computer shop. Pier was really into networks and suggested to use a linux/mysql/apache solution for the implementation. But there was another big alternative with Sun's Java Web Server which had this great new feature called 'servlets' which appeared to us much more architecturally advanced than any other technology we knew.

But JWS was proprietary, so we looked around and found this 'java.apache.org' sister project of Apache where we downloaded 'JServ 0.9.7' and we were able to run servlets inside Apache. Attach a JDBC layer on top of MySQL and we found our architecture.

The problems came afterwords: JServ was a very young implementation and while it had very clever ideas (such as decoupling the JVM from the web server processes using a simple binary protocol thru a socket), it was not really polished. My first patch was to change the use of 'Stack' to 'Vector' since the servlet logs were traced backwards, go figure :)

Anyway, I got more and more involved into JServ. So much so that at some point, I volunteered to be the release coordinator and I was accepted by the development community.

For my work on the various java.apache.org hosted projects, I was proposed for membership of the Apache Software Foundation in 1999 and voted in.

TB: What gave you the initial idea to develop a XML based Web Framework?

SM: After JServ started to attract attention and Java on the server side was really catching on, we started to feel that java.apache.org could grow into a project with a wider scope and started hosting other software projects related to server-side Java.

At that time, I volunteered to help Jon Stevens (of Apache Turbine, Apache Velocity and Tigris Scarab fame) to manage the HTML documentation of all the hosted projects that was starting to become too big and messy.

After trying many different solutions, I started to believe that the problem was not in how to treat HTML, but in HTML itself which was not really suited for the job. At ApacheCon '98, I had the pleasure to meet Eric Prudommeaux which introduced me to the wonders of namespaced XML. At that time, we were talking about making an RDF description for the Apache HTTPD configuration files, but that effort died away.

The idea of using some more semantic markup for our documentation was intriguing, but the picture was not really complete and I couldn't really find a valid answer.

The answer came while reading the first working draft of the XSL specification: with a transformation language, it was possible to write docs in a simple and specific markup and then transform it to different things.

But there was something else: I've always been bugged by the fact that the Servlet API didn't allow to create servlet pipelines. When working on JServ, many people asked us to implement a feature of some other servlet engines called 'servlet chaining' that made it possible to create 'chains' of servlets. The concept was a 'hack' and we always rejected the notion.

With the need for an XSLT post-processing stage, the lack of servlet filtering (that was later introduced in the Servlet API 2.2) and the necessity to avoid serializing/parsing of the XML stream in between stages, it was obvious, to my eyes, the necessity of an XML-based processing framework.

TB: What is your current role in the Apache Software Foundation and your role in the development of Cocoon respectively?

SM: I'm currently a member of the foundation and I'm the official representative for the JSR 170 (Java Repository API) for the JCP, that's the only official role I have at the moment.

For Cocoon, while I was involved in most of the coding for the first generation, for the second generation I did mostly only architectural design and community engineering. While it is not obvious at first, the two roles are very closely related in an open development community like Cocoon's. This is because design is done in the open, before the implementation and this is very dangerous approach because sometimes it generates lots of debates and friction. In fact, my leading role is not to impose my view, but to throw 'random thoughts' to sparkle discussion and to create consensus from what emerges.

This is the real power of Cocoon: it's healthy development community and a continuos research into new directions, but done with the involvement of everyone which cares and invests in this technology. My role is to make sure things continue with this spirit.

TB: How much time did you invest into Cocoon? Did your studies suffer from your part-time work?

SM: Oh, it's really hard to tell but for sure a lot of work, a lot of time and a lot of energy, but I don't regret it at all because I've learned a lot, both from a technological point of view and, mostly, from a human point of view (which I consider the most important thing)

Also, yes, my college studies suffered from my involvement. In fact, at some point, I decided to quit for a while and dedicate my time to finish my studies (that I did July 2001).

TB: Do you have any benefits from your commitment to the development of Cocoon? Why do you think someone should program for an open source project?

SM: Many benefits. First of all knowledge: a few years of hard core open source programming with a leading international community gives you more than any school could give you. Then the ability to work with others, remotely and at very high quality levels. Finally, economic reward: without the visibility that Cocoon and Apache in general gave me, I would not have been able to be contacted by many companies that want my consultancies.

TB: What is the difference comparing Cocoon to JSP, ASP or Cold Fusion?

SM: While JSP, ASP and ColdFusion are web technologies based on the concept of server pages, Cocoon is a framework, thus something that architecturally is located underneath. A proof of this fact is that Cocoon can be used to post-process JSP pages or the fact that it includes its own XML-oriented server pages technology (XSP).

From a functional point of view, being implemented as a servlet, everything that can be done with a servlet or an JSP page can be done with Cocoon, but unlike JSP and servlets, Cocoon gives you a bunch of XML-related functionality, modules, libraries and tools that allow you to concentrate on your logic rather than on all the XML stuff.

TB: At the ApacheCon 2000 you said that Cocoon is "A small revolution". Why is Cocoon a revolution from your point of view?

SM: Cocoon is the first web technology to be heavily designed over processing pipelines right from scratch. The UNIX operating system showed the world how powerful the concept of pipelines is: with a few general components and a powerful way to connect them together, most of the work can be done without having to write complex code, but mostly reuse what already exists, more or less like I enjoy doing with Lego.

TB: Some time ago you suggested using Xerces C++ instead of Xerces Java for dealing with XSLT Transformation. Can we look forward to performance improvements and is there a danger of loosing platform independence?

SM: Profiling shows that most of the time Cocoon uses to process a resource is taken by the XSLT processor if that is present in the pipeline (which is normally the case). I suggested that the use of native processors might in fact improve performance, but we later found out that this is not the case since a clever implementation of the XSLT processor that was donated to the Apache Xalan project showed that a java implementation that compiles stylesheets in bytecode can be even faster than a native implementation of the XSLT processor. So we decided to drop the native path (at least for now and for this).

TB: Many people are inspired by the Cocoon architecture, but are also worried about its performance. Do you consider Cocoon performant and what measures are taken to make Cocoon fast?

Cocoon has been tested and used for very loaded systems and the testers were impressed by its performance, compared to the amount of processing that it does to produce resources. Moreover, tests show that most of the time is spent on XSLT processing and the compiled version of Xalan (XSLTC) has been measured to give 400% to 600% performance improvement over the interpreted version that is used by Cocoon today.

Along with more proxy friendly features and the improved caching engine that will appear on the 2.1 version, I think Cocoon will not have any problems in be adopted even in critical environments.

TB: Can you give us an example for an application for which Cocoon would be the right tool and a second one for which it wouldn't be suitable for?

SM: Cocoon was born as a web publishing framework and it really shines for multi-channel operations. For example, one of the major european mobile telephone companies is using Cocoon for their next generation mobile portal where the necessity to transform content to various devices is the most important necessity.

But multi-channel operation is only an example of those uses where content must be transformed, adapted, aggregated, merged or otherwise incorporated. This includes portals, static web sites, documentation and all solutions where the different skills used to create the web site are brought by different people.

Applications where Cocoon doesn't really stands out (yet!) are data-centric web applications where some design decisions taken for stateful publishing somewhat impact its ability to be as powerful for those applications.

But since it's getting harder and harder to separate a data-centric web application from a content-centric publishing web site, we are working really hard to rebalance the Cocoon design in the next major release so that all web needs are equally powerful.

TB: When starting to develop web applications one might ask himself whether to use Struts or Cocoon. Are "Model View Controller" applications possible with Cocoon?

SM: MVC is one of the possible examples of 'separation of concerns' (SoC). Since Cocoon is designed around SoC, it is entirely possible to design a web application in Cocoon using this pattern, even if, admittedly, webapp frameworks that have their focus on this are easier to use and much faster to learn than an XML-based framework.

But we are working on a new feature that might very well change this picture of this. Our idea is based on the fact that we want to design a web technology where the logic of the web application is not fragmented in several different places but nicely located in one place.

Ask yourself the difference between a web application and a command line application that share the same business logic. Which one is easier to write? The answer is pretty obvious and this is our goal: make it possible to write a web application with the same simplicity of writing a 'regular' application, having the framework taking care of everything else.

We call this 'logic' that glues the various resources 'flow'. The flow, even in advanced MVC frameworks like Struts or Turbine, is never located in one place. You can't simply look at a description of the flow and understand what the web application does. Writing web applications today is like writing a states/transition map for a 'finite state machine'. Scientists and researchers spent decades to avoid forcing humans to think like a machine and web development is now forcing this back. We hope to provide at least a first step in that direction.

TB: In which stage is the implementation of "flow" functionality in Cocoon? Would you already recommend to use it in real projects?

SM: The 'flowscript' engine is implemented and functional, but at the time of writing it must be considered at alpha stage because we cannot guarantee that the APIs and the various contracts with the users will be maintained untouched and back-compatible before the release. I would suggest to start using it, but not in a production environment, at least not until we release a final version of Cocoon 2.1

Cocoon 2 was accompanied by a fundamental architectural revision. Are you satisfied with the current architecture? Are there major things which you would like to change?

Even after two years, I'm still very satisfied by the Cocoon 2.x architecture. I personally think it doesn't require any major architectural difference and we are not planning any. We might rebalance a few things here and there, but nothing so radical and back incompatible like it happened in the transition between Cocoon 1.x and Cocoon 2.x.

TB: Do you think it's possible that the ideas and concepts of Cocoon have an influence on future JSP and Servlet specifications?

SM: I've been part of the Servlet API Expert Group in the years 1998-2001 and some of the ideas that I've used in Cocoon were already proposed and rejected by that expert group. For example, I proposed a SAX-based connector to servlet filters that allowed servlets to avoid the parsing/serialization overhead in servlet pipelines but was rejected. I also suggested the use of namespaces for JSP taglibs and that is happening.

So, yes, I think that Cocoon has already influenced a lot the JSP and Servlet specifications, at least in showing its weak points and showing possible alternative solutions. This said, we work very closely with the various spec leads and Apache representatives at the JCP and we are very happy when they ask us for feedback on proposed changes or alternative solutions.

TB: To what features can we look forward to in the next version (2.1) of Cocoon?

SM: There is no official roadmap for the next version, so things might change, but the major feature will be the flow engine: it will be possible to describe not only the URI space with the sitemap, but also the flow logic between the various resources. Some are starting to name this this feature MVC+, where the '+' is the use of 'continuations', the ability to save the state of the entire application without using sessions and without freezing threads (to implement this, we use a modified version of the Mozilla Javascript engine: Rhino)

Another big difference is the fact that the default sitemap engine will be the interpreted version (unlike the compiled version that is used in the 2.0.x family). This means that sitemap reload times are reduced from seconds to milliseconds, which gives a big advantage during development.

We have also implemented the concept of 'writable sources', which are URI-referenced resources where we can now write on. This make it possible for Cocoon to read and publish a resource, but also to write and edit a resource, closing the roundtrip loop of content editing.

A lot of work has been done on various new components, mostly for portals, authentication, form handling, content pagination, database updating and many more.

But this is just a very tentative list and since the development community is the one in control, things might change quickly and some features might be postponed in later releases, to allow a faster release cycle (which is what we always would like to have)

TB: Thank you for the interview.

Resources

Zum Geschaeftsbreich Competence Center