My blog has moved!

You should be automatically redirected in 4 seconds. If not, visit
http://chrisjhorn.wordpress.com
and update your bookmarks.

Wednesday 27 August 2008

Software Playlists

I wrote the text below as part of a whitepaper on Cloudsmith – you may find it interesting to read through.

In summary, Cloudsmith lets you browse and find useful bundles of software components which work together – software playlists – and then download ones of interest. Each one can contain components from different software repositories, and Cloudsmith knows where to go, and how to get to them.

I did a few short (approx. 4 mins each) youtube videos to help you. The first one (sorry for the mug shot..) shows you how to browse and download a playlist in Cloudsmith; the second, how you can publish your own playlist, or distro, of interesting software and share it with everyone else, if you want to; and the third, how you can tell Cloudsmith about new software components available which it doesn’t already know about.

Have fun.

Meanwhile, here’s some motivation for why we put Cloudsmith together…..


Douglas McIlroy for the NATO conference on software engineering in Germany in 1968:

"Software components (routines), to be widely applicable to different machines and users, should be available in families arranged according to precision, robustness, generality and time-space performance.

Existing sources of components - manufacturers, software houses, users' groups and algorithm collectors - lack the breadth of interest or coherence of purpose to assemble more than one or two such families, yet software production in the large would be so enormously helped by the availability of spectra of high quality routines, quite as mechanical design is abetted by the existence of families of structural shapes, screws and resistors.

We undoubtedly get the short end of the stick in confrontations with hardware people because they are industrialists, and we are the crofters. Software production today appears in the scale of industrialization somewhere below the more backward construction industries."

Some forty years later, some cynics may argue that we software people are still crofters compared to our peers in other disciplines of engineering. However, there is considerable progress, at the technology level, for component based engineering applied to the software industry: modern programming environments such as Java, C#, Ruby, Perl, etc; modern development environments such as Eclipse and XCode; and emerging runtime environments such as OSGi. Likewise there are repositories of software components online at Eclipse, Apache, Sourceforge, and Tigris, as well as many others. Discovering what software components are available is a modest challenge: you can use raw Google, or source code searchers such as Google CodeSearch, Koders, Krugle, Codase, or some such. Competency and skills across a development organisation can be accelerated by understanding how different components have been used together by architects and experienced developers.

However, the momentum behind component based software paradoxically is not without problems. Many developers make contributions, and it can be difficult to clearly see the gems against the morass of activity. Software appears inherently unstable. Problems are found, bugs are fixed, extensions are made, and patches released. When the industry as a whole is increasingly adopting componentization, when components are supplied by third parties, and are upstream in the component “supply chain”, there is a serious risk of accelerating instability.

It can take hours, or even longer, to deduce dependencies between components (whether open or closed source); to copy folders; to pack; to download; to then unpack; to establish deployment targets, and all the sundry other activities needed to actually successfully build a system and get it to work. It is not uncommon for the skills involved to be concentrated on a very small number of key “build” or “configuration” developers, whose loss from an organization would could serious concern and vulnerability.

Following other engineering disciplines to componentization should be “a good thing” as McIlroy argues: but can we do more to enhance confidence and accountability ?

The momentum behind software components is resulting in increasingly less pre-packaging by specific suppliers and aggregators, and increasingly more tailoring by both suppliers and consumers. There is some analogy with developments in the music industry. In 1968, when McIlroy made his comments at the NATO conference, and even as recently as the start of this decade, music distribution companies and their contracted artists sold pre-packaged songs as albums: an album was distributed as a complete image printed onto some distribution media - an LP record, tape, CD or DVD - and the selection of songs it contained was pre-determined by the supplier. Software vendors have been doing something similar.

Today, while such music albums are still available, it is more common for the public to download individual songs and tunes from online repositories and stores such as iTunes. Further, anyone can package together a selection of songs and tunes which they believe form an interesting juxtaposition, as a playlist. A favorite playlist may be shared with friends and others, enabling them to download and also listen to the same selection.

A playlist can be part of another, forming a larger collection. iTunes v3 introduced automatic updating of playlists, based on ratings, popular plays, keyword tags and play counts. The playlist concept has been extended to videos and photograph selections.

Software playlists are a new concept, introduced by Cloudsmith. A software playlist identifies a set of software components which can be usefully used together. Given a playlist, its constituent components can be automatically downloaded from their respective public or, as appropriate, proprietary software repositories and materialised onto a target machine. The publisher of a playlist - a company, or an individual - asserts that the specific components are mutually compatible, and usage and other metrics can confirm this. An example of a playlist, from Stefan Daume, might be a foundation for using the Seam rich client Java toolset and so list the combination of specific compatible versions of the Eclipse Classic IDE, JBoss Tools, Seam Core, JBoss AS, and PostgreSQL required to obtain the Seam environment.

Software playlists are an excellent way to encourage standard configurations and environments. Corporate “favorite” playlists, containing only approved executable binaries and libraries, can be enforced where appropriate. To the extent that playlists are shared in public (on the internet), different alternative configurations can quickly be appraised and compared for popularity and ubiquity. Software adoption trends across the industry can be monitored, and the stability of new releases can be tracked.

Changes to specific software playlists can also be monitored. Notifications can be received, for example, by using an RSS feed whenever one of the underlying components is updated. Equally, notifications can be received whenever a software playlist is in turn incorporated and nested inside another playlist: this can be one measure of adoption and ubiquity, akin to citation scores for top scientific papers and to pagerank algorithms.

In a software development environment, public software playlists can provide valuable information on configurations found useful by other organizations and developers. A well defined (evidently stable and popular) playlist can save wasted time and effort, otherwise need to find workable configurations of different versions of various components.

Inside the corporate firewall, private (ie to the corporation) playlists can be built from a mixture of proprietary software components from internal repositories, and from public components if appropriate. Playlists overcome the vulnerability of configuration skills being limited to a very small group of core developers and builders.

Software playlists can be a mechanism for describing a specific tailored configuration from a customer back to a software supplier, under a suitable support contract. They can also be a way for a vendor to release updates and patches in a limited distribution to appropriate customers.

The components – the “tunes” – within a software playlist need not be limited to executable modules and libraries. A software component can also be source code, or a test script, or documentation, or a presentation – in fact any soft copy of any information. A software playlist can, for example, describe the executable software, tutorials, exercises and class notes necessary for a particular training course. A prospective student – or the individual responsible for preparing a class room for a course – can then pre-load the material necessary for the course, from the playlist, and thus avoid time wastage for configuration activities during the course itself.

We have built and offer Cloudsmith to the global software community as a service to help find, assemble, load and track software artifacts, described by software playlists, as well as to help to encourage the construction of new ones.

Cloudsmith is a repository for software playlists: it contains information about components, but not the components themselves. Cloudsmith is thus not a software component repository, but augments them. Software playlists can be easily constructed from others, and from those software components known to Cloudsmith. Publishing; sharing or protecting; finding and searching for; downloading and “materializing” components for; and monitoring the popularity and quality of software playlists are all simple, easily-learnt, point-and-click, activities. Playlists are named, grouped into folders, and can be given keyword tags. Playlists can be shared amongst a specified set of users, or made generally public. All the components necessary for a specific playlist can be materialized and installed in a specific machine, as a single mouse click – and such “cloudlinks” can be shared via, for example, email or blogs. Versioning compatibility and specific machine environment differences can be automatically managed.

Public software component repositories, such as Eclipse, Apache, Sourceforge, Maven and Tigris, are largely already mapped by Cloudsmith, and so the components therein can be easily added into playlists. Cloudsmith understands all common industry versioning and meta-information formats for software repositories and build systems. Extending the map with further repositories, including private ones (for example inside a firewall), is straight forward. Components from private repositories can be restricted to private playlists for limited groups of users.

In a similar way, the Cloudsmith web site is a publicly accessible resource. When using public assets together with private ones, it is a common requirement to place key proprietary assets within your corporate firewall. We thus also provide private Cloud servers, which operate on a corporate intranet and co-operate with the main public Cloudsmith web site. A private Cloudserver can thus complement a private software repository.

Software components openly shared and accessible across the internet are changed (by third parties) in ways that sometimes may appear unpredictable, and perhaps even unwarranted from the perspective of your own use of them. By contrast, private software assets inside the corporate firewall can be managed and have their life cycles carefully controlled. A private Cloudserver can provide a useful interface to couple private and public assets, ensuring that specific versions and updates to public assets are only adopted inside the corporation within the firewall in a managed way.

A private Cloudserver can also serve as the "site of record" for the adoption and consumption of public software assets across an organisation. By using working via corporate software playlists on a private Cloudserver, rather than allowing the direct consumption of publicly available software components, the degree to which public assets are used can be controlled and monitored. This in turn can not just provide stability for software engineering activities, but also greatly assist verification and audit processes necessary for good governance and IP management.

Douglas McIlroy stated, back in 1968:

"Existing sources of components - manufacturers, software houses, users' groups and algorithm collectors - lack the breadth of interest or coherence of purpose to assemble more than one or two such families, yet software production in the large would be so enormously helped by the availability of spectra of high quality routines, quite as mechanical design is abetted by the existence of families of structural shapes, screws and resistors."

Today, there is a substantial spectra of software components on a global scale, and many of high quality.

The challenge now is to understand which software components work well with which others, and then how to understand and manage those configurations.