Document toolboxDocument toolbox

Content Repository for the Platform

OBSOLETE

Can an off-the-shelf content repository (CR) provide the functionality needed for the back end of Synapse?

Such a CR would require the following features:

- CRUD on objects

- create universal IDs for objects

- Custom meta-data on objects, with either (1) composite keys (attributes), and/or (2) multiple values per attribute

- versioning of objects

- fine grained authorization/security (would need to know how users and groups are represented)

- meta data for users and groups

- querying (paginated, sorted, and filtering results based on authorization); unstructured as well as structured queries

- support open-source database, e.g. MySQL

- integrate with our authentication mechanism

- support concurrent access by users

Nice to have:

- automatic indexing for search

Evaluation of Jackrabbit/Java Content Respository:

Answers to requirements above:

YES - CRUD on objects

YES, for "referenceable" nodes - create universal IDs for objects

YES, multiple values are allowed - Custom meta-data on objects, with either (1) composite keys (attributes), and/or (2) multiple values per attribute

YES - versioning of objects

MAYBE (Bruce thought he saw it was, but Dave found a comment to the contrary) - fine grained authorization/security (would need to know how users and groups are represented)

Additional comments from Dave:

The JBOSS DNA project didn't actually implement authorization, just authentication (with JAAS):
http://docs.jboss.org/jbossdna/0.2/manuals/reference/html/environment.html#authorization

Jackrabbit's JCR 2.0 impl ... looks like they DID implement a nice looking Node-level hierarchical authorization scheme. Their wiki discusses the features and pros/cons for the different styles of authZ:
http://wiki.apache.org/jackrabbit/AccessControl?highlight=%28authorization%29

but the API clearly has Resource (Node) based authorization actually implemented in it:
http://jackrabbit.apache.org/api/2.2/org/apache/jackrabbit/api/security/JackrabbitAccessControlList.html

Given that and their nice SVN/CVS like versioning, the 2.0 spec impl looks a lot more attractive that the 1.x impl.

YES, by leveraging content nodes to represent this metadata - meta data for users and groups

YES, in version 2 (relatively new) - querying (paginated, sorted, and filtering results based on authorization); unstructured as well as structured queries

YES - support open-source database, e.g. MySQL

YES, via JAAS, for which Crowd has a plug-in - integrate with our authentication mechanism

YES - support concurrent access by users

Nice to have:

YES - automatic indexing for search

Open questions:

- how widely is JCR/jackrabbit adopted?  How widely is the new version 2.0 adopted?

- what competing CR products are there?

- Can you specify a finite list of values (i.e. an "enum") for an attribute, or more generally integrate validation with annotation population?

- are the attributes sufficiently flexible?

- what's a "relative path", a "reference", a "PATH" property type?

- what's an "unstructured node"?

- how does JAAS link to Crowd?

- does JCR support user groups?

Notes:

- There is a Spring template for JCR

Feasibility tasks:

- stress test querying

- exercise SQL-like querying

- integrate authentication and authorization with Crowd via JAAS

Caveats about Jackrabbit/JCR:

JCR v.1 has problems doing joins and with query performance.

JCR has no relation DB client

JCR v.1 didn't have versioning or auth'

Dave B found issues during querying related to 'annotation types'

Other products:

Content Repositories:

CouchDB (might have RDBMS client)

http://code.google.com/p/couchdb-python/ 

(Enterprise) Content Management Systems:

ECMs seem to be more complete systems (with a web UI) for managing documents on-line.  (Think of Google blogspot.)  So this line of exploration may be a dead-end.

OpenCMS an open source content management system written in Java http://demo.opencms.org/en/ "enterprise content management"
Nuxeo is a comprehensive free software/open source Enterprise Content Management (ECM) platform http://en.wikipedia.org/wiki/Enterprise_content_management
dotCMS dotCMS is a free software / open source web content management system (wCMS)
Alfresco an open source enterprise content management system
Magnolia (CMS) is an Open-Source content management system (CMS) developed by Magnolia International Ltd., based in Basel, Switzerland uses jackrabbit under the hood
Hippo CMS ...for multi-channel distribution like web sites and intranets. uses jackrabbit under the hood
Apache Lenya Apache Lenya is a Java/XML open-source content management system based on the Apache Cocoon content management framework. Features include revision control, scheduling, search capabilities, workflow support, and browser-based WYSIWYG editors.