Content Repository for the Platform
OBSOLETE
Can an off-the-shelf content repository (CR) provide the functionality needed for the back end of Synapse?
Such a CR would require the following features:
- CRUD on objects
- create universal IDs for objects
- Custom meta-data on objects, with either (1) composite keys (attributes), and/or (2) multiple values per attribute
- versioning of objects
- fine grained authorization/security (would need to know how users and groups are represented)
- meta data for users and groups
- querying (paginated, sorted, and filtering results based on authorization); unstructured as well as structured queries
- support open-source database, e.g. MySQL
- integrate with our authentication mechanism
- support concurrent access by users
Nice to have:
- automatic indexing for search
Evaluation of Jackrabbit/Java Content Respository:
Answers to requirements above:
YES - CRUD on objects
YES, for "referenceable" nodes - create universal IDs for objects
YES, multiple values are allowed - Custom meta-data on objects, with either (1) composite keys (attributes), and/or (2) multiple values per attribute
YES - versioning of objects
MAYBE (Bruce thought he saw it was, but Dave found a comment to the contrary) - fine grained authorization/security (would need to know how users and groups are represented)
Additional comments from Dave:
The JBOSS DNA project didn't actually implement authorization, just authentication (with JAAS):
http://docs.jboss.org/jbossdna/0.2/manuals/reference/html/environment.html#authorization
Jackrabbit's JCR 2.0 impl ... looks like they DID implement a nice looking Node-level hierarchical authorization scheme. Their wiki discusses the features and pros/cons for the different styles of authZ:
http://wiki.apache.org/jackrabbit/AccessControl?highlight=%28authorization%29
but the API clearly has Resource (Node) based authorization actually implemented in it:
http://jackrabbit.apache.org/api/2.2/org/apache/jackrabbit/api/security/JackrabbitAccessControlList.html
Given that and their nice SVN/CVS like versioning, the 2.0 spec impl looks a lot more attractive that the 1.x impl.
YES, by leveraging content nodes to represent this metadata - meta data for users and groups
YES, in version 2 (relatively new) - querying (paginated, sorted, and filtering results based on authorization); unstructured as well as structured queries
YES - support open-source database, e.g. MySQL
YES, via JAAS, for which Crowd has a plug-in - integrate with our authentication mechanism
YES - support concurrent access by users
Nice to have:
YES - automatic indexing for search
Open questions:
- how widely is JCR/jackrabbit adopted? How widely is the new version 2.0 adopted?
- what competing CR products are there?
- Can you specify a finite list of values (i.e. an "enum") for an attribute, or more generally integrate validation with annotation population?
- are the attributes sufficiently flexible?
- what's a "relative path", a "reference", a "PATH" property type?
- what's an "unstructured node"?
- how does JAAS link to Crowd?
- does JCR support user groups?
Notes:
- There is a Spring template for JCR
Feasibility tasks:
- stress test querying
- exercise SQL-like querying
- integrate authentication and authorization with Crowd via JAAS
Caveats about Jackrabbit/JCR:
JCR v.1 has problems doing joins and with query performance.
JCR has no relation DB client
JCR v.1 didn't have versioning or auth'
Dave B found issues during querying related to 'annotation types'
Other products:
Content Repositories:
CouchDB (might have RDBMS client)
http://code.google.com/p/couchdb-python/
(Enterprise) Content Management Systems:
ECMs seem to be more complete systems (with a web UI) for managing documents on-line. (Think of Google blogspot.) So this line of exploration may be a dead-end.
OpenCMS an open source content management system written in Java http://demo.opencms.org/en/ "enterprise content management"
Nuxeo is a comprehensive free software/open source Enterprise Content Management (ECM) platform http://en.wikipedia.org/wiki/Enterprise_content_management
dotCMS dotCMS is a free software / open source web content management system (wCMS)
Alfresco an open source enterprise content management system
Magnolia (CMS) is an Open-Source content management system (CMS) developed by Magnolia International Ltd., based in Basel, Switzerland uses jackrabbit under the hood
Hippo CMS ...for multi-channel distribution like web sites and intranets. uses jackrabbit under the hood
Apache Lenya Apache Lenya is a Java/XML open-source content management system based on the Apache Cocoon content management framework. Features include revision control, scheduling, search capabilities, workflow support, and browser-based WYSIWYG editors.