Can an off-the-shelf content repository (CR) provide the functionality needed for the back end of Synapse?
Such a CR would require the following features:
- CRUD on objects
- create universal IDs for objects
- Custom meta-data on objects, with either (1) composite keys (attributes), and/or (2) multiple values per attribute
- versioning of objects
- fine grained authorization/security (would need to know how users and groups are represented)
- meta data for users and groups
- querying (paginated, sorted, and filtering results based on authorization); unstructured as well as structured queries
- support open-source database, e.g. MySQL
- integrate with our authentication mechanism
- support concurrent access by users
Nice to have:
- automatic indexing for search
Evaluation of Jackrabbit/Java Content Respository:
Answers to requirements above:
YES - CRUD on objects
YES, for "referenceable" nodes - create universal IDs for objects
YES, multiple values are allowed - Custom meta-data on objects, with either (1) composite keys (attributes), and/or (2) multiple values per attribute
YES - versioning of objects
MAYBE (Bruce thought he saw it was, but Dave found a comment to the contrary) - fine grained authorization/security (would need to know how users and groups are represented)
YES, by leveraging content nodes to represent this metadata - meta data for users and groups
YES, in version 2 (relatively new) - querying (paginated, sorted, and filtering results based on authorization); unstructured as well as structured queries
YES - support open-source database, e.g. MySQL
YES, via JAAS, for which Crowd has a plug-in - integrate with our authentication mechanism
YES - support concurrent access by users
Nice to have:
YES - automatic indexing for search
Open questions:
- how widely is JCR/jackrabbit adopted? How widely is the new version 2.0 adopted?
- what competing CR products are there?
- Can you specify a finite list of values (i.e. an "enum") for an attribute, or more generally integrate validation with annotation population?
- are the attributes sufficiently flexible?
- what's a "relative path", a "reference", a "PATH" property type?
- what's an "unstructured node"?
- how does JAAS link to Crowd?
- does JCR support user groups?
Notes:
- There is a Spring template for JCR
Feasibility tasks:
- stress test querying
- exercise SQL-like querying
- integrate authentication and authorization with Crowd via JAAS
Caveats about Jackrabbit/JCR:
JCR v.1 has problems doing joins and with query performance.
JCR has no relation DB client
JCR v.1 didn't have versioning or auth'
Dave B found issues during querying related to 'annotation types'
Other products:
Content Repositories:
CouchDB (might have RDBMS client)
http://code.google.com/p/couchdb-python/
(Enterprise) Content Management Systems:
ECMs seem to be more complete systems (with a web UI) for managing documents on-line. (Think of Google blogspot.) So this line of exploration may be a dead-end.
OpenCMS an open source content management system written in Java http://demo.opencms.org/en/ "enterprise content management"
Nuxeo is a comprehensive free software/open source Enterprise Content Management (ECM) platform http://en.wikipedia.org/wiki/Enterprise_content_management
dotCMS dotCMS is a free software / open source web content management system (wCMS)
Alfresco an open source enterprise content management system
Magnolia (CMS) is an Open-Source content management system (CMS) developed by Magnolia International Ltd., based in Basel, Switzerland uses jackrabbit under the hood
Hippo CMS ...for multi-channel distribution like web sites and intranets. uses jackrabbit under the hood
Apache Lenya Apache Lenya is a Java/XML open-source content management system based on the Apache Cocoon content management framework. Features include revision control, scheduling, search capabilities, workflow support, and browser-based WYSIWYG editors.