Goal:
Provide authentication / authorization for users of the Sage Platform.
Authentication: Verify user's identity.
Authorization: Allow a user, or an application invoked by user, to access data in the Platform.
The Platform comprises:
- Addama registry (running on Google App Engine, or "GAE")
- Addama feed service (on GAE)
- Addama file repository service (running on Amazon Elastic Compute Cloud, or "EC2")
- Addama Java Content Repository (JCR) service (on EC2)
- UI html files (on EC2)
- a Google Group
- shared Google Docs
- file repository, hosted at Sage, accessible via Secure CoPy (SCP)
Requirements and Design Constraints:
- Single sign-on. Once a user has signed on to the platform, they don't have to sign into any of the components.
- One-stop user administration: Adding or removing a user in one place will apply to all components.
- "Security at all layers": No component can be part of the platform unless it adheres to (one of) our authentication mechanism(s). (Note, we could implement several mechanisms "under the hood", if the systems we are integrating require it.)
- Platform will have 'arms length' integration with Google Apps, Groups. (I.e. the rest of the system (Addama, Sage file repository) must work if Google tools are omitted.)
- We want to have full control over the UI (hence a custom approach using GWT instead of GoogleSites) but link to the relevant GoogleDocs and GoogleGroups and use the GoogleDocs UI and GoogleGroups UI when folks are interacting with those resource.
Analysis
There are just four components that need to perform authentication. (The others delegate authentication to the registry.) They are listed here, along with authentication options:
Addama registry GAE application (Google account, Google Apps account, OpenID federated authentication)
Google Apps (Google Apps account, SAML delegated authentication)
Google Group (Google account, Google Apps account)
Sage SSH server (standard unix login)
If the SSH server were eliminated (by migrating the hosted files to an Addama repository service) then a common denominator *might* be Google App account authentication, which in turn might be delegated to an external identity provider.
Design
This is a possible design approach, contingent upon the answers to certain open questions (listed below):
- Restrict the Addama registry application to be hosted on a proprietary domain, configured to authenticate via "Google Apps for your domain".
- Configure Google Apps on our domain to delegate authentication via SAML.
- Configure Google Groups on our domain to delegate authentication via SAML.
- Migrate local file repository to Addama service.
- Employ Atlassian Crowd as the administration console for user authentication.
Open Questions
- Are Atlassians Crowd pricing, license models, and hosting options acceptable for our purposes? Do they prohibit integrating with NextBio?
(Note: Atlassian doesn't host Crowd, rather we download and host it ourselves. It's an Apache Tomcat application, with a variety of choices for databases.)
- What other SAML or OpenID identity provider (ip) tools (provding UIs and/or aggregating other ip's) are there?
- Can Google Apps and Google Groups use OpenID (instead of SAML) for authentication?
-Do we want to use google app's to see content we host elsewhere, or will google app's be the only place that doc's are stored in this 'sprint'?
- Can "Google Group" membership be managed by an external authentication mechanism? (If not, then the google Provisioning API can create accounts for them in our domain. Back-up alternative might be to use GMail + group alias rather than Google Groups for threaded discussions.)
- if we are doing "arm's length" integration with google app's, then what other providers should we plan for? - do we need 'audit logs', e.g. to show when users were added/removed and by whom?
Experiment to address key questions
1. Authenticate Google Apps, Groups using SAML and Crowd
- Set-up Crowd trial edition (on local box or AWS)
- Change Google Apps demo domain to authenticate against Crowd
- Add user to Crowd
-Try to access Google Apps via this user (e.g. make a document)
- try to access Google Groups via this user
2. Authenticate GAE app using SAML and Crowd
- Change/deploy GAE app, authenticating via Google Apps
- Try to log into to GAE app via this user
(If not, can GAE OpenID option work with Crowd or can bypass UserService to use some sort of OpenID connector to reach Crowd?)
3. Authorize using SAML, Crowd
- Define a group in Crowd
- Add a user to a group in Crowd
- Add a user to a grou in Google Apps
- See if access to services can be selected based on such group membership.
4. Replace Crowd with Open Source Identity Provider
Repeat 1-3 above.
Experiment execution
Set Up Crowd
- Added 140.107.149.214 deflaux to C:/windows/system32/drivers/etc/hosts on my la top. I can now PuTTY/SSH into 'deflaux' which is a Linux box in Nicole's office.
- Followed http://confluence.atlassian.com/display/CROWD/Installing+Crowd+and+CrowdID
- downloaded zip file
- downloaded and installed WinSCP; connected to deflaux:22 using SCP protocol.
- unzipped zip file and copied contents to /usr/local/tomcat on deflaux
- per the instructions, created the directory /var/crowd-home and edited .../crowd-webapp/WEB-INF/classes/crowd-init.properties accordingly.
- ran sudo ./start_crowd.sh
but got
"The BASEDIR environment variable is not defined correctly. This environment variable is needed to run this program"
- Googled around for a solution. Found
- Did a bunch of trial-and-error and ended up trying
chmod -R 777 *
which seemed to do the trick: Instead of getting an error message I got:
Using CATALINA_BASE: /usr/local/tomcat/apache-tomcat
Using CATALINA_HOME: /usr/local/tomcat/apache-tomcat
Using CATALINA_TMPDIR: /usr/local/tomcat/apache-tomcat/temp
Using JRE_HOME: /usr
Using CLASSPATH: /usr/local/tomcat/apache-tomcat/bin/bootstrap.jar
- Instructions say to go to http://localhost:8095/crowd. I tried going to http://deflaux:8095/crowd and http://140.107.149.214:8095/crowd, but neither worked. Used 'sudo ps' and 'sudo lsof -i :8095' to show that the server is indeed running.
- Nicole poked hole in the firewall on the box. Now I can go to http://deflaux:8095/ and see the web page. Woo hoo.
- Per the instructions at
I created the file /usr/local/tomcat/crowd.init.d, then from /etc/init.d
sudo ln /usr/local/tomcat/crowd.init.d crowd
Ran the web-based set up wizard: Got a 30 day license key and chose to use the 'embedded' database.
Change Google Apps demo domain to authenticate against Crowd
Following:
http://confluence.atlassian.com/display/CROWD/Configuring+the+Google+Apps+Connector
Note, this says "you will need the Premier, Education, or Partners edition of Google Apps." so I may not be able to use 42stories.com. I'll see how far I get before I'm stuck.
- Since we're using JDK 1.6, I followed the instructions which said to put the following two jars in <Crowd-Install>/crowd-webapp/WEB-INF/lib (<Crowd-Install>=/usr/local/tomcat)
1-) xml-security-1.4.2.jar
2-) commons-logging-1.1.1.jar
- At step 1.5 "...select one or more user directories..." I picked the single one listed, "Evaluation." I believe this is the default user directory I set up during installation.
Set "Allow all to authenticate" -> True.
Under "Permissions" allow Google Apps to add/modify/remove groups and users, but I'm not sure if Google Apps can actually do this. (Perhaps it can, rhrough the Provisioning API!)
Step 2: Generate new keys. Afterwards the Configuration tab displayed:
Sign-in Page URL:
http://deflaux:8095/crowd/console/plugin/secure/saml/samlauth.action Sign-out Page URL: http://deflaux:8095/crowd/console/logoff.action Change Password URL: http://deflaux:8095/crowd/console/user/viewchangepassword.action DSA Key-pair Location: /var/crowd-home/plugin-data/crowd-saml-plugin
- Step 3. Configuring Google Apps to Recognise Crowd
Went to 42stories google app's console: https://www.google.com/a/cpanel/42stories.com/Dashboard
There is no "'single sign-on (SSO)' link."
Switched to sagebionetworks.com, which DOES have a premier version of Google Apps. Followed Atlassian instructions to set up SSO.
Note: To Disable: Go to https://www.google.com/a/cpanel/sagebionetworks.com/SetupSSO, unclick "Enable Single Sign-on", then Save Changes.
Step 4, trying it out:
I created a user called 'ssotest' having the same password. Performed the 'Authentication Test' which was successful.
Now for a true test, connecting to Google Apps on bionetworks.com using 'ssotest':
Went to http://sites.google.com/a/sagebionetworks.com
Click on 'sign in to Sage Bionetworks'
Entered ssotest / ssotest
got "Google Apps - Invalid Email" error
I *can* log in to bruce.hoff. This is because
sagebionetworks already has a bruce.hoff
Added a 'mike.kellen' pw: ssotest to Crowd
It works!
Added 'nicole.deflaux', pw: drizzle to Crowd
It works!
Conclusion: GoogleApps delegates password management, but not user management!!
Went to groups.google.com/a/sagebionetworks.com
It works! I.e. google groups delegates authentication too.
Big open question: If Crowd aggregates two directories, both having a user called john_smith, then whose credentials are used to log in to Google Apps?
Tried running Nicole's demo. Result: Was prompted for regular (non-Crowd) credentials. So this demo doesn't automatically delegate when google apps does.
This might be due to how the application was deployed. The application is associated with the sagebase.org domain, i.e. it is visible at:
https://appengine.google.com/a/sagebase.org
at the authentication choice is "Google Accounts API: The Google Accounts API includes all Gmail Accounts, but does not include accounts on Google Apps domains."
Info on how to deploy to a domain is here:
http://code.google.com/appengine/articles/auth.html
Create a Google App Engine application using Google Apps accounts to log-in
Installed GAE plug-in for Eclipse. It includes SDK v. 1.3.8.
Notes:
Q: What's the cumulative file size on the Sage SSH server?
A: About 2GB, considering the files in the directory /data/incoming on sage.fhcrc.org
Google Apps provides two APIs to help with authentication:
1. SAML Single Sign-On (SSO) Service: would allow *us* to create and maintain users and groups outside of Google.
http://code.google.com/googleapps/domain/sso/saml_reference_implementation.html
2. Google Apps Provisioning API: would allow us to programmatically create Google users and groups in our private domain. This would streamline adding users to Google Apps. If we used it as a total solution, then the non-google app's (e.g. Addama) would have to go to google for authentication, which violates the 'arms length' integration requirement.
3. OpenID sounds like an alternative to SAML:
http://www.google.com/support/forum/p/apps-apis/thread?tid=33a3707bd2ea7904&hl=en
In the case of OpenID, the user may have a Google Account, a Google Apps Account, or an account from any other domain that provides OpenID federated login.
Integration of GAE with OpenID:
http://code.google.com/appengine/docs/java/users/overview.html
4. At times like this, faced with a moral dillema, I ask myself, "What would Atlassian Do" (WWAD)?
4.1 Seraph is a very simple, pluggable J2EE web application security framework developed by Atlassian and used in our products.
http://confluence.atlassian.com/display/DEV/Single+Sign-on+Integration+with+JIRA+and+Confluence
4.2 Crowd is a single sign-on (SSO) application for as many users, web applications
and directory servers you need — all through a single web interface.
http://www.atlassian.com/software/crowd/
Crowd centralises identity management, allowing you to take users from different directories
and manage them in one place. Multiple user directories can be centrally managed via Crowd's
administration console.
Crowd's OpenID authentication server, CrowdID, talks with websites and applications using
OpenID. It expands Crowd's SSO capabilities to applications outside your organisation's firewall.
http://confluence.atlassian.com/display/CROWD/Configuring+the+Google+Apps+Connector
To enable single sign-on in Google Apps, you will need the Premier, Education, or Partners edition of Google Apps.
The Crowd Google Apps connector does not support the automatic adding of users. If a user exists
in Crowd but not in Google Apps, then the user will not be able to log in to Google Apps.
To add an application (e.g. a GAE app like Addama registry):
http://confluence.atlassian.com/display/CROWDDEV/Application+Integration+Overview
Licensing and hosting Crowd:
- Crowd is not hosted by Atlassian. We have to run it ourselves. It runs on Windows, Linux or Mac and uses an apache tomcat app server:
http://confluence.atlassian.com/display/CROWD/Installing+Crowd+and+CrowdID
- Pricing: This is a little confusing but it seems to say that it's $10 for up to 10 users then $600/$1200 for up to 100 users (academic/commercial)
http://www.atlassian.com/software/crowd/pricing.jsp
Open source alternatives to Crowd:
http://code.google.com/googleapps/domain/open_source_projects.html#sso
- Addama authentication is via Servlet filters using GAE User Service OR a Google API-key.
- Addama services
- Sage SSH/SCP server authenticates using standard unix log-in.
- Addama handles authentication via Servlet Filters; the servlet config xml file shows what's in place.
- Addama white list: "user x can get these services, or anything under the branch."
Nicole's "an area for testing" is a "google apps for your domain" domain
http://www.google.com/a/sagebionetworks.com is a "test domain for Google Apps"
What's the difference between a "google account" and a "google apps account"?
A: the latter is newer and ultimately should subsume the former.
Does Google Apps support OpenID?
A: Only as an "Identity provider" (of the Google Apps ID) not as a service provider seeking authentication.
http://code.google.com/googleapps/domain/sso/openid_reference_implementation.html
3 ways to authenticate GAE
- google accounts
- google-apps account (on proprietary domain associated with Google)
- OpenID
ours is a google apps premier (="business"?) account
Notes on Addama Registry Filters:
org.systemsbiology.addama.coresvcs.gae.filters.StaticContentFilter
I don't think this has anything to do with authentication, rather it's a cache for static content.
Note: You can't even get this far without being authenticated.
Note: The white list (below) *authorizes*, and doesn't apply to static content.
org.systemsbiology.addama.coresvcs.gae.filters.UserServiceFilter
If logged-in Google Acct OR valid API Key, then allow, else deny.
org.systemsbiology.addama.coresvcs.gae.filters.WhiteListFilter
If the user is an Admin or is in a 'white list' for the requested resource, then allow, else deny.
org.systemsbiology.addama.coresvcs.gae.filters.DirectLinkFilter
Seems to handle a specific kind of request called a 'direct link' request.
(This MIGHT be a method for retrieving large files.)
org.systemsbiology.addama.coresvcs.gae.filters.AdminOnlyFilter
Filter out any requests NOT from an admin.
Applied only for addama/memcache/*
org.systemsbiology.addama.coresvcs.gae.filters.ProxiesFilter
Seems to forward certain requests (in particular, non-registry requests) to GAE's "URLFetchService".
- what does <security-constraint> in the GAE web.xml file mean?
A: from http://code.google.com/appengine/docs/java/users/overview.html
If you have pages that the user should not be able to access unless signed in, you can establish a security constraint for those pages in the deployment descriptor (the web.xml or app.yaml file). If a user accesses a URL with a security constraint and the user is not signed in, App Engine redirects the user to the sign-in page automatically (for Google Accounts or Google Apps authentication) or to the page at /_ah/login_required (for OpenID authentication), then directs the user back to the URL after signing in or registering successfully.
A security constraint can also require that the user be a registered administrator for the application. This makes it easy to build administrator-only sections of the site, without having to implement a separate authorization mechanism.