Skip to end of banner
Go to start of banner

Authentication & Authorization

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 50 Next »

Goal:

Provide authentication / authorization for users of the Sage Platform. 

Authentication: Verify user's identity.
Authorization:  Allow a user, or an application invoked by user, to access data in the Platform.

The Platform comprises:

- Addama registry (running on Google App Engine, or "GAE")

- Addama feed service (on GAE)

- Addama file repository service (running on Amazon Elastic Compute Cloud, or "EC2")

- Addama Java Content Repository (JCR) service (on EC2)

- UI html files (on EC2)

- a Google Group

- shared Google Docs

- file repository, hosted at Sage, accessible via Secure CoPy (SCP)

Requirements and Design Constraints:

- Single sign-on.  Once a user has signed on to the platform, they don't have to sign into any of the components.

- One-stop user administration:  Adding or removing a user in one place will apply to all components.

- "Security at all layers":   No component can be part of the platform unless it adheres to (one of) our authentication mechanism(s).  (Note, we could implement several mechanisms "under the hood", if the systems we are integrating require it.)

- Platform will have 'arms length' integration with Google Apps, Groups.  (I.e. the rest of the system (Addama, Sage file repository) must work if Google tools are omitted.)

- We want to have full control over the UI (hence a custom approach using GWT instead of GoogleSites) but link to the relevant GoogleDocs and GoogleGroups and use the GoogleDocs UI and GoogleGroups UI when folks are interacting with those resource.

Analysis

There are just four components that need to perform authentication. (The others delegate authentication to the registry.)  They are listed here, along with authentication options:

Addama registry GAE application (Google account, Google Apps account, OpenID federated authentication)

Google Apps (Google Apps account, SAML delegated authentication)

Google Group (Google account, Google Apps account)

Sage SSH server (standard unix login)

If the SSH server were eliminated (by migrating the hosted files to an Addama repository service) then a common denominator *might* be Google App account authentication, which in turn might be delegated to an external identity provider.

Design

This is a possible design approach, contingent upon the answers to certain open questions (listed below):

- Restrict the Addama registry application to be hosted on a proprietary domain, configured to authenticate via "Google Apps for your domain".

- Configure Google Apps on our domain to delegate authentication via SAML.

- Configure Google Groups on our domain to delegate authentication via SAML.

- Migrate local file repository to Addama service.

- Employ Atlassian Crowd as the administration console for user authentication.

Open Questions

 - Are Atlassians Crowd pricing, license models, and hosting options acceptable for our purposes?  Do they prohibit integrating with NextBio?
(Note: Atlassian doesn't host Crowd, rather we download and host it ourselves. It's an Apache Tomcat application, with a variety of choices for databases.)

- What other SAML or OpenID identity provider (ip) tools (provding UIs and/or aggregating other ip's) are there?

- Can Google Apps and Google Groups use OpenID (instead of SAML) for authentication?

-Do we want to use google app's to see content we host elsewhere, or will google app's be the only place that doc's are stored in this 'sprint'?
 - Can "Google Group" membership be managed by an external authentication mechanism? (If not, then the google Provisioning API can create accounts for them in our domain.  Back-up alternative might be to use GMail + group alias rather than Google Groups for threaded discussions.)

- if we are doing "arm's length" integration with google app's, then what other providers should we plan for?

 - do we need 'audit logs', e.g. to show when users were added/removed and by whom?

Experiment to address key questions

 1. Authenticate Google Apps, Groups using SAML and Crowd

- Set-up Crowd trial edition (on local box or AWS)
- Change Google Apps demo domain to authenticate against Crowd
- Add user to Crowd
-Try to access Google Apps via this user (e.g. make a document)
- try to access Google Groups via this user

 2. Authenticate GAE app using SAML and Crowd

- Change/deploy GAE app, authenticating via Google Apps

- Try to log into to GAE app via this user
 (If not, can GAE OpenID option work with Crowd or can bypass UserService to use some sort of OpenID connector to reach Crowd?)

3. Authorize using SAML, Crowd

- Define a group in Crowd

- Add a user to a group in Crowd

- Add a user to a group in Google Apps
- See if access to services can be selected based on such group membership.

4. Replace Crowd with Open Source Identity Provider

Repeat 1-3 above.

Experiment execution

 Set Up Crowd

- Added 140.107.149.214 deflaux to C:/windows/system32/drivers/etc/hosts on my la top.  I can now PuTTY/SSH into 'deflaux' which is a Linux box in Nicole's office.

- Followed http://confluence.atlassian.com/display/CROWD/Installing+Crowd+and+CrowdID

- downloaded zip file

- downloaded and installed WinSCP; connected to deflaux:22 using SCP protocol.

- unzipped zip file and copied contents to /usr/local/tomcat on deflaux

- per the instructions, created the directory /var/crowd-home and edited .../crowd-webapp/WEB-INF/classes/crowd-init.properties accordingly.
- ran sudo ./start_crowd.sh

but got

"The BASEDIR environment variable is not defined correctly. This environment variable is needed to run this program"

- Googled around for a solution.  Found

http://codingexplorer.wordpress.com/2009/01/12/the-basedir-environment-variable-is-not-defined-correctly-this-environment-variable-is-needed-to-run-this-program/

- Did a bunch of trial-and-error and ended up trying

chmod -R 777 *

which seemed to do the trick:  Instead of getting an error message I got:

Using CATALINA_BASE:   /usr/local/tomcat/apache-tomcat
Using CATALINA_HOME:   /usr/local/tomcat/apache-tomcat
Using CATALINA_TMPDIR: /usr/local/tomcat/apache-tomcat/temp
Using JRE_HOME:        /usr
Using CLASSPATH:       /usr/local/tomcat/apache-tomcat/bin/bootstrap.jar

- Instructions say to go to http://localhost:8095/crowd.  I tried going to http://deflaux:8095/crowd and http://140.107.149.214:8095/crowd, but neither worked.  Used 'sudo ps' and 'sudo lsof -i :8095' to show that the server is indeed running.

- Nicole poked hole in the firewall on the box.  Now I can go to http://deflaux:8095/ and see the web page.  Woo hoo.

- Per the instructions at

http://confluence.atlassian.com/display/CROWD/Setting+Crowd+to+Run+Automatically+and+Use+an+Unprivileged+System+User+on+UNIX

I created the file /usr/local/tomcat/crowd.init.d, then from /etc/init.d

sudo ln /usr/local/tomcat/crowd.init.d crowd

Ran the web-based set up wizard:  Got a 30 day license key and chose to use the 'embedded' database.

Change Google Apps demo domain to authenticate against Crowd

Following:

http://confluence.atlassian.com/display/CROWD/Configuring+the+Google+Apps+Connector

Note, this says "you will need the Premier, Education, or Partners edition of Google Apps." so I may not be able to use 42stories.com.  I'll see how far I get before I'm stuck.

- Since we're using JDK 1.6, I followed the instructions which said to put the following two jars in <Crowd-Install>/crowd-webapp/WEB-INF/lib (<Crowd-Install>=/usr/local/tomcat)

1-) xml-security-1.4.2.jar
2-) commons-logging-1.1.1.jar
 

- At step 1.5 "...select one or more user directories..." I picked the single one listed, "Evaluation."  I believe this is the default user directory I set up during installation.

Set "Allow all to authenticate" -> True.

Under "Permissions" allow Google Apps to add/modify/remove groups and users, but I'm not sure if Google Apps can actually do this.  (Perhaps it can, rhrough the Provisioning API!)

Step 2: Generate new keys.  Afterwards the Configuration tab displayed:

       Sign-in Page URL:

       http://deflaux:8095/crowd/console/plugin/secure/saml/samlauth.action       Sign-out Page URL:        http://deflaux:8095/crowd/console/logoff.action       Change Password URL:        http://deflaux:8095/crowd/console/user/viewchangepassword.action       DSA Key-pair Location:        /var/crowd-home/plugin-data/crowd-saml-plugin
 
- Step 3. Configuring Google Apps to Recognise Crowd
Went to 42stories google app's console: https://www.google.com/a/cpanel/42stories.com/Dashboard
There is no "'single sign-on (SSO)' link."

 Switched to sagebionetworks.com, which DOES have a premier version of Google Apps. Followed Atlassian instructions to set up SSO. 

Note: To Disable: Go to https://www.google.com/a/cpanel/sagebionetworks.com/SetupSSO, unclick "Enable Single Sign-on", then Save Changes.

Step 4, trying it out:

I created a user called 'ssotest' having the same password.  Performed the 'Authentication Test' which was successful.

Now for a true test, connecting to Google Apps on bionetworks.com using 'ssotest':

Went to http://sites.google.com/a/sagebionetworks.com
Click on 'sign in to Sage Bionetworks'
Entered ssotest / ssotest
got "Google Apps - Invalid Email" error

I *can* log in to bruce.hoff.  This is because
sagebionetworks already has a bruce.hoff

Added a 'mike.kellen' pw: ssotest to Crowd

It works!

Added 'nicole.deflaux', pw: drizzle to Crowd

It works!

Conclusion:  GoogleApps delegates password management, but not user management!!

Went to groups.google.com/a/sagebionetworks.com

It works! I.e. google groups delegates authentication too.

Big open question:  If Crowd aggregates two directories, both having a user called john_smith, then whose credentials are used to log in to Google Apps?

Tried running Nicole's demo.  Result:  Was prompted for regular (non-Crowd) credentials.  So this demo doesn't automatically delegate when google apps does.

This might be due to how the application was deployed.   The application is associated with the sagebase.org domain, i.e. it is visible at:

https://appengine.google.com/a/sagebase.org

at the authentication choice is "Google Accounts API: The Google Accounts API includes all Gmail Accounts, but does not include accounts on Google Apps domains." 

Info on how to deploy to a domain is here:

http://code.google.com/appengine/articles/auth.html

Create a Google App Engine application using Google Apps accounts to log-in

Installed GAE plug-in for Eclipse.  It includes SDK v. 1.3.8.

Created an app 'sandbox-sagebionetworks.appspot.com' set to authenticate against users in the sagebionetworks.com domain.  Verified that the default app runs on the web.

Added a <security-constraint> to the web.xml and redeployed.  Result:  I get a "500 Internal Server Error" error when I click on the servlet link.  In the appengine control panel error log I see the message:

"Authentication for the Google Apps domain sagebionetworks.com can only be performed when requests are served from a subdomain of that domain or it has been approved through the Google Apps Control Panel."

I logged in using my sagebionetworks.com credentials, but got the same error again.

The problem is that I have not told GAE to delegate to Google Apps.

Went to appengine.google.com under the sandbox-sagebionetworks app
clicked 'Application Settings' then went to 'Add Domain'
entered 'sagebionetworks.com

got the message:
Your users can access sandbox-sagebionetworks at:
https://sandbox-sagebionetworks.appspot.com

Now it works!  I can go to
https://sandbox-sagebionetworks.appspot.com/, click on the link, and get "Hello, world"

I am signed in as bruce.hoff@sagebionetworks.com.

I click 'Sign Out' from Google Apps and try the app url again.
Unexpectedly I CAN get to 'hello, world' (no authentication)
I close all windows

I try to go to google.com/a/sagebionetworks.com and am prompted for a log-in.

Now I go to the app, click on the "Sandbox" link and am prompted for a log-in. (Yea!)

Logged out of Google Apps
https://www.google.com/a/sagebionetworks.com/
then returned to the app and was prompted for a log-in,
so authentication seems to be working.

Now to delegate authentication to Crowd:

Logged in to the control panel for the sagebionetworks.com domain and went to 'advanced tools'
https://www.google.com/a/cpanel/sagebionetworks.com/Advanced#Advanced/subtab=0
Went to 'set up single sign-on (SSO)"
https://www.google.com/a/cpanel/sagebionetworks.com/SetupSSO
and clicked "Enable Single Sign-on" then "Save changes"
Logged out.

Went to
sites.google.com/a/sagebionetworks.com
and got the modified Google log in reflecting that SSO is activated.
Clicked "Sign in to Sage Bionetworks"
and got the Crowd log in screen.
Did NOT log in but rather went to
https://sandbox-sagebionetworks.appspot.com
Clicked on 'Sandbox'
and went to the Crowd log-in screen!! (Success!!)
Entered user: nicole.deflaux, p/w: drizzle (avoiding my own, administrative credentials)
Successfully ran the "Hello world" servlet.

Summary:
Google App Engine (GAE) can be configured to delegate authentication to Google Apps (on our domain),
which can in turn delegate authentication to an external SAML-based Identity
Provider.  Moreover, the authentication requirement for the GAE services can be completely
managed in the web.xml file using <security-constraint> tags.

In principle we can create a collaborative platform including Google Apps, Google Groups,
and Google App Engine authenticating web services (e.g. Addama) in which users experience
single-sign on using their native credentials and externally managed passwords.

Notes

Q: What's the cumulative file size on the Sage SSH server?

A: About 2GB, considering the files in the directory /data/incoming on sage.fhcrc.org

Google Apps provides two APIs to help with authentication:

1. SAML Single Sign-On (SSO) Service: would allow *us* to create and maintain users and groups outside of Google.

http://code.google.com/googleapps/domain/sso/saml_reference_implementation.html

2. Google Apps Provisioning API: would allow us to programmatically create Google users and groups in our private domain.  This would streamline adding users to Google Apps.  If we used it as a total solution, then the non-google app's (e.g. Addama) would have to go to google for authentication, which violates the 'arms length' integration requirement.

3. OpenID sounds like an alternative to SAML:

http://www.google.com/support/forum/p/apps-apis/thread?tid=33a3707bd2ea7904&hl=en

In the case of OpenID, the user may have a Google Account, a Google Apps Account, or an account from any other domain that provides OpenID federated login.

Integration of GAE with OpenID:

http://code.google.com/appengine/docs/java/users/overview.html

4. At times like this, faced with a moral dillema, I ask myself, "What would Atlassian Do" (WWAD)?

4.1 Seraph is a very simple, pluggable J2EE web application security framework developed  by Atlassian and used in our products.

http://confluence.atlassian.com/display/DEV/Single+Sign-on+Integration+with+JIRA+and+Confluence

4.2 Crowd is a single sign-on (SSO) application for as many users, web applications
and directory servers you need — all through a single web interface.
http://www.atlassian.com/software/crowd/

Crowd centralises identity management, allowing you to take users from different directories
and manage them in one place. Multiple user directories can be centrally managed via Crowd's
administration console.

Crowd's OpenID authentication server, CrowdID, talks with websites and applications using
OpenID. It expands Crowd's SSO capabilities to applications outside your organisation's firewall.

http://confluence.atlassian.com/display/CROWD/Configuring+the+Google+Apps+Connector
To enable single sign-on in Google Apps, you will need the Premier, Education, or Partners edition of Google Apps.
The Crowd Google Apps connector does not support the automatic adding of users. If a user exists
in Crowd but not in Google Apps, then the user will not be able to log in to Google Apps.

To add an application (e.g. a GAE app like Addama registry):

http://confluence.atlassian.com/display/CROWDDEV/Application+Integration+Overview

Licensing and hosting Crowd:

- Crowd is not hosted by Atlassian.  We have to run it ourselves.  It runs on Windows, Linux or Mac and uses an apache tomcat app server:

http://confluence.atlassian.com/display/CROWD/Installing+Crowd+and+CrowdID

- Pricing:  This is a little confusing but it seems to say that it's $10 for up to 10 users then $600/$1200 for up to 100 users (academic/commercial)

http://www.atlassian.com/software/crowd/pricing.jsp

 Open source alternatives to Crowd:

http://code.google.com/googleapps/domain/open_source_projects.html#sso

- Addama authentication is via Servlet filters using GAE User Service OR a Google API-key.

- Addama services

- Sage SSH/SCP server authenticates using standard unix log-in.

- Addama handles authentication via Servlet Filters; the servlet config xml file shows what's in place.

- Addama white list: "user x can get these services, or anything under the branch."

Nicole's "an area for testing" is a "google apps for your domain" domain

 http://www.google.com/a/sagebionetworks.com is a "test domain for Google Apps"

What's the difference between a "google account" and a "google apps account"?
 A: the latter is newer and ultimately should subsume the former.

Does Google Apps support OpenID?

A: Only as an "Identity provider" (of the Google Apps ID) not as a service provider seeking authentication.

http://code.google.com/googleapps/domain/sso/openid_reference_implementation.html

3 ways to authenticate GAE
- google accounts
- google-apps account (on proprietary domain associated with Google)
- OpenID

ours is a google apps premier (="business"?) account

Notes on Addama Registry Filters:

org.systemsbiology.addama.coresvcs.gae.filters.StaticContentFilter
  I don't think this has anything to do with authentication, rather it's a cache for static content.  

  Note:  You can't even get this far without being authenticated.
  Note: The white list (below) *authorizes*, and doesn't apply to static content.

 org.systemsbiology.addama.coresvcs.gae.filters.UserServiceFilter
  If logged-in Google Acct OR valid API Key, then allow, else deny.

 org.systemsbiology.addama.coresvcs.gae.filters.WhiteListFilter
  If the user is an Admin or is in a 'white list' for the requested resource, then allow, else deny.

 org.systemsbiology.addama.coresvcs.gae.filters.DirectLinkFilter
  Seems to handle a specific kind of request called a 'direct link' request.
  (This MIGHT be a method for retrieving large files.)

 org.systemsbiology.addama.coresvcs.gae.filters.AdminOnlyFilter
  Filter out any requests NOT from an admin.
  Applied only for addama/memcache/*

 org.systemsbiology.addama.coresvcs.gae.filters.ProxiesFilter
  Seems to forward certain requests (in particular, non-registry requests) to GAE's "URLFetchService".

- what does <security-constraint> in the GAE web.xml file mean?
A: from http://code.google.com/appengine/docs/java/users/overview.html
If you have pages that the user should not be able to access unless signed in, you can establish a security constraint for those pages in the deployment descriptor (the web.xml or app.yaml file). If a user accesses a URL with a security constraint and the user is not signed in, App Engine redirects the user to the sign-in page automatically (for Google Accounts or Google Apps authentication) or to the page at /_ah/login_required (for OpenID authentication), then directs the user back to the URL after signing in or registering successfully.

A security constraint can also require that the user be a registered administrator for the application. This makes it easy to build administrator-only sections of the site, without having to implement a separate authorization mechanism.


 

  • No labels