Skip to end of banner
Go to start of banner

Common Client Command set and Cache ("C4")

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 55 Next »

This is the specification of the command set for Synapse clients.  The goal is to align the command sets for clients in different languages to ease users' transitions between languages.  Additionally we define the organization of the file cache so that various clients arrange local copies of file consistently.

 

Synapse File Management

To motivate the design of the client side cache and file manipulation commands, we review file management in Synapse:  Synapse tracks the shared location of the files it stores (e.g. the location in Amazon S3 or some other web-accessible location) and also the file's MD5 hash.  While Synapse records a file's name, it does not know the location of the file when downloaded by any user.  The client has a notion of a File object (defined below).  This object has a slot for the ID of the File object in Synapse and also has a slot for the local file handle.  When the client moves a file from the local machine to Synapse (via the "synStore" command, defined below), the file is uploaded from the local location to Synapse.  When the client retrieves a File (via the "synGet" command, defined below) it may specify the local file handle to which the file is downloaded, or allow the client to place in a default location.  With this understanding, we can discuss how the clients cache files.

Cache Design Principles

The analytical clients provide for client-side caching of Synapse files, to avoid unnecessarily retrieving (large or many) files that have already been downloaded.   Downloaded files may be cached in one of two ways: 

(1) By default, downloaded files are put in a Read Only cache, organized by (MD5) hash. The organization is:

<cache root> / <hash prefix> / <hash>

where <cache root> is a folder whose location is configurable during client installation and whose default is within the user's home directory; <hash prefix> is the first two hex digits of the hash, and <hash> is the name used for the file, the full MD5 hash.  The reason for using the intermediate <hash prefix> directory is to avoid having too many files in a single folder. 

(2) Alternatively, the a file may be downloaded to a folder location specified by the user.  The location must be outside of the Read Only cache. We call this this an an "external cache location".   In this case the file may be modified.

When synStore() is called to create an entity having a file in an external cache location, the local copy of the file is not moved.  It serves as a "cached copy" in its current location, as described below.

When synGet() is called, the client first retrieves the File metadata, including the MD5 hash.  If no download location is specified, then the client checks the default cache location.  If the file is already there it is not downloaded again.  If absent, the file is downloaded.   If the download destination is an external cache location, the client computes the MD5 of the local copy to determine whether it differs from the version in Synapse.  If so, the client must (based on user choice) either (1) create new, unique file name, (2) confirm overwrite or (3) keep the original file.  (If the latter, a subsequent synStore() would overwrite the Synapse version with the local copy.)

When synStore() is called to update an entity, if the file location is the default, Read Only cache, then the file is not uploaded.  If the file location is an external cache location, then the MD5 hash of the file is recomputed.  The file is uploaded if and only if the newly computed MD5 hash differs from that of the previously retrieved entity.

The effect of this architecture is that repeated downloads are avoided:

  • for repeated uploads or downloads of the same file (in one entity or across multiple entities) when using the default, Read Only cache for the file.
  • for repeated uploads or downloads of an entity's file when using an external cache location and when the local copy is not modified.

This strategy does not avoid repeated downloads of an entity's file to a variety of local folders (e.g. to the default location and then to an external one, or to two different external ones).  We feel that the potential efficiency gains of doing this are outweighed by the complexity of tracking multiple, mutable copies of a file.

 

Command Set

We conceptual divide the client commands into three levels (1) Basic, (2) Power User and (3) Web API.

 

CommandcommentsR SyntaxPython SyntaxCommand Line Syntax
1 – Basic Level    
Create a Synapse file handle in memory, specifying the path to the file in the local file system, the name in Synapse, and the Folder in Synapse.  This step 'stages' a file to be sent to Synapse. Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.  If 'name' is omitted, it defaults to the file's name. If 'synapseStore' is TRUE then file is uploaded to S3, else only the file location is saved.The specified file doesn't move or get copied.File(path="/path/to/file", synapseStore=T, name="foo", parentId="syn101", ...)File(path="/path/to/file", synapseStore=T, name="foo", parentId="syn101", **kwargs)NA
Create a Synapse file handle in memory which will be a serialized version of an in-memory object.  Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below. If 'synapseStore' is TRUE then file is uploaded to S3, else only the file location is saved.

The object is not serialized at this time. 

(We are hoping people will like calling the object a File, even though it takes an in-memory object as a parameter.)

File(obj=<obj ref>, synapseStore=T, name="foo", parentId="syn101", ...)Will not be implemented in python.NA
Create a Synapse Record in memory, specifying the name and the Folder in Synapse.  This step 'stages' a Record to be sent to Synapse. 

Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.

 

Files aren't moved or copied.

TODO:  How do you specify file annotations (as distinct from Strings)?  Shall we introduce in-memory wrappers around files and urls to help distinguish them?

Record(name="foo", parentId="syn101", ...)Record(name="foo", parentId="syn101", **kwargs) 
Create a Folder or Project in memory. Name and parentId are optional. 

Folder(name="foo", parentId="syn101", ...)

Project(name="foo", ...)

Folder(name="foo", parentId="syn101", **kwargs)

Project(name="foo", **kwargs)

 
Set an entity's attribute (property or annotation) in memory.  Client first checks properties, then goes to annotations; (setting to NULL deletes it in R, using DEL operator in python deletes it)TODO:  we want to include files and (for R) in memory objectssynAnnot(entity, name)<-valueentity.parentId="syn101" 
Gets an entity's attribute value (property or annotation) from the object already in memory. 

synAnnot(entity, name); returns NULL if undefined

entity.name; throws exception if value is undefined 
Create or update an entity (File, Folder, etc.) in Synapse.  May also specify (1) whether a name collision in an attempted 'create' should become an 'update', (2) whether to 'force' a new version to be created, (3) the list of entities 'used' to generate this one, (4) the list of entities 'executed' to generate this one, (5) the name of the generation activity, and (6) the description of the generation activity.TODO:  Give some examples.synStore(entity, used, executed, activityName=NULL, activityDescription=NULL, createOrUpdate=T, forceVersion=T)synStore(entity, used, executed, activityName=None, activityDescription=None, createOrUpdate=T, forceVersion=T)

synapse create --name NAME --parentid PARENTID --description DESCRIPTION

--type TYPE

--file PATH

--update=T/F

--forceVersion=T/F

Get an entity (file, folder, etc.) from the Synapse server, with its attributes (properties, annotations) and, optionally, with its associated file(s).  if.collision is one of "keep.both", "keep.original", or "overwrite.original", telling the system what to do if a different file is found at the given local file location.'download' and 'load' are ignored for objects lacking Files.  OK for download=F and load=T, this means don't cache (a valid choice if the File lives on a network share).  If a downloadLocation is not provided a default, read-only cache location is used.  If a downloadLocation IS provided, then the client must handle collisions with existing files.  Note, 'downloadLocation' must be a folder, i.e. it cannot be used to rename files.

synGet(id, version, downloadFile=T,

downloadLocation=NULL, ifcollision="keep.both", load=T)

synapse.get(id, version, downloadFile=True, downloadLocation=None, ifcollision="keep.both", load=True)synapse get ID -v NUMBER
  synGet(entity, downloadFile=T, downloadLocation=NULL, ifcollision="keep.both", load=T)synapse.get(entity, downloadFile=True, downloadLocation=NULL, ifcollision="keep.both", load=True) 
Trash an entity, and all of its children (move all Folders and Files within a Folder to the trash can). synTrash(id) / synTrash(entity)
synapse.trash()synapse trash id
Open the web browser to the page for this entity. onWeb(entityId) / onWeb(entity)onweb(entityId) / onweb(entity)synapse onweb id
log-inget API key and write to user's properties filesynapseLogin(<user>,<pw>)synapseLogin(<user>,<pw>)NA
log-outdelete API key from properties filesynapseLogout()synapseLogout()NA
2 – Power User Level    
Execute queryTODO:  pagination, e.g. the function returns an iterator. Look at current implementation in R client.synQuery(queryString)synapse.query(queryString) 
we talked about this, but is it needed? synGetEntity()  
we talked about this, but is it needed? synStoreEntity()  
Retrieve the wiki for an entityTODO: Is it a requirement that we retrieve attachments?  If not, do we retrieve file handles? Is this the id of the wiki or the wiki?synGetWiki(id, version) / synGetWiki(entity)

synapse.getWiki(id, version)

synapse.getWiki(entity)

 
  synStoreWiki()synapse.storeWiki() 
  synGetAnnotations()  
  synSetAnnotations()  
  synGetProperties()  
Access properties, throwing exception if property is not defined. synSetProperties()  
  synGetAnnotation()  
  synSetAnnotation()  
Access property, throwing exception if property is not defined. synGetProperty()  
Access property, throwing exception if property is not defined. Setting to NULL deletes. synSetProperty()  
Create an Activity (provenance object) in memory. Activity(name, description, used, executed)  
Create or update the Activity in Synapse synStoreActivity(activity)  
Get the Activity which generated the given entity. synGetActivity(entity) / synGetActivity(entityId)  
Set the Activity which generated the given entity synSetActivity(entity)<-activity  
Empty trash can    
Restore from trash can    
Run code, capturing output, code and provenance relationship. synapseExecute(executable, args, resultParentId, codeParentId, resultEntityProperties = NULL,  resultEntityName=NULL, replChar=".")synapseExceute(executable, args, resultParentId, codeParentId, resultEntityProperties = None,  resultEntityName=None, replChar=".") 
Create evaluation object Evaluation(name, description, status)Evaluation(name, description, status) 
Join evaluation addParticipant(evaluation, principalId)evaluation.addParticipant(principalId) 
Submit for evaluation submit(evaluation, entity)evaluation.submit(entity) 
3 – Web API Level    
Execute GET request synRestGET(endpoint, uri)  
Execute POST request synRestPOST(endpoint, uri, body)  
Execute PUT request synRestPUT(endpoint, uri, body)  
Execute DELETE request synRestDELETE(endpoint, uri)  

 

Common Configuration File

This is a properties file in a standard place that is interpreted upon client initialization.  The location should be private for a user.

The format will be .properties, http://en.wikipedia.org/wiki/.properties

Things to specify in the common config file:

 

Appendix:  Current implementation of the file cache in the R Client:

  • files are cached (meatadata used to be cached in entity.json)
  • cache is mix of read/write
  • each entity version has a location within the cache is based on its URI (e.g. .synapseCache/proddata.sagebase.org/<entityId>/<locationId>/version/<version>)
    • files.json specifies what resides within the archive
    • <fileName> file which R Client currently assumes to be a zip (this is immutable by convention until storeEntity is called)  (TODO:  What happens when it is not a zip archive)
    • <fileName>_unpacked directory within which all unzipped content lives
      • this subdirectory is writable (by convention)
      • re-stores file if not an archive (both as <fileName> and <fileName>_unpacked/<fileName>)

 

 

 

 

 

 

 

 

 

  • No labels