Skip to end of banner
Go to start of banner

Common Client Command set and Cache ("C4")

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 43 Next »

This is the specification of the command sets for Synapse clients.  The goal is to align the command sets for clients in different languages to ease users' transitions between languages.  Additionally we define the organization of the file cache so that various clients arrange local copies of file consistently.

 

CommandcommentsR SyntaxPython SyntaxCommand Line Syntax
1 – Basic Level    
Create a Synapse file handle in memory, specifying the path to the file in the local file system, the name in Synapse, and the Folder in Synapse.  This step 'stages' a file to be sent to Synapse. Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.  If 'synapseUpload' is TRUE then synStore() (below) uploads the file to S3, else only the location is persisted.  If 'name' is omitted, it defaults to the file's name.The specified file doesn't move or get copied.File(path="/path/to/file", name="foo", parentId="syn101", synapseUpload=TRUE, ...)File(path="/path/to/file", name="foo", parentId="syn101", synapseUpload=TRUE, **kwargs)NA
Create a Synapse file handle in memory which will be a serialized version of an in-memory object.  Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.

The object is not serialized at this time. 

(We are hoping people will like calling the object a File, even though it takes an in-memory object as a parameter.)

File(obj=<obj ref>, name="foo", parentId="syn101", synapseUpload=TRUE, ...)Will not be implemented in python.NA
Create a Synapse Record in memory, specifying the name and the Folder in Synapse.  This step 'stages' a Record to be sent to Synapse. 

Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.

 

Files aren't moved or copied.

TODO:  How do you specify file annotations (as distinct from Strings)?  Shall we introduce in-memory wrappers around files and urls to help distinguish them?

Record(name="foo", parentId="syn101", ...)Record(name="foo", parentId="syn101", **kwargs) 
Create a Folder or Project in memory. Name and parentId are optional. 

Folder(name="foo", parentId="syn101", ...)

Project(name="foo", ...)

Folder(name="foo", parentId="syn101", **kwargs)

Project(name="foo", **kwargs)

 
Set an entity's attribute (property or annotation) in memory.  Client first checks properties, then goes to annotations; (setting to NULL deletes it in R, using DEL operator in python deletes it)TODO:  we want to include files and (for R) in memory objectssynAnnot(entity, name)<-valueentity.parentId="syn101" 
Gets an entity's attribute value (property or annotation) from the object already in memory. 

synAnnot(entity, name); returns NULL if undefined

entity.name; throws exception if value is undefined 
Create or update an entity (File, Folder, etc.) in Synapse.  May also specify (1) whether a name collision in an attempted 'create' should become an 'update', (2) whether to 'force' a new version to be created, (3) the list of entities 'used' to generate this one, (4) the list of entities 'executed' to generate this one, (5) the name of the generation activity, and (6) the description of the generation activity.TODO:  Give some examples.synStore(entity, used, executed, activityName=NULL, activityDescription=NULL, createOrUpdate=T, forceVersion=T)synStore(entity, used, executed, activityName=None, activityDescription=None, createOrUpdate=T, forceVersion=T)

synapse create --name NAME --parentid PARENTID --description DESCRIPTION

--type TYPE

--file PATH

--update=T/F

--forceVersion=T/F

Get an entity (file, folder, etc.) from the Synapse server, with its attributes (properties, annotations) and, optionally, with its associated file(s).'download' and 'load' are ignored for objects lacking Files.  OK for download=F and load=T, this means don't cache (a valid choice if the File lives on a network share).  If a downloadLocation is not provided a default, read-only cache location is used.  If a downloadLocation IS provided, then the client must handle collisions with existing files.synGet(id, version, downloadFile=T, downloadLocation=NULL, load=T)synapse.get(id, version, downloadFile=True, downloadLocation=None, load=True)synapse get ID -v NUMBER
  synGet(entity, downloadFile=T, load=T)synapse.get(entity, downloadFile=True, load=True) 
Trash an entity, and all of its children (move all Folders and Files within a Folder to the trash can). synTrash()synapse.trash()synapse trash id
Open the web browser to the page for this entity. onWeb(entityId) / onWeb(entity)onweb(entityId) / onweb(entity)synapse onweb id
log-inget API key and write to user's properties filesynapseLogin(<user>,<pw>)synapseLogin(<user>,<pw>)NA
log-outdelete API key from properties filesynapseLogout()synapseLogout()NA
Run code, capturing output, code and provenance relationship. synapseExecute(executable, args, resultParentId, codeParentId, resultEntityProperties = NULL,  resultEntityName=NULL, replChar=".")synapseExceute(executable, args, resultParentId, codeParentId, resultEntityProperties = None,  resultEntityName=None, replChar=".") 
Create evaluation object Evaluation(name, description, status)Evaluation(name, description, status) 
Join evaluation addParticipant(evaluation, principalId)evaluation.addParticipant(principalId) 
Submit for evaluation submit(evaluation, entity)evaluation.submit(entity) 
2 – Power User Level    
Execute queryTODO:  pagination, e.g. the function returns an iterator. Look at current implementation in R client.synQuery(queryString)synapse.query(queryString) 
we talked about this, but is it needed? synGetEntity()  
we talked about this, but is it needed? synStoreEntity()  
Retrieve the wiki for an entityTODO: Is it a requirement that we retrieve attachments?  If not, do we retrieve file handles? Is this the id of the wiki or the wiki?synGetWiki(id, version) / synGetWiki(entity)

synapse.getWiki(id, version)

synapse.getWiki(entity)

 
  synStoreWiki()synapse.storeWiki() 
  synGetAnnotations()  
  synSetAnnotations()  
  synGetProperties()  
Access properties, throwing exception if property is not defined. synSetProperties()  
  synGetAnnotation()  
  synSetAnnotation()  
Access property, throwing exception if property is not defined. synGetProperty()  
Access property, throwing exception if property is not defined. Setting to NULL deletes. synSetProperty()  
Create an Activity (provenance object) in memory. Activity(name, description, used, executed)  
Create or update the Activity in Synapse synStoreActivity(activity)  
Get the Activity which generated the given entity. synGetActivity(entity) / synGetActivity(entityId)  
Set the Activity which generated the given entity synSetActivity(entity)<-activity  
Empty trash can    
Restore from trash can    
3 – Web API Level    
Execute GET request synRestGET(endpoint, uri)  
Execute POST request synRestPOST(endpoint, uri, body)  
Execute PUT request synRestPUT(endpoint, uri, body)  
Execute DELETE request synRestDELETE(endpoint, uri)  

 

Common Configuration File

This is a properties file in a standard place that is interpreted upon client initialization.  The location should be private for a user.

The format will be .properties, http://en.wikipedia.org/wiki/.properties

Things to specify in the common config file:

Cache Design Principles

The function and design principles of the client-side file cache need to be made explicit and consistent across analytical clients.

Background:  Current implementation within the R Client:

  • files are cached (meatadata used to be cached in entity.json)
  • cache is mix of read/write
  • each entity version has a location within the cache is based on its URI (e.g. .synapseCache/proddata.sagebase.org/<entityId>/<locationId>/version/<version>)
    • files.json specifies what resides within the archive
    • <fileName> file which R Client currently assumes to be a zip (this is immutable by convention until storeEntity is called)  (TODO:  What happens when it is not a zip archive)
    • <fileName>_unpacked directory within which all unzipped content lives
      • this subdirectory is writable (by convention)
      • re-stores file if not an archive (both as <fileName> and <fileName>_unpacked/<fileName>)

Revised approach

The analytical clients provide for client-side caching of Synapse files, to avoid unnecessarily retrieving (large or many) files that have already been downloaded.   Downloaded files may be cached in one of two ways:  By default, downloaded files are put in a Read Only cache, organized by (MD5) hash. The organization is:

<cache root> / <hash prefix> / <hash>

where <cache root> is a folder whose location is configurable during client installation and whose default is within the user's home directory; <hash prefix> is the first three hex digits of the hash, and <hash> is the name used for the file, the full MD5 checksum.  The reason for using the intermediate <hash prefix> directory is to avoid having too many files in a single folder. (TODO:  Alternative is to make <hash> a folder and place the file inside, with the original file name.  If so, the file name must be disregarded by the client.)

Alternately, the a file may be downloaded to a location specified by the user.  The location must be outside of the Read Only cache. We call this this an an "external cache location".   In this case the file may be modified, but not moved.  The client keeps a map from <Synapse-FileHandleID> to the external cache location.  We will refer to this as the "external cache map."  The external cache map "overrides" the default, Read Only cache.

When synStore() is called to create or update an entity having a file in an external cache location, the local copy of the file is not moved.   Rather an entry is made in the external cache map.

When synGet() is called, the client first retrieves the file handle ID and MD-5 from the entity, then checks whether the file already exists.  If the download destination is an external cache location, the client computes its MD-5 to determine whether it differs from the version in Synapse.  If so, the client must either throw an exception or prompt the user to either (1) confirm overwrite or (2) keep the original file.  (If the latter, a subsequent synStore() would overwrite the Synapse version with the local copy.)

Note:  There are multiple ways to implement the external cache map.  One approach is to maintain the map in memory, serialize and store in a file at the end of a session, read back into memory when starting the next session.    Such a file could be shared between clients, though concurrent access by multiple clients on a machine is a more complex issue.  It could also be handled by individual files for each cached fileId containing information about it's location.

Note:  This architecture supports a recursive synGet() function in which a Folder and all its children are retrieved.  The top level folder would be specified as an external cache location, and the children would be placed in folders/files following the Synapse hierarchy.  The user would not be permitted to modify the folder hierarchy (TODO: HOW?), but files could be edited and then the tree persisted with a recursive synStore().

 

TODO:  should an 'unzip' convenience function be provided and, if so, where should the unzipped files go by default?  My thought is that we should move away from zipping/unzipping behind the scenes.

 

 

 

 

 

 

 

  • No labels