Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

CommandcommentsR SyntaxPython SyntaxCommand Line Syntax
1 – Basic Level    
Create a Synapse file handle in memory, specifying the path to the file in the local file system, the name in Synapse, and the Folder in Synapse.  This step 'stages' a file to be sent to Synapse. Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.  If 'synapseUpload' is TRUE then the file is uploaded to S3, else only the location is persisted.The specified file doesn't move or get copied at this time.File(path="/path/to/file", name="foo", parentId="syn101", synapseUpload=TRUE, ...)File(path="/path/to/file", name="foo", parentId="syn101", synapseUpload=TRUE, **kwargs)NA
Create a Synapse file handle in memory which will be a serialized version of an in-memory object.  Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.

The object is not serialized at this time. 

(We are hoping people will like calling the object a File, even though it takes an in-memory object as a parameter.)

File(obj=<obj ref>, name="foo", parentId="syn101", synapseUpload=TRUE, ...)Will not be implemented in python.NA
Create a Synapse Record in memory, specifying the paths to one or more files in the local file system, the name in Synapse, and the Folder in Synapse.  This step 'stages' a Record to be sent to Synapse. 

Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.

 

Files aren't moved or copied.

TODO:  How do you specify file annotations (as distinct from Strings)?  Shall we introduce in-memory wrappers around files and urls to help distinguish them?

Record(name="foo", parentId="syn101", ...)Record(name="foo", parentId="syn101", **kwargs) 
Create a Folder or Project in memory. Name and parentId are optional. 

Folder(name="foo", parentId="syn101", ...)

Project(name="foo", ...)

Folder(name="foo", parentId="syn101", **kwargs)

Project(name="foo", **kwargs)

 
Set an entity's attribute (property or annotation) in memory.  Client first checks properties, then goes to annotations; setting to NULL deletes itTODO:  we want to include files and (for R) in memory objectssynAnnot(entity, name)<-valueentity.parentId="syn101" 
Gets an entity's attribute value (property or annotation) from the object already in memory. 

synAnnot(entity, name); returns NULL if undefined

entity.name; throws exception if value is undefined 
Create or update an entity (File, Folder, etc.) in Synapse.  May also specify (1) whether a name collision in an attempted 'create' should become an 'update', (2) whether to 'force' a new version to be created, (3) the list of entities 'used' to generate this one, (4) the list of entities 'executed' to generate this one, (5) the name of the generation activity, and (6) the description of the generation activity.TODO:  Give some examples.synStore(entity, used, executed, activityName=NULL, activityDescription=NULL, createOrUpdate=T, forceVersion=T)synStore(entity, used, executed, activityName=None, activityDescription=None, createOrUpdate=T, forceVersion=T) 
Get an entity (file, folder, etc.) from the Synapse server, with its attributes (properties, annotations) and, optionally, with its associated file(s).'download' and 'load' are ignored for objects lacking Files.  OK for download=F and load=T, this means don't cache (a valid choice if the File lives on a network share).  If a downloadLocation is not provided a default, read-only cache location is used.synGet(id, version, downloadFile=T, downloadLocation=NULL, load=T)synapse.get(id, version, downloadFile=True, downloadLocation=None, load=True)synapse get ID -v NUMBER
  synGet(entity, downloadFile=T, load=T)synapse.get(entity, downloadFile=True, load=True) 
Open the web browser to the page for this entity. onWeb(entityId) / onWeb(entity)onweb(entityId) / onweb(entity) 
log-inget API key and write to user's properties filesynapseLogin(<user>,<pw>)synapseLogin(<user>,<pw>) 
log-outdelete API key from properties filesynapseLogout()synapseLogout() 
Run code, capturing output, code and provenance relationship. synapseExceute(executable, args, resultParentId, codeParentId, resultEntityProperties = NULL,  resultEntityName=NULL, replChar=".")synapseExceute(executable, args, resultParentId, codeParentId, resultEntityProperties = None,  resultEntityName=None, replChar=".") 
Create evaluation object Evaluation(name, description, status)Evaluation(name, description, status) 
Join evaluation addParticipant(evaluation, principalId)evaluation.addParticipant(principalId) 
Submit for evaluation submit(evaluation, entity)evaluation.submit(entity) 
2 – Power User Level    
Execute queryTODO:  pagination, e.g. the function returns an iterator. Look at current implementation in R client.synQuery(queryString)  
we talked about this, but is it needed? synGetEntity()  
we talked about this, but is it needed? synStoreEntity()  
Delete an entity, and all of its children (e.g. all Folders and Files within a Folder). synDelete()  
Retrieve the wiki for an entityTODO: Is it a requirement that we retrieve attachments?  If not, do we retrieve file handles?synGetWiki(id, version) / synGetWiki(entity)  
  synStoreWiki()  
  synGetAnnotations()  
  synSetAnnotations()  
  synGetProperties()  
Access properties, throwing exception if property is not defined. synSetProperties()  
  synGetAnnotation()  
  synSetAnnotation()  
Access property, throwing exception if property is not defined. synGetProperty()  
Access property, throwing exception if property is not defined. Setting to NULL deletes. synSetProperty()  
Create an Activity (provenance object) in memory. Activity(name, description, used, executed)  
Create or update the Activity in Synapse synStoreActivity(activity)  
Get the Activity which generated the given entity. synGetActivity(entity) / synGetActivity(entityId)  
Set the Activity which generated the given entity synSetActivity(entity)<-activity  
     
3 – Web API Level    
Execute GET request synRestGET(endpoint, uri)  
Execute POST request synRestPOST(endpoint, uri, body)  
Execute PUT request synRestPUT(endpoint, uri, body)  
Execute DELETE request synRestDELETE(endpoint, uri)  

...

The function and design principles of the client-side file cache need to be made explicit and consistent across analytical clients.

Background:  Current implementation within the R Client:

  • only entity files are cached (not metadata)
  • cache is mix of read/write
  • each entity version has a location within the cache is based on its URI (e.g. .synapseCache/proddata.sagebase.org/<entityId>/<locationId>/version/<version>)
    • files.json specifies what resides within the archive
    • <fileName> file which R Client currently assumes to be a zip (this is immutable by convention until storeEntity is called)
    • <fileName>_unpacked directory within which all unzipped content lives
      • this subdirectory is writable (by convention)
      • re-stores file if not an archive (both as <fileName> and <fileName>_unpacked/<fileName>)

...

Revised approach

Each The analytical client client has a file cache on the local file system.  When an File or Record entity is fetched, if its associated files are fetched then they go into the cache.

When a file is added to an entity and pushed to Synapse, the local copy does not move.  Thus there is no guarantee that the local file associated with an entity be in the download cache.  (The entity must keep a reference to the local file handle.)  (The arguments for this principle are:  Moving large files could be expensive with wasteful duplication; the file might be a script that the user is editing.)

The files in the cache may be modified. That is, the client may not assume that a file in the cache is identical to the one in Synapse, even if the entity's e-tag has not changed. (TODO:  How DOES the client know a file has been modified? One way is to keep a copy of the md5 in memory upon synGet or synStore, and compare when storing again.)

TODO:  When the client does synGet() and the associated file(s) is/are in the cache already, how does it know whether to download again?  One way is to compute the local file(s) md5(s) and compare to those in Synapse.

TODO:  What if you add a script to a FIle entity, synStore(), exit, restart and synGet()?  You now have two copies of the file, one in the cache and the original elsewhere.  The user would like to keep editing the original and, upon synStore(), update again.  How does the client know to associate the original?  One solution:  Client keeps a map of <Syn FileHandle> -> <local location> which gets saved upon exit and loaded upon start-up and which 'overlays' or 'amends' the default cache mapping.

TODO: read a remote file, edit it, lose session, then repeat 'loadEntity', keep editing local file, push/store entity what should happen?

TODO: how can we download a folder hierarchy and have that hierarchy reflected in the system?  One approach is to mimic the situation in which we upload a folder hierarchy, keeping pointers to files which are outside of the cache.  I.e. we download the files into a folder hierarchy outside of the normal cache and keep in-memory pointers to those file locations.  In this case Synapse would hold the "truth" about the hierarchy, so the user would not be allowed to move things around locally without messing up the cacheclients provide for client-side caching of Synapse files, to avoid unnecessarily retrieving (large or many) files that have already been downloaded.   Downloaded files may be cached in one of two ways:  By default, downloaded files are put in a Read Only cache, organized by (MD5) hash. The organization is:

<cache root> / <hash prefix> / <hash>

where <cache root> is a folder whose location is configurable during client installation and whose default is within the user's home directory; <hash prefix> is the first three hex digits of the hash, and <hash> is the name used for the file, the full MD5 checksum.  The reason for using the intermediate <hash prefix> directory is to avoid having too many files in a single folder. (TODO:  Alternative is to make <hash> a folder and place the file inside, with the original file name.  If so, the file name must be disregarded by the client.)

Alternately, the a file may be downloaded to a location specified by the user.  The location must be outside of the Read Only cache. We call this this an an "external cache location".   In this case the file may be modified, but not moved.  The client keeps a map from <Synapse-FileHandleID> to the external cache location.  We will refer to this as the "external cache map."  The external cache map "overrides" the default, Read Only cache.

When synStore() is called to create or update an entity having a file in an external cache location, the local copy of the file is not moved.   Rather an entry is made in the external cache map.

When synGet() is called, the client first retrieves the file handle ID and MD-5 from the entity, then checks whether the file already exists.  If the download destination is an external cache location, the client computes its MD-5 to determine whether it differs from the version in Synapse.  If so, the client must either throw an exception or prompt the user to either (1) confirm overwrite or (2) keep the original file.  (If the latter, a subsequent synStore() would overwrite the Synapse version with the local copy.)

Note:  This architecture supports a recursive synGet() function in which a Folder and all its children are retrieved.  The top level folder would be specified as an external cache location, and the children would be placed in folders/files following the Synapse hierarchy.  The user would not be permitted to modify the folder hierarchy, but files could be edited and then the tree persisted with a recursive synSet().

 

TODO:  should an 'unzip' convenience function be provided and, if so, where should the unzipped files go by default?  My thought is that we should move away from zipping/unzipping behind the scenes.