Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is the specification of the command set for Synapse clients.  The goal is to align the command sets for clients in different languages to ease users' transitions between languages.  Additionally we define the organization of the file cache so that various clients arrange local copies of file consistently.

Note:  This effort to create common command sets between R and Python has been subsumed by the creation of the "synapser" R package which wraps the Python client, automatically generating matching commands.  Nevertheless this document serves as a valuable reference for the core methods and especially the design of the client file cache mechanism.


Table of Contents

Synapse File Management

...

The analytical clients provide for client-side caching of Synapse files, to avoid unnecessarily retrieving (large or many) files that are already available locally.  When a file is uploaded or downloaded the client keeps track of the location along with information to determine if it is later changed.  Specifically, the client maintains a "Cache Map" whose keys are Synapse FileHandle IDs local file locations and whose values are lists of local file locations. Each file location has (1) a path on the local file system, and (2) a 'last modified' time.   The use of this UTC last-modified-on times.  The use of this map is as follows: 

Case

...

Action

synStore is called to upload a new file to Synapse.

...

A new entry is made in the Cache Map.

 

...

synStore is called

...

to upload a File object

...

which has already been uploaded to Synapse.

...

(i.e. same name, Synapse ID, file path, parent, etc.)

  1. The associated 'last modified' time in the File Cache is compared to the

...

  1. 'last modified' time

...

  1. for the file. 
  2. If the timestamps are the same no upload occurs. 
  3. Otherwise the file is uploaded (generating a new FileHandle ID)

...

  1.  
  2. A new Cache Map entry is

...

  1. created with the

...

  1. FileHandleID and timestamp. 

...

The old entry is left in place, since some other in-memory File object may reference the same local file.

...

 

...

synGet is called for a File object which has not been downloaded locally.

...

  1. The File metadata are retrieved, including the FileHandleID.

...

  1. Since there is no entry in the Cache Map, the file is downloaded and an entry made in the Cache Map.

 

...

synGet is called for a File object which has been downloaded

...

locally with a different target location.

...

  1. The File metadata are retrieved, including the FileHandle ID.  
  2. An entry is found in the Cache Map for the given FileHandle ID, but not for the given location. 
  3. If any currently downloaded file in the Cache Map for the FileHandle ID has an unchanged 'last modified' timestamp, it is copied to the new location.
  4. Otherwise,

...

  1. the file is downloaded from Synapse to the new location.

...

  1.   
  2. A new Cache Map entry is created for the new file.

...

  1.   

We do NOT make the new File object point to the cached file, since unexpected behavior would result when multiple File objects modify the same on-disk file.

...

 

...

synGet is called for a File object which has been downloaded

...

locally with the same target location.

...

  1. The File metadata are retrieved, including the FileHandle ID.

...

  1.    
  2. An entry is found in the Cache Map for the given FileHandle ID and location

...

  1. .  
  2. If the cached timestamp matches the

...

  1. 'last modified' timestamp for the file, no download occurs. 
  2. If the local file is *missing*, then the file is downloaded.

...

  1.   
  2. Otherwise, the action depends

...

  1. on the "ifcollision" mode specified for synGet:  

    ...

      1. "overwrite.local":  The file is downloaded to the target location and the Cache Map entry is updated with a new timestamp;

    ...

      1. "keep.local":  No download occurs.  The File references the locally modified file at the given location;

    ...

      1. "keep.both":  The file is downloaded to the target location, but given a modified local file name.  A second entry for the FileHandle ID is made in the Cache Map.

    ...

    Cache Location

    When a file is downloaded, specifying the file location is optional.   If it isn't specified  By default, the file is placed in a default ' cache folder '. along with a Cache Map file.  

    The organization of the file cache is:

    <cache root>/<file handle id>/<file name>

    where

    <cache root> CACHE_ROOT/[Intermediate Folder]/[File Handle ID]/[File Name]

    where:

    • CACHE_ROOT is user configurable and defaults to ~/.synapseCache

    <file handle id> is the file id part of the file handle

    ...

    • [Intermediate Folder] is the [File Handle ID] mod 1000.  This extra level is to reduce fan-out when the number of downloaded files is much greater than 1000.
    • [File Handle ID] is the S3 file ID used to upload/download the file
    • [File Name] is the file name given by the file handle.  If there is a collision and "ifcollision" is "keepBoth", then the name is

    ...

    • modified by appending a number (i.e. file.txt may become file(1).txt

    ...

     

    Cache Map Design

    Cache Entry

    There is a file for each Synapse FileHandle ID that has been downloaded or uploaded.  The file has the path:

    <cache root> / <file handle id> / .cacheMap

    ...

    • )

    For older types of Synapse Entities, namely Locationables (i.e. Data and Code objects), the cache folder is:

    CACHE_ROOT/[Synapse ID]/[Version Number]/[File Name]

    where:

    •  CACHE_ROOT and [File Name] are as defined above
    • [Synapse ID] is the "syn"-prefixed ID of the object
    • [Version Number] is the version downloaded

    Cache Map Design

    There is a file for each Synapse FileHandle ID that has been downloaded or uploaded.  The file is located in the cache folder (above) at the same level as [File Name].

    The file contains the location and last-modified time stamp of each downloaded or uploaded file.  The data is stored in a JSON map whose keys are file paths and whose values are time stampstimestamps, e.g.

    Code Block
    {
     "/path/to/file.txt": "2013-0403-02 1614T15:33:10" 
    }

     

    TODO: time stamp should be ISO8601 format.

    TODO:  specify file locking protocol for .cacheMap file access.

     

    File Usage Examples

    In each example, we have a project in which the File will reside:

    Code Block
    project<-(name="myproject")
    # 'synStore' will either create the project or retrieve it if it already exists
    project<-synStore(project)
    pid<-propertyValue(project, "id")

    Example 1: Create File entity wrapping local file, save to Synapse, retrieve, and save again

    Code Block
    file <- File(path="~/myproject/genotypedata.csv", name="genotypedata", parentId=pid)
    # 'synStore' will upload the file to Synapse
    # locally we record that the uploaded file is available at ~/myproject/genotypedata.csv
    file <- synStore(file)
    # we can get the ID of the file in Synapse
    fileId <- propertyValue(file, "id")
    # ----- Now assume a new session (perhaps a different user)
    # at first we have only the Synapse file ID
    fileId <- "synXXXXX"
    file <-synGet(fileId)
    # client recognizes that local copy exists, avoiding repeated download
    getFileLocation(file)
    > 09:26.000Z",
     "/alt/folder/file.txt":  "2013-04-06T15:36:41.000Z"}

    Note: the timestamp is in ISO8601 format ("%Y-%m-%dT%H:%M:%S.000Z"), in UTC (aka "Zulu") time zone.

    Note: the file separator is "/" regardless of the platform (that means 'native' path strings on Windows must be cross-platform normalized before being written into a cache-map file.)

    File locking

    Problem: Clients may concurrently access files on the same file cache; Only 1 client may access the cache map at a time to prevent multiple processes from overwriting or mangling data.

    A client must "lock" a .cacheMap file before accessing it.  Opening a file with write access does not lock it on all platforms, therefore we use the following convention, which all clients must follow: 

    • To lock a file <filename> a client must successfully create an (empty) folder in the file's parent folder and named <filename>.lock.  (Folder creation is atomic and exclusive across platforms.  This lock folder acts as a mutex object.)  
    • If the client is successful, it has exclusive access for ten (10) seconds from the time the lock folder is created.  
    • When the creation time stamp is older than ten seconds, any client may delete the lock folder and the client which has created the folder is obligated to refrain from accessing <filename> unless it locks the file again.  The time limit prevents stale locks from blocking other clients.  
    • If the client cannot obtain a lock within seventy (70) seconds, it should throw an appropriate exception.
    • The client creating the lock should delete the lock folder when it's done accessing <filename>.


    File Usage Examples

    In each example, we have a project in which the File will reside:

    Code Block
    project<-(name="myproject")
    # 'synStore' will either create the project or retrieve it if it already exists
    project<-synStore(project)
    pid<-propertyValue(project, "id")

    Example 1: Create File entity wrapping local file, save to Synapse, retrieve, and save again

    Code Block
    file <- File(path="~/myproject/genotypedata.csv", name="genotypedata", parentId=pid)
    # now'synStore' changewill something, e.g. add an annotation...
    synAnnot(file, "data type")<-"genotype"
    # ... and save.  the client determines that the file is unchanged so does not upload again
    file <-upload the file to Synapse
    # locally we record that the uploaded file is available at ~/myproject/genotypedata.csv
    file <- synStore(file)
    # we can also download to a specific location
    fileCopy<-synGet(fileId, downloadLocation="~/scratch/")
    getFileLocation(fileCopy)
    > "~/scratch/genotypedata.csv"
    # we now have two copies on the local file system

    Example 2: Link to File on web, then download

    Code Block
    # we use 'synapseStore=F' to indicate that we only wish to link
    file <- File(path="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1000/matrix/GSExxxx_RAW.tar", synapseStore=F, name="genotypedata", parentId=pid)
    # Synapse stores the metadata, but does not upload the file
    file <- synStore(file)
    # we can get the ID of the file in Synapse
    fileId <- propertyValue(file, "id")
    # synGet downloads the file to a default location
    file <-synGet(fileId)
    getFileLocation(file get the ID of the file in Synapse
    fileId <- propertyValue(file, "id")
    # ----- Now assume a new session (perhaps a different user)
    # at first we have only the Synapse file ID
    fileId <- "synXXXXX"
    file <-synGet(fileId)
    # client recognizes that local copy exists, avoiding repeated download
    getFileLocation(file)
    > "~/myproject/genotypedata.csv"
    # now change something, e.g. add an annotation...
    synAnnot(file, "data type")<-"genotype"
    # ... and save.  the client determines that the file is unchanged so does not upload again
    file <-synStore(file)
    # we can also download to a specific location
    fileCopy<-synGet(fileId, downloadLocation="~/scratch/")
    getFileLocation(fileCopy)
    > "~/.synapseCache/GSExxxx_RAW.tarscratch/genotypedata.csv"
    # we now changehave thetwo metacopies dataon andthe savelocal synAnnot(file, "data type")<-"gene expression"
    # synStore does not upload the file
    file<-synStore(file)

    Example 3: Lose session after editing file

    Code Block
    codeFileId <- "synXXXXX"
    codeFile <-synGet(codeFileId, load=F)
    getFileLocation(codeFile)
    > "~/.synapseCache/rScript.R"
    # the file is edited
    # the session is lost
    # a new session begins
    codeFileId <- "synXXXXX"
    codeFile <-synGet(codeFileId, load=F, if.collision="keep.local")
    # The File object now refers to the edited file
    # synStore detects that the file is changed and uploads it
    file <-synStore(file)

    Example 4: link to file on NFS

    Code Block
    # we use 'synapseStore=F' to indicate that we only wish to link
    file <- File(path="file:///corporatenfs/sharedproject/genotypedata.csv", synapseStore=F, name="genotypedata", parentId=pid)
    # Synapse stores the metadata, but does not upload the file
    file <- synStore(file)
    
    
    # Now assume a new session, perhaps by a different user
    # synGet downloads the file to a default location
    fileId<-"synXXXXX"
    # we use 'downloadFile=F' to indicate that we do not need a new copy on our local disk
    file <-synGet(fileId, downloadFile=F)
    getFileLocation(file)
    > "/corporatenfs/sharedproject/genotypedata.csv"
    # now change the meta data and save
    synAnnot(file, "data type")<-"SNP"
    # since the File was created with "synapseStore=F", synStore does not upload the file
    file<-synStore(file)
    
    

     

     

     

     

     

    Command Set

    We conceptual divide the client commands into three levels (1) Common functions, (2) Advanced functions and (3) low-level Web API functions.  The first collection of commands captures the majority of functionality of interest to users. The second collection rounds out the functionality with less frequently used functions.  The third set comprises simple, low level wrappers around the Synapse web service interface.  By including this third set users can access web services in advance of having specialized commands in the analytic clients.

     

    ...

    The specified file doesn't move or get copied.

    ...

    File(path, synapseStore=T, parentId, ...)

     

    example:

    File(path="/path/to/file", parentId="syn101")

    ...

    The object is not serialized at this time. 

    (We are hoping people will like calling the object a File, even though it takes an in-memory object as a parameter.)

    ...

    File(obj, synapseStore=T, parentId, ...)

     

    example:

    File(obj=dataObject, parentId="syn101")

    ...

    Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.

     

    ...

    Files aren't moved or copied.

    TODO:  How do you specify file annotations (as distinct from Strings)?  Shall we introduce in-memory wrappers around files and urls to help distinguish them?

    ...

    Record(name=NULL, parentId="syn101", ...)

    example:
    Record(name="foo", parentId="syn101")

    ...

    Folder(name=NULL, parentId=NULL, ...)

    Project(name=NULL, ...)

    example:
    Folder(name="foo", parentId="syn101")

    ...

    Folder(name="foo", parentId="syn101", **kwargs)

    Project(name="foo", **kwargs)

    ...

    synAnnot(entity, name); returns NULL if undefined

    ...

    synStore(entity, used, executed, activityName=NULL, activityDescription=NULL, createOrUpdate=T, forceVersion=T, isRestricted=F)

    ...

    synapse create --name NAME --parentid PARENTID --description DESCRIPTION

    --type TYPE

    --file PATH

    --update=T/F

    --forceVersion=T/F

     

    --annotations={foo=bar, bar=foo}

    ...

    synGet(id, version, downloadFile=T, downloadLocation=NULL, ifcollision="keep.both", load=F)

    ...

    synapse.getWiki(id, version)

    synapse.getWiki(entity)

    ...

     system

    Example 2: Link to File on web, then download

    Code Block
    # we use 'synapseStore=F' to indicate that we only wish to link
    file <- File(path="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1000/matrix/GSExxxx_RAW.tar", synapseStore=F, name="genotypedata", parentId=pid)
    # Synapse stores the metadata, but does not upload the file
    file <- synStore(file)
    # we can get the ID of the file in Synapse
    fileId <- propertyValue(file, "id")
    # synGet downloads the file to a default location
    file <-synGet(fileId)
    getFileLocation(file)
    > "~/.synapseCache/GSExxxx_RAW.tar"
    # now change the meta data and save
    synAnnot(file, "data type")<-"gene expression"
    # synStore does not upload the file
    file<-synStore(file)

    Example 3: Lose session after editing file

    Code Block
    codeFileId <- "synXXXXX"
    codeFile <-synGet(codeFileId, load=F)
    getFileLocation(codeFile)
    > "~/.synapseCache/rScript.R"
    # the file is edited
    # the session is lost
    # a new session begins
    codeFileId <- "synXXXXX"
    codeFile <-synGet(codeFileId, load=F, if.collision="keep.local")
    # The File object now refers to the edited file
    # synStore detects that the file is changed and uploads it
    file <-synStore(file)

    Example 4: link to file on NFS

    Code Block
    # we use 'synapseStore=F' to indicate that we only wish to link
    file <- File(path="file:///corporatenfs/sharedproject/genotypedata.csv", synapseStore=F, name="genotypedata", `Id=pid)
    # Synapse stores the metadata, but does not upload the file
    file <- synStore(file)
    
    
    # Now assume a new session, perhaps by a different user
    # synGet downloads the file to a default location
    fileId<-"synXXXXX"
    # we use 'downloadFile=F' to indicate that we do not need a new copy on our local disk
    file <-synGet(fileId, downloadFile=F)
    getFileLocation(file)
    > "/corporatenfs/sharedproject/genotypedata.csv"
    # now change the meta data and save
    synAnnot(file, "data type")<-"SNP"
    # since the File was created with "synapseStore=F", synStore does not upload the file
    file<-synStore(file)
    
    


    Examples Provenance

    Provenance can be handled either implicitly by creating it when something is saved/created in Synapse or explicitly by modifying the provenance record of something already in Synapse.  For most use cases implicitly creating the provenance record should be enough.

    Example 1: Creating a provenance record referencing things in Synapse (assuming already logged in and syn is a synapse object in Python)

    Provenance records in Synapse can reference files stored in Synapse by either specifying the synapseIds or the entities stored in synapse
    #Lets create a file that was created using syn445865 and syn1446185 as input and store it in project syn123
    
    Code Block
    languagepython
    titlePython code
    myFile = syn.store(File("/path/to/file", parentId="syn123"), used=['syn445865', 'syn1446185'])


    Code Block
    titleR code
    myFile <- synStore(File("/path/to/file", parentId="syn123"), used=list('syn445865', 'syn1446185'))


    #Lets store some code that was executed to generate another file and reference this code as being executed

    Code Block
    languagepython
    titlePython code
    myCode = syn.store(File("/path/to/script.py", parentId="syn123"))
    myFile = syn.store(File("/path/to/file", parentId="syn123"), used=['syn445865', 'syn1446185'], executed=myCode)


    Code Block
    titleR code
    myCode <- synStore(File("/path/to/script.R", parentId="syn123"))
    myFile <- synStore(File("/path/to/file", parentId="syn123"), used=list('syn445865', 'syn1446185'), executed=myCode)


    #To specify the name of the activity that was performed you can specify the name and description of the activity

    Code Block
    languagepython
    titlePython code
    myFile = syn.store(File("/path/to/file", parentId="syn123"), used=['syn445865', 'syn1446185'], 
                       activityName="Manual editing of file", 
                       activityDescription="Corrected spelling of variable names")


    Code Block
    titleR code
    myFile <- synStore(File("/path/to/file", parentId="syn123"), used=list('syn445865', 'syn1446185'), 
                       activityName="Manual editing of file", 
                       activityDescription="Corrected spelling of variable names")


    Example 2: Creating a provenance record referencing things files stored on the web

    The provenance record can also store references to datasets and information stored in external URLs not in Synapse.  In this case the 

    Code Block
    languagepython
    titlePython code
    myFile = syn.store(File("/path/to/file", parentId="syn123"), used=['syn445865', 'http://www.google.com'], activityName="Updated dbgap ids")


    Code Block
    titleR code
    myFile <- synStore(File("/path/to/file", parentId="syn123"), 
                       used=list('syn445865', 'http://www.google.com'), 
                       activityName="Updated dbgap ids")

    Example Table Interaction

    User creates a new Table:

    Ideally there would be some way of passing a csv/tsv file to create a table:

    Code Block
    languagepy
    table = syn.createTableFromCSV(Table('path/to/file', parent='syn1231')) 

    but as a first pass it may make more sense to create an interface that mimics the REST API that we can later build on to create a convenience function for the this expanded functionality.

    The construction of a Table consists of creating column models by performing a POST on /column which returns a ColumnModel and then creating the entity by posting a TableEntity to /entity/.  The Table entity should contain a list of column ids obtainable from the list of columnModels.


    Code Block
    languagepy
    #Create a local representation of a table
    table synapseclient.Table(name=foobar, parent=syn123, columns=[{columnType:int, name:'age'}, {columnType:string, name:'gender', enumValues=['m', 'f']}])
    table = syn.store(table)

    We might want to eventually make convenience functions for storing columnModels locally or an easier way of representing them.

    User Requests data from a table using a query

    Code Block
    languagepy
    #A query always returns a Table object that the user can extract the data from
    table = syn.queryTable(table='syn12312', 'select *')
    df = table.as_df() #Returns a Pandas object of the data frame
    array = table.as_matrix() #Returns a 2D numpy array
    lists = table.as_dict() # Returns a dict of lists (this would be a generic python solution that is not dependent on external libraries)
     
    #User can now make modification or additions to the table but storing the changes/additions will require adding a df/array/lists back to the table

    User Adds Data to existing Table

    Code Block
    languagepy
    #Assume we have a local df extracted from a table above
    df.append(row of values)
    df.ix[1,1] = 'bar'
     
    table.from_df(df)
    table = syn.store(table)

    Other Table Ideas

    A key feature of table is for a user to be able to understand what a table contains.  I would suggest a function:

    syn.describeTable(id|Table)  that outputs the information about the column models. Also perhaps passing a details=true would return information about the size of the table by running a query "select count" and information about the values in columns by doing "select unique(c)" for each  each string column c and perhaps "select min(c), max(c)" for numeric columns.



    Command Set

    We conceptual divide the client commands into three levels (1) Common functions, (2) Advanced functions and (3) low-level Web API functions.  The first collection of commands captures the majority of functionality of interest to users. The second collection rounds out the functionality with less frequently used functions.  The third set comprises simple, low level wrappers around the Synapse web service interface.  By including this third set users can access web services in advance of having specialized commands in the analytic clients.


    Command

    comments

    R Syntax

    Python Syntax

    Command Line Syntax

    1 – Common functions





    Create a Synapse file handle in memory, specifying the path to the file in the local file system, the name in Synapse, and the Folder in Synapse.  This step 'stages' a file to be sent to Synapse. Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below.  If 'synapseStore' is TRUE then file is uploaded to S3, else only the file location is saved.

    Note:  synapseStore=T is not allowed if "path" is a URL rather than a local file path.

    The specified file doesn't move or get copied.

    File(path, parentId, synapseStore=T, ...)


    example:

    File(path="/path/to/file", parentId="syn101")

    File(path, parentId, synapseStore=True, **kwargs)


    example:

    File('/foo/baz/bar.txt', 'syn123')

    NA

    Create a Synapse file handle in memory which will hold a serialized version of one or more in-memory objects.  Additional parameters (...) are interpreted as properties or annotations, in the manner of synSet(), defined below. If 'synapseStore' is TRUE then file is uploaded to S3, else only the file location is saved.

    The object is not serialized at this time. 

    (We are hoping people will like calling the object a File, even though it's a collecton of in-memory objects.)

    File(parentId)


    example:

    file<-File(parentId="syn101")

    file<-addObject(file, obj)


    Will not be implemented in python.

    NA

    Create a Folder or Project in memory. Name and parentId are optional.


    Folder(name=NULL, parentId=NULL, ...)

    Project(name=NULL, ...)

    example:
    Folder(name="foo", parentId="syn101")

    Folder(name="foo", parentId="syn101", **kwargs)

    Project(name="foo", **kwargs)


    Set an entity's attribute (property or annotation) in memory.  Client first checks properties, then goes to annotations; (setting to NULL deletes it in R, using DEL operator in python deletes it)

    TODO:  we want to include files and (for R) in memory objects

    synAnnot(entity, name)<-value

    entity.parentId="syn101"

    synapse update id --parentId syn101

    Gets an entity's attribute value (property or annotation) from the object already in memory.


    synAnnot(entity, name); returns NULL if undefined

    entity.name; throws exception if value is undefined


    Create or update an entity (File, Folder, etc.) in Synapse.  May also specify (1) the list of entities 'used' to generate this one, (2) the list of entities 'executed' to generate this one, (3) the name of the generation activity, and (4) the description of the generation activity, (5) whether a name collision in an attempted 'create' should become an 'update', (6) whether to 'force' a new version to be created, and (7) whether the data is restricted (which will put a download 'lock' on the data and contact the Synapse Access and Compliance team for review.

    TODO:  Give some examples.

    synStore(entity, used=NULL, executed=NULL, activityName=NULL, activityDescription=NULL, createOrUpdate=T, forceVersion=T, isRestricted=F)



    synStore(entity, activity=NULL, createOrUpdate=T, forceVersion=T, isRestricted=F)

    synapse.store(entity, used, executed, activityName=None, activityDescription=None, createOrUpdate=True, forceVersion=True, isRestricted=False)


    synapse.store(entity, activity, createOrUpdate=True, forceVersion=True, isRestricted=False)

    synapse create --name NAME --parentid PARENTID --description DESCRIPTION

    --type TYPE

    --file PATH

    --update=T/F

    --forceVersion=T/F


    --annotations={foo=bar, bar=foo}

    Delete an object from Synapse.  In the case of entities, move to the trash can.


    synDelete(id)

    synDelete(object)

    synapse.delete(synid), synapse.delete(entity), synapse.delete(wiki), synapse.delete(evaluation), synapse.delete(activity) ##TODO, synapse.delete(submission)


    Get an entity (file, folder, etc.) from the Synapse server, with its attributes (properties, annotations) and, optionally, with its associated file(s).  ifcollision is one of "keep.both", "keep.local", or "overwrite.local", telling the system what to do if a different file is found at the given local file location.

    'download' and 'load' are ignored for objects other than Files.  If a downloadLocation is not provided a default location is used.  Collisions with existing files are handled according to the 'ifcollision' parameter.  Note, 'downloadLocation' must be a directory.

    synGet(id, version, downloadFile=T, downloadLocation=NULL, ifcollision="keep.both", load=F)

    synapse.get(id, version, downloadFile=True, downloadLocation=None, ifcollision="keep.both")

    synapse get ID -v NUMBER

    Get the downloaded location of the file associated with a File object.

    If synGet was called with download=FALSE, getFilePath() NULL.

    getFileLocation(file)

    getFileURL(file)

    file.path

    file.url

    TODO

    Open the web browser to the page for this entity.


    onWeb(entityId) / onWeb(entity)

    synapse.onweb(entityId) / synapse.onweb(entity)

    synapse onweb id

    log-in

    If fields are omitted, then values are retrieved from the configuration file or from cached API keys.

    synapseLogin(username = "", password = "", sessionToken = "", apiKey = "", rememberMe = False)

    synapseLogin()

    synapse.login(email=None, password=None, sessionToken=None, apiKey=None, rememberMe=False, silent=False)

    synapse.login()

    synapse login -u USER -p PASSWORD

    log-out

    localOnly=T delete any local copies of sessionToken or apiKey

    localOnly=F: -> (1) if client has sessionToken, then call "DELETE /session"; (2) do the localOnly part

    synapseLogout(localOnly=F)

    synapse.logout(local=False, clearCache=False)

    synapse logout

    invalidate API key

    invalidate API key

    invalidateAPIKey()

    invalidateAPIKey()


    2 –Advanced functions





    Execute query

    TODO:  pagination, e.g. the function returns an iterator. Look at current implementation in R client.

    synQuery(queryString)

    synapse.query(queryString)

    synapse query

    Find the Entities having attached file(s) which have the given md5.

    Returns an EntityHeader list.

    synMD5Query(md5)

    synapse.md5Query(md5)

    NA

    Retrieve the wiki for an object (Entity or Evaluation)


    synGetWiki(owner)

    synGetWiki(owner, id)

    synapse.getWiki(owner, subpageId)

    Examples:

    synapse.getWiki(entity)

    synapse.getWiki(evalution, 2342)


    Retrieve wiki headers of evaluation or entity


    synGetWikiHeaders(owner)

    Synapse.getWikiHeaders(owner)

    where owner is an evaluation or entity


    Wiki construction


    WikiPage(owner, title, markdown, attachments)

    WikiPage(owner, title, markdown, attachments, parentWikiId)

    'attachments' is a list of local file paths

    Wiki(owner, title, markdown, attachmentFileHandleIds, parentWikiId=None)




    synStore(wiki)

    synapse.store(Wiki)




    synGetAnnotations()

    synapse.getAnnotations(entity/entityId)




    synSetAnnotations()

    synapse.setAnntotations(entity/entityId, annotations)




    synGetProperties()

    NA

    NA

    Access properties, throwing exception if property is not defined.


    synSetProperties()

    NA

    NA



    synGetAnnotation()





    synSetAnnotation()



    Access property, throwing exception if property is not defined.


    synGetProperty()

    NA

    NA

    Access property, throwing exception if property is not defined. Setting to NULL deletes.


    synSetProperty()

    NA

    NA

    Create an Activity (provenance object) in memory.


    Activity(name, description, used, executed)

    Activity(name, description, used, exectuted)

    NA

    Set the list of entities/urls 'used' (not 'executed') by an Activity.


    used(activity)<-refererenceList

    activity$used<-refererenceList



    Set the list of entities/urls 'executed' (not 'used') by an Activity.


    executed(activity)<-refererenceList

    activity$executed<-referenceList



    Get the list of entities/urls 'used' (not 'executed') by an Activity.


    used(activity)

    activity$used



    Get the list of entities/urls 'executed' (not 'used') by an Activity.


    executed(activity)

    activity$executed



    Create or update the Activity in Synapse


    synStore(activity)

    synapse.store(Activity)

    NA

    Get the Activity which generated the given entity.


    synGetActivity(entity) / synGetActivity(entityId)

    synapse.getActivity(entity/entityId)

    NA

    Empty trash can





    Restore from trash can





    Create evaluation object


    Evaluation(name, description, status)

    Evaluation(name, description, status, contentSource)

    NA

    Retrieve an Evaluation.


    synGetEvaluation(evaluationId)



    Submit for evaluation

    teamName is used only when the Evaluation queue is part of a Challenge.  teamName is the (unique) name of a Synapse Team, registered for the Challenge.

    submit(evaluation, entity, submissionName, teamName)

    synapse.submit(evaluation, entity, name=None, teamName=None)

    synapse submitEvaluation

    Returns an iterator of submissions


    synGetSubmissions(evaluationId, myown=F, status, limit, offset)

    Synapse.getSubmissions(evaluation, status=None):


    Get specific submission


    synGetSubmission(id, downloadFile=T, downloadLocation=NULL, ifcollision="keep.both", load=F)

    Synapse.getSubmission(id, downloadFile=True, downloadLocation=None, ifcollision="keep.both"):


    Get status of of submission


    synGetSubmissionStatus(id)

    synGetSubmissionStatus(submission)

    Synapse.getSubmissionStatus(submission):


    Get a user profile (own or other's)

    When retrieving own profile, all fields are returned.  When retrieving other's profile, only public fields are returned.

    synGetUserProfile()

    synGetUserProfile(principalId)



    3  Table functions



    3.1  Basic Table functions



    Create a Column definiition in memory.name and type are required.  default, values and maxSize are optional.TableColumn(name, type, default, values, maxSize)new table.Column(name, type, default, values, maxSize)
    Store a Column definition in Synapse.
    tableColumn<-synStore(tableColumn)synapse.store(tableColumn)
    Create a schema in memory

    'column's are TableColumn objects or IDs.

    TODO:  Can 'parentId' be any entity or just a project?  If the latter, change from 'parentId', to 'projectId'.

    schema<-TableSchema(name, parentId, columns,  ...)


    schema=new table.Schema(name, parentId, columns, **kwargs)
    TODO:  Add commands to add/remove columns from a table.



    Store the schema to Synapse.

    schema<-synStore(schema)

    schema = synapse.store(schema)


    Retrieve the schema.
    schema<-synGet(id)schema=synapse.get(id)
    Delete the table.

    synDelete(schema)

    synDelete(id)

    synapse.delete(schema)

    synapse.delete(id)


    Create a set of rows in memory.

    values may be a data frame or a path to a CSV file. schema is a TableSchema or its ID.

    Returns a Table object, suitable for upload to Synapse.

    t<-Table(tableSchema, values)


    t = table.values(schema, values)


    Store table content in Synapse.

    Stores Table in Synapse.

    If 'retrieveData' is true then return a TableDataFrame, else return the number of uploaded rows.

    If verbose=TRUE (the default), shows a progress bar.

    rs<-synStore(t, retrieveData=FALSE, verbose=TRUE)rs = synapse.store(t, retrieveData=False, verbose=True)
    Delete rows from a table.table is a TableDataFrame.  The rows indicated by the row labels are deleted.synDeleteRows(table)synapse.deleteRows(table)

    Query a Table. 

    Result is saved in a local file.  If loadResult is true, it is loaded into memory as well.

    If "verbose" is TRUE then show progress.

    downloadLocation is and optional specification of where the downloaded file should go.  If omitted the file will be written into the file cache.

    User may specify pagination.


    Note:  data frames returned have row labels of the form "<row>-<version>" e.g. "101-2".  These rows labels are required when submitting updates to Synapse.

    queryResult<-synTableQuery(sqlString, loadResult=TRUE, verbose=TRUE, downloadLocation)rs = synapse.tableQuery(sqlString, loadResult=TRUE, verbose=TRUE, downloadLocation)
    3.2 Table convenience functions:



    Create table columns in memory from data frame.
    columns<-as.TableColumns(dataframe)columns = table.asColumns(pandasFrame)


    columns<-as.TableColumns(filePath)

    columns = table.asColumns(filePath)


    Download the file for a given cell (row, column) in a table.

    For R client, 'table' can be an ID or a Table object. 

    The parameters 'downloadLocation', 'ifcollision' and 'load' are defined as in 'synGet' and 'synGetSubmission'.

    filePath<-synDownloadTableFile(table, rowIdAndVersion, columnName, downloadLocation=NULL, ifcollision="keep.both")

    example:

    filePath<-synDownloadTableFile("syn12345", "42_2", "genotype")

    filePath=table.downloadFile(rowIdAndVersion, columnName, downloadLocation=None, ifcollision="keep.both")

    4 – Web API Level functions





    Execute GET request

    See details below.

    synRestGET(uri, endpoint)

    synapse.restGET(uri, endpoint=None)*


    Execute POST request

    See details below.

    synRestPOST(uri, body, endpoint)

    synapse.restPOST(uri, body, endpoint=None)*


    Execute PUT request

    See details below.

    synRestPUT(uri, body, endpoint)

    synapse.restPUT(uri, body, endpoint=None)*


    Execute DELETE request

    See details below.

    synRestDELETE(uri, endpoint)

    synapse.restDELETE(uri, endpoint=None)*


    Get the current set of web service endpoints.


    synGetEndpoints()



    Set the web service endpoints.

    If no arguments are passed, then reset to the default endpoints.

    synSetEndpoints(repo, auth, file, portal)

    synSetEndpoints()








    *The endpoint defaults to repoEndpoint and it would be useful to be able to pass arbitrary named arguments that are just passed on to the underlying http library.  For python for example the stream and file parameters could be useful to pass along the the filehandle requests for get and put. 

    Endpoints

    At the time of this writing, there are three endpoints for web service calls in our production system:

    ...

    These are used to call the web APIs linked below.

    Web APIs

    The URIs, request bodies and request methods are defined by the Synapse Web APIs.  The URIs omit the endpoints given above, e.g. to retrieve entity metadata the endpoint would be "https://repo-prod.prod.sagebase.org/repo/v1" while the URI might be "/entity/syn123456".  The web APIs define request and response bodies in terms of JSON objects.  In the analytic clients these are expressed as named lists or nested named list, e.g. in R the JSON object {"foo":"bar", "bas":"bah"} is passed in as list(foo="bar", bas="bah").

    The Web APIs are defined here:

    Synapse REST APIs

     

    Common Configuration File

    This is a properties file in a standard place that is interpreted upon client initialization.  The location should be private for a user.

    The format will that of an .ini file (http://en.wikipedia.org/wiki/INI_file).  Although the format is somewhat 'dated', there is a Python parser available:

    http://docs.python.org/2/library/configparser.html

    and an R parsing algorithm has been suggested:

    ...

    Things to specify in the common config file:

    " while the URI might be "/entity/syn123456".  The web APIs define request and response bodies in terms of JSON objects.  In the analytic clients these are expressed as named lists or nested named list, e.g. in R the JSON object {"foo":"bar", "bas":"bah"} is passed in as list(foo="bar", bas="bah").

    The Web APIs are defined here:

    Synapse REST APIs


    Common Configuration File

    Upon client initialization, the client searches for a configuration file in a standard place.  Specifically, it looks for an INI-formated '~/.synapseConfig' file.  Parsering algorithms are available for both R and Python.  

    The following can be specified in the configuration file:

    • Username, password, session token, or API key

    • File cache location (should be private to the user)

    • Endpoints for each of the Synapse services


    Code Block
    languagebash
    firstline1
    titleExample
    [authentication] username = example@user.com password = samplePassword sessionToken = 1234567890asdfghjkl apikey = Some+API+key+retrieved+from+either+the+web+portal+or+via+a+REST+GET+call+to+/secretKey==   [cache] location = ~/.synapseCache   [endpoints] repoEndpoint = https://repo-prod.prod.sagebase.org/repo/v1 authEndpoint = https://auth-prod.prod.sagebase.org/auth/v1 fileHandleEndpoint = https://file-prod.prod.sagebase.org/file/v1 portalEndpoint = https://synapse.org/  


    Appendix:  Current implementation of the file cache in the R Client:

    • files are cached (meatadata used to be cached in entity.json)

    • cache is mix of read/write

    • each entity version has a location within the cache is based on its URI (e.g. .synapseCache/proddata.sagebase.org/<entityId>/<locationId>/version/<version>)

      • files.json specifies what resides within the archive
      • <fileName> file which R Client currently assumes to be a zip (this is immutable by convention until storeEntity is called)  (TODO:  What happens when it is not a zip archive)
      • <fileName>_unpacked directory within which all unzipped content lives
        • this subdirectory is writable (by convention)
        • re-stores file if not an archive (both as <fileName> and <fileName>_unpacked/<fileName>)

     

     

     

     

     

     

     

     

    ...