...
- Is dataset "download availability" = (release date != null)?
I think so... not exactly sure of the question here.
- Re "Release Notes: 3", what does it mean to "release", can a DS be released more than
once? Does it mean another sample has been added to the data set, that the DS has
been updated, or something else?
Probably most commonly means a new layer added (especially a QC layer), new samples also possible. Any change at all to the dataset needs to be versioned since a goal is to be able to reproduce analysis, we have to know exactly what data was available when a particular analysis was run, even if the data set changes later.
Practically, I don't see large number of versions to a dataset occurring. This will be an infrequent event and many datasets may only have a v1 release, but I think we still need to code for it.
- What is a "contributor" -- Someone who uploads data? someone who analyses the data?
Is it something more restrictive, like the PI of the experiment who generated the data?
What should be the granularity of 'contribution' -- by the sample, the layer, the
data set, some sort of ds revision, ...?
The contributor is the person who provided us the data, most likely the PI of the lab that generated it. It's the person we thank profusely for his contribution, and encourage users of the data to cite when they publish work that uses the data. The data is actually sent to us by some grad student or post doc in the contributor's lab.
I've been seeing one contributor for the dataset in the examples I've seen, though I guess multiple contributors could be possible.
- In the DatasetMyers screen, what does "Modifed" mean? Is it the latest date
something changed in the project? If so, what are the things whose modifications
need to be tracked? Do they also need to be versioned?
Something changed in the curated data set layer. We're not in a project context here.
Open Questions:
- are the attributes for a dataset well defined (species, tissue type, ...) or should the list be open ended?
- what does "suggest a project" mean? ("Send Stephen an email"?)
...
- are there any objects are to be 'versioned' besides data sets, layers, and scripts/algorithms?
- If the start data for a workstream changes late in a workstream's progress, is the change reflected in an update to the current workstream (like changes propagating through a xls sheet) or rather in a new workstream (i.e. the workstream is tied to a particular revision of each input data layer)?
- Are workstreams 'versioned'? If so, what constitutes a revison, a change in algorithm, and change in data, both, ...?
- can you Follow something that's not versioned? What events are there besides revisions?
- can document versioning be delegated to the document collaboration system (Google Apps)?
- how is 'last modified' defined for a workstream?
- In the workstream example, is "Created" really defined for an analysis (e.g. "Correlation Network Analysis") or just for the analysis output ("Network")?
- Is every data set and layer in a project in some workstream, or can a project have ds,dl's that aren't in any workstream?
(My guess is the latter since a project can have new, unused data, and perhaps 'scratch' analyses.)
...
- What are the valid "statuses" for an algorithm, besides "Unpublished"?
- What can be published besides data sets, data layers, and scripts/algorithms?
- what does it mean to "Remove" Network1 or GeneList1?
...