...
- Is dataset "download availability" = (release date != null)?
I think so... not exactly sure of the question here.
- Re "Release Notes: 3", what does it mean to "release", can a DS be released more than
once? Does it mean another sample has been added to the data set, that the DS has
been updated, or something else?
Probably most commonly means a new layer added (especially a QC layer), new samples also possible. Any change at all to the dataset needs to be versioned since a goal is to be able to reproduce analysis, we have to know exactly what data was available when a particular analysis was run, even if the data set changes later.
Practically, I don't see large number of versions to a dataset occurring. This will be an infrequent event and many datasets may only have a v1 release, but I think we still need to code for it.
- What is a "contributor" -- Someone who uploads data? someone who analyses the data?
Is it something more restrictive, like the PI of the experiment who generated the data?
What should be the granularity of 'contribution' -- by the sample, the layer, the
data set, some sort of ds revision, ...?
The contributor is the person who provided us the data, most likely the PI of the lab that generated it. It's the person we thank profusely for his contribution, and encourage users of the data to cite when they publish work that uses the data. The data is actually sent to us by some grad student or post doc in the contributor's lab.
I've been seeing one contributor for the dataset in the examples I've seen, though I guess multiple contributors could be possible.
- In the DatasetMyers screen, what does "Modifed" mean? Is it the latest date
something changed in the project? If so, what are the things whose modifications
need to be tracked? Do they also need to be versioned?
Something changed in the curated data set layer. We're not in a project context here.
Open Questions:
- are data-sets associated with projects? if so, is the association optional?
Can a ds be associated with multiple projects?
(Note the "DatasetMyers" screen says "Projects Using this Dataset" suggesting
there's a "uses" relationship bet. projects and ds's.)- what does "suggest a dataset" mean?- do species and tissuetype go in the dataset or in the layer?
same Q for StudySize
that is, is it correct to say that a dataset specifies the
subjects, tissue types, etc., and the layers represent
the assays (GE, GT, sequencing)? Or can different
layers have different subjects, tissue types, phenotypes
and numbers of samples?
- may be multiple diseases, species, tissue types, platforms
- what does it mean for a dataset to be "posted"?
Aren't different layers posted at different times?
- Is the dataset "description" (seen in the Datasets screen) the same as the
dataset "Overview" (see in the DatasetMyers screen)?
- What does "Data type: Clinical phenotypes" mean as associated with a data *set*?
- should 'posted' and 'curated' time stamps be associated w/ Datasets or
with dataset layers?
- Is dataset "download availability" = (release date != null)?
- Re "Release Notes: 3", what does it mean to "release", can a DS be released more than
once? Does it mean another sample has been added to the data set, that the DS has
been updated, or something else?
- What is a "contributor" -- Someone who uploads data? someone who analyses the data?
Is it something more restrictive, like the PI of the experiment who generated the data?
What should be the granularity of 'contribution' -- by the sample, the layer, the
data set, some sort of ds revision, ...?
- In the DatasetMyers screen, what does "Modifed" mean? Is it the latest date
something changed in the project? If so, what are the things whose modifications
need to be tracked? Do they also need to be versioned?- what does " suggest a project" mean?
- Is it OK that a project doesn't have the following, but rather inherits it from its data sets?
diseases / study areas
# of followers
last activity
...
- Networks page: The "Project" of a network could be the Project owning the workstream whose step
created the Project or, if the network was not created as a workstream step, then a comma
delimited list of all Projects using the dataset which is the parent of the network.
- will Should the Platform have "Scripts" as a generalizatin generalization of "algorithm" or are all scripts that can process
data considered algorithms?
- What objects (if any, so far) need versioning ?
- What are the types of access, e.g. is there 'read-only' access to a resource for some groups and read-write for others?
is there a separate permission to 'publish'?
- What are the possible statuses for a dataset?