Dear Colleagues,
Please see my specific responses in line below. In addition, I recommend
that the GLIS work with the new Planteome <http://planteome.org/> project,
which will be developing a number of ontologies relevant to PGRFAs.
Ramona
------------------------------------------------------
Ramona L. Walls, Ph.D.
Scientific Analyst, The iPlant Collaborative, University of Arizona
Research Associate, Bio5 Institute, University of Arizona
Laboratory Research Associate, New York Botanical Garden
On Tue, Mar 10, 2015 at 12:43 PM, Marsella Marco <[log in to unmask]>
wrote:
> Dear colleagues,
>
> I would like to add a few comments, hoping to further clarify the points
> in Francisco’s email that we think should be addressed when discussing the
> metadata structure to associate to the DOIs assigned by GLIS.
>
> a) accurate identification of the PGRFA being assigned a DOI is the first
> obvious need we have. A good metadata description will allow GLIS to
> perform automatic checks and flag, as potential duplicates, PGRFA that have
> the same values in all or some of the metadata fields. Which ones this
> fields are will be one of the discussion topics
>
A controlled vocabulary is essential here, not only for disambiguating
genetic resources, but for clarifying what types of entities should receive
a DOI as a PGRFA.
>
> b) a good metadata description will allow users to query GLIS using any of
> the metadata fields (e.g. the Accession Number or the genus/species) and
> receive a lista of one or more PGRFA together with their DOIs
>
> c) as already mentioned during W1 and W2, PGRFA are not standing still,
> they change and interact over time. We need to find a way to model such
> changes and interactions
>
A sound semantic model (ontology) will be crucial for this. The Plant
Ontology and Gene Ontology can cover development stages and biological
processes. The Population and Community Ontology is under development and
could also be useful. However, exisiting ontologies in their present form
are unlikely to include all terms needed for this effort, and the PGRFA
community will need to contribute to ontology development.
> d) allowing users to associate new targets (e.g. URLs for websites or DOIs
> for publications) to existing PGRFA is considered an important feature of
> GLIS. This way, the landing page of any PGRFA registered in GLIS will list
> all destinations where additional information on the PGRFA can be located.
> Such destinations can be further specified through some simple metadata
> attributes to indicate, for instance, that they provide details on C&E
> data, offer DNA sequencing information, point to publications and so on.
> This applies to responses to client applications as well
>
Metadata attributes are essentially properties that link data, but
properties on their own carry little semantic. I support the use of
metadata properties (e.g., Darwin Core), but without an ontology to clarify
their semantics, data interoperability will be limited.
>
> e) one of the design principles of GLIS is the preservation of existing
> identifiers. We do not want organisations to abandon their own identifiers,
> them being Accession Numbers, DarwinCore triplets or LSIDs, UUIDs, ARKs or
> else. Therefore, we need a way to allocate such existing identifiers in the
> metadata description
>
DOIs allow for simply linking "related resources", but it would be better
to have more explicit ontology classes (e.g., as subclasses of CRID
(central registry identifier) in the Ontology for Biomedical
Investigations) to define how differnt existing identifiers relate to one
another.
>
> f) We would like to identify a metadata description that can be easily
> converted into different formats to make it accessible to the widest range
> of client applications
>
The description, or list of metadata elements, is independent of the format
and therefore conversion. RDF triples as a base format have the advantage
of being lossless, but they don't always scale well. DOI itself specifies
XML a standard [1]. [2], but if GLIS sets up its own Registration Agency
for DOIs, it would use its own base format, which can then be converted to
XML for interactivity with other registries
<snip>
On 10 Mar 2015, at 10:41, Lopez, Francisco (AGDT) <[log in to unmask]>
wrote:
Dear colleagues,
We are now opening discussion for week 3 about the metadata
structure associated to each DOI to build a functional Global Information
System. Again, let us focus the discussion on PGRFA only at this stage. The
purpose of such metadata structure is to assist users and client
applications to:
*a)* identify the PGRFA being registered with each new DOI (This week we
can detailed the scope of the identifiers);
*b)* search for PGRFA during a discovery operation;
*c)* model PGRFA interactions through a minimum set of relation operators
(This point was already anticipated during Week 1);
>>Datacite provides a set of relational operators [3], and it may be
necessary to use these in the "official" DOI metadata. However, a richer
set of relations could be had through biological ontologies under
development, such as the Relations Ontology [4]
*d)* allow the registration of multiple targets associated to each DOI;
>>If I understand correctly, this seems like a resolution issue, which
should already be well worked out by the DOI foundation.
*e)* allow for association of existing identifiers. These can be local or
global, permanent or not, but are commonly used by the entity owner to
refer to the PGRFA like the “accession number”, for example, as discussed
during the previous weeks.
>>See response to Marco's point d above.
*f)* be expressed in a metadata format that can be (or already is) mapped
to some widely accepted standard such as RDF, DWC, Crop Ontology and so on.
>>These are very different types of standards. DWC is available as RDF, and
CO (being an ontology) should be able to be expressed as RDF.
Conversion among formats is fairly straightforward, however, and it is more
important to focus on the content of the metadata (i.e. what elements are
required or recommended). Content should look to existing repositories like
Biosamples [5], MIxS [6], and DWC. Note that MIxS is not currently well
oriented toward plant material (other than as a source for metagenomes),
but some community discussion are underway to create a plant genomes
extension.
Many of you already mentioned that the Multi-Crop Passport Descriptors V.2
<http://www.bioversityinternational.org/uploads/tx_news/FAO-Bioversity_multi_crop_passport_descriptors_V_2_Final_rev_1526.pdf>
could be a good starting point, but that there is the need for some
extension in terms of relational operators, so we can further elaborate on
this.
>>I think that a new standard for plant material will be necessary (see my
comment above) that combines elements of the Passport and MIxS.
[1] http://www.doi.org/doi_handbook/DOI_Schema_Release_Notes.html
[2] http://schema.datacite.org/meta/kernel-3/
[3]
http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf
[4] https://code.google.com/p/obo-relations/
[5] http://www.ncbi.nlm.nih.gov/biosample/
[6] http://wiki.gensc.org/index.php?title=MIxS
########################################################################
To unsubscribe from the GLIS-PGRFA-L list, click the following link:
https://listserv.fao.org/cgi-bin/wa?SUBED1=GLIS-PGRFA-L&A=1