Dear Colleagues,

Please see my specific responses in line below. In addition, I recommend that the GLIS work with the new Planteome project, which will be developing a number of ontologies relevant to PGRFAs.


Ramona

------------------------------------------------------
Ramona L. Walls, Ph.D.
Scientific Analyst, The iPlant Collaborative, University of Arizona
Research Associate, Bio5 Institute, University of Arizona
Laboratory Research Associate, New York Botanical Garden

On Tue, Mar 10, 2015 at 12:43 PM, Marsella Marco <[log in to unmask]> wrote:
Dear colleagues,

I would like to add a few comments, hoping to further clarify the points in Francisco’s email that we think should be addressed when discussing the metadata structure to associate to the DOIs assigned by GLIS.

a) accurate identification of the PGRFA being assigned a DOI is the first obvious need we have. A good metadata description will allow GLIS to perform automatic checks and flag, as potential duplicates, PGRFA that have the same values in all or some of the metadata fields. Which ones this fields are will be one of the discussion topics
 
A controlled vocabulary is essential here, not only for disambiguating genetic resources, but for clarifying what types of entities should receive a DOI as a PGRFA.

b) a good metadata description will allow users to query GLIS using any of the metadata fields (e.g. the Accession Number or the genus/species) and receive a lista of one or more PGRFA together with their DOIs

c) as already mentioned during W1 and W2, PGRFA are not standing still, they change and interact over time. We need to find a way to model such changes and interactions

A sound semantic model (ontology) will be crucial for this. The Plant Ontology and Gene Ontology can cover development stages and biological processes. The Population and Community Ontology is under development and could also be useful. However, exisiting ontologies in their present form are unlikely to include all terms needed for this effort, and the PGRFA community will need to contribute to ontology development.


d) allowing users to associate new targets (e.g. URLs for websites or DOIs for publications) to existing PGRFA is considered an important feature of GLIS. This way, the landing page of any PGRFA registered in GLIS will list all destinations where additional information on the PGRFA can be located. Such destinations can be further specified through some simple metadata attributes to indicate, for instance, that they provide details on C&E data, offer DNA sequencing information, point to publications and so on. This applies to responses to client applications as well

Metadata attributes are essentially properties that link data, but properties on their own carry little semantic. I support the use of metadata properties (e.g., Darwin Core), but without an ontology to clarify their semantics, data interoperability will be limited.

e) one of the design principles of GLIS is the preservation of existing identifiers. We do not want organisations to abandon their own identifiers, them being Accession Numbers, DarwinCore triplets or LSIDs, UUIDs, ARKs or else. Therefore, we need a way to allocate such existing identifiers in the metadata description
 
DOIs allow for simply linking "related resources", but it would be better to have more explicit ontology classes (e.g., as subclasses of CRID (central registry identifier) in the Ontology for Biomedical Investigations) to define how differnt existing identifiers relate to one another.

f) We would like to identify a metadata description that can be easily converted into different formats to make it accessible to the widest range of client applications
 
The description, or list of metadata elements, is independent of the format and therefore conversion. RDF triples as a base format have the advantage of being lossless, but they don't always scale well. DOI itself specifies XML a standard [1]. [2], but if GLIS sets up its own Registration Agency for DOIs, it would use its own base format, which can then be converted to XML for interactivity with other registries

<snip>
On 10 Mar 2015, at 10:41, Lopez, Francisco (AGDT) <[log in to unmask]> wrote:

Dear colleagues, 
                We are now opening discussion for week 3 about the metadata structure associated to each DOI to build a functional Global Information System. Again, let us focus the discussion on PGRFA only at this stage. The purpose of such metadata structure is to assist users and client applications to:

a) identify the PGRFA being registered with each new DOI (This week we can detailed the scope of the identifiers);

b) search for PGRFA during a discovery operation;

c) model PGRFA interactions through a minimum set of relation operators (This point was already anticipated during Week 1);


>>Datacite provides a set of relational operators [3], and it may be necessary to use these in the "official" DOI metadata. However, a richer set of relations could be had through biological ontologies under development, such as the Relations Ontology [4]
 
d) allow the registration of multiple targets associated to each DOI;

>>If I understand correctly, this seems like a resolution issue, which should already be well worked out by the DOI foundation.

e) allow for association of existing identifiers. These can be local or global, permanent or not, but are commonly used by the entity owner to refer to the PGRFA like the “accession number”, for example, as discussed during the previous weeks.

>>See response to Marco's point d above.

f) be expressed in a metadata format that can be (or already is) mapped to some widely accepted standard such as RDF, DWC, Crop Ontology and so on.

>>These are very different types of standards. DWC is available as RDF, and CO (being an ontology) should be able to be expressed as RDF.
Conversion among formats is fairly straightforward, however, and it is more important to focus on the content of the metadata (i.e. what elements are required or recommended). Content should look to existing repositories like Biosamples [5], MIxS [6], and DWC. Note that MIxS is not currently well oriented toward plant material (other than as a source for metagenomes), but some community discussion are underway to create a plant genomes extension.
 
Many of you already mentioned that the Multi-Crop Passport Descriptors V.2 could be a good starting point, but that there is the need for some extension in terms of relational operators, so we can further elaborate on this.

>>I think that a new standard for plant material will be necessary (see my comment above) that combines elements of the Passport and MIxS.
          

 
[1] http://www.doi.org/doi_handbook/DOI_Schema_Release_Notes.html

[2] http://schema.datacite.org/meta/kernel-3/

[3] http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf

[4] https://code.google.com/p/obo-relations/

[5] http://www.ncbi.nlm.nih.gov/biosample/

[6]  http://wiki.gensc.org/index.php?title=MIxS


To unsubscribe from the GLIS-PGRFA-L list, click the following link:
https://listserv.fao.org/cgi-bin/wa?SUBED1=GLIS-PGRFA-L&A=1