Dear colleagues, Thank you very much for your participating in week 3. This week’s discussion has been very dynamic and I would like to thank you for your valuable inputs and recommendations.
The discussion started with six questions but your comments focused more on a) identification of PGRFA material and disambiguation, c) model PGRFA interactions through a minimum set of relation operators and e) accommodate existing identifiers.
Dear all, I think that there is a wide agreement on the fact that the DWC germplasm extension combined with the DOIs will help a lot the functions we listed this week for the Global Information System.
The discussion during the last days has evolved very quickly and focused on a number of minimum fields to be registered with the DOIs, taking into account the principle of minimum effort, particularly on the side of the data curator, indicated by Axel.
Francisco has encouraged me to share the mapping table between the MCPD [1] and the Darwin Core standard [2] as input to the discussions for week 3 on metadata. When this mapping was developed [3], we started with the Darwin Core and created an extension to include the descriptors from the MCPD not already covered by terms established in the Darwin Core. The mapping has later developed into a SKOS vocabulary of terms [4]. An overview of this mapping (Germplasm Vocabulary) was also presented at the ECPGR Information and documentation network meeting last year [5] (see slide number
Reading through the requirements, a few items stand out for me.
I would encourage GLIS to adopt RDF for the metadata from the start as that will avoid the restrictions often encountered with XML schema based models (e.g., ISO 19139, EML). I think the approach promoted by Dublin Core for extensible metadata application profiles [1] is worth investigating. RDF should allow you to make relations explicit in the metadata. The use of RDF should also encourage re-use of existing vocabularies thus promoting interoperability and the GLIS can benefit from ongoing work in TDWG and elsewhere, highlighted by Dag,
I second Éamonn's suggestion to work closely with existing standards organization, especially TDWG, GSC (Genomics Standards Consortium), and OBO Foundry (http://www.obofoundry.org/). Please do not invent another standard or a new ontology in a silo!
Dear colleagues, as an attempt to keep the discussion on very practical terms, I would like to follow-up on Dag’s message below. Trying for a moment not to focus too much on the specific ontology, can we compile a short list of “kernel” fields that:
- are readily available to users who will be registering PGRFA in GLIS, e.g. genebanks - can be used to discriminate two different PGRFA - convey enough information to the querying user or application - can be marked as mandatory to register a new PGRFA in GLIS
I agree with proposals below and would add that the concepts: INSTCODE:ACCENUMB but also GENUS/SPECIES mostly provide an acceptable key to the germplasm accessions and what is of outmost importance are available data of any genebank.
Those identifiers listed below are fine. Regarding the 'kernel' Marco is referring to, in addition to the descriptors above, I would agree on including: Accession name, Donor institute code, Common crop name, Ancestral data (this could help in tracing back its origin) and Acquisition date.
I’m late to the party and I could not find any exchange in the archives about the maintenance of the PUIDs assigned to germplasm.
Once the PUID of one accession has been generated and assigned by GLIS based on some “kernel fields”, who is responsible to maintain the generated PUID forever (addressing the “Permanent” in PUID)?
As you know, the “Permanent” part of PUID is the most challenging one because it involves not only technical issues but rather, and so much more importantly, organisational issues.
However, I believe your question is somehow misstated as there is no unique party "responsible of maintaining the generated PUID forever”, but rather a group of organisations and systems that, together, work to maintain the association between the PUID and the PGRFA entity valid “forever”. In this view, and I agree, some even define the PUID as such association between the entity and the identifier string.
Some of the terminology used in the discussion, such as "ontology" has for someone coming from the biological background a very different meaning than in the database context. It is very interesting and educational for me to follow the conversations, but I may not understand all issues. Some abbreviations used in the conversations I do not understand (and I am too lazy to look them up).
No, Marco, you haven’t scratched the surface of my question.
> as there is no unique party "responsible of maintaining the generated PUID forever”, but rather a group of organisations and systems that, together, work to maintain the association between the PUID and the PGRFA entity valid “forever”. In this view, and I agree, some even define the PUID as such association between the entity and the identifier string.
Using MCPD descriptors is a good starting point. In my opinion, the OTHERNUMB descriptor is a very important one. From the EURISCO experiences I know that also in well-organised genebanks accessions numbers may change more often than we would like. Also taxonomic information is changing from time to time. So any other number (e.g. internal DB identifier, sample ID of a donor etc.) which is available would be a great help.
The valid points made by Matija are of concern. From a/our genebank's perspective I must say that the human resources available to invest much into PUIs from our end are extremely limited. Perhaps the best is still to return to the simple solution to combine the Institute code (INSTCODE) and Accession number (ACCENUMB). For genebank accessions that should make for an PUI for such material. So far, genebank clients have mostly not reported genebank accessions numbers in publications, not to talk about cultivar descriptions or pedigrees, if available at all. If we could get people to do that,
If a DOI system is adopted, then there is a contractual obligation for the issuing party to maintain a landing page for the DOI and, if the DOI links to some digital object such as a data set, to maintain access to that digital object. I don't know if there are obligations to maintain physical objects for which DOIs have been registered (i.e. plant material), but I am sure that the registration agency such as EZID can answer that question. The registration agency, for their part, is required to maintain to actual identifier. I don't think anyone is going to
My apologies to all for having just sat on the side and not being more active. I personally appreciated the debate as well as the knowledge and experience mix of all the contributors. I have learned a lot.
If not too late, I would like to express few considerations.
I am partly convinced of the usefulness for PGRFA accessions of a new PUI or PUID, which could be a DOI. As recognized, accessions in the global PGRFA community do have PUI already (MCPD: INSTCODE + ACCENUMB, if the standard is applied as defined "ACCENUMB - This is the
Ramona noted in a previous message that: > Yes, the PGR suppliers will need to obtain a PUID and agree to use it, but I thought GLIS was setting up infrastructure to support that.
Later, Stefano reminds us that > … the more data curation remains with the originator without too much hassle the better.
Thanks for the useful table. Basically, I think that INSTCODE:ACCENUM are enough to 'identify' material among genebanks. If need be, to reconfirm the uniqueness of the material, GENUS might be added. This just in case that some genebank may have added the same ACCENUM to different accessions - but I doubt this is feasible.
Unfortunately, there are genebanks, which "re-use" the same accession number for different genera.
With best regards, Stephan
> -----Original Message----- > From: Global Information System on PGRFA [mailto:GLIS-PGRFA- > [log in to unmask]] On Behalf Of Alercia, Adriana (Bioversity) > Sent: Friday, March 13, 2015 4:18 PM > To: [log in to unmask] > Subject: Re: Metadata fields > > Dear Gerardo and all, > > Thanks for the useful table. Basically, I think that INSTCODE:ACCENUM are > enough to 'identify' material among genebanks. If need be, to reconfirm the > uniqueness of the material, GENUS might be added. This just in case that
That is true and one example for using the same number (k-...) is the important collection of VIR at St Petersburg, but still these places will have a unique accession identifiers.
Axel
-----Original Message----- From: Global Information System on PGRFA [mailto:[log in to unmask]] On Behalf Of Stephan Weise Sent: March-13-15 9:42 AM To: [log in to unmask] Subject: Re: Metadata fields
Following Axel’s message, these are accessions with INSTCODE=RUS001 and ACCENUMB=10
https://www.genesys-pgr.org/acn/RUS001/10
Genesys PGR portal and probably EURISCO use genus to differentiate between these accessions.
As for the mandatory fields: INSTCODE, ACCENUMB and GENUS are a must, everything else must be optional.
Note that in the odd case where an accession is reclassified (or taxonomy changes) it may be assigned a different genus and the connection is lost (hence the PUIDs).
So, perhaps the first steps to arrive at some standard numbering are to (1) request genebanks to implement an essential minimum of the FAO Multicrop passport descriptors and, where necessary, to help genebanks to do so, (2) make each genebank suggesting unique accession identifiers, which may require combinations of the two or more of the essential FAO Multi passport descriptors, and (3) ensure all genebanks/collections have a unique Institute Code registered, for example, in WIEWS? The latter would require that FAO will continue to exist and to maintain WIEWS. Perhaps FOA is more permament than the places that
While I whole-heartedly support the proposal to ask genebanks to register unique institution codes, I think this is drifting a bit off topic. The institution codes and accession numbers are to be added as metadata linked to a PUID, not to be used as unique identifiers themselves. The goal is to provide a single, globally unique, persistent identifier for each object/entity (i.e. PGRFA) that can link to existing and local identifiers such as those used by genebanks.
If you want genebanks on board it may be good to start with them and look where they are regarding the documentation. Otherwise we may design something that is fine but not related to the base. The genebank identifiers are not for local use only as we have always encouraged the genebank users to refer to them in publications, albeit with mixed success.
By local, I only meant not globally unique. If an agreement can be reached (e.g, via globally unique institution codes) that allows a combination of institution code and accession number to serve as a PUID that also meets the other requirements (e.g., multiple resolution) specified in week one, I am all for adopting them and not creating new identifiers. However, I find it hard (but not impossible) to see how the adoption of local identifiers as PUIDs is going to meet all of the requirements (e.g., institution codes and accession numbers could be incorporated into DOIs, but that affects opacity).
Matija, that sounds somewhat different than what I understood the PUIDs were going to be, but I am new to this game. Yes, the PGR suppliers will need to obtain a PUID and agree to use it, but I thought GLIS was setting up infrastructure to support that. I'm afraid I don't have a good sense of how it will all proceed operationally (maybe because it hasn't been worked out yet), but I agree that adoption by the material suppliers is the only way PUIDs for PGRFAs will work.
I wanted to stress that the PUID should be treated with the same care as ACCENUMB currently is in gene banks.
> Therefore, the genebanks and other entities (e.g. breeders) with PGR collections are the ones who will need to (1) obtain a PUID and (2) maintain that PUID forever just next to the ACCENUMB and (3) always include the PUID when sharing data.
Dear Marco, I would like add MLSSTAT to this list of initial set of fields to be associated with the global identifier to know whether the material is the Multilateral System or not.
I also think that DONORCODE and DONORNUMBER as well as OTHERNUMB (eg. collecting number in GRIN-Global) could be also very useful to find out potential duplicates and for the Global system to facilitate relationships when the material is transferred.
Dear colleagues, We are now opening discussion for week 3 about the metadata structure associated to each DOI to build a functional Global Information System. Again, let us focus the discussion on PGRFA only at this stage. The purpose of such metadata structure is to assist users and client applications to:
I would like to add a few comments, hoping to further clarify the points in Francisco’s email that we think should be addressed when discussing the metadata structure to associate to the DOIs assigned by GLIS.
a) accurate identification of the PGRFA being assigned a DOI is the first obvious need we have. A good metadata description will allow GLIS to perform automatic checks and flag, as potential duplicates, PGRFA that have the same values in all or some of the metadata fields. Which ones this fields are will be one of the discussion topics
Please see my specific responses in line below. In addition, I recommend that the GLIS work with the new Planteome <http://planteome.org/> project, which will be developing a number of ontologies relevant to PGRFAs.
Dear Ramona and others, thank you for the interesting comments. I would just like to add a few clarifications.
> DOIs allow for simply linking "related resources", but it would be better to have more explicit ontology classes (e.g., as subclasses of CRID (central registry identifier) in the Ontology for Biomedical Investigations) to define how differnt existing identifiers relate to one another.
Thanks for the clarification. Sorry I was not more explicit about those points!
Ramona
Enviado de Baja Arizona.
> On Mar 12, 2015, at 1:00 AM, Marsella Marco <[log in to unmask]> wrote: > > Dear Ramona and others, thank you for the interesting comments. I would just like to add a few clarifications. > >> DOIs allow for simply linking "related resources", but it would be better to have more explicit ontology classes (e.g., as subclasses of CRID (central registry identifier) in the Ontology for Biomedical Investigations) to define how differnt existing identifiers relate to one another. > > Actually, DOI includes
Thank you for the valuable contributions to Week 2 of the PUI Consultation.
The technical nature of the topics at hand has somehow limited the amount of detailed feedback, but what we have taken note of the wide support to DOIs as the most promising Permanent Unique Identifier for GLIS. This conclusion is derived from the analysis of the scoring tables that we have compiled (attached) and from the general comments received from Elizabeth and Ruth. I would like to thanks in particular Dag, Éamonn and Marco for compiling the table and providing not only the scoring, but
I must confess that my technical knowledge is rather limited to make an expert scoring of each solution and it would require some literature review on my side. I rapidly consulted bioinformatics colleagues in Bioversity but they cannot provide a quick comparative assessment of the 3 solutions as they did not experienced the 3.
Find attached some input which I put together with my colleagues, Tim Robertson and Kyle Braak. We focus on DOIs and LSIDs as the GBIF Secretariat does not have practical experience with ARKs. In addition, our comments on DOIs are based on experience with DataCite only.
Dear colleagues, we are now ready to open Week 2 of this electronic consultation on PUIDs for PGRFA. This week will be dedicated to analyze and compare the three candidate PUIDs that have been identified during the COGIS meeting (ARK, DOI and LSID) against the list of requirements that has ben discussed during last week.
Dear all, I have been on travel this week. I have now tried to fill in scores and comments in the table. I have left some options intentionally empty where I lack experience or information. Best regards Dag ________________________________________ From: Global Information System on PGRFA <[log in to unmask]> on behalf of Lopez, Francisco (AGDT) <[log in to unmask]> Sent: 03 March 2015 11:37 To: [log in to unmask] Subject: Week 2 - Analysis of the PUID candidates against the requirements identified in W1 Dear colleagues, we are now ready to open Week 2 of this electronic consultation on PUIDs for PGRFA. This week will be dedicated to analyze
I'm really in line with the others on this. This issues are not just based on a purely technical assessment score. The increased popularity and "Universality" of DOI's make them very attractive in particular if related user communities such as GBIF are using them in a comparable way. However again I would advocate that the biggest issue is the detailed definition of the scope of the identifiers and ideally the need to have very precise descriptions of how they are to be used. In particular the need to guard against using the identifiers in a way that tags similar but
I am fully aligned with David on this. In addition, I would go further; it is confusing and dangerous to use identifiers that do not reflect or clearly link with the biological state of a specific germplasm sample (inbred, open pollinated, etc). Form follows function and it is critical to look at these things from a user’s point of view- what can be misinterpreted and how can this be best guarded against.
we are following the discussion and taking notes of the valuable inputs you all are providing. We believe that a set of guidelines on how the PUID should be assigned, maintained and used in the context of GLIS will in itself be a valuable contribution to the advancement of the matter and facilitate acceptance of PUIDs within the community.
as Secretariat of ITPGRFA we do not have direct experience with any of the three PUID candidates. However, in preparation to the COGIS meeting in San Diego and later on, I have spent a considerable amount of time reading all material I could find on the Web on PUID in general and the three candidates in particular. The result is the attached table that I have put together from technical specification documents, discussion groups, meeting reports and presentations to various meetings.
Dear colleagues, We would like to thank you for the constructive discussion over the first week. We have taken into account your comments and we attach herewith the updated list of requirements.
Additionally, we would like also take this opportunity to share with you a quick summary on some of the major topics of discussions:
Thanks for a complete list of requirements! I have included some comments into the document using track-changes.
#4 Resolvability: A plain web page with information about the entity is a useful minimum requirement. However, we might perhaps want to mention already in #4 that an additional machine readable service is preferred?
David Marshall Information and Computational Sciences The James Hutton Institute Invergowrie Dundee, DD2 5DA Scotland, UK [log in to unmask]<mailto:[log in to unmask]> +44(0) 8449 285 428 (Switchboard) +44(0) 1382 568 744 (Direct) +44(0) 8449 285 429 (Fax)
From: Global Information System on PGRFA [mailto:[log in to unmask]] On Behalf Of Dag Endresen Sent: 25 February 2015 09:02 To: [log in to unmask] Subject: Re: W1 -Task Force on Permanent Unique Identifiers: Requirements
Hello I have added a few comments to the discussion in the included version on the document. Thanks
Eugene Timmermans
Database Administrator, Plant Gene Resource of Canada Agriculture and Agri-Food Canada / Government of Canada [log in to unmask] / Tel: 306-385-9467 / TTY: 613-773-2600
Database Administrator, Plant Gene Resource of Canada Agriculture et Agroalimentaire Canada / Gouvernement du Canada [log in to unmask] / Tél. : 306-385-9467 / ATS : 613-773-2600
> -----Original Message----- > From: Global Information System on PGRFA [mailto:GLIS-PGRFA- > [log in to unmask]] On Behalf Of David Marshall > Sent: Wednesday, February 25, 2015 10:34 AM > To: [log in to unmask] > Subject: Re: W1 -Task Force on Permanent Unique Identifiers: Requirements > > Some more comments added to the document. > > > > David > > > > David Marshall > Information and Computational Sciences > The James Hutton Institute > Invergowrie > Dundee, DD2 5DA > Scotland, UK > > [log in to unmask] <mailto:[log in to unmask]> > +44(0) 8449 285 428 (Switchboard) > > +44(0) 1382 568 744 (Direct)
Regarding #21 Identification of fragments, I agree with the concern raised by Eugene and Stephan. My understanding of what could be an example of a fragment (of an accession entity) might be seeds of an accession from a given harvest year (I have seen this thing to be called a batch). And if there is a need to identify such "fragment"-things, a clean opaque PUID for such things would generally be preferred compared to to appending codes to the accession-level PUID identifier string. Perhaps there are other use cases, but in general I agree with these concerns.
Thanks for the comprehensive list. Few comments in the attachment. Adriana
-----Original Message----- From: Global Information System on PGRFA [mailto:[log in to unmask]] On Behalf Of Stephan Weise Sent: Wednesday, February 25, 2015 6:16 PM To: [log in to unmask] Subject: Re: W1 -Task Force on Permanent Unique Identifiers: Requirements
An additional comment.
Stephan
> -----Original Message----- > From: Global Information System on PGRFA [mailto:GLIS-PGRFA- > [log in to unmask]] On Behalf Of David Marshall > Sent: Wednesday, February 25, 2015 10:34 AM > To: [log in to unmask] > Subject: Re: W1 -Task Force on Permanent Unique Identifiers: Requirements > > Some more comments added to the document. > > >
-----Original Message----- From: Alercia, Adriana (Bioversity) Sent: Wednesday, February 25, 2015 7:08 PM To: [log in to unmask] Subject: RE: W1 -Task Force on Permanent Unique Identifiers: Requirements
Thanks for the comprehensive list. Few comments in the attachment. Adriana
-----Original Message----- From: Global Information System on PGRFA [mailto:[log in to unmask]] On Behalf Of Stephan Weise Sent: Wednesday, February 25, 2015 6:16 PM To: [log in to unmask] Subject: Re: W1 -Task Force on Permanent Unique Identifiers: Requirements
As I see the PUI from the ongoing discussions, this concept will, if accepted in the broad sense, grow exponentially at rate that ridicules the snowball effect. If all things listed in the initial e-mail (genebank accessions, fragments of them, seed lots, selections, gene sequences, individuals, publications...) will receive a PUI this will not help the cause of the GLIS but create a global and eventually cosmic catalogue of everything. The GlIS has not this purpose. I think we need to restrict this and perhaps need to focus more on
#3 Opacity: I will stress the importance of the opacity best practice for PUID name strings. How about reformulation as: "No information on the entity should be inferrable from the PUID name string alone"? See eg. GBIF (2011) [1] (page 9).
#9 Compatibility: I agree that naming accession number as local identifier is an unfortunate and bad wording. How about "traditional identifier" or perhaps "verbatim identifier"? I fully agree that keeping the accession numbers as the most common identifiers in communication between humans remains the most pragmatic option!
Agree with Axel and Barnabas regarding 'accession numbers'.
- With regards to #1 - Uniqueness and #9 Compatibility: Dave comment about the need to clarify the nature of the entity. "Accession" is a management and not a biological concept and should be treated as such. It is an 'entry' in a genebank, and constitutes the 'unit of management at the Genebank' (RS Hamilton, 2002) -whatever its composition- held in storage for conservation or use. The FAO/Bioversity list of Multi-Crop Passport Descriptors defines the Accession number as: the unique identifier for accessions within a genebank, and is assigned
Dear colleagues, As we are approaching the end of the first week I would like to comment on the issue of "entity" and "accession number". We may want to clearly indicate that. (B) The Treaty community already dealt with this issue during the process for the development Standard Material Transfer Agreement (SMTA) and the document refers more generically, following the text of the Treaty, to "PGRFA" for which the definition is included in paragraph 2 of the SMTA.
Attached is the latest version of the Word doc with my comments inserted. As an overall comment, my greatest concern is the lack of clarity about what will be identified with these PUIDs. The paragraph quoted by Fransisco:
"“Plant Genetic Resources for Food and Agriculture” any material of plant origin, including reproductive and vegetative propagating material, containing functional units of heredity of actual or potential value for food and agriculture."
I have been stuck in a hospital bed for the last week, so I have been unable to actively participate in this very stimulating discussion. I hope to be back home tomorrow for week 2.
Anyway, I would just like to clarify our position about the two points raised by Ramona and others.
> -----Original Message----- > From: Barnabas Kapange [mailto:[log in to unmask]] > Sent: Thursday, February 26, 2015 7:40 AM > To: Stephan Weise > Subject: > > Dear Stephen, > > I am part of the UGIP and I currently have problems with my outgoing mails. I > have made a small contribution to the List but I am not sure if it is reflected. > > If not, then this is what I wrote: > Dear Colleagues, > > I am in agreement with what most other contributors say. I especially > support
While institution code, accession number and genus may be sufficient to disambiguate entities, I think the minimum metadata required should perhaps be more than what is required for disambiguation. The whole system that uses PUIDs is also supposed to serve the purposes of discovery and reuse of information about PGRFAs. For this more metadata is necessary, such as the type of resource and the location of the resource (a physical location for material resources or UIR for digital ones). Even more metadata is important, but cannot necessarily be required if it is not know. Although only "required" and
Dear All I have also only used DOIs and LSIDs and not ARK and would agree with the majority of the detailed comments made by Éamonn and colleagues. Based on previous experience I would favour DOIs as they are becoming more and more widely used and as such are likely to become more permanent that then LSID and ARK. I think this is essential criteria as we look to the future. As well as the more formal use of DOIs, they are also being used in some very creative data citation project such as Figshare. Figshare was created as a