|
Hi all -
You will recall that when time permits, EDI works on EML data package checks for the EML Congruence Checker (ECC), and our protocol is to accumulate checks for semi-annual release in November and May. Recently, we finalized three checks: two related to date-times (in dataTable attributes), and one for an optional element in the project tree. Below is a description of the checks we plan to release May 16. Until then, the checks can be tested in the staging environment, https://portal-s.edirepository.org
For more information, contact EDI. Feel free to visit our git repository, below, to find meeting notes and more details of these checks (under practices/community_updates).
best,
ECC Working Group, https://github.com/EDIorg/ECC
Margaret O'Brien, Duane Costa, Sven Bohm, Stevan Earl, Jason Downing, Gastil-Buhl
Overview
Date-time checks
Date-times are complex. As labels for points in time, they have features of other types of measurements, and when correctly parsed, can be used in computations (e.g., to compute duration). Users have requested the ability to query, filter or plot data values by date and time. But before data values can be effectively used, they must be parsed and interpreted. Many programming languages have libraries for date-time parsing, and may also layer on their own interpretations. The simplest solution would have been to accept the dateTime interpretation of a single processing language, but this was not consistent with the language-agnostic spirit of EML.
There are two parts to EML date-time checking, and consequently, two checks, which work together.
1. examine dateTimeFormatString: EML uses a formatString in metadata to specify how date time values will appear in data. For this check, the working group created a list of preferred dateTime formats, which generally reflect ISO-accepted date times. Code reads this list and creates regular expressions, to which EML dateTime formatStrings can be compared, that is, a match between the format string and every individual data value in that column.
2. dateFormatMatches: Only datetimes with preferred formatStrings can be checked for congruence.
Both dateTime checks return a "warn" if their conditions are not met. With this setting, non-passing data packages are still accepted, but the submitter is aware of the potential problems. Datasets that meet ECC standards will be the most easily reused for synthesis.
Funding element check
Increasingly, funders (e.g., NSF) are asking for datasets to be searchable by a funding code. In EML 2.1.x the funding element in the project tree holds that information, and is unstructured. A planned addition to EML 2.2 will have additional fields for structuring award information. Currently, the fundingPresence check simply looks for the element's presence, although this is likely to be revisited after the EML 2.2 release.
|