The biocompute package offers a toolkit to create, validate, and export BioCompute Objects (BCO). This package follows the tidyverse design principles and can be seamlessly used together with the other packages with similar designs.
To ensure better reproducibility, the composing and validation functions in this package are versioned. This means the BCO creation and validation can be done with fixed versions of the BioCompute Object specification if needed.
For example, to compose the provenance domain, one could use
compose_provenance()
, which is an alias to the current
stable version of the specification. Alternatively, one could use a
versioned function compose_provenance_v1.3.0()
. As the
specification evolves, functions for new spec versions can be added, and
compose_provenance()
might point to a newer version in the
future, while compose_provenance_v1.3.0()
will not change
over time.
The function biocompute::versions()
tells the current
and all available versions of the BioCompute Object specification
supported in this package:
biocompute::versions()
$current
[1] "1.4.2"
$available
[1] "1.4.2"
The package takes structured, native R data structures (vector or
data frames), and turns them into BioCompute Objects. The functions
compose_*()
and biocompute::compose()
are used
to compose BioCompute Object domains and the final BioCompute
Object.
For example, to compose the provenance domain, we first prepare the
data as data frames or vectors with a fixed set of variables names, and
feed them into compose_provenance()
:
name <- "HCV1a ledipasvir resistance SNP detection"
version <- "1.0.0"
review <- data.frame(
"status" = c("approved", "approved"),
"reviewer_comment" = c(
"Approved by [company name] staff. Waiting for approval from FDA Reviewer",
"The revised BCO looks fine"
),
"date" = c(
as.POSIXct("2017-11-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST"),
as.POSIXct("2017-12-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "America/Los_Angeles")
),
"reviewer_name" = c("Jane Doe", "John Doe"),
"reviewer_affiliation" = c("Seven Bridges Genomics", "U.S. Food and Drug Administration"),
"reviewer_email" = c("example@sevenbridges.com", "example@fda.gov"),
"reviewer_contribution" = c("curatedBy", "curatedBy"),
"reviewer_orcid" = c("https://orcid.org/0000-0000-0000-0000", NA),
stringsAsFactors = FALSE
)
derived_from <- "https://github.com/biocompute-objects/BCO_Specification/blob/1.2.1-beta/HCV1a.json"
obsolete_after <- as.POSIXct("2018-11-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")
embargo <- c(
"start_time" = as.POSIXct("2017-10-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST"),
"end_time" = as.POSIXct("2017-11-12T12:30:48", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")
)
created <- as.POSIXct("2017-01-20T09:40:17", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")
modified <- as.POSIXct("2019-05-10T09:40:17", format = "%Y-%m-%dT%H:%M:%S", tz = "EST")
contributors <- data.frame(
"name" = c("Jane Doe", "John Doe"),
"affiliation" = c("Seven Bridges Genomics", "U.S. Food and Drug Administration"),
"email" = c("example@sevenbridges.com", "example@fda.gov"),
"contribution" = I(list(c("createdBy", "curatedBy"), c("authoredBy"))),
"orcid" = c("https://orcid.org/0000-0000-0000-0000", NA),
stringsAsFactors = FALSE
)
license <- "https://creativecommons.org/licenses/by/4.0/"
compose_provenance(
name, version, review, derived_from, obsolete_after,
embargo, created, modified, contributors, license
) %>% convert_json()
{
"name": "HCV1a ledipasvir resistance SNP detection",
"version": "1.0.0",
"review": [
{
"status": "approved",
"reviewer_comment": "Approved by [company name] staff. Waiting for approval from FDA Reviewer",
"date": 1510507848,
"reviewer": [
{
"reviewer_name": "Jane Doe",
"reviewer_affiliation": "Seven Bridges Genomics",
"reviewer_email": "example@sevenbridges.com",
"reviewer_contribution": "curatedBy",
"reviewer_orcid": "https://orcid.org/0000-0000-0000-0000"
}
]
},
{
"status": "approved",
"reviewer_comment": "The revised BCO looks fine",
"date": 1513110648,
"reviewer": [
{
"reviewer_name": "John Doe",
"reviewer_affiliation": "U.S. Food and Drug Administration",
"reviewer_email": "example@fda.gov",
"reviewer_contribution": "curatedBy",
"reviewer_orcid": "NA"
}
]
}
],
"derived_from": "https://github.com/biocompute-objects/BCO_Specification/blob/1.2.1-beta/HCV1a.json",
"obsolete_after": "2018-11-12T12:30:48-0500",
"embargo": ["2017-10-12T12:30:48-0500", "2017-11-12T12:30:48-0500"],
"created": "2017-01-20T09:40:17-0500",
"modified": "2019-05-10T09:40:17-0500",
"contributors": [
{
"name": "Jane Doe",
"affiliation": "Seven Bridges Genomics",
"email": "example@sevenbridges.com",
"contribution": ["createdBy", "curatedBy"],
"orcid": "https://orcid.org/0000-0000-0000-0000"
},
{
"name": "John Doe",
"affiliation": "U.S. Food and Drug Administration",
"email": "example@fda.gov",
"contribution": "authoredBy",
"orcid": "NA"
}
],
"license": "https://creativecommons.org/licenses/by/4.0/"
}
After all the domains are composed, use compose_tlf()
to
compose the top level fields, as all the domains will be used to
calculate an SHA-256 checksum. Next, use
biocompute::compose()
to compose the complete BioCompute
Object.
tlf <- compose_tlf(
compose_provenance(), compose_usability(), compose_extension(),
compose_description(), compose_execution(), compose_parametric(),
compose_io(), compose_error()
)
biocompute::compose(
tlf,
compose_provenance(), compose_usability(), compose_extension(),
compose_description(), compose_execution(), compose_parametric(),
compose_io(), compose_error()
) %>% convert_json()
{
"spec_version": "https://w3id.org/biocompute/1.4.2/",
"object_id": "https://biocompute.sbgenomics.com/bco/85823b2e-2f88-42e8-8b78-8ae3af94a6dd",
"etag": "496f1a5c9693811984714b0647914929c5bc29c8e7a18ad2a0276f82b11a42ce",
"provenance_domain": {
"name": [],
"version": [],
"review": [],
"derived_from": [],
"obsolete_after": [],
"embargo": [],
"created": [],
"modified": [],
"contributors": [],
"license": []
},
"usability_domain": [],
"extension_domain": {
"fhir_extension": [],
"scm_extension": []
},
"description_domain": {
"keywords": [],
"xref": [],
"platform": [
"Seven Bridges Platform"
],
"pipeline_steps": []
},
"execution_domain": {
"script": [],
"script_driver": [],
"software_prerequisites": [],
"external_data_endpoints": [],
"environment_variables": []
},
"parametric_domain": [],
"io_domain": {
"input_subdomain": [],
"output_subdomain": []
},
"error_domain": {
"empirical_error": [],
"algorithmic_error": []
}
}
As we have already seen above, use convert_json()
or
convert_yaml()
to convert the domain objects or BCO objects
into the JSON or YAML format.
To make sure that a BioCompute Object was not tampered and follows the standard, we can validate them by the checksum, or validate them against the BCO JSON schemas. For example
bco <- tempfile(fileext = ".json")
generate_example("HCV1a") %>%
convert_json() %>%
export_json(bco)
bco %>% validate_checksum()
──
[1mLoading BioCompute Object
[22m ───────────────────────────────────────────────────
──
[1mValidating Checksum
[22m ─────────────────────────────────────────────────────────
Documented checksum: d18deb41a97a3108e743231af5fec0055dac321b945a569ca58bf33503ec526b
Calculated checksum: c722ea929af19b3b80e899017bde78602b220d0f88a4a131411fe23533c3c999
Documented and calculated checksum did NOT match.Due to the minor differences in JSON serialization,this could be a false positive. Please double check.
bco <- tempfile(fileext = ".json")
generate_example("HCV1a") %>%
convert_json() %>%
export_json(bco)
bco %>% validate_schema()
──
[1m0: Validating BioCompute Object
[22m ─────────────────────────────────────────────
[1] FALSE
attr(,"errors")
field message
1 data.extension_domain is the wrong type
──
[1m1: Validating Provenance Domain
[22m ─────────────────────────────────────────────
[1] FALSE
attr(,"errors")
field message
1 data.review.0.date is the wrong type
2 data.review.1.date is the wrong type
3 data.obsolete_after must be date-time format
4 data.embargo is the wrong type
5 data.created must be date-time format
6 data.modified must be date-time format
──
[1m2: Validating Usability Domain
[22m ──────────────────────────────────────────────
[1] TRUE
──
[1m3: Validating Description Domain
[22m ────────────────────────────────────────────
[1] FALSE
attr(,"errors")
field message
1 data.xref.0.ids is the wrong type
2 data.xref.0.access_time must be date-time format
3 data.xref.1.ids is the wrong type
4 data.xref.1.access_time must be date-time format
5 data.xref.2.access_time must be date-time format
6 data.xref.3.ids is the wrong type
7 data.xref.3.access_time must be date-time format
8 data.pipeline_steps.0.step_number is the wrong type
──
[1m4: Validating Execution Domain
[22m ──────────────────────────────────────────────
[1] FALSE
attr(,"errors")
field message
1 data.external_data_endpoints.0 is the wrong type
2 data.external_data_endpoints.1 is the wrong type
3 data.external_data_endpoints.2 is the wrong type
──
[1m5: Validating Parametric Domain
[22m ─────────────────────────────────────────────
[1] TRUE
──
[1m6: Validating I/O Domain
[22m ────────────────────────────────────────────────────
[1] TRUE
──
[1m7: Validating Error Domain
[22m ──────────────────────────────────────────────────
[1] TRUE
The biocompute package offers a few convinient functions for
exporting the BioCompute Objects to a JSON (export_json()
),
PDF, HTML, or Word document (export_pdf()
,
export_html()
, export_word()
), and the
capability to export (upload) to cloud-based platforms
(export_sevenbridges()
). Check the function documentation
for details.