Package: eHDPrep 1.3.3.9000

Ian Overton

eHDPrep: Quality Control and Semantic Enrichment of Datasets

A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.

Authors:Tom Toner [aut], Ian Overton [aut, cre]

eHDPrep_1.3.3.9000.tar.gz
eHDPrep_1.3.3.9000.zip(r-4.5)eHDPrep_1.3.3.9000.zip(r-4.4)eHDPrep_1.3.3.9000.zip(r-4.3)
eHDPrep_1.3.3.9000.tgz(r-4.4-any)eHDPrep_1.3.3.9000.tgz(r-4.3-any)
eHDPrep_1.3.3.9000.tar.gz(r-4.5-noble)eHDPrep_1.3.3.9000.tar.gz(r-4.4-noble)
eHDPrep_1.3.3.9000.tgz(r-4.4-emscripten)eHDPrep_1.3.3.9000.tgz(r-4.3-emscripten)
eHDPrep.pdf |eHDPrep.html
eHDPrep/json (API)
NEWS

# Install 'eHDPrep' in R:
install.packages('eHDPrep', repos = c('https://overton-group.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/overton-group/ehdprep/issues

Datasets:

On CRAN:

data-qualityhealth-informaticssemantic-enrichment

46 exports 8 stars 1.47 score 89 dependencies 11 scripts 276 downloads

Last updated 1 years agofrom:4a2f499bce. Checks:OK: 1 NOTE: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 13 2024
R-4.5-winNOTESep 13 2024
R-4.5-linuxNOTESep 13 2024
R-4.4-winNOTESep 13 2024
R-4.4-macNOTESep 13 2024
R-4.3-winNOTESep 13 2024
R-4.3-macNOTESep 13 2024

Exports:apply_quality_ctrlassess_completenessassess_qualityassume_var_classescompare_completenesscompare_info_contentcompare_info_content_pltcompleteness_heatmapcount_compareedge_tbl_to_graphencode_as_num_matencode_binary_catsencode_catsencode_genotypesencode_ordinalsentropyexport_datasetextract_freetextidentify_inconsistencyimport_datasetimport_var_classesinformation_content_contininformation_content_discretejoin_vars_to_ontolmerge_colsmetavariable_aggmetavariable_infometavariable_variable_descendantsmi_content_discretemod_tracknode_IC_zhounums_to_NAordinal_label_levelsplot_completenessreport_var_modsreview_quality_ctrlrow_completenesssemantic_enrichmentskipgram_appendskipgram_freqskipgram_identifystrings_to_NAvalidate_consistency_tblvariable_completenessvariable_entropyzero_entropy_variables

Dependencies:base64encBHbitbit64bslibcachemcellrangerclicliprcolorspacecpp11crayondigestdplyrevaluatefansifarverfastmapfastmatchfontawesomeforcatsfsgenericsggplot2gluegtablehighrhmshtmltoolsigraphisobandISOcodesjquerylibjsonlitekableExtraknitrlabelinglatticelifecyclemagrittrMASSMatrixmemoisemgcvmimemunsellnlmeNLPpheatmappillarpkgconfigprettyunitsprogresspurrrquantedaR6rappdirsRColorBrewerRcppreadrreadxlrematchrlangrmarkdownrstudioapisassscalesslamSnowballCstopwordsstringistringrsvglitesystemfontstibbletidygraphtidyrtidyselecttinytextmtzdbutf8vctrsviridisLitevroomwithrxfunxml2yaml

'eHDPrep': an 'R' package for Electronic Health Data Quality Control and Semantic Enrichment

Rendered fromIntroduction_to_eHDPrep.Rmdusingknitr::rmarkdownon Sep 13 2024.

Last update: 2023-06-01
Started: 2022-07-26

Readme and manuals

Help Manual

Help pageTopics
Apply quality control measures to a datasetapply_quality_ctrl
Assess completeness of a datasetassess_completeness
Assess quality of a datasetassess_quality
Assume variable classes in dataassume_var_classes
Kable logical data highlightingcellspec_lgl
Compare Completeness between Datasetscompare_completeness
Information Content Comparison Tablecompare_info_content
Information Content Comparison Plotcompare_info_content_plt
Completeness Heatmapcompleteness_heatmap
Compare unique values before and after data modificationcount_compare
Calculate mutual information of a matrix of discrete valuesdiscrete.mi
Find highly distant value for data framedistant_neg_val
Convert edge table to tidygraph graphedge_tbl_to_graph
Convert data frame to numeric matrixencode_as_num_mat
Encode a categorical vector with binary categoriesencode_bin_cat_vec
Encode categorical variables as binary factorsencode_binary_cats
Encode categorical variables using one-hot encoding.encode_cats
Encode a genotype/SNP vectorencode_genotype_vec
Encode genotype/SNP variables in data frameencode_genotypes
Encode ordinal variablesencode_ordinals
Calculate Entropy of a Vectorentropy
Exact kernel density estimationexact.kde
Example data for eHDPrepexample_data
Example ontology as an edge table for semantic enrichmentexample_edge_tbl
Example mapping file for semantic enrichmentexample_mapping_file
Example ontology as a network graph for semantic enrichmentexample_ontology
Export data to delimited fileexport_dataset
Extract information from free textextract_freetext
Identify inconsistencies in a datasetidentify_inconsistency
Import data into 'R'import_dataset
Import corrected variable classesimport_var_classes
Calculate Information Content (Continuous Variable)information_content_contin
Calculate Information Content (Discrete Variable)information_content_discrete
Join Mapping Table to Ontology Network Graphjoin_vars_to_ontol
Find maximum of vector safelymax_catchNAs
Find mean of vector safelymean_catchNAs
Merge columns in data framemerge_cols
Aggregate Data by Metavariablemetavariable_agg
Compute Metavariable Informationmetavariable_info
Extract metavariables' descendant variablesmetavariable_variable_descendants
Calculate Mutual Information Contentmi_content_discrete
Find minimum of vector safelymin_catchNAs
Data modification trackingmod_track
Calculate Node Information Content (Zhou et al 2008 method)node_IC_zhou
Min max normalizationnormalize
Replace numeric values in numeric columns with NAnums_to_NA
One hot encode a vectoronehot_vec
Extract labels and levels of ordinal variables in a datasetordinal_label_levels
Plot Completeness of a Datasetplot_completeness
Find product of vector safelyprod_catchNAs
Track changes to dataset variablesreport_var_mods
Review Quality Controlreview_quality_ctrl
Calculate Row Completeness in a Data Framerow_completeness
Semantic enrichmentsemantic_enrichment
Append Skipgram Presence Variables to Datasetskipgram_append
Report Skipgram Frequencyskipgram_freq
Identify Neighbouring Words (Skipgrams) in a free-text vectorskipgram_identify
Replace values in non-numeric columns with NAstrings_to_NA
Sum vector safely for semantic enrichmentsum_catchNAs
Validate internal consistency tablevalidate_consistency_tbl
Validate mapping table for semantic enrichmentvalidate_mapping_tbl
Validate ontology network for semantic enrichmentvalidate_ontol_nw
Calculate Variable Completeness in a Data Framevariable_completeness
Calculate Entropy of Each Variable in Data Framevariable_entropy
Variable bandwidth Kernel Density Estimationvariable.bw.kde
Missing dots warningwarn_missing_dots
Identify variables with zero entropyzero_entropy_variables