Package: cleanepi 1.1.2.9000

Bubacarr Bah

cleanepi: Clean and Standardize Epidemiological Data

Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.

Authors:Karim Mané [aut], Thibaut Jombart [ctb], Abdoelnaser Degoot [aut], Bankolé Ahadzie [aut], Nuredin Mohammed [aut], Bubacarr Bah [aut, cre], Hugo Gruson [ctb, rev], Pratik R. Gupte [rev], James M. Azam [rev], Joshua W. Lambert [rev, ctb], Chris Hartgerink [rev], Andree Valle-Campos [rev, ctb], London School of Hygiene and Tropical Medicine, LSHTM [cph], data.org [fnd]

cleanepi_1.1.2.9000.tar.gz
cleanepi_1.1.2.9000.zip(r-4.7)cleanepi_1.1.2.9000.zip(r-4.6)cleanepi_1.1.2.9000.zip(r-4.5)
cleanepi_1.1.2.9000.tgz(r-4.6-any)cleanepi_1.1.2.9000.tgz(r-4.5-any)
cleanepi_1.1.2.9000.tar.gz(r-4.7-any)cleanepi_1.1.2.9000.tar.gz(r-4.6-any)
cleanepi_1.1.2.9000.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
cleanepi/json (API)
NEWS

# Install 'cleanepi' in R:
install.packages('cleanepi', repos = c('https://epiverse-trace.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/epiverse-trace/cleanepi/issues

Pkgdown/docs site:https://epiverse-trace.github.io

Datasets:

On CRAN:

Conda:

data-cleaningepidemiologyepiverse

7.46 score 12 stars 44 scripts 296 downloads 21 exports 40 dependencies

Last updated from:6fb22d1aa5. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK171
source / vignettesOK219
linux-release-x86_64OK166
macos-release-arm64OK114
macos-oldrel-arm64OK122
windows-develOK115
windows-releaseOK128
windows-oldrelOK108
wasm-releaseOK123

Exports:%>%add_to_dictionaryadd_to_reportcheck_date_sequencecheck_subject_idsclean_dataclean_using_dictionaryconvert_numeric_to_dateconvert_to_numericcorrect_misspelled_valuescorrect_subject_idsfind_duplicatesget_default_paramsprint_reportremove_constantsremove_duplicatesreplace_missing_valuesscan_datastandardize_column_namesstandardize_datestimespan

Dependencies:backportsbitbit64checkmateclicliprcpp11crayondplyrforcatsgenericsgluehmsjanitorlifecyclelinelistlubridatemagrittrmatchmakernumberizepillarpkgconfigprettyunitsprogresspurrrR6readrrlangsnakecasestringistringrtibbletidyrtidyselecttimechangetzdbutf8vctrsvroomwithr

Introduction to cleanepi

Rendered fromcleanepi.Rmdusingknitr::rmarkdownon May 06 2026.

Last update: 2025-07-15
Started: 2023-03-07

Package Design vignette for {cleanepi}

Rendered fromdesign_principle.Rmdusingknitr::rmarkdownon May 06 2026.

Last update: 2025-07-08
Started: 2024-01-20

Readme and manuals

Help Manual

Help pageTopics
Add an element to the data dictionaryadd_to_dictionary
Add an element to the report objectadd_to_report
Checks whether the order in a sequence of date events is chronological. order.check_date_sequence
Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the 'correct_subject_ids' function to correct them.check_subject_ids
Clean and standardize dataclean_data
Perform dictionary-based cleaningclean_using_dictionary
Common strings representing missing valuescommon_na_strings
Convert numeric to dateconvert_numeric_to_date
Convert columns into numericconvert_to_numeric
Correct misspelled values by using approximate string matching techniques to compare them against the expected values.correct_misspelled_values
Correct the wrong subject IDs based on the user-provided values.correct_subject_ids
Identify and return duplicated rows in a data frame or linelist.find_duplicates
Set and return 'clean_data' default parametersget_default_params
Generate report from data cleaning operationsprint_report
Remove constant data, including empty rows, empty columns, and columns with constant values.remove_constants
Remove duplicatesremove_duplicates
Replace missing values with 'NA'replace_missing_values
Scan through a data frame and return the proportion of 'missing', 'numeric', 'Date', 'character', 'logical' values.scan_data
Standardize column names of a data frame or line liststandardize_column_names
Standardize date variablesstandardize_dates
Calculate time span between datestimespan