Package: cleanepi 1.1.2.9000

Bubacarr Bah

cleanepi: Clean and Standardize Epidemiological Data

Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.

Authors:Karim Mané [aut], Thibaut Jombart [ctb], Abdoelnaser Degoot [aut], Bankolé Ahadzie [aut], Nuredin Mohammed [aut], Bubacarr Bah [aut, cre], Hugo Gruson [ctb, rev], Pratik R. Gupte [rev], James M. Azam [rev], Joshua W. Lambert [rev, ctb], Chris Hartgerink [rev], Andree Valle-Campos [rev, ctb], London School of Hygiene and Tropical Medicine, LSHTM [cph], data.org [fnd]

cleanepi_1.1.2.9000.tar.gz
cleanepi_1.1.2.9000.zip(r-4.7)cleanepi_1.1.2.9000.zip(r-4.6)cleanepi_1.1.2.9000.zip(r-4.5)
cleanepi_1.1.2.9000.tgz(r-4.6-any)cleanepi_1.1.2.9000.tgz(r-4.5-any)
cleanepi_1.1.2.9000.tar.gz(r-4.7-any)cleanepi_1.1.2.9000.tar.gz(r-4.6-any)
cleanepi_1.1.2.9000.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
DESCRIPTION |NEWS
card.svg |card.png
cleanepi/json (API)

# Install 'cleanepi' in R:

install.packages('cleanepi', repos = c('https://epiverse-trace.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/epiverse-trace/cleanepi/issues

Pkgdown/docs site:https://epiverse-trace.github.io

Datasets:

common_na_strings - Common strings representing missing values

On CRAN:

data-cleaning epidemiology epiverse

7.45 score 12 stars 49 scripts 413 downloads 21 exports 40 dependencies

Last updated from:6fb22d1aa5. Checks:9 OK. Indexed: yes.

Target	Result	Time
linux-devel-x86_64	OK	179
source / vignettes	OK	190
linux-release-x86_64	OK	168
macos-release-arm64	OK	99
macos-oldrel-arm64	OK	93
windows-devel	OK	112
windows-release	OK	121
windows-oldrel	OK	119
wasm-release	OK	158

Exports:%>%add_to_dictionary add_to_report check_date_sequence check_subject_ids clean_data clean_using_dictionary convert_numeric_to_date convert_to_numeric correct_misspelled_values correct_subject_ids find_duplicates get_default_params print_report remove_constants remove_duplicates replace_missing_values scan_data standardize_column_names standardize_dates timespan

Dependencies:backports bit bit64 checkmate cli clipr cpp11 crayon dplyr forcats generics glue hms janitor lifecycle linelist lubridate magrittr matchmaker numberize pillar pkgconfig prettyunits progress purrr R6 readr rlang snakecase stringi stringr tibble tidyr tidyselect timechange tzdb utf8 vctrs vroom withr

Introduction to cleanepi

An overview | General data cleaning tasks | Using {cleanepi} functionalities with pipe operators | Printing the report | Specific data cleaning tasks | Remove constant data | Cleaning column names | Replacing missing entries with NA | Standardizing Dates | Standardizing subject IDs | Detecting incorrect, duplicated, and missing subject IDs | Correct wrong subject ids | Checking date sequence | Converting character columns into numeric | Converting numeric values into date | Finding duplicated rows | Removing duplicates | Dictionary based data substituting | Correct misspelled values | Calculating time span in different time scales (“years”, “months”, “weeks”, or “days”)

Rendered fromcleanepi.Rmdusingknitr::rmarkdown

Last update: 2025-07-15
Started: 2023-03-07

Package Design vignette for {cleanepi}

Rendered fromdesign_principle.Rmdusingknitr::rmarkdown

Last update: 2025-07-08
Started: 2024-01-20

Help page	Topics
Add an element to the data dictionary	add_to_dictionary
Add an element to the report object	add_to_report
Checks whether the order in a sequence of date events is chronological. order.	check_date_sequence
Check whether the subject IDs comply with the expected format. When incorrect IDs are found, the function sends a warning and the user can call the 'correct_subject_ids' function to correct them.	check_subject_ids
Clean and standardize data	clean_data
Perform dictionary-based cleaning	clean_using_dictionary
Common strings representing missing values	common_na_strings
Convert numeric to date	convert_numeric_to_date
Convert columns into numeric	convert_to_numeric
Correct misspelled values by using approximate string matching techniques to compare them against the expected values.	correct_misspelled_values
Correct the wrong subject IDs based on the user-provided values.	correct_subject_ids
Identify and return duplicated rows in a data frame or linelist.	find_duplicates
Set and return 'clean_data' default parameters	get_default_params
Generate report from data cleaning operations	print_report
Remove constant data, including empty rows, empty columns, and columns with constant values.	remove_constants
Remove duplicates	remove_duplicates
Replace missing values with 'NA'	replace_missing_values
Scan through a data frame and return the proportion of 'missing', 'numeric', 'Date', 'character', 'logical' values.	scan_data
Standardize column names of a data frame or line list	standardize_column_names
Standardize date variables	standardize_dates
Calculate time span between dates	timespan

Package: cleanepi 1.1.2.9000

cleanepi: Clean and Standardize Epidemiological Data

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)