R package

R Package

The gerda R package provides tools to download and work with GERDA datasets directly in R. Current CRAN version: 0.6.0 (CRAN); development version: 0.7.1 (GitHub). As of v0.7 the package exposes 46 datasets covering local, state, federal, mayoral, Landrat (county executive), European Parliament, and county (Kreistag) elections, including federal and state results at the constituency (Wahlkreis) level, plus crosswalks and covariates. Federal county-level data goes back to 1953; the other election families extend through 2026.

Python users

A lightweight Python loader is also available: gerda on PyPI (source: hhilbig/gerda-py). It exposes three functions — gerda.load(name), gerda.datasets(), and gerda.party_crosswalk(...) — and returns pandas DataFrames (or polars, optionally). Bundled covariate / Census merge helpers are not yet ported; use the R package for those.

pip install gerda

import gerda
df = gerda.load("federal_cty_harm")

Installation

# Install from CRAN
install.packages("gerda")

# Or install development version from GitHub
devtools::install_github("hhilbig/gerda")

Main Functions

Data Loading

gerda_data_list(print_table = TRUE): Lists all available GERDA datasets with descriptions.
- print_table: If TRUE (default), prints a formatted table and invisibly returns a tibble. If FALSE, returns the tibble directly.
load_gerda_web(file_name, verbose = FALSE, file_format = "rds", on_error = "warn", timeout = 300, max_retries = 2, cache = FALSE, refresh = FALSE): Loads a GERDA dataset from the web.
- file_name: Dataset name (see gerda_data_list() for options).
- verbose: Print loading messages (default FALSE).
- file_format: File format to download, "rds" or "csv" (default "rds").
- on_error: How to handle failures (unknown name, download error, corrupt file). "warn" (default) emits a warning and returns NULL; "stop" raises an error, which is the safer default inside scripts and pipelines. The global default can be changed with options(gerda.on_error = "stop").
- timeout: Download timeout in seconds (default 300). Larger municipality panels can need more on slow connections.
- max_retries: Extra download attempts after the first, with backoff (default 2). GERDA files are served through Git LFS; a download that returns the LFS pointer instead of the data is now caught and retried.
- cache: If TRUE, downloaded datasets are cached on disk and reused on later calls instead of being re-downloaded (default FALSE). refresh = TRUE forces a fresh download.
- Includes fuzzy matching for file names and suggests close matches if an exact match isn’t found.
- Party vote-share columns (cdu, spd, etc.) are fractions of valid votes and do not sum to 1 across named major parties: the remainder sits in smaller-party columns and an other category.
clear_gerda_cache() and gerda_cache_dir(): Clear and locate the on-disk download cache used when cache = TRUE. The cache lives under tools::R_user_dir("gerda", "cache") and is only written when caching is switched on.

Covariates (INKAR county-level, 1995–2022)

add_gerda_covariates(election_data): Appends 30 INKAR county-level socioeconomic indicators (demographics, economy, labour market, education, income, healthcare, childcare, housing, transport, public finances) to county- or municipality-level election data. On municipality data, all municipalities in the same Kreis receive identical covariate values. Coverage is strongest for 1998–2021; several newer indicators are only available for more recent years. See gerda_covariates_codebook() for per-variable coverage.
gerda_covariates(): Returns the raw covariate data as a standalone tibble for manual merging.
gerda_covariates_codebook(): Returns the codebook with variable descriptions, original INKAR codes, and missing-data rates.

Census 2022 (Zensus, municipality-level)

add_gerda_census(election_data): Appends 14 Zensus 2022 indicators (population and age structure, migration background, household size, housing) to county- or municipality-level election data. The census is a single 2022 snapshot, so values do not vary across election years; analyses relying on within-unit variation in these variables are not supported. For county-level data, municipality values are aggregated up using population-weighted means for shares and sums for counts. Most indicators have above 95% municipality coverage; avg_household_size_census22 is missing for roughly 12.5% of municipalities because Destatis suppresses small-cell values.
gerda_census(): Returns the raw census data as a standalone tibble.
gerda_census_codebook(): Returns the codebook with variable descriptions and coverage notes.

Party Mapping

party_crosswalk(party_gerda, destination): Maps GERDA party names to corresponding values from the ParlGov database.
- party_gerda: Character vector of party names.
- destination: Target column name from ParlGov crosswalk.

Usage Examples

library(gerda)

# List available datasets
gerda_data_list()

# Load harmonized municipal election data
municipal <- load_gerda_web("municipal_harm", verbose = TRUE)

# Load federal county data with socioeconomic covariates
federal_county <- load_gerda_web("federal_cty_harm") %>%
  add_gerda_covariates()

# View covariate definitions
gerda_covariates_codebook()

# Map party names to ParlGov
party_crosswalk(c("cdu_csu", "spd", "gruene"), "party_name_english")

Deprecations

As of v0.6, federal_cty_unharm exposes both the upstream columns (ags, year) and the canonical GERDA county-level names (county_code, election_year). The ags and year aliases will be removed in v0.8. New code should use county_code and election_year, which match the rest of the county-level datasets and work directly with add_gerda_covariates().

Documentation

Feedback

Feedback is welcome. Please email hhilbig@ucdavis.edu or open an issue on GitHub.