A yearly panel of head-to-head counts for the top 105 ATP players, suitable for Bradley–Terry/Stochastic Block Model analyses. For each season, the data include (i) the number of wins by player \(i\) against player \(j\) and (ii) the number of matches played between \(i\) and \(j\), along with a per-season player metadata tibble.
Format
A named list of length 23, with elements "2000", "2001", …, "2022".
Each yearly element is a list of length 3:
Y_ijA numeric matrix \(105 \times 105\). Entry
Y_ij[i, j]is the count of matches in which player \(i\) defeated player \(j\) in that calendar year (nonnegative integer; diagonal is zero).N_ijA numeric matrix \(105 \times 105\). Entry
N_ij[i, j]is the total number of matches between players \(i\) and \(j\) that year (nonnegative integer; symmetric by construction; diagonal is zero).players_dfA tibble/data frame with 105 rows and 7 columns describing the player index used in the matrices for that year:
playerInteger player identifier (row/column index used in
Y_ijandN_ij).worst_rankWorst (numerically largest) ATP ranking attained by the player during the year.
median_rankMedian ATP ranking across the player's ranking snapshots in that year.
last_rankATP ranking at the last snapshot available in the year (e.g., year-end ranking).
age_yearApproximate age (in years) for that season (e.g., at mid-season).
ht_yearPlayer height in centimeters (season-level value).
player_slugCharacter identifier (URL-safe or underscored name) for the player.
Source
Aggregated by the package author from public ATP results (e.g., the tennis\_atp datasets by Jeff Sackmann) and internal preprocessing. See the package vignette for provenance and cleaning steps.
Details
The player ordering in players_df defines the row/column indexing of
Y_ij and N_ij for the corresponding year. The diagonal entries of both
matrices are zero by definition. In typical usage for Bradley–Terry-type
models, one can treat Y_ij[i, j] as the number of “successes” for
\(i\) vs. \(j\), with the binomial denominator N_ij[i, j].
Note
The matrices may be sparse for many player pairs. Ensure any model code
guards against divisions by zero when N_ij[i, j] = 0.
Examples
if (FALSE) { # \dontrun{
data(ATP_2000_2022)
names(ATP_2000_2022)
year <- "2000"
str(ATP_2000_2022[[year]])
# Player i's total wins that year:
i <- 1
sum(ATP_2000_2022[[year]]$Y_ij[i, ], na.rm = TRUE)
# Total matches between i and j:
j <- 2
ATP_2000_2022[[year]]$N_ij[i, j]
# Join player metadata to indices used in the matrices:
head(ATP_2000_2022[[year]]$players_df)
} # }