| Title: | High Dimensional Analysis in Linked Spaces |
|---|---|
| Description: | A 'shiny' GUI that performs high dimensional cluster analysis. This tool performs data preparation, clustering and visualisation within a dynamic GUI. With interactive methods allowing the user to change settings all without having to to leave the GUI. An earlier version of this package was described in Laa and Valencia (2022) <doi:10.1140/epjp/s13360-021-02310-1>. |
| Authors: | Gabriel McCoy [aut, cre] (ORCID: <https://orcid.org/0009-0008-3570-0361>), Ursula Laa [aut] (ORCID: <https://orcid.org/0000-0002-0249-6439>), German Valencia [aut] (ORCID: <https://orcid.org/0000-0001-6600-1290>) |
| Maintainer: | Gabriel McCoy <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-21 06:53:31 UTC |
| Source: | https://github.com/gabrielmccoy/pandemonium |
The dataset contains daily counts of bikes rented with corresponding weather and seasonal information. The data is provided by Hadi Fanaee-T and available from https://doi.org/10.24432/C5W894. Additionally, model information from a single hidden layer neural network with eight nodes in the hidden layer has been added: the values of the activations for all observations (variables A1 to A8 in the cluster space) and the model prediction (pred) and residual (res) in the other variables.
BikesBikes
a list of 4 dataframes
dataframe 731 obs of 18 variables containing the entire bikes data set
dataframe 731 obs of 8 variables (cluster space)
dataframe 731 obs of 6 variables (linked space, predictors used in the model)
dataframe 731 obs of 4 variables (other variables, including observed and predicted counts)
Can be used as getScores input in pandemonium. Returns chi-squared values as the score and sigma bins as the bins.
chi2Score(cluster, covinv, exp, ...)chi2Score(cluster, covinv, exp, ...)
cluster |
dataframe with variables in space1 |
covinv |
inverse covariance matrix from space1 |
exp |
reference point from space 1 |
... |
other expected values of getScore |
named list containing scores for use in pandemonium
chi2Score( Bikes$space1, solve(cov(Bikes$space1)), data.frame(value = colMeans(Bikes$space1)) )chi2Score( Bikes$space1, solve(cov(Bikes$space1)), data.frame(value = colMeans(Bikes$space1)) )
The returned tibble contains the id of the cluster benchmark, the cluster radius and diameter, and group number for each cluster.
getBenchmarkInformation(dmat, groups)getBenchmarkInformation(dmat, groups)
dmat |
distance matrix |
groups |
groups resulting from clustering |
data frame with cluster information
dists <- getDists(Bikes$space1, "euclidean") fit <- stats::hclust(dists, "ward.D2") groups <- stats::cutree(fit, k = 4) getBenchmarkInformation(as.matrix(dists), groups)dists <- getDists(Bikes$space1, "euclidean") fit <- stats::hclust(dists, "ward.D2") groups <- stats::cutree(fit, k = 4) getBenchmarkInformation(as.matrix(dists), groups)
The returned tibble contains the id of the cluster pairs, with benchmark distance (d1), minimum (d2) and maximum (d3) distances between any points in the two clusters.
getClusterDists(dmat, groups, benchmarks)getClusterDists(dmat, groups, benchmarks)
dmat |
distance matrix |
groups |
groups resulting from clustering |
benchmarks |
data frame with benchmark id and group number |
data frame with distance information
dists <- getDists(Bikes$space1, "euclidean") fit <- stats::hclust(dists, "ward.D2") groups <- stats::cutree(fit, k = 4) bm <- getBenchmarkInformation(as.matrix(dists), groups) getClusterDists(as.matrix(dists), groups, bm)dists <- getDists(Bikes$space1, "euclidean") fit <- stats::hclust(dists, "ward.D2") groups <- stats::cutree(fit, k = 4) bm <- getBenchmarkInformation(as.matrix(dists), groups) getClusterDists(as.matrix(dists), groups, bm)
Compute distances between all points
getDists(coord, metric, user_dist = NULL)getDists(coord, metric, user_dist = NULL)
coord |
matrix with coordinate representation of all points |
metric |
name of distance metric to be used in stats::dist |
user_dist |
user distance returned with metric=user |
distances between all points
getDists(Bikes$space1[1:5, ], "euclidean") getDists(Bikes$space1[1:5, ], "maximum")getDists(Bikes$space1[1:5, ], "euclidean") getDists(Bikes$space1[1:5, ], "maximum")
An interface to generate a specific graph seen when using the GUI. Settings include: metric, linkage, k, plotType, for details see the vignette on using this function.
makePlots( cluster, settings, cov = NULL, covInv = NULL, exp = NULL, linked = NULL, linked.cov = NULL, linked.covInv, linked.exp = NULL, user_dist = NULL, getCoordsSpace1 = normCoords, getCoordsSpace2 = normCoords, getScore = NULL, results = NULL )makePlots( cluster, settings, cov = NULL, covInv = NULL, exp = NULL, linked = NULL, linked.cov = NULL, linked.covInv, linked.exp = NULL, user_dist = NULL, getCoordsSpace1 = normCoords, getCoordsSpace2 = normCoords, getScore = NULL, results = NULL )
cluster |
dataframe of variables in cluster space |
settings |
list specifying parameters usually selected in the app |
cov |
covariance matrix for space 1 |
covInv |
inverse covariance matrix for space 1 |
exp |
reference point in space 1 |
linked |
dataframe of variables in linked space |
linked.cov |
covariance matrix for space 2 |
linked.covInv |
inverse covariance matrix for space 2 |
linked.exp |
reference point in space 2 |
user_dist |
user defined distances |
getCoordsSpace1 |
function to calculate coordinates in cluster space |
getCoordsSpace2 |
function to calculate coordinates in linked space |
getScore |
function to calculate scores and bins |
results |
an output of |
ggplot, plotly or detourr plot depending on settings$plotType
makePlots( cluster = Bikes$space1, settings = list( plotType = "WC", x = "hum", y = "temp", k = 4, metric = "euclidean", linkage = "ward.D2", WCa = 0.5, showalpha = TRUE ), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual") ) makePlots( cluster = Bikes$space1, settings = list( plotType = "tour", k = 4, metric = "euclidean", linkage = "ward.D2", tourspace = "space1", colouring = "clustering", out_dim = 2, tour_path = "grand", display = "scatter", radial_start = NULL, radial_var = NULL, slice_width = NULL, seed = 2025 ), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual") )makePlots( cluster = Bikes$space1, settings = list( plotType = "WC", x = "hum", y = "temp", k = 4, metric = "euclidean", linkage = "ward.D2", WCa = 0.5, showalpha = TRUE ), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual") ) makePlots( cluster = Bikes$space1, settings = list( plotType = "tour", k = 4, metric = "euclidean", linkage = "ward.D2", tourspace = "space1", colouring = "clustering", out_dim = 2, tour_path = "grand", display = "scatter", radial_start = NULL, radial_var = NULL, slice_width = NULL, seed = 2025 ), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual") )
Settings are: metric, linkage, k. for details see the vignette on makePlots
makeResults( cluster, settings, cov = NULL, covInv = NULL, exp = NULL, linked = NULL, linked.cov = NULL, linked.covInv, linked.exp = NULL, user_dist = NULL, getCoordsSpace1 = normCoords, getCoordsSpace2 = normCoords, getScore = NULL )makeResults( cluster, settings, cov = NULL, covInv = NULL, exp = NULL, linked = NULL, linked.cov = NULL, linked.covInv, linked.exp = NULL, user_dist = NULL, getCoordsSpace1 = normCoords, getCoordsSpace2 = normCoords, getScore = NULL )
cluster |
dataframe of variables in cluster space |
settings |
list specifying parameters usually selected in the app |
cov |
covariance matrix for space 1 |
covInv |
inverse covariance matrix for space 1 |
exp |
reference point in space 1 |
linked |
dataframe of variables in linked space |
linked.cov |
covariance matrix for space 2 |
linked.covInv |
inverse covariance matrix for space 2 |
linked.exp |
reference point in space 2 |
user_dist |
user defined distances |
getCoordsSpace1 |
function to calculate coordinates in cluster space |
getCoordsSpace2 |
function to calculate coordinates in linked space |
getScore |
function to calculate scores and bins |
list of results to be passed to makePlots
r <- makeResults(cluster = Bikes$space1, settings = list(k = 4, metric = "euclidean", linkage = "ward.D2"), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual")) makePlots(cluster = Bikes$space1, settings = list(plotType = "Obs", x = "hum", y = "temp", obs = "A1"), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual"), results = r)r <- makeResults(cluster = Bikes$space1, settings = list(k = 4, metric = "euclidean", linkage = "ward.D2"), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual")) makePlots(cluster = Bikes$space1, settings = list(plotType = "Obs", x = "hum", y = "temp", obs = "A1"), cov = cov(Bikes$space1), linked = Bikes$space2, getScore = outsideScore(Bikes$other$res, "Residual"), results = r)
Using scale to center and scale the coordinates.
normCoords(df, ...)normCoords(df, ...)
df |
data frame |
... |
other expected values of getCoords |
matrix with coordinate representation of all points
head(normCoords(Bikes$space2))head(normCoords(Bikes$space2))
Can be used as getScores input in pandemonium, to use score values that are computed externally. Returns scores values as the score, and bins computed as below, between or above the first and third quartile.
outsideScore(scores, scoreName = NULL)outsideScore(scores, scoreName = NULL)
scores |
external scores to be passed to the app. |
scoreName |
name for scores |
named list containing scores for use in pandemonium
pandemonium( df = Bikes$space1, linked = Bikes$space2, getScore = outsidescore(Bikes$other$res, "Residual") )pandemonium( df = Bikes$space1, linked = Bikes$space2, getScore = outsidescore(Bikes$other$res, "Residual") )
Opening the GUI to cluster the data points based on values in linked. Coordinates and distances are computed on the fly, or can be entered in the function call.
pandemonium( df, cov = NULL, is.inv = FALSE, exp = NULL, linked = NULL, linked.cov = NULL, linked.exp = NULL, group = NULL, label = NULL, user_dist = NULL, dimReduction = list(tSNE = tSNE, umap = umap), getCoords = list(normal = normCoords), getScore = NULL )pandemonium( df, cov = NULL, is.inv = FALSE, exp = NULL, linked = NULL, linked.cov = NULL, linked.exp = NULL, group = NULL, label = NULL, user_dist = NULL, dimReduction = list(tSNE = tSNE, umap = umap), getCoords = list(normal = normCoords), getScore = NULL )
df |
data frame of data, assumes space 1 but variables can be re-assigned in the app |
cov |
covariance matrix (optional) |
is.inv |
is the covariance matrix an inverse default FALSE |
exp |
observable reference value (e.g. experimental measurement) |
linked |
data frame assumed to be in space 2 but variables can be re-assigned in the app |
linked.cov |
covariance matrix (optional) |
linked.exp |
observable reference value (e.g. experimental measurement) |
group |
grouping assignments |
label |
point labels |
user_dist |
input distance matrix (optional) |
dimReduction |
named list of functions used for dimension reduction |
getCoords |
named list containing functions to calculate coordinates |
getScore |
named list containing functions to calculate scores to be plotted as bins and continuous value. |
No return value, called to initiate 'shiny' app
Computes coordinate values by comparing observed values to the reference, using the covariance matrix as when computing the chi-squared loss.
pullCoords(df, covInv, exp, ...)pullCoords(df, covInv, exp, ...)
df |
data frame |
covInv |
inverse covariance matrix |
exp |
reference values |
... |
other expected values of getCoords |
matrix with coordinate representation of all points
head(pullCoords( Bikes$space2, solve(cov(Bikes$space2)), data.frame(value = colMeans(Bikes$space2)) ))head(pullCoords( Bikes$space2, solve(cov(Bikes$space2)), data.frame(value = colMeans(Bikes$space2)) ))
Coordinates are computed as centered by the reference value and scaled with the standard deviation. Uses the i,ith entry of the covariance matrix as the standard deviation of the ith variable.
pullCoordsNoCov(df, cov, exp, ...)pullCoordsNoCov(df, cov, exp, ...)
df |
data frame |
cov |
covariance matrix |
exp |
reference values |
... |
other expected values of getCoords |
matrix with coordinate representation of all points
head(pullCoordsNoCov( Bikes$space2, cov(Bikes$space2), data.frame(value = colMeans(Bikes$space2)) ))head(pullCoordsNoCov( Bikes$space2, cov(Bikes$space2), data.frame(value = colMeans(Bikes$space2)) ))
Returns the input data frame. This is used when other coordinate computations fail. In general, scaling of the inputs is recommended before clustering.
rawCoords(df, ...)rawCoords(df, ...)
df |
data frame |
... |
other expected values of getCoords |
Externally calculated coordinates can be used through userCoords or as input data with rawCoords used as the coordinate function. The use of userCoords over rawCoords is in the treatment of input data. As pandemonium displays the input data in many plots the use of coordinates as input data will result in these plots being less meaningful for interpretation. Use userCoords where coordinates are necessary to calculate distances but interpretation from plots of clustering space is necessary.
matrix with coordinate representation of all points
head(rawCoords(Bikes$space2))head(rawCoords(Bikes$space2))
Computes non-linear dimension reduction with Rtsne and default parameters.
tSNE(dist, ...)tSNE(dist, ...)
dist |
a distance matrix |
... |
other parameters expected to be passed to dimReduction |
list containing a n x 2 matrix of reduced dimension data in Y
head(tSNE(getDists(Bikes$space1, "euclidean"))$Y)head(tSNE(getDists(Bikes$space1, "euclidean"))$Y)
Computes non-linear dimension reduction with uwot and default parameters.
umap(dist, ...)umap(dist, ...)
dist |
a distance matrix |
... |
other parameters expected to be passed to dimReduction |
list containing a 2 x n matrix of reduced dimension data
head(umap(getDists(Bikes$space1, "euclidean"))$Y)head(umap(getDists(Bikes$space1, "euclidean"))$Y)
Allows the use of externally calculated coordinates in the app. Can only be used when variables are not reassigned between the two spaces.
userCoords(user_coords)userCoords(user_coords)
user_coords |
coordinate matrix the size of the space it will be used on |
Externally calculated coordinates can be used through userCoords or as input data with rawCoords used as the coordinate function. The use of userCoords over rawCoords is in the treatment of input data. As pandemonium displays the input data in many plots the use of coordinates as input data will result in these plots being less meaningful for interpretation. Use userCoords where coordinates are necessary to calculate distances but interpretation from plots of clustering space is necessary.
function that returns the user defined coordinates user_coords
pandemonium( df = Bikes$space1, space2 = Bikes$space2, coords = list(normalised = normCoords, space2 = userCoords(Bikes$space2)) )pandemonium( df = Bikes$space1, space2 = Bikes$space2, coords = list(normalised = normCoords, space2 = userCoords(Bikes$space2)) )
For working with the results outside the app. Settings used: metric, linkage, k
writeResults( cluster, cov = NULL, covInv = NULL, exp = NULL, linked, linked.cov = NULL, linked.covInv = NULL, linked.exp = NULL, settings, filename, user_dist = NULL, getCoords.space1 = normCoords, getCoords.space2 = rawCoords )writeResults( cluster, cov = NULL, covInv = NULL, exp = NULL, linked, linked.cov = NULL, linked.covInv = NULL, linked.exp = NULL, settings, filename, user_dist = NULL, getCoords.space1 = normCoords, getCoords.space2 = rawCoords )
cluster |
cluster space matrix |
cov |
covariance matrix |
covInv |
inverse covariance matrix |
exp |
observable reference value (e.g. experimental measurement) |
linked |
linked space matrix |
linked.cov |
covariance matrix |
linked.covInv |
inverse covariance matrix |
linked.exp |
observable reference value (e.g. experimental measurement) |
settings |
list specifying parameters usually selected in the app |
filename |
path to write the results file to |
user_dist |
input distance matrix (optional) |
getCoords.space1 |
function to calculate coordinates on clustering space |
getCoords.space2 |
function to calculate coordinates on linked space |
No return value, called for writing file
file <- tempfile() writeResults( cluster = Bikes$space1, linked = Bikes$space2, settings = list(metric = "euclidean", linkage = "ward.D2", k = 4), filename = file ) file.remove(file)file <- tempfile() writeResults( cluster = Bikes$space1, linked = Bikes$space2, settings = list(metric = "euclidean", linkage = "ward.D2", k = 4), filename = file ) file.remove(file)