Title: | Generalized Dissimilarity Modeling |
---|---|
Description: | A toolkit with functions to fit, plot, summarize, and apply Generalized Dissimilarity Models. Mokany K, Ware C, Woolley SNC, Ferrier S, Fitzpatrick MC (2022) <doi:10.1111/geb.13459> Ferrier S, Manion G, Elith J, Richardson K (2007) <doi:10.1111/j.1472-4642.2007.00341.x>. |
Authors: | Matt Fitzpatrick [aut, cre] , Karel Mokany [aut] , Glenn Manion [aut], Diego Nieto-Lugilde [aut] , Simon Ferrier [aut] , Roozbeh Valavi [ctb], Matthew Lisk [ctb], Chris Ware [ctb], Skip Woolley [ctb], Tom Harwood [ctb] |
Maintainer: | Matt Fitzpatrick <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.6.0-3 |
Built: | 2024-11-25 15:21:19 UTC |
Source: | https://github.com/fitzlab-al/gdm |
Generalized Dissimilarity Modeling is a statistical technique for modelling variation in biodiversity between pairs of geographical locations or through time. The gdm package provides functions to fit, evaluate, summarize, and plot Generalized Dissimilarity Models and to make predictions (across space and/or through time) and map biological patterns by transforming environmental predictor variables.
The functions in the gdm package provide the tools necessary for fitting GDMs, including functions to prepare biodiversity and environmental data. Major functionality includes:
Formatting various types of biodiversity and environmental data to gdm's site-pair format used in model fitting
Fitting GDMs using geographic and environmental distances between sites
Plotting fitted functions & extracting I-spline values
Estimating predictor importance using matrix permutation and predictor contributions using deviance paritioning
Using cross-validation to evaluate models
Predicting pairwise dissimiliarites between sites or times and transforming environmental predictors to biological importance and mapping these patterns.
To see the preferable citation of the package, type citation("gdm")
.
GDM fits biological distances to pairwise site geographical and environmental distances. Most users will need to first format their data to gdm's site-pair table format:
formatsitepair
|
To convert biodiversity and environmental data to site-pair format |
gdm
|
To fit a GDM |
gdm.crossvalidation
|
To evaluate a GDM |
gdm.partition.deviance
|
To asses predictor contributions to deviance explained |
gdm.varImp
|
To asses model significance and predictor importance |
summary
|
To summarize a GDM |
predict
|
To predict biological dissimilarities between sites in space or between time periods |
gdm.transform
|
To transform each environmental predictor to biological importance |
plot
|
To plot model fit and I-splines |
isplineExtract
|
To extract I-spline values to allow for custom plotting |
plotUncertainty
|
To estimate and plot model sensitivity using bootstrapping |
The gdm development team is Matt Fitzpatrick and Karel Mokany. The R package is based on code originally developed by Glenn Manion under the direction of Simon Ferrier. Where others have contributed to individual functions, credits are provided in function help pages.
The maintainer of the R version of gdm is Matt Fitzpatrick <[email protected]>.
Calculate GDM deviance for observed & predicted dissimilarities. Can be used for assessing cross-validation data. Translated from the c++ function CalcGDMDevianceDouble() in the file NNLS_Double.cpp from the GDM R package.
calculate.gdm.deviance(predDiss, obsDiss)
calculate.gdm.deviance(predDiss, obsDiss)
predDiss |
(float) A vector of predicted dissimilarity values, of same length as obsDiss. |
obsDiss |
(float) A vector of observed dissimilarity values, of same length as predDiss. |
A single value (float) being the deviance.
This function takes input biological data and environmental,
geographic, and other predictor data and builds a site-pair table required
for fitting a Generalized Dissimilarity Model using the gdm
function. NOTE: x-y coordinates of sites MUST be present in either the
biological or the environmental data. Site coordinates ideally should be in a
projected coordinate system (i.e., not longitude-latitude) to ensure proper
calculation of geographic distances.
The input biological data can be in one of the following four formats. Note
that the general term "species" is used, but any classification of biological
entities (e.g. functional types, haplotypes, etc) can be used as long as an
appropriate distance metric is also supplied (see "dist" argument):
site-by-species matrix
x, y, species list
site-by-site biological distance (dissimilarity) matrix
an existing site-pair table (see Details)
Predictor data can be provided in three formats:
a site-by-predictor matrix with a column for each predictor variable
and a row for each site
a terra object SpatRaster, with one raster for each predictor variable
one or more site-by-site distance matrices using the "distPreds" argument (see below).
formatsitepair(bioData, bioFormat, dist="bray", abundance=FALSE, siteColumn=NULL, XColumn, YColumn, sppColumn=NULL, abundColumn=NULL, sppFilter=0, predData, distPreds=NULL, weightType="equal", custWeights=NULL, sampleSites=1, verbose=FALSE)
formatsitepair(bioData, bioFormat, dist="bray", abundance=FALSE, siteColumn=NULL, XColumn, YColumn, sppColumn=NULL, abundColumn=NULL, sppFilter=0, predData, distPreds=NULL, weightType="equal", custWeights=NULL, sampleSites=1, verbose=FALSE)
bioData |
The input biological (the response variable) data table, in one of the four formats defined above (see Details). |
bioFormat |
An integer code specifying the format of bioData. Acceptable values are 1, 2, 3, or 4 (see Details). |
dist |
Default = "bray". A character code indicating the metric to
quantify pairwise site distances / dissimilarities. Calls the
|
abundance |
Default = FALSE. Indicates whether the biological data are abundance data (TRUE) or presence-absence (0, 1) data (FALSE). |
siteColumn |
The name of the column in either the biological or environmental data table containing a unique site identifier. If a site column is provided in both the biological and environmental data, the site column name must be the same in both tables. |
XColumn |
The name of the column containing x-coordinates of sites. X-coordinates can be provided in either the biological or environmental data tables, but MUST be in at least one of them. If an x-coordinate column is provided in both the biological and environmental data, the column name must be identical. Site coordinates ideally should be in a projected coordinate system (i.e., not longitude-latitude) to ensure proper calculation of geographic distances. Note that if you are using rasters, they must be in the same coordinate system as the site coordinates. |
YColumn |
The name of the column containing y-coordinates of sample sites. Y-coordinates can be provided in either the biological or environmental data tables, but MUST be in at least one of them. If a y-coordinate column is provided in both the biological and environmental data, the column name must be identical. Site coordinates ideally should be in a projected coordinate system (i.e., not longitude-latitude) to ensure proper calculation of geographic distances. Note that if you are using rasters, they must be in the same coordinate system as the site coordinates. |
sppColumn |
Only used if bioFormat = 2 (x, y, species list). The name of the column containing unique name / identifier for each species. |
abundColumn |
If abundance = TRUE, this parameter identifies the column containing the measure of abundance at each site. Only used if bioFormat = 2 (i.e., x, y, species list), though in the case of abundance data, the table would have four columns: x, y, species, abundance. |
sppFilter |
Default = 0. To account for limited sampling effort at some sites, sppFilter removes all sites at which the number of recorded species (i.e., observed species richness) is less than the specified value. For example, if sppFilter = 5, all sites with fewer than 5 recorded species will be removed. |
predData |
The environmental predictor data. Accepts either a site-by-predictor table or a terra object SpatRaster. |
distPreds |
An optional list of distance matrices to be used as predictors in combination with predData. For example, a site-by-site dissimilarity matrix for one biological group (e.g., trees) can be used as a predictor for another group (e.g., ferns). Each distance matrix must have as the first column the names of the sites (therefore the matrix will not be square). The name of the column containing the site names should have the same name as that provided for the siteColumn argument. Site IDs are required here to ensure correct ordering of sites in the construction of the site-pair table. Note that the formatsitepair function will not accept distance matrices in the as the only predictors (i.e., at least one additional, non-distPreds predictor variable is required). If you wish to fit GDM using only distance matrices provided using distPreds, provide one fake predictor (e.g., with all sites having the same value), plus site and coordinate columns. The s1 and s2 columns for this fake variable can then be removed by hand before fitting the GDM. |
weightType |
Default = "equal". Defines the weighting for sites. Can be either: (1) "equal" (weights for all sites set = 1), (2) "richness" (each site weighted according to number of species recorded), or (3) "custom" (user defined). If weightType="custom", the user must provide a vector of site weights equal to the number of rows in the full site-pair table (i.e., before species filtering (sppFilter argument) or sub-sampling is taken into account (sampleSites argument)). |
custWeights |
A two column matrix or data frame of user-defined site weights. The first column should be the site name and should be named the same as that provided for the siteColumn argument. The second column should be numeric weight values and should be named "weights". The weight values represent the importance of each site in model fitting, and the values in the output site-pair table is an average of the two sites in each site-pair. Required when weightType = "custom". Ignored otherwise. |
sampleSites |
Default = 1. A number between 0-1 indicating the fraction of sites to be used to construct the site-pair table. This argument can be used to reduce the number of sites to overcome possible memory limitations when fitting models with very large numbers of sites. |
verbose |
Default = FALSE. If TRUE, summary of information regarding dimensions of the site-pair table will be printed that can be useful for diagnostics. |
bioData and bioFormat: The function accepts biological data in the following formats:
bioData = site-by-species matrix; bioFormat = 1: assumes that the response data are provided with a site ID column (specified by siteCol) and, optionally, two columns for the x & y coordinates of the sites. All remaining columns contain the biological data, with a column for each biological entity (most commonly species). In the case that a raster stack (a terra object SpatRaster) is provided for the environmental data (predData), x-y coordinates MUST be provided in bioData to allow extraction of the environmental data at site locations. The x-y coordinates will be intersected with the raster stack and, if the number of unique cells intersected by the points is less than the number of unique site IDs (i.e. multiple sites fall within a single cell), the function will use the raster cell as the site ID and aggregate sites accordingly. Therefore, model fitting will be sensitive to raster cell size. If the environmental data are in tabular format, they should have the same number of sites (i.e., same number of rows) as bioData. The x-y coordinate and site ID columns must have the same names in bioData and predData.
bioData = x, y, species list (optionally a fourth column with abundance can be provided); bioFormat = 2: assumes a table of 3 or 4 columns, the first two being the x & y coordinates of species records, the third (sppCol) being the name / identifier of the species observed at that location, and optionally a fourth column indicating a measure of abundance. If an abundance column is not provided, presence-only data are assumed. In the case that a raster stack (a terra object SpatRaster) is provided for the environmental data (predData), the x-y coordinates will be intersected with the raster stack and, if the number of unique cells intersected by the points is less than the number of unique site IDs (i.e. multiple sites fall within a single cell), the function will use the raster cell as the site ID and aggregate sites accordingly. Therefore, model fitting will be sensitive to raster cell size.
bioData = site-site distance (dissimilarity) matrix; bioFormat = 3. This option allows the use of an existing site-site distance (dissimilarity) matrix, such as genetic distance matrix calculated outside of the gdm package. Only the lower triangle of the matrix is required to create the site-pair table, but the function will automatically removes the upper triangle if present. The code checks and aligns the order of sites in the distance matrix and the predictor data to ensure they match. To do so, (1) a site column is required in both the distance matrix and the predictor data and (2) site IDs are required to be a number. This is the only bioFormat in which the environmental data MAY NOT be a raster stack.
bioData = site-pair table; bioFormat = 4: with an already created site-pair table, this option allows the user to add one or more distance matrices (see distPreds above) to the existing site-pair table and/or sub-sample the site-pair table (see sample above). If the site-pair table was not created using the formatsitepair function, the user will need to ensure the order of the sites matches that in other tables being provided to the function.
NOTES: (1) The function assumes that the x-y coordinates and the raster stack (if used) are in the same coordinate system. No checking is performed to confirm this is the case. (2) The function assumes that the association between the provided site and x-y coordinate columns are singular and unique. Therefore, the function will fail should a given site has more than one sets of coordinates associated with it, as well as multiple sites being given the exact same coordinates.
A formatted site-pair table containing the response (biological distance or dissimilarity), predictors, and weights as required for fitting Generalized Dissimilarity Models.
## tabular data # start with the southwest data table head(southwest) sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] #########table type 1 ## site-species table without coordinates testData1a <- reshape2::dcast(sppData, site~species) ##site-species table with coordinates coords <- unique(sppData[, 2:ncol(sppData)]) testData1b <- merge(testData1a, coords, by="site") ## site-species, table-table exFormat1a <- formatsitepair(testData1a, 1, siteColumn="site", XColumn="Long", YColumn="Lat", predData=envTab) #' # next, let's try environmental raster data ## not run # rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") # envRast <- terra::rast(rastFile) ## site-species, table-raster ## not run # exFormat1b <- formatsitepair(testData1b, 1, siteColumn="site", XColumn="Long", # YColumn="Lat", predData=envRast) #########table type 2 ## site xy spp list, table-table exFormat2a <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ## site xy spp list, table-raster ## not run # exFormat2b <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", # sppColumn="species", siteColumn="site", predData=envRast) #########table type 3 ## It is possible to format a site-pair table by starting # with a pre-calculated matrix of biological distances dim(gdmDissim) # pairwise distance matrix + 1 column for site IDs gdmDissim[1:5, 1:5] # now we can format the table: exFormat3 <- formatsitepair(gdmDissim, 3, XColumn="Long", YColumn="Lat", predData=envTab, siteColumn="site") #########table type 4 ## adds a predictor matrix to an existing site-pair table, in this case, ## predData needs to be provided, but is not actually used exFormat4 <- formatsitepair(exFormat2a, 4, predData=envTab, siteColumn="site", distPreds=list(as.matrix(gdmDissim)))
## tabular data # start with the southwest data table head(southwest) sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] #########table type 1 ## site-species table without coordinates testData1a <- reshape2::dcast(sppData, site~species) ##site-species table with coordinates coords <- unique(sppData[, 2:ncol(sppData)]) testData1b <- merge(testData1a, coords, by="site") ## site-species, table-table exFormat1a <- formatsitepair(testData1a, 1, siteColumn="site", XColumn="Long", YColumn="Lat", predData=envTab) #' # next, let's try environmental raster data ## not run # rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") # envRast <- terra::rast(rastFile) ## site-species, table-raster ## not run # exFormat1b <- formatsitepair(testData1b, 1, siteColumn="site", XColumn="Long", # YColumn="Lat", predData=envRast) #########table type 2 ## site xy spp list, table-table exFormat2a <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ## site xy spp list, table-raster ## not run # exFormat2b <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", # sppColumn="species", siteColumn="site", predData=envRast) #########table type 3 ## It is possible to format a site-pair table by starting # with a pre-calculated matrix of biological distances dim(gdmDissim) # pairwise distance matrix + 1 column for site IDs gdmDissim[1:5, 1:5] # now we can format the table: exFormat3 <- formatsitepair(gdmDissim, 3, XColumn="Long", YColumn="Lat", predData=envTab, siteColumn="site") #########table type 4 ## adds a predictor matrix to an existing site-pair table, in this case, ## predData needs to be provided, but is not actually used exFormat4 <- formatsitepair(exFormat2a, 4, predData=envTab, siteColumn="site", distPreds=list(as.matrix(gdmDissim)))
The gdm function is used to fit a generalized dissimilarity model to tabular
site-pair data formatted as follows using the formatsitepair
function: distance, weights, s1.xCoord, s1.yCoord, s2.xCoord, s2.yCoord,
s1.Pred1, s1.Pred2, ...,s1.PredN, s2.Pred1, s2.Pred2, ..., s2.PredN. The
distance column contains the response variable must be any ratio-based
dissimilarity (distance) measure between Site 1 and Site 2. The weights column
defines any weighting to be applied during fitting of the model. If equal
weighting is required, then all entries in this column should be set to 1.0
(default). The third and fourth columns, s1.xCoord and s1.yCoord, represent
the spatial coordinates of the first site in the site pair (s1). The fifth
and sixth columns, s2.xCoord and s2.yCoord, represent the coordinates of the
second site (s2). Note that the first six columns are REQUIRED, even if you
do not intend to use geographic distance as a predictor (in which case these
columns can be loaded with dummy data if the actual coordinates are
unknown - though that would be weird, no?). The next N*2 columns contain values
for N predictors for Site 1, followed by values for the same N predictors for
Site 2.
The following is an example of a GDM input table header with three
environmental predictors (Temp, Rain, Bedrock):
distance, weights, s1.xCoord, s1.yCoord, s2.xCoord, s2.yCoord, s1.Temp, s1.Rain, s1.Bedrock, s2.Temp, s2.Rain, s2.Bedrock
gdm(data, geo=FALSE, splines=NULL, knots=NULL)
gdm(data, geo=FALSE, splines=NULL, knots=NULL)
data |
A data frame containing the site pairs to be used to fit the GDM
(obtained using the |
geo |
Set to TRUE if geographic distance between sites is to be included as a model term. Set to FALSE if geographic distance is to be omitted from the model. Default is FALSE. |
splines |
An optional vector of the number of I-spline basis functions to be used for each predictor in fitting the model. If supplied, it must have the same length as the number of predictors (including geographic distance if geo is TRUE). If this vector is not provided (splines=NULL), then a default of 3 basis functions is used for all predictors. |
knots |
An optional vector of knots in units of the predictor variables to be used in the fitting process. If knots are supplied and splines=NULL, then the knots argument must have the same length as the number of predictors * n, where n is the number of knots (default=3). If both knots and the number of splines are supplied, then the length of the knots argument must be the same as the sum of the values in the splines vector. Note that the default values for knots when the default three I-spline basis functions are 0 (minimum), 50 (median), and 100 (maximum) quantiles. |
gdm returns a gdm model object. The function
summary.gdm
can be used to obtain or print a synopsis of the
results. A gdm model object is a list containing at least the following
components:
The name of the table used as the data argument to the model.
Whether geographic distance was used as a predictor in the model.
The deviance of the fitted GDM model.
The deviance of the null model.
The percentage of null deviance explained by the fitted GDM model.
The fitted value for the intercept term in the model.
A list of the names of the predictors that were used to fit the model, in order of the amount of turnover associated with each predictor (based on the sum of the I-spline coefficients).
A list of the coefficients for each spline for each of the predictors considered in model fitting.
A vector of the knots derived from the x data (or user defined), for each predictor.
A vector of the number of I-spline basis functions used for each predictor.
The date and time of model creation.
The observed response for each site pair (from data column 1).
The predicted response for each site pair, from the fitted model (after applying the link function).
The linear predictor (ecological distance) for each site pair, from the fitted model (before applying the link function).
Ferrier S, Manion G, Elith J, Richardson, K (2007) Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity & Distributions 13, 252-264.
formatsitepair, summary.gdm,
plot.gdm, predict.gdm, gdm.transform
##fit table environmental data # format site-pair table using the southwest data table head(southwest) sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##fit table GDM gdmTabMod <- gdm(sitePairTab, geo=TRUE) summary(gdmTabMod) ##fit raster environmental data ##sets up site-pair table rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") envRast <- terra::rast(rastFile) ##environmental raster data sitePairRast <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envRast) ##sometimes raster data returns NA in the site-pair table, these rows will ##have to be removed before fitting gdm sitePairRast <- na.omit(sitePairRast) ##fit raster GDM gdmRastMod <- gdm(sitePairRast, geo=TRUE) summary(gdmRastMod)
##fit table environmental data # format site-pair table using the southwest data table head(southwest) sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##fit table GDM gdmTabMod <- gdm(sitePairTab, geo=TRUE) summary(gdmTabMod) ##fit raster environmental data ##sets up site-pair table rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") envRast <- terra::rast(rastFile) ##environmental raster data sitePairRast <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envRast) ##sometimes raster data returns NA in the site-pair table, these rows will ##have to be removed before fitting gdm sitePairRast <- na.omit(sitePairRast) ##fit raster GDM gdmRastMod <- gdm(sitePairRast, geo=TRUE) summary(gdmRastMod)
Undertake a cross-validation assessment of a GDM fit using all the predictors included in the formatted GDM input site-pair table (spTable). The cross-validation is run using a specified proportion (train.proportion) of the randomly selected sites included in spTable to train the model, with the remaining sites being used to test the performance of the model predictions. The test is repeated a specified number of times (n.crossvalid.tests), with a unique random sample taken each time. Outputs are a number of cross-validation test metrics.
gdm.crossvalidation(spTable, train.proportion=0.9, n.crossvalid.tests=1, geo=FALSE, splines=NULL, knots=NULL)
gdm.crossvalidation(spTable, train.proportion=0.9, n.crossvalid.tests=1, geo=FALSE, splines=NULL, knots=NULL)
spTable |
(dataframe) A dataframe holding the GDM input table for model fitting. |
train.proportion |
(float) The proportion of sites in 'spTable' to use in training the GDM, with the remaining proportion used to test the model. (default = 0.9) |
n.crossvalid.tests |
(integer) The number of cross-validation sets to use in testing the GDM. (default = 1) |
geo |
(boolean) Geographic distance to be used in model fitting (default = FALSE). |
splines |
(vector) An optional vector of the number of I-spline basis functions to be used for each predictor in fitting the model. |
knots |
(vector) An optional vector of knots in units of the predictor variables to be used in the fitting process. |
List, providing cross-validation statistics. These are metrics that describe how well the model fit using the sitepair training table predicts the dissimilarities in the site-pair testing table. Metrics provided include: 'Train.Deviance.Explained' (the deviance explained for the training data); 'Test.Deviance.Explained' (the deviance explained for the test data); 'Mean.Error'; 'Mean.Absolute.Error'; 'Root.Mean.Square.Error'; 'Obs.Pred.Correlation' (Pearson's correlation coefficient between observed and predicted values); 'Equalized.RMSE' (the average root mean square error across bands of observed dissimilarities (0.05 dissimilarity units)); 'Error.by.Observed.Value' (the average root mean square error and number of observations within bands of observed dissimilarities (0.05 dissimilarity units)).
Partitions deviance explained from GDM into different user specified components - most typically environment versus space.
gdm.partition.deviance(sitePairTable, varSets=list(), partSpace=TRUE)
gdm.partition.deviance(sitePairTable, varSets=list(), partSpace=TRUE)
sitePairTable |
A correctly formatted site-pair table from
|
varSets |
A list in which each element is a vector of variable names across which deviance partitioning is to be performed, excluding geographic distance (which is set by the partSpace argument). Variable names must match those used to build the site-pair table. See example. |
partSpace |
Whether or not to perform the partitioning using geographic space. Default=TRUE. |
A dataframe summarizing deviance partitioning results.
Matt Fitzpatrick and Karel Mokany
# set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) # EXAMPLE - Partition two groups of variables # Make list of variable sets for partitioning varSet <- vector("list", 2) # now, name the variable groups for partitioning # note you do not need to add "space" as this is only needed # for environmental variables # two groups (soils & climate) names(varSet) <- c("soil", "climate") # lastly, add variable names for varSet$soil <- c("awcA", "phTotal", "sandA", "shcA", "solumDepth") varSet$climate <- c("bio5", "bio6", "bio15", "bio18", "bio19") varSet # run the function to partition soils, climate, and space (partSpace=TRUE) scgPart <- gdm.partition.deviance(sitePairTab, varSet, partSpace=TRUE) # EXAMPLE - Partition three groups of variables # Make list of variable sets for partitioning varSet <- vector("list", 3) names(varSet) <- c("soil", "temp", "precip") varSet$soil <- c("awcA", "phTotal", "sandA", "shcA", "solumDepth") varSet$temp <- c("bio5", "bio6") varSet$precip <- c("bio15", "bio18", "bio19") # partition soils, temperature, and precip # note we can't also partition space given the function's limit to a # maximum of three variable sets, so we set partSpace=FALSE scPart <- gdm.partition.deviance(sitePairTab, varSet, partSpace=FALSE)
# set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) # EXAMPLE - Partition two groups of variables # Make list of variable sets for partitioning varSet <- vector("list", 2) # now, name the variable groups for partitioning # note you do not need to add "space" as this is only needed # for environmental variables # two groups (soils & climate) names(varSet) <- c("soil", "climate") # lastly, add variable names for varSet$soil <- c("awcA", "phTotal", "sandA", "shcA", "solumDepth") varSet$climate <- c("bio5", "bio6", "bio15", "bio18", "bio19") varSet # run the function to partition soils, climate, and space (partSpace=TRUE) scgPart <- gdm.partition.deviance(sitePairTab, varSet, partSpace=TRUE) # EXAMPLE - Partition three groups of variables # Make list of variable sets for partitioning varSet <- vector("list", 3) names(varSet) <- c("soil", "temp", "precip") varSet$soil <- c("awcA", "phTotal", "sandA", "shcA", "solumDepth") varSet$temp <- c("bio5", "bio6") varSet$precip <- c("bio15", "bio18", "bio19") # partition soils, temperature, and precip # note we can't also partition space given the function's limit to a # maximum of three variable sets, so we set partSpace=FALSE scPart <- gdm.partition.deviance(sitePairTab, varSet, partSpace=FALSE)
This function transforms geographic and environmental predictors using (1) the
fitted functions from a model object returned from gdm
and (2) a
data frame or raster object containing predictor data for a set of sites.
gdm.transform(model, data, filename = "", ...)
gdm.transform(model, data, filename = "", ...)
model |
A gdm model object resulting from a call to |
data |
Either (i) a data frame containing values for each predictor variable in the model, formatted as follows: X, Y, var1, var2, var3, ..., varN or (ii) a terra object SpatRaster with one layer per predictor variable used in the model, excluding X and Y (rasters for x- and y-coordinates are built automatically from the input rasters if the model was fit with geo=TRUE). The order of the columns (data frame) or raster layers (SpatRaster) MUST be the same as the order of the predictors in the site-pair table used in model fitting. There is currently no checking to ensure that the order of the variables to be transformed are the same as those in the site-pair table used in model fitting. If geographic distance was not used as a predictor in model fitting, the x- and y-columns need to be removed from the data to be transformed. Output is provided in the same format as the input data. |
filename |
character. Output filename for rasters. When provided the raster layers are written to file directly. |
... |
additional arguments to pass to terra |
gdm.transform returns either a data frame with the same number of rows as the input data frame or a SpatRaster, depending on the format of the input data. If the model uses geographic distance as a predictor the output object will contain columns or layers for the transformed X and Y values for each site. The transformed environmental data will be in the remaining columns or layers.
Ferrier S, Manion G, Elith J, Richardson, K (2007) Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity & Distributions 13, 252-264.
Fitzpatrick MC, Keller SR (2015) Ecological genomics meets community-level modeling of biodiversity: Mapping the genomic landscape of current and future environmental adaptation. Ecology Letters 18: 1-16
# start with the southwest data set # grab the columns with xy, site ID, and species data sppTab <- southwest[, c("species", "site", "Lat", "Long")] ##fit gdm using rasters rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") envRast <- terra::rast(rastFile) sitePairRast <- formatsitepair(sppTab, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envRast) ##remove NA values sitePairRast <- na.omit(sitePairRast) ##fit raster GDM gdmRastMod <- gdm(sitePairRast, geo=TRUE) ##raster input, raster output transRasts <- gdm.transform(gdmRastMod, envRast) # map biological patterns; increase maxcell if using large rasters pcaSamp <- terra::prcomp(transRasts, maxcell = 1e4) # note the use of the 'index' argument pcaRast <- terra::predict(transRasts, pcaSamp, index=1:3) # stretch the PCA rasters to make full use of the colour spectrum pcaRast <- terra::stretch(pcaRast) terra::plotRGB(pcaRast, r=1, g=2, b=3)
# start with the southwest data set # grab the columns with xy, site ID, and species data sppTab <- southwest[, c("species", "site", "Lat", "Long")] ##fit gdm using rasters rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") envRast <- terra::rast(rastFile) sitePairRast <- formatsitepair(sppTab, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envRast) ##remove NA values sitePairRast <- na.omit(sitePairRast) ##fit raster GDM gdmRastMod <- gdm(sitePairRast, geo=TRUE) ##raster input, raster output transRasts <- gdm.transform(gdmRastMod, envRast) # map biological patterns; increase maxcell if using large rasters pcaSamp <- terra::prcomp(transRasts, maxcell = 1e4) # note the use of the 'index' argument pcaRast <- terra::predict(transRasts, pcaSamp, index=1:3) # stretch the PCA rasters to make full use of the colour spectrum pcaRast <- terra::stretch(pcaRast) terra::plotRGB(pcaRast, r=1, g=2, b=3)
This function uses matrix permutation to perform model and predictor significance testing and to estimate predictor importance in a generalized dissimilarity model. The function can be run in parallel on multicore machines to reduce computation time.
gdm.varImp(spTable, geo, splines = NULL, knots = NULL, predSelect = FALSE, nPerm = 50, pValue=0.05, parallel = FALSE, cores = 2, sampleSites = 1, sampleSitePairs = 1, outFile = NULL)
gdm.varImp(spTable, geo, splines = NULL, knots = NULL, predSelect = FALSE, nPerm = 50, pValue=0.05, parallel = FALSE, cores = 2, sampleSites = 1, sampleSitePairs = 1, outFile = NULL)
spTable |
A site-pair table, same as used to fit a |
geo |
Similar to the |
splines |
Same as the |
knots |
Same as the |
predSelect |
Set to TRUE to perform predictor selection using matrix permutation and backward elimination. Default is FALSE. When predSelect = FALSE results will be returned only for a model fit with all predictors. |
nPerm |
Number of permutations to use to estimate p-values. Default is 50. |
pValue |
The p-value to use for predictor selection / elimination. Default is 0.05. |
parallel |
Whether or not to run the matrix permutations and model fitting in parallel. Parallel processing is highly recommended when either (i) the nPerms argument is large (>100) or (ii) a large number of site-pairs (and / or variables) are being used in model fitting (note computation demand can be reduced using subsampling - see next arguments). The default is FALSE. |
cores |
When the parallel argument is set to TRUE, the number of cores to be registered for parallel processing. Must be <= the number of cores in the machine running the function. There is no benefit to setting the number of cores greater than the number of predictors in the model. |
sampleSites |
The fraction (0-1, though a value of 0 would be silly, wouldn't it?) of sites to retain from the full site-pair table. If less than 1, this argument will completely remove a fraction of sites such that they are not used in the permutation routines. |
sampleSitePairs |
The fraction (0-1) of site-pairs (i.e., rows) to retain from the full site-pair table - in other words, all sites will be used in the permutation routines (assuming sampleSites = 1), but not all site-pair combinations. In the case where both the sampleSites and the sampleSitePairs argument have values less than 1, sites first will be removed using the sampleSites argument, followed by removal of site-pairs using the sampleSitePairs argument. Note that the number of site-pairs removed is based on the fraction of the resulting site-pair table after sites have been removed, not on the size of the full site-pair table. |
outFile |
An optional character string to write the object returned by the function to disk as an .RData object (".RData" is not required as part of the file name). The .RData object will contain a single list with the name of "outObject". The default is NULL, meaning that no file will be written. |
To test model significance, first a model is fit using all predictors and un-permuted environmental data. Any predictor for which the sum of the I-spline coefficients sum to zero is preemptively removed. Next, the environmental data are permuted nPerm times (by randomizing the order of the rows) and a GDM is fit to each permuted table. Model significance is determined by comparing the deviance explained by GDM fit to the un-permuted table to the distribution of deviance explained values from GDM fit to the nPerm permuted tables. To assess predictor significance, this process is repeated for each predictor individually (i.e., only the data for the predictor being tested is permuted rather than the entire environmental table). Predictor importance is quantified as the percent change in deviance explained between a model fit with and without that predictor permuted. If predSelect=TRUE, this process continues by next permutating the site-pair table nPerm times, but removing one predictor at a time and reassessing predictor importance and significance. At each step, the least important predictor is dropped (backward elimination) and the process continues until all non-significant predictors are removed, with significance level being set by the user and the pValue argument.
A list of four tables. The first table summarizes full model deviance, percent deviance explained by the full model, the p-value of the full model, and the number of permutations used to calculate the statistics for each fitted model (i.e., the full model and each model with predictors removed in succession during the backward elimination procedure if predSelect=T). The remaining three tables summarize (1) predictor importance, (2) predictor significance, and (3) the number of permutations used to calculate the statistics for that model, which is provided because some GDMs may fail to converge for some permutations / predictor combinations and you might want to know how many permutations were used when calculating statistics. Or maybe you don't, you decide.
Predictor importance is measured as the percent decrease in deviance explained between the full model and the deviance explained by a model fit with that predictor permuted. Significance is estimated using the bootstrapped p-value when the predictor has been permuted. For most cases, the number of permutations will equal the nPerm argument. However, the value may be less should any of the models fit to them permuted tables fail to converge.
If predSelect=FALSE, the tables will have values only in the first column.
Matt Fitzpatrick and Karel Mokany
Ferrier S, Manion G, Elith J, Richardson, K (2007) Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity & Distributions 13, 252-264.
Fitzpatrick, MC, Sanders NJ, Ferrier S, Longino JT, Weiser MD, and RR Dunn. 2011. Forecasting the Future of Biodiversity: a Test of Single- and Multi-Species Models for Ants in North America. Ecography 34: 836-47.
##fit table environmental data ##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ## not run #modTest <- gdm.varImp(sitePairTab, geo=T, nPerm=50, parallel=T, cores=10, predSelect=T) #barplot(sort(modTest$`Predictor Importance`[,1], decreasing=T))
##fit table environmental data ##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ## not run #modTest <- gdm.varImp(sitePairTab, geo=T, nPerm=50, parallel=T, cores=10, predSelect=T) #barplot(sort(modTest$`Predictor Importance`[,1], decreasing=T))
Pairwise Bray-Curtis dissimilarity calculated using the species
occurrence data from the southwest
data set.
gdmDissim
gdmDissim
A data frame with 94 rows and 95 columns (extra column holds site IDs):
Extracts the I-spline values from a gdm object. There is one I-spline for each predictor that has at least one non-zero coefficient in the fitted model.
isplineExtract(model)
isplineExtract(model)
model |
A gdm object from |
A list with two items. The first item contains the x-values (actual values of the predictors) of the I-splines and the second item contains the y-values (partial ecological distances) of the fitted I-splines.
Ferrier S, Manion G, Elith J, Richardson, K (2007) Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity & Distributions 13, 252-264.
Fitzpatrick MC, Sanders NJ, Normand S, Svenning J-C, Ferrier S, Gove AD, Dunn RR (2013). Environmental and historical imprints on beta diversity: insights from variation in rates of species turnover along gradients. Proceedings of the Royal Society: Series B 280, art. 1768
##set up site-pair table using the southwest data set sppData <- southwest[, c(1,2,14,13)] envTab <- southwest[, c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##extracts splines exSplines <- isplineExtract(gdmMod) ##plot spline(s) #spline for winter precip (bio19) plot(exSplines[[1]][,"bio19"], exSplines[[2]][,"bio19"], type="l", lwd=3, xlab="Winter precipitation (mm)", ylab="Partial Ecological Distance")
##set up site-pair table using the southwest data set sppData <- southwest[, c(1,2,14,13)] envTab <- southwest[, c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##extracts splines exSplines <- isplineExtract(gdmMod) ##plot spline(s) #spline for winter precip (bio19) plot(exSplines[[1]][,"bio19"], exSplines[[2]][,"bio19"], type="l", lwd=3, xlab="Winter precipitation (mm)", ylab="Partial Ecological Distance")
plot is used to plot the I-splines and fit of a generalized
dissimilarity model created using the gdm
function.
## S3 method for class 'gdm' plot(x, plot.layout = c(2, 2), plot.color = "blue", plot.linewidth = 2, include.rug = FALSE, rug.sitepair = NULL, ...)
## S3 method for class 'gdm' plot(x, plot.layout = c(2, 2), plot.color = "blue", plot.linewidth = 2, include.rug = FALSE, rug.sitepair = NULL, ...)
x |
A gdm model object returned from |
plot.layout |
This argument specifies the row and column layout for the plots, including: (1) a single page plot of observed response data against the raw linear predictor (ecological distance) from the model, and (2) a single page plot of the observed response against the predicted response from the model, i.e. after applying the link function, 1.0 - exp(-y), to the linear predictor, and (3) the I-splines fitted to the individual predictors. Default is 2 rows by 2 columns. To produce one predictor plot per page set plot.layout to c(1,1). The first two model plots are always produced on a single page each and therefore the layout parameter affects only the layout of the I-spline plots for those predictors that featured in the model fitting process (i.e., predictors with all-zero I-spline coefficients are not plotted). |
plot.color |
Color of the data points that are plotted for the overall plots. |
plot.linewidth |
The line width for the regression line over-plotted in the two overall plots to optimize the display of the line over the data points. |
include.rug |
Whether or not to include a rug plot of the predictor values used to fit the gdm in the I-spline plots. When set to TRUE, a s ite-pair table must be supplied for the rug.sitepair argument. Default is FALSE. |
rug.sitepair |
A site-pair table used to add a rug plot of the predictor values used to fit the gdm in the I-spline plots. This should be the same site-pair table used to fit the gdm model being plotted. The function does not check whether the supplied site-pair table matches that used in model fitting. |
... |
Ignored. |
plot returns NULL. Use summary.gdm
to obtain a
synopsis of the model object.
Ferrier S, Manion G, Elith J, Richardson, K (2007) Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Diversity & Distributions 13:252-264.
##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##plot GDM plot(gdmMod, plot.layout=c(3,3))
##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##plot GDM plot(gdmMod, plot.layout=c(3,3))
This function estimates uncertainty in the fitted I-splines by fitting many GDMs using a subsample of the data. The function can run in parallel on multicore machines to reduce computation time (recommended for large number of iterations). I-spline plots with error bands (+/- one standard deviation) are produced showing (1) the variance of I-spline coefficients and (2) a rug plot indicating how sites used in model fitting are distributed along each gradient. Function result optionally can be saved to disk as a csv for custom plotting, etc. The result output table will have 6 columns per predictor, three each for the x and y values containing the lower bound, full model, and upper bound.
plotUncertainty(spTable, sampleSites, bsIters, geo=FALSE, splines=NULL, knots=NULL, splineCol="blue", errCol="grey80", plot.linewidth=2.0, plot.layout=c(2,2), parallel=FALSE, cores=2, save=FALSE, fileName="gdm.plotUncertainy.csv")
plotUncertainty(spTable, sampleSites, bsIters, geo=FALSE, splines=NULL, knots=NULL, splineCol="blue", errCol="grey80", plot.linewidth=2.0, plot.layout=c(2,2), parallel=FALSE, cores=2, save=FALSE, fileName="gdm.plotUncertainy.csv")
spTable |
A site-pair table, same as used to fit a |
sampleSites |
The fraction (0-1) of sites to retain from the full site-pair table when subsampling. |
bsIters |
The number of bootstrap iterations to perform. |
geo |
Same as the |
splines |
Same as the |
knots |
Same as the |
splineCol |
The color of the plotted mean spline. The default is "blue". |
errCol |
The color of shading for the error bands (+/- one standard deviation around the mean line). The default is "grey80". |
plot.linewidth |
The line width of the plotted mean spline line. The default is 2. |
plot.layout |
Same as the |
parallel |
Perform the uncertainty assessment using multiple cores? Default = FALSE. |
cores |
When the parallel argument is set to TRUE, the number of cores to be registered for the foreach loop. Must be <= the number of cores in the machine running the function. |
save |
Save the function result (e.g., for custom plotting)? Default=FALSE. |
fileName |
Name of the csv file to save the data frame that contains the function result. Default = gdm.plotUncertainy.csv. Ignored if save=FALSE. |
plotUncertainty returns NULL. Saves a csv to disk if save=TRUE.
Shryock, D. F., C. A. Havrilla, L. A. DeFalco, T. C. Esque, N. A. Custer, and T. E. Wood. 2015. Landscape genomics of Sphaeralcea ambigua in the Mojave Desert: a multivariate, spatially-explicit approach to guide ecological restoration. Conservation Genetics 16:1303-1317.
plot.gdm, formatsitepair, subsample.sitepair
##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##plot GDM uncertainty using one core #not run #plotUncertainty(sitePairTab, sampleSites=0.70, bsIters=5, geo=TRUE, plot.layout=c(3,3)) ##plot GDM uncertainty in parallel #not run #plotUncertainty(sitePairTab, sampleSites=0.70, bsIters=50, geo=TRUE, plot.layout=c(3,3), #parallel=T, cores=10)
##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##plot GDM uncertainty using one core #not run #plotUncertainty(sitePairTab, sampleSites=0.70, bsIters=5, geo=TRUE, plot.layout=c(3,3)) ##plot GDM uncertainty in parallel #not run #plotUncertainty(sitePairTab, sampleSites=0.70, bsIters=50, geo=TRUE, plot.layout=c(3,3), #parallel=T, cores=10)
This function predicts biological distances between sites or times using a
model object returned from gdm
. Predictions between site
pairs require a data frame containing the values of predictors for pairs
of locations, formatted as follows: distance, weights, s1.X, s1.Y, s2.X,
s2.Y, s1.Pred1, s1.Pred2, ..., s1.PredN, s2.Pred1, s2.Pred2, ..., s2.PredN, ...,
Predictions of biological change through time require two raster stacks or
bricks for environmental conditions at two time periods, each with a
layer for each environmental predictor in the fitted model.
## S3 method for class 'gdm' predict(object, data, time=FALSE, predRasts=NULL, filename="", ...)
## S3 method for class 'gdm' predict(object, data, time=FALSE, predRasts=NULL, filename="", ...)
object |
A gdm model object resulting from a call to |
data |
Either a data frame containing the values of predictors for pairs
of sites, in the same format and structure as used to fit the model using
For a data frame, the first two columns - distance and weights - are required by the function but are not used in the prediction and can therefore be filled with dummy data (e.g. all zeros). If geo is TRUE, then the s1.X, s1.Y and s2.X, s2.Y columns will be used for calculating the geographical distance between each site for inclusion of the geographic predictor term into the GDM model. If geo is FALSE, then the s1.X, s1.Y, s2.X and s2.Y data columns are ignored. However these columns are still REQUIRED and can be filled with dummy data (e.g. all zeroes). The remaining columns are for N predictors for Site 1 and followed by N predictors for Site 2. The order of the columns must match those in the site-pair table used to fit the model. A raster stack should be provided only when time=T and should contain one layer for each environmental predictor in the same order as the columns in the site-pair table used to fit the model. |
time |
TRUE/FALSE: Is the model prediction for biological change through time? |
predRasts |
A raster stack characterizing environmental conditions for a different time in the past or future, with the same extent, resolution, and layer order as the data object. Required only if time=T. |
filename |
character. Output filename for rasters. When provided the raster layers are written to file directly. |
... |
additional arguments to pass to terra |
predict returns either a response vector with the same length as the number of rows in the input data frame or a raster depicting change through time across the study region.
##set up site-pair table using the southwest data set sppData <- southwest[, c(1,2,14,13)] envTab <- southwest[, c(2:ncol(southwest))] # remove soils (no rasters for these) envTab <- envTab[,-c(2:6)] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) # create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##predict GDM predDiss <- predict(gdmMod, sitePairTab) ##time example rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") envRast <- terra::rast(rastFile) ##make some fake climate change data futRasts <- envRast ##reduce winter precipitation by 25% futRasts[[3]] <- futRasts[[3]]*0.75 timePred <- predict(gdmMod, envRast, time=TRUE, predRasts=futRasts) terra::plot(timePred)
##set up site-pair table using the southwest data set sppData <- southwest[, c(1,2,14,13)] envTab <- southwest[, c(2:ncol(southwest))] # remove soils (no rasters for these) envTab <- envTab[,-c(2:6)] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) # create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##predict GDM predDiss <- predict(gdmMod, sitePairTab) ##time example rastFile <- system.file("./extdata/swBioclims.grd", package="gdm") envRast <- terra::rast(rastFile) ##make some fake climate change data futRasts <- envRast ##reduce winter precipitation by 25% futRasts[[3]] <- futRasts[[3]]*0.75 timePred <- predict(gdmMod, envRast, time=TRUE, predRasts=futRasts) terra::plot(timePred)
A data set containing species occurrence and associated environmental data at 94 sites in southwestern Australia.
southwest
southwest
A data frame with 29364 rows and 14 variables:
species name
site name
plant-available water capacity in soil horizon A
soil pH
percent sand content in soil horizon A
saturated hydraulic conductivity in soil horizon A
soil depth to unweathered parent material
maximum temperature of the coldest month
minimum temperature of the coldest month
precipitation seasonality
precipitation of warmest quarter
precipitation of coldest quarter
latitude
longitude
Randomly selects a number of sites from a given site-pair table and removes them from the site-pair table. It will remove all instances of the sites randomly selected to be removed in both s1 and s2 positions.
subsample.sitepair(spTable, sampleSites)
subsample.sitepair(spTable, sampleSites)
spTable |
A site-pair table, same as used to fit a |
sampleSites |
The fraction (0-1, though a value of 0 would be silly, wouldn't it?) of sites to retain from the full site-pair table. If less than 1, this argument will completely remove a fraction of sites such that they are not used in the permutation routines. |
A site-pair table, such as one created by formatsitepair
,
ideally smaller than the one given. In the very rare case where the function
determines not to remove any sites, or should the sampleSites argument be 1,
then the function will return the full site-pair table.
This function removes sites, not just site-pairs (rows) from the
site-pair table. This function is called from several of the other functions
within the gdm package, including the plotUncertainty
and
gdm.varImp
functions, for the purposes of subsampling the sites
in the site-pair table.
##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) subsample.sitepair(sitePairTab, sampleSites=0.7)
##set up site-pair table using the southwest data set sppData <- southwest[c(1,2,13,14)] envTab <- southwest[c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) subsample.sitepair(sitePairTab, sampleSites=0.7)
This function summarizes the gdm model object returned from gdm
.
## S3 method for class 'gdm' summary(object, ...)
## S3 method for class 'gdm' summary(object, ...)
object |
A gdm model object resulting from a call to |
... |
Ignored. |
summary prints its output to the R Console window and returns no value.
##set up site-pair table using the southwest data set sppData <- southwest[, c(1,2,14,13)] envTab <- southwest[, c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##summary of GDM summary(gdmMod)
##set up site-pair table using the southwest data set sppData <- southwest[, c(1,2,14,13)] envTab <- southwest[, c(2:ncol(southwest))] sitePairTab <- formatsitepair(sppData, 2, XColumn="Long", YColumn="Lat", sppColumn="species", siteColumn="site", predData=envTab) ##create GDM gdmMod <- gdm(sitePairTab, geo=TRUE) ##summary of GDM summary(gdmMod)