
Streamlined Analyses of GBIF Species Distribution data with Bioclimatic Variables from WorldClim

About GBIF Bioclim R Toolkit
Please Visit the GitHub Page to Download Latest R Script
GBIF Bioclim R Toolkit is an R script with a template to source species distribution data from the GBIF database as well as bioclimate variables and elevation from the WorldClim Dataset . The script then comprises several functions to facilitate extracting bioclimatic and elevation data from the WorldClim rasters, plotting geo-located species data on rasters, and plotting distributions of species bioclimatic variables based on different geographic ranges (e.g. lon/lat/elevation/grids) of the sampling.
We are working on additional functions to facilitate statistical analyses of distributions and more complex methods for grouping of species' geographic distributions (i.e. by slope steepness gradients, presence in valleys, ridges, or plateaus), check back for more updates!
Latest Updates
Updates (as of August 24, 2024):
-
Release date:
-
First part of R script contains template set up to source GBIF/WorldClim data
-
Functions:
-
GBIFilter_country()
-
GBIFilter_world()
-
calculate_extent()
-
raster_extract()
-
add_geo_labels()
-
add_grid_labels()
-
plot_geolabels()
-
plot_bioclim()
-
plot_bioclim2()
-
Function Descriptions
-
GBIFilter_world(GBIF_dataframe, Inaturalist = T/F)
-
Filter geo-located GBIF species data (i.e. only keeps samples with lon/lat data) and include/exclude iNaturalist observations (This usually limits sampling to herbarium or museum collections).
-
Example to get samples without iNaturalist observations:
- GBIFilter_world(gbifdf = GBIF_dataframe, inat = F)
- GBIFilter_world(gbifdf = GBIF_dataframe, inat = F)
-
-
GBIFilter_country(GBIF_dataframe, Inaturalist = T/F, CountryCode)
-
Filter geo-located GBIF species data and include/exclude iNaturalist observations. This version of the function facilitates only sourcing samples from a specific country using their country code.
-
Example to get US samples without iNaturalist observations:
-
GBIFilter_country(gbifdf = GBIF_dataframe, inat = F, ccode = 'US')
-
-
-
calculate_extent(dataframe, longitude, latitude, buffer)
-
Outputs list of longitudinal and latitudinal extent for input dataframe with geolocated samples denoted by longitude and latitude data columns. Buffer option allows user to designate a numeric buffer to expand the extent beyond the limits of samples.
-
Example to get lon/lat extents of geolocated GBIF data and add a buffer of 5 lon/lat:
-
calculate_extent(data= GBIF_dataframe, lon_column = 'decimalLongitude', lat_column = 'decimalLatitude', buffer = 5)
-
-
-
raster_extract(lonlat_GBIF_dataframe, rasterbrick)
-
Extracts raster values from a RasterBrick object ('bioclimd' object if using our R script to set up bioclimatic rasters from world clim) based on longitude and latitude values from a GBIF sample dataframe. Use filtered GBIF data-frame output by GBIFilter_country() or GBIFlter_world() functions as inputs for the data-frame. If using other lon/lat dataframes not generated by GBIF, they must denote longitude and latitude columns for geo-located samples with headers as decimalLongitude and decimalLatitude, respectively
-
Example to extract worldclim raster values output by our template (to generate the bioclimd rasterbrick) for species distribution data from GBIF:
-
raster_extract(GBIF_dataframe, bioclimd)
-
-
-
add_geo_label(GBIF_bioclim_dataframe, group_column, interval = 10, geolabel = 'geolabel')
-
Generates and adds a group label column into the input dataframe for different distributions of samples based on a numeric range represented within the input dataframe. The input datframe is intended to have the GBIF sample distribution data as well as the extract biolcim and elevation values extracted from a rasterbrick for each sample (i.e. such as the output dataframe generated by the raster_extract() function) in order to plot both variable distributions and group distributions on a raster map using plotgeolabels() and plot_bioclim() functions. Generally we use this to generate sample sub-groups based on longitude, latitude, or elevation groups. The function allows users to designate the specific interval value to group samples by, beginning the first group by the lowest value and cutting off the last group based on the maximum extent of the variable.
-
Example to make group labels based on latitude in intervals of 5 decimal latitude units, with default geolabel header:
-
add_geo_labels(GBIF_bioclim_data,'decimalLatitude', 5)
-
-
Example to make group labels based on longitude in intervals of 10 decimal longitude units, with default geolabel header:
-
add_geo_labels(GBIF_bioclim_data,'decimalLongitude', 10)
-
-
Example to make group labels based on elevation in intervals of 300m derived from wclim 2.5m resolution elevation raster, with a group label of 'elevgroup':
-
add_geo_labels(GBIF_bioclim_data,'wc2.1_2.5m_elev', 300, geolabel = 'elvgroup')
-
-
-
add_grid_label(GBIF_bioclim_dataframe, lon_column, lat_column, grid_size)
-
Generates and adds a group label column into the input dataframe for different distributions of samples based on a grid framework applied across the extent of the sample distributions. The input dataframe needs to have lon/lat columns that are designated in the function in order to know the extent of the sample dsitribution and generate the grid groups for samples. Users need to designate the size of the grid that represent equal length longitudinal and latitudinal squares. The grid is generated with the first grid-square generated at the top right of the extent and adds grid-squares in a left to right manner, continuing below it when the maximum longitudinal extent has been reached. Grids at the bottom of the extent will also be cut off based on the maximum latitudinal value of the extent.
-
Example to make group labels based on a grid with a grid-square size of 5, with default geolabel header
-
add_grid_label(GBIF_bioclim_data, lon_column = 'decimalLongitude', lat_column = 'decimalLatitude', grid_size = 5)
-
-
-
plotgeolabels(labeled_GBIF_bioclim_dataframe, rasterbrick, rasterbrick_layer, geolabel_column, plot_title, x_label , y_label)
-
Plots sample points that are color-coded based on group labels on top of specific layer from a rasterbrick.
-
Example to plot samples in the GBIF_Bioclim dataframe based on 'geolabel' group on top of the 'wc2.1_2.5m_elev' layer of the bioclimd rasterbrick object.
-
plotgeolabels(GBIF_Bioclim, bioclimd,'wc2.1_2.5m_elev', geolabel = 'geolabel', plot_title = "Plot Title", x_label = 'Longitude', y_label = 'Latitutidue)
-
-
-
plot_bioclim(labeled_GBIF_bioclim_dataframe, plot_variable, geolabel_column, plot_type, plot_title, x_label , y_label, legend_title)
-
Plots distribution of bioclim variable from processed GBIF dataframe, with groupings based on a group label column (i.e. as generated by add_geo_label() or add_grid_label() functions). The default function plots Mean and Standard deviations (SD) on top of distribution peaks that is useufl when group distributions are well separated with little overlap between peaks. You can use plot_bioclim2() with the same parameters to plot Mean and SD in the group legend if distribution peaks are overlapped to make visualization of data easier.
-
Example to plot distributions of BIO1 (i.e. 'wc2.1_2.5m_bio_12') distributions based on 'geolabel' groups from the GBIF_Bioclim dataframe
-
plotgeolabels(GBIF_Bioclim, 'wc2.1_2.5m_bio_1', geolabel = 'geolabel', plot_title = "BIO1 dsitributions based on geolabel groups", x_label = 'Longitude', y_label = 'Latitutidue', legend_title = 'Geolabel Groups')
-
-
Tutorials
In this tutorial we go through the basics of using the GBIFBC.R script to source GBIF collection data and WorldClim rasters. We assess environmental factors using the provided GBIFBC.R functions for collections across longitudinal groups the species' range of the North American tree Asimina triloba also known as PawPaw.

In this tutorial we use GBIFBC to explore environmental variation across latitude and elevation in the Saguaro cactus (Carnegiea gigantea). Using GBIFBC.R functions we visualize variation across latitude and elevation groups and then use linear models with random effects to test if populations at the extreme ends of their range are experiencing novel climatic conditions compared to their geographic origins.
