Data File(s)	Type	Description	Action
885506_v2_global_ba_grid.csv (175.47 MB)	Comma Separated Values (.csv)	Primary data file for dataset ID 885506, version 2. See the "Parameters" section of the metadata for column descriptions and units. Missing values are reported as NaN.	Download View Table What is ERDDAP? SUBSET DATA VIEW TABLE CSV TSV GeoJSON MATLAB netCDF

Supplemental File(s)	Type	Description	Action
Global_and_basin_averages.csv (34.65 KB)	Comma Separated Values (.csv)	Average depth profiles of [Ba] and barite saturation state. (Supplemental file for dataset ID 885506, version 2.) Depth profiles of mean, median, and the standard deviation of [Ba] (nmol/kg) and barite saturation state (unitless) for the whole ocean and the major ocean basins (Arctic, Atlantic, Indian, Pacific, and Southern Oceans). All profiles are provided on the World Ocean Atlas 2018 depth spacing and the number of profiles in each bin is shown.	Download
ML_model_3080.txt (285.35 MB)	Plain Text	An alternative version of the primary data file for dataset ID 885506, version 2. Provided to facilitate use with ODV. This file was exported from ODV and contains metadata at the top of the file before the data begins. The data themselves are identical to file "885506_v2_global_ba_grid.csv" with the exception of a few additional columns which have been removed from the .csv as they are not needed (Cruise, Type, and date/time). Missing values are reported as NaN.	Download
Experiment_list.csv (208.65 KB)	Comma Separated Values (.csv)	List of trained Gaussian Process Regression models and their respective skill metrics. (Supplemental file for dataset ID 885506, version 2.) List showing the 4,095 Machine Learning models trained and tested in this study. Each model uses a unique combination of the 12 features tested—longitude, latitude, bathymetry, depth, temperature, salinity, oxygen, phosphate, nitrate, silicate, mixed-layer depth, and chlorophyll a. If a feature was used in model testing it is denoted by a ‘1’, else it is ‘0.’ Column 1 lists the model number; model 3080 (the model used for global simulations) is shown first, model 3112 second, and the others are listed in a random order. Columns 2-13 show the features included in that model. The final four columns show the model skill in terms of Mean Absolute Error (MAE; units nmol/kg) and Mean Absolute Percentage Error (%) for the training data and then for the testing data.	Download
Testing_data.csv (152.11 KB)	Comma Separated Values (.csv)	Data used to test GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the Indian Ocean data used to test the ML models. The file contains 16 columns. Columns 1–4 indicate the Cruise ID, the station, and coordinates from which the in situ barium data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain interpolated World Ocean Atlas data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model test. Column 15 contains the observed in situ dissolved barium from the respective data source, listed in column 16.	Download
Testing_results.csv (32.18 MB)	Comma Separated Values (.csv)	Model testing results. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the results from ML model testing in the Indian Ocean. Columns 1–5 provide a unique identifier that corresponds to the samples listed in ‘Testing_data.csv.’ Column 6 shows observed [Ba]. Columns 7–4,102 show model-predicted [Ba] (in nmol/kg) for each trained model shown in ‘Experiment_list.csv.’	Download
Training_data.csv (641.82 KB)	Comma Separated Values (.csv)	Data used to train GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the data used to train the ML models. The file contains 16 columns. Columns 1–4 indicate the GEOTRACES cruise ID, the station, and coordinates from which the in situ data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain in situ data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model training. Column 15 contains the observed in situ dissolved barium. Column 16 shows a citation to the data source and/or originator (if unpublished).	Download
Model_3080_code.zip (140.06 MB)	ZIP Archive (ZIP)	This folder contains the following 3 files: (1) predictBa.m; (2) trainedModel_Exp3080.mat; (3) exampleData.xlsx. "predictBa.m" is a code that allows users to predict [Ba] in seawater based on input data for seven predictors: depth, temperature, salinity, dioxygen, phosphate, nitrate, and silicate. Predictions of [Ba] are made using "trainedModel_Exp3080.mat", which is a Gaussian Process Regression Machine Learning Model that was trained to simulate [Ba] based on these seven inputs. Instructions on how to use the model are provided in the comments to predictBa.m and example input data are provided in "exampleData.xlsx". The code was written in MATLAB, and should work on all versions beyond 2018a. All settings, configurations, and the training process are described in a companion study by Mete et al. (2023).	Download

Files

Filename: 885506_v2_global_ba_grid.csv (175.47 MB)
Type: Comma Separated Values (.csv)
Description: Primary data file for dataset ID 885506, version 2. See the "Parameters" section of the metadata for column descriptions and units. Missing values are reported as NaN.

Download View Table

Supplemental Files

Filename: Global_and_basin_averages.csv (34.65 KB)
Type: Comma Separated Values (.csv)
Description: Average depth profiles of [Ba] and barite saturation state. (Supplemental file for dataset ID 885506, version 2.) Depth profiles of mean, median, and the standard deviation of [Ba] (nmol/kg) and barite saturation state (unitless) for the whole ocean and the major ocean basins (Arctic, Atlantic, Indian, Pacific, and Southern Oceans). All profiles are provided on the World Ocean Atlas 2018 depth spacing and the number of profiles in each bin is shown.

Download

Filename: ML_model_3080.txt (285.35 MB)
Type: Plain Text
Description: An alternative version of the primary data file for dataset ID 885506, version 2. Provided to facilitate use with ODV. This file was exported from ODV and contains metadata at the top of the file before the data begins. The data themselves are identical to file "885506_v2_global_ba_grid.csv" with the exception of a few additional columns which have been removed from the .csv as they are not needed (Cruise, Type, and date/time). Missing values are reported as NaN.

Download

Filename: Experiment_list.csv (208.65 KB)
Type: Comma Separated Values (.csv)
Description: List of trained Gaussian Process Regression models and their respective skill metrics. (Supplemental file for dataset ID 885506, version 2.) List showing the 4,095 Machine Learning models trained and tested in this study. Each model uses a unique combination of the 12 features tested—longitude, latitude, bathymetry, depth, temperature, salinity, oxygen, phosphate, nitrate, silicate, mixed-layer depth, and chlorophyll a. If a feature was used in model testing it is denoted by a ‘1’, else it is ‘0.’ Column 1 lists the model number; model 3080 (the model used for global simulations) is shown first, model 3112 second, and the others are listed in a random order. Columns 2-13 show the features included in that model. The final four columns show the model skill in terms of Mean Absolute Error (MAE; units nmol/kg) and Mean Absolute Percentage Error (%) for the training data and then for the testing data.

Download

Filename: Testing_data.csv (152.11 KB)
Type: Comma Separated Values (.csv)
Description: Data used to test GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the Indian Ocean data used to test the ML models. The file contains 16 columns. Columns 1–4 indicate the Cruise ID, the station, and coordinates from which the in situ barium data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain interpolated World Ocean Atlas data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model test. Column 15 contains the observed in situ dissolved barium from the respective data source, listed in column 16.

Download

Filename: Testing_results.csv (32.18 MB)
Type: Comma Separated Values (.csv)
Description: Model testing results. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the results from ML model testing in the Indian Ocean. Columns 1–5 provide a unique identifier that corresponds to the samples listed in ‘Testing_data.csv.’ Column 6 shows observed [Ba]. Columns 7–4,102 show model-predicted [Ba] (in nmol/kg) for each trained model shown in ‘Experiment_list.csv.’

Download

Filename: Training_data.csv (641.82 KB)
Type: Comma Separated Values (.csv)
Description: Data used to train GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the data used to train the ML models. The file contains 16 columns. Columns 1–4 indicate the GEOTRACES cruise ID, the station, and coordinates from which the in situ data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain in situ data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model training. Column 15 contains the observed in situ dissolved barium. Column 16 shows a citation to the data source and/or originator (if unpublished).

Download

Filename: Model_3080_code.zip (140.06 MB)
Type: ZIP Archive (ZIP)
Description: This folder contains the following 3 files: (1) predictBa.m; (2) trainedModel_Exp3080.mat; (3) exampleData.xlsx. "predictBa.m" is a code that allows users to predict [Ba] in seawater based on input data for seven predictors: depth, temperature, salinity, dioxygen, phosphate, nitrate, and silicate. Predictions of [Ba] are made using "trainedModel_Exp3080.mat", which is a Gaussian Process Regression Machine Learning Model that was trained to simulate [Ba] based on these seven inputs. Instructions on how to use the model are provided in the comments to predictBa.m and example input data are provided in "exampleData.xlsx". The code was written in MATLAB, and should work on all versions beyond 2018a. All settings, configurations, and the training process are described in a companion study by Mete et al. (2023).

Download

The data are output from a machine learning model that was trained using GEOTRACES dissolved Barium ([Ba]) data. Full protocols for sample collection and analysis are provided in the GEOTRACES Cookbook and 2021 Intermediate Data Product (see References), respectively.

Full methods are provided in a companion study, which is in revision for Earth System Science Data (Mete et al., 2023). A summary of methods is provided below.

The features used to predict [Ba] and their associated data sources are summarized in Table 1 of Mete et al. (2023). The first three features (latitude, longitude, depth) record geospatial information that defines the location of an observation in three-dimensional space. Features 4–9 encode physical (temperature, salinity) and chemical (oxygen, nutrients) information that is routinely measured alongside [Ba]. These data were generally available for the same bottle as the [Ba] measurements; however, when that was not the case, nutrient data were taken from the corresponding location during a separate cast, or, in the case of oxygen, from linearly interpolated sensor data. Features 10-12 are independent of depth, meaning that all samples within a given vertical profile exhibit the same value for mixed-layer depth, sea-surface chlorophyll a, and bathymetry.

Table 2 of Mete et al. (2023) identifies all dataset sources of d[Ba] ingested into the master record. The data ingestion process resulted in a master record containing 5,502 observations of [Ba] that also contained a corresponding value for all 12 of the features of interest described above. The record was then split into a Pareto partition: the first partition was used for ML model training (4,345 observations, 79 % of data) and the second for model testing (1,157 data; 21 %).

We opted for supervised ML using a Gaussian Process Regression learner, implemented in MATLAB. The training partition of the master record was used to train 4,095 different machine learning models with the goal of finding a model that could accurately simulate the global distribution of [Ba]. Each model uses a unique combination of the 12 features and our testing followed a factorial design whereby each feature was either enabled or disabled. In the second stage of cross validation, trained models were used to predict [Ba] for the withheld data from the Indian Ocean. The accuracy of the models was assessed by comparing ML model predictions against observed [Ba]. We then winnowed the list of models from 4,095 to a single, highly accurate model (#3080), which we used to simulate Ba* and the saturation state of seawater with respect to barite on a global basis.

Refer to Mete et al. (2023) for complete methodology, results, and discussion.

The data provided here include the resulting global grid of dissolved [Ba], Ba*, and barite saturation state as well as Supplemental Files used in testing and training of the model.

The code used in running the model is also provided here in the Supplemental File "Model_3080_code.zip". "predictBa.m" is a code that allows users to predict [Ba] in seawater based on input data for seven predictors: depth, temperature, salinity, dioxygen, phosphate, nitrate, and silicate. Predictions of [Ba] are made using "trainedModel_Exp3080.mat", which is a Gaussian Process Regression Machine Learning Model that was trained to simulate [Ba] based on these seven inputs. Instructions on how to use the model are provided in the comments to predictBa.m and example input data are provided in "exampleData.xlsx". The code was written in MATLAB, and should work on all versions beyond 2018a. All settings, configurations, and the training process are described in a companion study by Mete et al. (2023).

Related Datasets

No Related Datasets

Related Publications

Results

Mete, Ö. Z., Subhas, A. V., Kim, H. H., Dunlea, A. G., Whitmore, L. M., Shiller, A. M., Gilbert, M., Leavitt, W. D., & Horner, T. J. (2023). Barium in seawater: dissolved distribution, relationship to silicon, and barite saturation state determined using machine learning. Earth System Science Data, 15(9), 4023–4045. https://doi.org/10.5194/essd-15-4023-2023

Methods

Cutter, Gregory, Casciotti, Karen, Croot, Peter, Geibert, Walter, Heimbürger, Lars-Eric, Lohan, Maeve, Planquette, Hélène, van de Flierdt, Tina (2017) Sampling and Sample-handling Protocols for GEOTRACES Cruises. Version 3, August 2017. Toulouse, France, GEOTRACES International Project Office, 139pp. & Appendices. DOI: http://dx.doi.org/10.25607/OBP-2

References

GEOTRACES Intermediate Data Product Group. (2021). The GEOTRACES Intermediate Data Product 2021 (IDP2021). (Version 1) [Data set]. NERC EDS British Oceanographic Data Centre NOC. https://doi.org/10.5285/CF2D9BA9-D51D-3B7C-E053-8486ABC0F5FD

Dataset: A spatially and vertically resolved global grid of dissolved barium concentrations in seawater determined using Gaussian Process Regression machine learning

Principal Investigator: Tristan J. Horner (Woods Hole Oceanographic Institution)

Student: Oyku Z. Mete (Woods Hole Oceanographic Institution)

BCO-DMO Data Manager: Shannon Rauch (Woods Hole Oceanographic Institution)

Project: The Speed, Signature, and Significance of Barium Transformations in Seawater (The Three S's)

Abstract

Files

Supplemental Files

Related Datasets

Related Publications

Dataset: A spatially and vertically resolved global grid of dissolved barium concentrations in seawater determined using Gaussian Process Regression machine learning

Principal Investigator: Tristan J. Horner (Woods Hole Oceanographic Institution)

Student: Oyku Z. Mete (Woods Hole Oceanographic Institution)

BCO-DMO Data Manager: Shannon Rauch (Woods Hole Oceanographic Institution)

Project: The Speed, Signature, and Significance of Barium Transformations in Seawater (The Three S's)

Abstract

Metadata

Files

Supplemental Files

Related Datasets

Related Publications