File(s) | Type | Description | Action |
---|---|---|---|
885506_v2_global_ba_grid.csv (175.47 MB) | Comma Separated Values (.csv) | Primary data file for dataset ID 885506, version 2. See the "Parameters" section of the metadata for column descriptions and units. Missing values are reported as NaN. | |
Supplemental File(s) | Type | Description | Action |
Training_data.csv (641.82 KB) | Comma Separated Values (.csv) | Data used to train GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the data used to train the ML models. The file contains 16 columns. Columns 1–4 indicate the GEOTRACES cruise ID, the station, and coordinates from which the in situ data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain in situ data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model training. Column 15 contains the observed in situ dissolved barium. Column 16 shows a citation to the data source and/or originator (if unpublished). | |
Testing_results.csv (32.18 MB) | Comma Separated Values (.csv) | Model testing results. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the results from ML model testing in the Indian Ocean. Columns 1–5 provide a unique identifier that corresponds to the samples listed in ‘Testing_data.csv.’ Column 6 shows observed [Ba]. Columns 7–4,102 show model-predicted [Ba] (in nmol/kg) for each trained model shown in ‘Experiment_list.csv.’ | |
Testing_data.csv (152.11 KB) | Comma Separated Values (.csv) | Data used to test GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the Indian Ocean data used to test the ML models. The file contains 16 columns. Columns 1–4 indicate the Cruise ID, the station, and coordinates from which the in situ barium data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain interpolated World Ocean Atlas data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model test. Column 15 contains the observed in situ dissolved barium from the respective data source, listed in column 16. | |
Model_3080_code.zip (140.06 MB) | ZIP Archive (ZIP) | This folder contains the following 3 files: (1) predictBa.m; (2) trainedModel_Exp3080.mat; (3) exampleData.xlsx. "predictBa.m" is a code that allows users to predict [Ba] in seawater based on input data for seven predictors: depth, temperature, salinity, dioxygen, phosphate, nitrate, and silicate. Predictions of [Ba] are made using "trainedModel_Exp3080.mat", which is a Gaussian Process Regression Machine Learning Model that was trained to simulate [Ba] based on these seven inputs. Instructions on how to use the model are provided in the comments to predictBa.m and example input data are provided in "exampleData.xlsx". The code was written in MATLAB, and should work on all versions beyond 2018a. All settings, configurations, and the training process are described in a companion study by Mete et al. (2023). | |
ML_model_3080.txt (285.35 MB) | Plain Text | An alternative version of the primary data file for dataset ID 885506, version 2. Provided to facilitate use with ODV. This file was exported from ODV and contains metadata at the top of the file before the data begins. The data themselves are identical to file "885506_v2_global_ba_grid.csv" with the exception of a few additional columns which have been removed from the .csv as they are not needed (Cruise, Type, and date/time). Missing values are reported as NaN. | |
Global_and_basin_averages.csv (34.65 KB) | Comma Separated Values (.csv) | Average depth profiles of [Ba] and barite saturation state. (Supplemental file for dataset ID 885506, version 2.) Depth profiles of mean, median, and the standard deviation of [Ba] (nmol/kg) and barite saturation state (unitless) for the whole ocean and the major ocean basins (Arctic, Atlantic, Indian, Pacific, and Southern Oceans). All profiles are provided on the World Ocean Atlas 2018 depth spacing and the number of profiles in each bin is shown. | |
Experiment_list.csv (208.65 KB) | Comma Separated Values (.csv) | List of trained Gaussian Process Regression models and their respective skill metrics. (Supplemental file for dataset ID 885506, version 2.) List showing the 4,095 Machine Learning models trained and tested in this study. Each model uses a unique combination of the 12 features tested—longitude, latitude, bathymetry, depth, temperature, salinity, oxygen, phosphate, nitrate, silicate, mixed-layer depth, and chlorophyll a. If a feature was used in model testing it is denoted by a ‘1’, else it is ‘0.’ Column 1 lists the model number; model 3080 (the model used for global simulations) is shown first, model 3112 second, and the others are listed in a random order. Columns 2-13 show the features included in that model. The final four columns show the model skill in terms of Mean Absolute Error (MAE; units nmol/kg) and Mean Absolute Percentage Error (%) for the training data and then for the testing data. |
Files
Type: Comma Separated Values (.csv)
Description: Primary data file for dataset ID 885506, version 2. See the "Parameters" section of the metadata for column descriptions and units. Missing values are reported as NaN.
Supplemental Files
Type: Comma Separated Values (.csv)
Description: Data used to train GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the data used to train the ML models. The file contains 16 columns. Columns 1–4 indicate the GEOTRACES cruise ID, the station, and coordinates from which the in situ data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain in situ data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model training. Column 15 contains the observed in situ dissolved barium. Column 16 shows a citation to the data source and/or originator (if unpublished).
Type: Comma Separated Values (.csv)
Description: Model testing results. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the results from ML model testing in the Indian Ocean. Columns 1–5 provide a unique identifier that corresponds to the samples listed in ‘Testing_data.csv.’ Column 6 shows observed [Ba]. Columns 7–4,102 show model-predicted [Ba] (in nmol/kg) for each trained model shown in ‘Experiment_list.csv.’
Type: Comma Separated Values (.csv)
Description: Data used to test GPR ML models. (Supplemental file for dataset ID 885506, version 1 and version 2.) File containing the Indian Ocean data used to test the ML models. The file contains 16 columns. Columns 1–4 indicate the Cruise ID, the station, and coordinates from which the in situ barium data were obtained. Column 5 is our best-estimate of the bathymetry. Column 6 is the depth in the water column from which the same was collected. Columns 7–12 contain interpolated World Ocean Atlas data for physical (temperature, salinity) and biogeochemical parameters (nutrients, oxygen). Columns 13 and 14 show the interpolated mixed-layer depth and sea-surface chlorophyll a used in model test. Column 15 contains the observed in situ dissolved barium from the respective data source, listed in column 16.
Type: ZIP Archive (ZIP)
Description: This folder contains the following 3 files: (1) predictBa.m; (2) trainedModel_Exp3080.mat; (3) exampleData.xlsx. "predictBa.m" is a code that allows users to predict [Ba] in seawater based on input data for seven predictors: depth, temperature, salinity, dioxygen, phosphate, nitrate, and silicate. Predictions of [Ba] are made using "trainedModel_Exp3080.mat", which is a Gaussian Process Regression Machine Learning Model that was trained to simulate [Ba] based on these seven inputs. Instructions on how to use the model are provided in the comments to predictBa.m and example input data are provided in "exampleData.xlsx". The code was written in MATLAB, and should work on all versions beyond 2018a. All settings, configurations, and the training process are described in a companion study by Mete et al. (2023).
Type: Plain Text
Description: An alternative version of the primary data file for dataset ID 885506, version 2. Provided to facilitate use with ODV. This file was exported from ODV and contains metadata at the top of the file before the data begins. The data themselves are identical to file "885506_v2_global_ba_grid.csv" with the exception of a few additional columns which have been removed from the .csv as they are not needed (Cruise, Type, and date/time). Missing values are reported as NaN.
Type: Comma Separated Values (.csv)
Description: Average depth profiles of [Ba] and barite saturation state. (Supplemental file for dataset ID 885506, version 2.) Depth profiles of mean, median, and the standard deviation of [Ba] (nmol/kg) and barite saturation state (unitless) for the whole ocean and the major ocean basins (Arctic, Atlantic, Indian, Pacific, and Southern Oceans). All profiles are provided on the World Ocean Atlas 2018 depth spacing and the number of profiles in each bin is shown.
Type: Comma Separated Values (.csv)
Description: List of trained Gaussian Process Regression models and their respective skill metrics. (Supplemental file for dataset ID 885506, version 2.) List showing the 4,095 Machine Learning models trained and tested in this study. Each model uses a unique combination of the 12 features tested—longitude, latitude, bathymetry, depth, temperature, salinity, oxygen, phosphate, nitrate, silicate, mixed-layer depth, and chlorophyll a. If a feature was used in model testing it is denoted by a ‘1’, else it is ‘0.’ Column 1 lists the model number; model 3080 (the model used for global simulations) is shown first, model 3112 second, and the others are listed in a random order. Columns 2-13 show the features included in that model. The final four columns show the model skill in terms of Mean Absolute Error (MAE; units nmol/kg) and Mean Absolute Percentage Error (%) for the training data and then for the testing data.