Data Curation

Systematic organization, enhancement, and preservation of research data to maximize its value, accessibility, and long-term utility across scientific disciplines

This section presents two practical applications of data curation, demonstrating how to transform raw data into reliable, analysis-ready information. Through these cases, we show how specialized statistical techniques solve common problems of data integrity and quality, establishing a solid foundation for any subsequent modeling or interpretation.

Regression on Order Statistics for Geochemical Data

This example addresses the issue of censored data in geochemistry, where Nickel (Ni) concentrations below the detection limit are corrected using Regression on Order Statistics (ROS). This process ensures the element's true distribution is recovered for valid statistical analysis.

Ni (ppm) original data Q-Q plot
Ni (ppm) original data Q-Q plot
Ni (ppm) Q-Q plot for corrected data with lognormal ROS
Ni (ppm) Q-Q plot for corrected data with lognormal ROS

Magnetic kriging to fill gaps in geomagnetic data time series

This example focuses on creating a continuous dataset from incomplete records. It employs the geostatistical technique of Magnetic Kriging to interpolate and fill gaps in time-series measurements of the Earth's magnetic field from a network of stations, resulting in a spatially coherent data grid for geomagnetic studies.

Measured magnetic data with gaps
Measured magnetic data with gaps
Data remediated using magnetic kriging interpolation
Data remediated using magnetic kriging interpolation
× Ni (ppm) original data Q-Q plot
Ni (ppm) original data Q-Q plot
× Ni (ppm) Q-Q plot for corrected data with lognormal ROS
Ni (ppm) Q-Q plot for corrected data with lognormal ROS
× Measured magnetic data with gaps
Measured magnetic data with gaps
× Data remediated using magnetic kriging interpolation
Data remediated using magnetic kriging interpolation