This article provides a comprehensive guide for researchers and scientists on managing the entire lifecycle of biologging data.
This article provides a comprehensive guide for researchers and scientists on managing the entire lifecycle of biologging data. It covers foundational principles and the critical importance of data preservation, explores modern archiving methods and platforms like the Biologging intelligent Platform (BiP) and Movebank, addresses common challenges in data standardization and machine learning validation to prevent overfitting, and presents best practices for ensuring data quality and interoperability. The content synthesizes the latest 2025 research and community standards to equip professionals with the strategies needed to maximize the value of biologging data for ecological discovery, conservation, and biomedical applications.
Biologging, the use of animal-borne electronic tags to document movements, behaviour, physiology, and environments of wildlife, has experienced rapid growth [1]. This growth has led to an unprecedented accumulation of complex data, creating a critical need for systematic archiving. Biologging data archiving is not merely the storage of data files; it is the process of preserving sensor-derived data and its associated metadata in a standardized, accessible, and reusable format for long-term use. This practice ensures that these unique datasets, which capture animal life on Earth, are safeguarded against loss and remain available for future scientific discovery, policy-making, and conservation efforts [2] [1].
The value of these data extends far beyond their initial research purpose. They form dynamic digital archives of natural history, capturing vital information about animals in the context of their changing environments [1]. As the field expands, a strategic approach to archiving becomes essential to mitigate biodiversity threats and maximize the return on investment from costly and logistically challenging biologging studies.
Effective archiving transforms biologging data from a static record into a powerful, reusable resource with multi-faceted strategic importance.
Archived biologging data provides objective, evidence-based insights crucial for conservation.
Biologging studies represent a significant investment of resources and often capture irreplaceable information about animal life in a specific time and place. Archiving ensures this "digital natural history" is preserved for future generations, protecting it from data loss due to hardware failure, format obsolescence, or simply the passage of time [2] [1]. This aligns with the view of biodiversity itself as a data system, where each species holds irreplaceable information refined over millennia [4].
By making data Findable, Accessible, Interoperable, and Reusable (FAIR), archiving platforms maximize the scientific and societal return on the substantial financial and human investment required for biologging studies [2] [1]. It prevents redundant research and allows secondary users to extract new value from existing data without the cost of new field deployments.
Table 1: Strategic Value of Biologging Data Archiving
| Strategic Area | Key Benefits | Example |
|---|---|---|
| Scientific Research | Enables large-scale meta-analyses; Fosters cross-disciplinary discovery; Improves model accuracy. | Using seal-collected data to map Antarctic Circumpolar Current fronts [2]. |
| Conservation & Policy | Informs design of protected areas; Quantifies anthropogenic threats; Tracks climate change impacts. | Mitigating fisheries bycatch using data on animal movement in shipping lanes [3]. |
| Data Stewardship | Preserves irreplaceable data for future generations; Prevents data loss; Ensures research legacy. | Creating a "dynamic archive of animal life on Earth" for long-term use [1]. |
| Economic Efficiency | Maximizes ROI from expensive deployments; Prevents redundant studies; Enables secondary data use. | Shared datasets allowing new research questions without new field work [2]. |
A robust biologging archive is built on interconnected pillars of data, metadata, and community standards.
The primary data from biologging devices can include location, depth, acceleration, water temperature, and more. A major challenge is the heterogeneity of data formats across sensor types and manufacturers. Strategic archiving requires data standardization, which involves converting data into consistent formats using common column names, standardized date-time formats (e.g., ISO 8601), and uniform file structures [2]. This step is critical for enabling data integration and automated analysis across datasets.
Sensor data alone are meaningless without context. Metadata—data about the data—provides this essential context and includes three key categories:
To ensure interoperability, metadata should conform to international standards and vocabularies, such as the Integrated Taxonomic Information System (ITIS) and Climate and Forecast (CF) Metadata Conventions [2].
An archiving platform is more than a storage drive. It should provide:
Diagram 1: Biologging data archiving workflow
The following protocols provide a roadmap for researchers to prepare and submit data to an archive, and for archives to manage that data effectively.
This protocol outlines the steps a researcher should take from data retrieval to successful archive submission.
1. Pre-Deployment Planning:
2. Data Retrieval and Initial Backup:
3. Data Standardization and Quality Control:
4. Metadata Compilation:
5. Submission and Access Level Setting:
This protocol describes the responsibilities of the data archive in managing and preserving submitted data.
1. Data Ingestion and Validation:
2. Curation and Annotation:
3. Secure Archiving and Preservation:
4. Distribution and Access Management:
5. Integration and Sustainability:
Table 2: Repository Types for Biologging Data
| Repository Type | Focus | Example Platforms | Key Characteristics |
|---|---|---|---|
| Domain-Specific | Biologging and animal movement data. | Movebank, Biologging intelligent Platform (BiP) | High level of community engagement; data curation specific to biologging; supports complex sensor data types [2] [5]. |
| Generalist | Data from any scientific discipline. | Zenodo, Dryad | Accepts data regardless of type or discipline; less specialized curation but broad visibility [5]. |
| Institutional | Data output from a specific institution. | University Data Repositories | Serves the institution's staff; preservation may be tied to the institution's lifespan [5]. |
| Project-Specific | Data from a specific large-scale project or collaboration. | AniBOS (Animal Borne Ocean Sensors) | Focused on a project's goals; enables data sharing within a defined scope and for reuse [2] [5]. |
Successful biologging research and data archiving rely on a suite of "research reagents"—both physical and digital.
Table 3: Essential Materials and Tools for Biologging Research
| Category | Item | Function |
|---|---|---|
| Hardware & Field Equipment | Satellite Relay Data Loggers (SRDLs) | Transmits compressed data (e.g., dive profiles, temperature) via satellite, eliminating need for recapture [2]. |
| GPS & Argos Transmitters | Provides animal location data. | |
| Bio-Logging Tags (Accelerometers, Depth Sensors) | Records fine-scale behaviour (e.g., flipper strokes, dive profiles) and environmental data [2] [3]. | |
| Software & Digital Tools | R / Python with Movement Ecology Packages | For processing, visualizing, and analyzing complex biologging data. |
| Online Analytical Processing (OLAP) Tools | Integrated in platforms like BiP to calculate environmental (e.g., surface currents) or behavioural parameters from raw data [2]. | |
| Data Standards & Vocabularies | Integrated Taxonomic Information System (ITIS) | Provides standardized taxonomic names, ensuring consistency in species identification across datasets [2]. |
| Climate and Forecast (CF) Metadata Conventions | A standard for encoding metadata for earth science data, promoting interoperability [2]. | |
| Data Platforms & Infrastructure | Biologging intelligent Platform (BiP) | An integrated platform for sharing, visualizing, and analyzing standardized biologging data [2]. |
| Movebank | A global data repository for animal tracking data, supporting a wide range of taxa and sensor types [2]. |
Diagram 2: Biologging data archiving ecosystem
Biologging data archiving is a strategic imperative, not an administrative afterthought. It is the foundation for a sustainable and impactful biologging research ecosystem. By defining and implementing robust archiving practices—centered on standardization, rich metadata, and dedicated platform infrastructure—the scientific community can fully leverage the immense value of these unique datasets. This approach ensures that biologging data continues to drive scientific discovery, inform effective conservation policies, and preserve a digital legacy of life on Earth for generations to come. The vision of a globally integrated network, such as the Internet of Animals (IoA), where data flows seamlessly from animals to researchers to decision-makers, depends entirely on the foundations of a strong, collaborative archiving culture [3] [1].
The expanding field of biologging, which involves attaching data recorders to animals to monitor their behavior, physiology, and environment, is generating unprecedented amounts of critical data [2]. This data is invaluable not only for understanding animal ecology but also for secondary applications in oceanography, meteorology, and environmental science [3]. However, the full potential of this data can only be realized through systematic archiving and sharing approaches that ensure data accessibility, standardization, and interoperability. Effective data sharing practices are fundamental to collaborative research and biological conservation, enabling the mapping of animal distributions and movements essential for informed conservation strategies [2]. This application note outlines standardized protocols and platforms for biologging data management, facilitating broader collaboration and data reuse across scientific disciplines.
The following tables summarize key quantitative aspects and metadata requirements for effective biologging data sharing, enabling easy comparison of platform capabilities and data structuring principles.
Table 1: Key Platform Capabilities for Biologging Data Management
| Platform Name | Primary Function | Data Types Supported | Metadata Standards | Access Protocol |
|---|---|---|---|---|
| Biologging intelligent Platform (BiP) | Data storage, standardization, analysis, and sharing [2] | Sensor data (location, depth, temperature, acceleration), animal traits, deployment info [2] | ITIS, CF, ACDD, ISO [2] | CC BY 4.0 for open data; request required for private data [2] |
| protocols.io | Protocol sharing and peer review [6] | Step-by-step experimental and analytical methods [6] | Integrated submission systems (e.g., Nature journals) [6] | Free-to-read, open access CC-BY; private during peer review [6] |
| Movebank | Biologging data management [2] | Primarily location data (7.5 billion points as of 2025) [2] | Not specified in search results | Not specified in search results |
Table 2: Essential Metadata for Biologging Data Reusability
| Metadata Category | Specific Elements | Standardization Importance |
|---|---|---|
| Animal Traits | Sex, body size, breeding history [2] | Enables analysis of individual differences on movement and behavior [2] |
| Instrument Details | Device type, manufacturer, sensor specifications [2] | Ensures proper interpretation of data quality and limitations [2] |
| Deployment Information | Who, when, where, and how deployed [2] | Provides critical context for data interpretation and reuse [2] |
| Data Collection Parameters | Sampling rates, calibration information, data formats [2] | Facilitates data integration and comparative analyses [2] |
Background: Inconsistent data formats—such as different column names for the same sensor data, variations in date-time formats, and differing file types—create significant barriers to collaborative research and secondary use of biologging data [2]. The BiP platform addresses this challenge through a standardized submission process.
Materials:
Procedure:
Metadata Preparation:
Data Standardization:
Data Upload:
Quality Control:
Publication:
Troubleshooting:
Background: Clear, detailed methodologies are essential for research reproducibility. Integrating protocol sharing into the publication process enhances transparency and enables method improvement [6].
Materials:
Procedure:
Platform Integration:
Version Management:
Publication and Linking:
Troubleshooting:
Table 3: Essential Tools for Biologging Data Management and Analysis
| Tool/Platform | Primary Function | Key Features | Application Context |
|---|---|---|---|
| Biologging intelligent Platform (BiP) | Integrated data storage, standardization, and analysis [2] | OLAP tools for environmental parameter calculation, standardized metadata, CC BY 4.0 licensing [2] | Cross-disciplinary research, environmental monitoring, collaborative studies [2] |
| protocols.io | Protocol development, sharing, and peer review [6] | Version history, collaborative editing, integration with journal submission systems [6] | Method documentation, research reproducibility, protocol peer review [6] |
| Movebank | Biologging data management [2] | Large-scale data storage (7.5+ billion location points), multi-taxa support [2] | Animal movement studies, migration analysis, distribution mapping [2] |
| AniBOS (Animal Borne Ocean Sensors) | Global ocean observation using animal-borne sensors [2] | Physical environmental data collection, integration with Argo float data [2] | Oceanographic research, climate studies, polar region monitoring [2] |
| Satellite Relay Data Loggers (SRDL) | Remote data collection and transmission from marine animals [2] | Data compression, satellite transmission, year+ operation [2] | Marine animal tracking, polar region oceanography, inaccessible area monitoring [2] |
Biologging research, which uses animal-borne electronic tags to collect data on movement, behavior, physiology, and environment, generates complex datasets that intersect with evolving data protection regulations. For researchers, scientists, and drug development professionals, navigating the compliance landscape is essential for both scientific integrity and legal adherence. The International Bio-Logging Society emphasizes that proper data management maximizes scientific insight and conservation benefits while minimizing impacts on study subjects [7]. Furthermore, funding agencies like the National Institutes of Health (NIH) now require researchers to plan for data management and sharing, submitting formal Data Management and Sharing Plans (DMSPs) with grant applications [8]. This document provides application notes and protocols to align biologging data archiving and sharing with current regulatory requirements, including FAIR principles (Findable, Accessible, Interoperable, Reusable) and specific data retention mandates emerging across U.S. states and federal agencies [9].
The regulatory environment for data protection is rapidly evolving, with significant implications for biologging data that may involve human researchers, location data, or sensitive species information. Several key laws and principles govern this space:
Recent state laws have introduced specific requirements for data protection assessments and documentation:
Table 1: Selected 2025 U.S. State Data Protection Assessment Requirements
| State/Jurisdiction | Requirement | Effective Date | Key Provisions |
|---|---|---|---|
| Delaware | Data Protection Assessment | July 1, 2025 | Required for processing activities presenting heightened risk of harm, including sensitive data processing, targeted advertising, and profiling [12]. |
| Colorado | Biometric Identifiers Policy | July 1, 2025 | Written policy required including retention schedules, data incident response, and data deletion procedures [12]. |
| Department of Justice | Bulk Sensitive Data Rule | July 8, 2025 | Restrictions on transfer of sensitive data to certain countries; requires demonstration of "good faith efforts" to comply [12]. |
Objective: To create standardized biologging datasets that comply with interoperability requirements and facilitate regulatory compliance.
Materials:
Procedure:
Apply standardization formats:
Implement quality control checks:
Upload to standardized platform:
Document the standardization process for compliance reporting, noting any data transformations, quality issues, or exceptions to standard protocols.
Objective: To conduct and document formal Data Protection Assessments (DPA) as required by state privacy laws for processing activities that present heightened risks.
Materials:
Procedure:
Determine assessment requirements:
Conduct risk assessment:
Identify mitigation measures:
Document the assessment:
The following diagram illustrates the integrated workflow for managing biologging data in compliance with regulatory requirements:
Compliant Biologging Data Workflow: This diagram outlines the pathway from raw data collection to compliant archiving and sharing, showing key stages where regulatory requirements influence data management decisions.
Table 2: Essential Tools and Platforms for Biologging Data Compliance
| Tool/Platform | Function | Compliance Application |
|---|---|---|
| Biologging intelligent Platform (BiP) | Integrated platform for sharing, visualizing, and analyzing biologging data [2]. | Standardizes data formats and metadata following international conventions; enables controlled data sharing with permission workflows. |
| Movebank | Global repository for animal tracking data with over 7.5 billion location points [10]. | Supports data preservation through public archiving; facilitates FAIR compliance through standardized data protocols. |
| movepub (R package) | Software tool to prepare Movebank data for publication [13]. | Automates data standardization and documentation processes to meet reproducibility requirements. |
| ETN R package | Access data from the European Tracking Network [13]. | Enables interoperable data reuse across research networks while maintaining data integrity. |
| ESS-DIVE Repository | Department of Energy repository for environmental research data [9]. | Provides long-term stewardship for modeling data; implements FAIR principles for terrestrial and biologging data. |
| Data Protection Assessment Templates | Structured frameworks for evaluating data processing risks [12]. | Documents compliance with state privacy laws (DE, CO); demonstrates due diligence for sensitive data processing. |
Navigating data retention and regulatory requirements in biologging research requires both technical solutions and institutional commitment. By implementing the standardized protocols, visualization workflows, and toolkits outlined in these application notes, researchers can build compliance into their data management lifecycle rather than treating it as an afterthought. The International Bio-Logging Society's Data Standardisation Working Group continues to develop community standards that both advance scientific goals and address regulatory requirements [13]. As data protection laws continue to evolve, establishing these foundational practices will position biologging researchers to adapt efficiently to new requirements while maximizing the scientific and conservation value of their data.
Bio-logging involves the use of animal-borne electronic tags, or "bio-loggers," to record data about an animal's movements, behavior, physiology, and the environment it experiences [14]. The rapid growth of this field is generating unprecedented volumes of data, offering profound opportunities for ecological research, biodiversity conservation, and environmental monitoring [14] [2] [15]. This application note establishes a standardized protocol for managing the complete lifecycle of bio-logging data, treating these data streams as dynamic digital archives of animal life on Earth [14] [16]. Adhering to this lifecycle is critical for ensuring the long-term scientific value, accessibility, and ethical reuse of these complex datasets.
The bio-logging data lifecycle encompasses a series of interconnected stages, from strategic planning and data collection to final archiving and reuse. The following workflow diagram visualizes this comprehensive process and the logical relationships between its key stages.
Effective data management begins before any device is deployed. This stage focuses on defining research objectives and selecting appropriate methodologies to ensure the collection of high-quality, ethically-sound data.
Objective: To define research goals, obtain necessary permits, and select appropriate bio-logging devices and attachment methods that minimize impact on animal welfare [14].
Materials:
Methodology:
Table 1: Key tools and platforms for bio-logging data collection and management.
| Item Name | Type/Model Examples | Function & Application |
|---|---|---|
| Satellite Relay Data Loggers (SRDL) | SMRU Instrumentation SRDLs | Transmit compressed data (e.g., dive profiles, temperature) via satellite; ideal for marine mammals in remote regions [2]. |
| GPS & Accelerometer Tags | Various manufacturers (e.g., TechnoSmart, Ornitela) | Record high-resolution location and tri-axial acceleration data for reconstructing movement paths and classifying behavior [15]. |
| Archival Tags (Data Loggers) | Lotek LAT, Star-Oddi DST | Store data internally for later retrieval; used when high-resolution data is needed and animal recapture is feasible [2]. |
| Bio-Logging Data Platforms | Movebank, Biologging intelligent Platform (BiP), WRAM | Web-based infrastructures for managing, storing, standardizing, and sharing bio-logging data during and after collection [14] [2]. |
| Time-Series Analysis Software | R Statistical Environment (packages: nlme, lme4, glmmTMB) |
Statistical toolkits for modeling autocorrelated physiological and movement data from bio-loggers [15]. |
Raw data from bio-logging devices are often not analysis-ready. This stage involves transforming raw data into a structured, standardized, and annotated format.
Objective: To clean, calibrate, and annotate raw sensor data with comprehensive metadata, creating a standardized dataset ready for ecological analysis or integration with larger data collections [2].
Materials:
Methodology:
This stage involves applying statistical and computational models to extract biological and environmental insights from the standardized data.
Objective: To analyze bio-logging time-series data (e.g., heart rate, depth, acceleration) using appropriate statistical models that account for temporal autocorrelation and complex data structures [15].
Materials:
Methodology:
Effective visualization is key to interpreting and communicating results. The following table summarizes best practices for colorizing biological data visualizations.
Table 2: Color application rules for biological data visualization based on data type [17] [18].
| Data Type (Measurement Scale) | Description & Examples | Recommended Color Scheme | Rationale & Tips |
|---|---|---|---|
| Nominal/Categorical | Categories without order (e.g., species, behavior type). | Qualitative (distinct hues). | Use distinct colors with easily names hues (e.g., Red, Blue, Green). Avoid fine hue differentiations. |
| Ordinal | Categories with order but unknown intervals (e.g., low, medium, high). | Sequential (light to dark). | Lightness (Luminance) should increase or decrease with the order. Use a single hue or small set of adjacent hues. |
| Interval/Ratio (Quantitative) | Numerical values with meaningful intervals (e.g., temperature, depth, speed). | Sequential or Diverging. | Sequential: For data from low to high. Diverging: To highlight deviation from a neutral mid-point (e.g., Purple-Green scheme) [18]. |
| Guidelines for All Types | Check for Deficiencies: Use tools (e.g., Coblis) to simulate Protanopia, Deuteranopia, Tritanopia. Ensure Contrast: Text and symbols must have high contrast against background colors. Use Perceptually Uniform Spaces: Prefer HCL or CIE L*a*b* over RGB [17]. |
Preserving data in a public, trusted archive ensures its long-term value and supports scientific reproducibility and collaboration.
Objective: To deposit curated data and metadata into a suitable long-term repository, making it Findable, Accessible, Interoperable, and Reusable (FAIR) while respecting the CARE principles for Indigenous data governance [14] [19].
Materials:
Methodology:
The future of bio-logging relies on integrated data collections that function as dynamic archives. Platforms like Movebank and BiP are leading this effort by providing standardized tools for the entire data lifecycle [14] [2]. BiP's unique integration of Online Analytical Processing (OLAP) tools allows users to calculate environmental parameters, such as surface currents and ocean winds, directly from animal movement data, showcasing the potential for cross-disciplinary secondary use [2].
A critical next step is the establishment of global governance, such as the community-led coordinating body proposed by the International Bio-logging Society, to oversee data standards and ensure the sustainable preservation of these invaluable digital archives of animal life [14] [13]. Widespread adoption of these protocols by researchers, coupled with support from funders, publishers, and data repositories, is essential to fully realize the potential of bio-logging data for addressing fundamental ecological questions and mitigating biodiversity threats.
Biologging, the practice of equipping animals with electronic data loggers, has transcended its origins in behavioral ecology to become a critical source of high-resolution environmental data. This article outlines standardized protocols for repurposing these animal-borne sensor data for applications in oceanography, meteorology, and biomedicine. Framed within the urgent need for robust archiving and sharing approaches, we detail methodologies for data collection, processing, and integration, providing a framework for maximizing the scientific return from biologging investments.
Initially developed to monitor the behavior and physiology of wild animals in their natural habitats, biologging is a Lagrangian observation method that moves with the animal, providing a unique mobile platform for data collection [3]. The field has evolved from basic tracking to the deployment of sophisticated suites of sensors that measure a host of environmental parameters. The resulting data, when properly archived and shared, present a transformative opportunity for secondary use in disparate scientific disciplines. These secondary uses are contingent upon the establishment of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles and community-wide adoption of standardized protocols [10]. This document provides application notes and experimental protocols for leveraging biologging data, emphasizing its role in a broader data-sharing ecosystem essential for addressing global scientific challenges.
The secondary use of biologging data follows a structured pipeline, from sensor deployment on animals to the final integration of validated data into cross-disciplinary models.
Objective: To collect in-situ oceanographic data (e.g., temperature, salinity) from marine animals to complement traditional ocean observation systems like Argo floats.
Step 1: Sensor Selection and Calibration
Step 2: Animal Deployment
Step 3: Data Transmission and Processing
Step 4: Data Integration
Objective: To derive estimates of ocean surface winds, currents, and waves by analyzing the movement patterns of soaring seabirds.
Step 1: High-Frequency Movement Data Collection
Step 2: Movement Trajectory Analysis
Step 3: Environmental Parameter Estimation
Step 4: Data Validation and Sharing
Objective: To integrate animal tracking and trait data for ecological forecasting and to develop models with potential translational applications in biomedicine.
Step 1: Multi-Source Data Aggregation
Step 2: Data Curation and Standardization
Step 3: Integrated Analysis for Hypothesis Testing
Step 4: Application in Human Health
The value of biologging data for secondary use is fully realized only through a robust and standardized data management pipeline.
The following diagram illustrates the integrated pipeline from data collection to secondary use, highlighting the critical role of archiving and standardization.
Table 1: Key data types collected via biologging and their relevance to secondary research disciplines.
| Data Type | Sensor(s) | Primary Use | Secondary Use & Discipline |
|---|---|---|---|
| Depth / Pressure | Pressure Sensor | Diving behavior | Oceanographic profile data (Oceanography) |
| Water Temperature | Thermistor | Thermal niche use | Sea surface & subsurface temperature maps (Oceanography) |
| Salinity | Conductivity Cell | Habitat preference | Ocean salinity models (Oceanography) |
| High-Res Location | GPS | Home range, migration | Derivation of surface currents & winds (Meteorology) |
| Acceleration | Accelerometer | Behavior identification, energetics | Model validation; Biomechanical studies (Biomedicine) |
| Body Temperature | Thermistor | Physiology, health | Fever response models (Biomedicine) |
Table 2: Platforms for archiving and sharing biologging data to enable secondary use.
| Platform Name | Primary Focus | Key Feature for Secondary Use | Data License |
|---|---|---|---|
| Biologging intelligent Platform (BiP) | Integrated, multi-sensor data | Online Analytical Processing (OLAP) for environmental parameter estimation [2] | CC BY 4.0 |
| Movebank | Animal tracking | Large-scale data aggregation; Linkage to environmental layers [10] | Varies by owner |
| AniBOS (Animal Borne Ocean Sensors) | Oceanographic data | Standardizes & delivers biologging data into the Global Ocean Observing System [2] | Follows GOOS policy |
Table 3: Essential tools, platforms, and reagents for biologging data research and secondary use.
| Item / Solution | Type | Function in Research |
|---|---|---|
| Satellite Relay Data Logger (SRDL) | Hardware | Transmits compressed sensor data (depth, temperature, etc.) via satellite, enabling long-term, remote data collection [2]. |
| Biologging intelligent Platform (BiP) | Data Platform | Stores, standardizes, and analyzes biologging data with integrated OLAP tools to estimate environmental parameters [2]. |
| Movebank | Data Platform | A global repository for animal tracking data that facilitates data management, sharing, and analysis [10] [21]. |
| vol2bird Algorithm | Software Algorithm | Processes weather radar data to extract biological signals, generating vertical profiles of bird density, speed, and direction [23]. |
| FAIR Guiding Principles | Data Framework | Ensures data are Findable, Accessible, Interoperable, and Reusable, which is critical for data preservation and secondary use [10]. |
| Integrated Bio-logging Framework (IBF) | Methodological Framework | Aids researchers in designing biologging studies by optimizing the links between biological questions, sensors, data, and analysis [21]. |
Biologging data are a powerful and growing resource for secondary research, turning animals into intelligent, mobile sensors of our changing planet. The full potential of this data to revolutionize oceanography, meteorology, and biomedicine will only be unlocked through a concerted, global effort to prioritize standardized archiving and open, ethical data sharing. The protocols and platforms outlined here provide a roadmap for researchers to contribute to and draw from this invaluable, expanding horizon of scientific data.
The expansion of bio-logging technologies has generated unprecedented volumes of data on animal movement, behavior, and physiology. Effective management and sharing of these complex datasets present significant challenges for researchers, requiring robust archival solutions that ensure data persistence, accessibility, and interoperability. This application note provides a comparative analysis of three major platforms—GBIF (Global Biodiversity Information Facility), Movebank, and the Movebank Data Repository (often referenced as BiP in bio-logging contexts)—to guide researchers in selecting appropriate archives for their specific data types and research objectives. Framed within a broader thesis on archiving approaches for bio-logging research, we detail specific workflows for mobilizing data between these platforms to maximize scientific value and compliance with evolving data policies.
GBIF serves as a global data infrastructure for biodiversity occurrence data, integrating species observations from specimens, human observations, and increasingly, machine-generated sources [24]. Its core mission centers on providing free and open access to biodiversity data to support research, policy, and conservation [25]. In contrast, Movebank is a specialized platform for managing, visualizing, and sharing animal tracking and other bio-logging sensor data, built on a flexible data model designed to accommodate diverse tracking technologies and taxa [26]. The Movebank Data Repository (MDR) is a public archive integrated with Movebank that provides long-term preservation and formal publication of curated tracking datasets with persistent identifiers [27].
Table 1: Core Characteristics of Major Bio-Logging Data Archives
| Feature | GBIF | Movebank | Movebank Data Repository |
|---|---|---|---|
| Primary Scope | Global biodiversity species occurrence data [28] | Animal tracking & bio-logging sensor data [26] | Published, curated animal movement & bio-logging data [27] |
| Core Data Model | Darwin Core (DwC) [28] | Movebank Data Model [26] | Movebank Data Model + DwC for integration [29] |
| Data Publication | Dataset publication via IPT, DOI assignment [25] | User-managed studies, controlled sharing options | Formal deposition, curation, and DOI assignment [27] |
| Licensing | Creative Commons (CC0, BY, BY-NC) | Various, user-controlled | Creative Commons (CC0, BY, BY-NC) [27] |
| Key Strength | Unified search for biodiversity data, extensive use in policy & research [25] | Specialist tools for complex movement data, active research collaboration | Data preservation, formal citation, fulfilling journal/funder mandates [27] |
Table 2: Quantitative Platform Metrics (Based on Available Data)
| Metric | GBIF | Movebank | Movebank Data Repository |
|---|---|---|---|
| Data Volume | ~3 billion occurrence records [25] | >4 billion location records, >7,500 studies [29] | Subset of Movebank studies submitted for publication |
| Taxonomic Coverage | All taxa (specimens, observations) [25] | >1,252 taxa [29] | Varies, primarily animal tracking data |
| Publisher Workflow | Integrated Publishing Toolkit (IPT) [25] | Movebank website & API | Submission via Movebank, followed by curation [27] |
A critical development in bio-logging data management is the creation of workflows to mobilize data from research platforms like Movebank to global infrastructures like GBIF. This enables movement data to function as species occurrence records, significantly broadening their discoverability and utility for biodiversity modeling, conservation assessments, and policy [30]. The MOVE2GBIF project established a foundational, open-source workflow for this purpose, using an R package (movepub) to transform data formatted in the Movebank model into the Darwin Core standard required by GBIF [29] [31].
Key technical considerations for this data transformation include:
animal_id, tag_id) must be mapped to appropriate Darwin Core terms (e.g., organismID, occurrenceID) [29].The diagram below illustrates the primary data flow models for publishing bio-logging data to GBIF, as demonstrated by the MOVE2GBIF project and analogous efforts for camera trap data.
Data Publishing Workflow Models. This diagram outlines multiple pathways for publishing bio-logging data from primary research platforms to GBIF, including direct archival and transformed publication.
This protocol is adapted from the established MOVE2GBIF workflow [29] [30]. It allows researchers to make their tracking data discoverable alongside other biodiversity records while maintaining a primary, rich dataset in a specialist repository.
I. Pre-publication Data Preparation on Movebank
II. Data Transformation to Darwin Core
movepub R package from its GitHub repository (github.com/inbo/movepub) [29].animal_id to organismID.timestamp and location_lat/location_long to eventDate and decimalLatitude/decimalLongitude.III. Publication and Registration
The following table details key technologies, standards, and software solutions essential for conducting bio-logging research and effectively archiving the resulting data.
Table 3: Key Research Reagents and Solutions for Bio-Logging Data Management
| Item Name | Function/Application | Specific Examples / Properties |
|---|---|---|
| GPS Loggers | Records fine-scale location data over time. | University of Amsterdam Bird Tracking System (UvA-BITS) used in oystercatcher studies [30]. |
| Movebank Data Model | Standardized vocabulary to describe and structure animal tracking data from diverse sources. | Defines core concepts: Animal, Tag, Deployment, Study, Event, Location [26]. |
| Darwin Core (DwC) | Standardized vocabulary for sharing biodiversity data, essential for GBIF integration. | Includes terms: organismID, eventDate, decimalLatitude, decimalLongitude [28]. |
| movepub R Package | Open-source software to automate transformation of Movebank data to Darwin Core. | Developed in MOVE2GBIF project; prepares data and EML metadata for GBIF publication [29]. |
| Integrated Publishing Toolkit (IPT) | Software application for publishing and registering biodiversity datasets to GBIF. | Hosted by institutions; used to upload Darwin Core Archives and metadata for GBIF harvesting [25]. |
| Camtrap DP | Data standard for camera trap data, enabling sharing through platforms like GBIF. | Used to transform data from platforms like Agouti into Darwin Core [25]. |
| Creative Commons Licenses | Legal tools to grant public permission to share and use data and creative work. | CC0 (public domain), CC BY (attribution), CC BY-NC (attribution, non-commercial) [27]. |
GBIF, Movebank, and the Movebank Data Repository are complementary pillars in the bio-logging data ecosystem. Movebank excels as an active research platform for complex data management, while its Data Repository provides critical preservation and formal citation services. GBIF serves as a powerful engine for data discovery and reuse across the broader biodiversity community. The emerging workflows that connect these platforms, such as the MOVE2GBIF model, represent a significant advancement in open science. They empower researchers to leverage the strengths of each platform, ensuring that valuable bio-logging data contributes fully to scientific discovery, conservation policy, and the global effort to understand and protect biodiversity.
The burgeoning field of biologging, which uses animal-borne sensors to collect data on movement, behavior, physiology, and the environment, faces a critical juncture. The volume and complexity of data collected have grown exponentially, creating both unprecedented research opportunities and significant data management challenges. Without systematic approaches to data organization and sharing, much of this valuable data becomes rapidly lost to science, undermining research reproducibility, hindering collaborative synthesis, and limiting the potential for secondary applications in fields such as oceanography, meteorology, and conservation science [14]. The power of standardization lies in its ability to transform disparate, idiosyncratic datasets into interoperable, reusable resources that can fuel discovery across disciplinary boundaries.
Adopting established formats like Darwin Core and community-defined protocols addresses these challenges by providing a common framework for data description, sharing, and archiving. Darwin Core, a widely adopted standard for biodiversity data, offers a stable, straightforward, and flexible framework for compiling data from varied sources [32] [33]. Concurrently, the biologging community is developing and refining its own specialized standards to address the unique complexities of sensor data, animal metadata, and deployment information [13] [2]. This article provides detailed application notes and protocols for implementing these standards, framed within a broader thesis on effective archiving and sharing approaches for biologging data research, to empower researchers, scientists, and data managers to harness the full potential of their data.
Darwin Core (DwC) is a standard maintained by Biodiversity Information Standards (TDWG). It consists primarily of a glossary of terms intended to facilitate the sharing of information about biological diversity. These terms provide identifiers, labels, and definitions and are primarily based on taxa, their occurrence in nature as documented by observations, specimens, and related information [32]. Its primary strength is in enabling interoperability across the broader biodiversity informatics domain.
For biologging data, Darwin Core provides the essential framework for describing the "who" and "where" components—the taxonomic identity of the tracked animal and the spatiotemporal context of its occurrences. Key term classes include:
dwc:taxonID, dwc:scientificName).dwc:occurrenceID, dwc:recordedBy).dwc:eventID, dwc:eventDate, dwc:decimalLatitude, dwc:decimalLongitude) [32] [34].The majority of datasets shared through global infrastructures like the Global Biodiversity Information Facility (GBIF) are published using the Darwin Core Archive (DwC-A) format, which packages data and metadata into a standardized, easily shareable folder structure [33].
While Darwin Core covers basic biodiversity concepts, the specific nature of biologging data requires more specialized descriptive protocols. A community-driven standardization framework has been proposed to advance ecological research and conservation by making bio-logging data Findable, Accessible, Interoperable, and Reusable (FAIR) [13] [14]. This framework encompasses:
Platforms like the Biologging intelligent Platform (BiP) implement this vision by storing sensor data alongside detailed, standardized metadata. BiP's metadata schema conforms to international standards like the Integrated Taxonomic Information System (ITIS), Climate and Forecast Metadata Conventions (CF), and ISO standards, ensuring broad interoperability beyond biology [2].
Table 1: Core Standards for Biologging Data Archiving
| Standard Name | Primary Scope | Key Applicability to Biologging | Governing Body/Platform |
|---|---|---|---|
| Darwin Core (DwC) [32] | Biodiversity data exchange | Describing taxonomic identity (dwc:taxonID), occurrence events (dwc:occurrenceID), and basic spatiotemporal parameters (dwc:decimalLatitude, dwc:eventDate). |
Biodiversity Information Standards (TDWG) |
| Darwin Core Archive (DwC-A) [33] | Data packaging and publication | A ready-to-publish package containing core data, extension data, and metadata (EML), ideal for sharing occurrence data derived from tracking. | GBIF / TDWG |
| Biologging Standardization Framework [14] | Sensor data and metadata | A comprehensive framework for standardizing sensor data formats, animal metadata, and deployment information to enable data integration. | International Bio-logging Society |
| BiP Metadata Schema [2] | Multi-disciplinary data interoperability | Standardizing metadata for animal traits, instrument details, and deployment context using ITIS, CF, and ISO conventions. | Biologging intelligent Platform (BiP) |
This protocol outlines a step-by-step process for preparing a biologging dataset for public archiving, integrating both Darwin Core and community-specific standards. The workflow ensures data is structured, documented, and packaged for maximum usability and long-term preservation.
Objective: To transform raw, researcher-collected biologging data into a standardized, archived, and FAIR-compliant dataset.
Step 1: Data Audit and Reorganization
Project/Data/2024/Site_A/Final/Processed/seal453_tracks_final_v2.csv, use data/raw_tracks/seal_453.csv.Step 2: Sensor Data Standardization
latitude and longitude consistently).2024-11-21T14:30:00Z).Step 3: Mapping to Darwin Core Terms
occurrenceID: A unique identifier for each location fix.scientificName: The full scientific name of the tracked animal.eventID: An identifier linking to the tracking event.eventDate: The timestamp of the location fix.decimalLatitude & decimalLongitude: The geographic coordinates.basisOfRecord: MachineObservation.Table 2: Example Mapping of Sensor Data to Darwin Core
| Original Sensor Data Column | Standardized Name | Darwin Core Term | Example Value |
|---|---|---|---|
animal_id |
tagLocalIdentifier |
(Extension) | Seal_453 |
lat |
decimalLatitude |
dwc:decimalLatitude |
-67.12345 |
lon |
decimalLongitude |
dwc:decimalLongitude |
142.67890 |
timestamp_utc |
eventDate |
dwc:eventDate |
2024-02-01T03:45:12Z |
LCC4.2 |
scientificName |
dwc:scientificName |
Leptonychotes weddellii |
| - | occurrenceID |
dwc:occurrenceID |
https://ipt.biodiversity.org/occurrence/12345 |
| - | basisOfRecord |
dwc:basisOfRecord |
MachineObservation |
Step 4: Compile Extended Metadata
individualID, taxonID, sex, lifeStage, and bodyLength.deviceID, deviceType, sensorType, and accuracy.deploymentID, deploymentDateTime, retrievalDateTime, attachmentMethod, and locationOfDeployment.Step 5: Create Human-Readable Documentation
README.txt file and a data dictionary.README should describe the project, the experimental design, file structure, and any quality flags or caveats.Step 6: Package as a Darwin Core Archive
Step 7: Deposit in a Public Repository
Successful standardization and archiving rely on a suite of digital "reagents" and platforms. The following table details key resources that constitute the modern biologging data scientist's toolkit.
Table 3: Essential Toolkit for Biologging Data Standardization and Archiving
| Tool/Resource Name | Type | Function & Application |
|---|---|---|
| Darwin Core Terms Guide [34] | Reference Guide | Provides the definitive list and definitions of all Darwin Core terms, essential for correct metadata mapping. |
| Biologging intelligent Platform (BiP) [2] | Data Platform | An integrated platform for uploading, standardizing, visualizing, and sharing biologging data with integrated metadata. |
| Movebank [14] | Data Repository & Management Tool | A free global repository for animal tracking data that supports a robust data model and facilitates data management pre- and post-publication. |
| EML (Ecological Metadata Language) [33] | Metadata Standard | An XML-based standard for describing ecological datasets in a modular fashion; the required metadata component for a Darwin Core Archive. |
| bandbox [35] | Software Tool | A Python package that assesses data directory organization by flagging issues like redundant directories or invalid characters in filenames before archival. |
| ETN R Package [13] | Software Tool | An R package designed to access and process data from the European Tracking Network, exemplifying how standardized data enables powerful analytical tools. |
| AniBOS (Animal Borne Ocean Sensors) [14] | Community Initiative | A global project integrating animal-borne sensor data into the Global Ocean Observing System, demonstrating cross-disciplinary data reuse. |
Adopting the power of standardization is not merely a technical exercise but a fundamental shift towards a more collaborative, open, and cumulative science. By implementing formats like Darwin Core and community-defined protocols, biologging data transitions from being a static, private output of a single study to becoming a dynamic, living component of a global digital natural history archive [14]. This transformation enables researchers to address questions at previously impossible scales, from global patterns of animal movement in response to climate change to the development of more robust computational models across ecology, oceanography, and conservation.
The pathway forward requires continued community engagement. Researchers are encouraged to participate in standards bodies like the International Bio-logging Society's Data Standardisation Working Group, demand and use standardized data from platforms like Movebank and BiP, and advocate for resources and policies that support long-term data stewardship [13] [14]. Through these concerted efforts, the biologging community can ensure that the immense value locked in its data is fully realized, both for today's science and for the legacy of future generations.
Biologging involves attaching data recorders to animals to monitor their behavior, physiology, and surrounding environment in the wild [36]. The Biologging intelligent Platform (BiP) is an integrated platform designed to address the critical social and academic mission of preserving these valuable datasets for future generations [36]. As a standardized repository, BiP facilitates collaborative research and biological conservation by enabling researchers to share, visualize, and analyze biologging data according to internationally recognized standards for sensor data and metadata storage [36]. This guide provides comprehensive protocols for preparing and uploading data to BiP, framed within the broader context of enhancing data sustainability and interoperability in biologging research.
Prior to uploading data to BiP, researchers should gather the following essential information:
BiP requires metadata to be structured according to international standards including the Integrated Taxonomic Information System (ITIS), Climate and Forecast Metadata Conventions (CF), Attribute Conventions for Data Discovery (ACDD), and International Organization for Standardization (ISO) [36]. The platform supports three primary metadata categories:
Table 1: Required Animal Metadata
| Metadata Category | Specific Elements | Format Standards |
|---|---|---|
| Individual Traits | Sex, body size, life history stage, breeding status | Controlled vocabularies |
| Taxonomic Information | Species name, taxonomic classification | Integrated Taxonomic Information System (ITIS) |
| Biological Measurements | Weight, length, health indicators | Numeric values with standardized units |
Table 2: Required Instrument Metadata
| Metadata Category | Specific Elements | Format Standards |
|---|---|---|
| Device Specifications | Manufacturer, model, sensor types, firmware version | Structured text fields |
| Calibration Data | Calibration dates, methods, reference values | ISO standard formats |
| Technical Parameters | Sampling rates, resolution, accuracy | Numeric values with standardized units |
Table 3: Required Deployment Metadata
| Metadata Category | Specific Elements | Format Standards |
|---|---|---|
| Deployment Context | Researcher, institution, project name | ACDD conventions |
| Temporal Information | Deployment date/time, retrieval date/time | ISO 8601 format |
| Geographical Context | Deployment location, habitat type | Decimal degrees, CF conventions |
BiP handles diverse sensor data types while promoting standardization for improved interoperability:
Data Column Standardization:
Supported Sensor Parameters:
File Format Considerations:
Figure 1: Data preparation workflow for Biologging intelligent Platform (BiP) showing sequential steps from metadata collection to upload readiness.
Initiate Upload Session:
Metadata Entry:
File Upload:
Data Standardization:
After upload, implement these verification protocols:
Visualization Review:
Metadata Validation:
Data-Metadata Alignment:
BiP provides flexible data sharing options to accommodate different research needs:
Table 4: Data Access Levels in BiP
| Access Level | Visibility | Download Permissions | Use Cases |
|---|---|---|---|
| Open Data | Public | Free download under CC BY 4.0 | Published research, collaborative projects |
| Private Data | Restricted | Owner permission required | Ongoing studies, sensitive locations |
| Embargoed Data | Metadata only | Available after embargo period | Planned publications, thesis research |
License Conditions:
Access Requests:
BiP's unique OLAP capabilities enable derived data products:
Environmental Parameter Calculation:
Behavioral Parameter Estimation:
Algorithm Integration:
DOI Integration:
Multi-Repository Storage:
Table 5: Key Research Reagents and Materials for Biologging Studies
| Reagent/Material | Function | Application Context |
|---|---|---|
| Satellite Relay Data Loggers (SRDL) | Transmit compressed data via satellite | Long-term marine mammal studies in remote regions |
| Animal-Borne Cameras | Visual documentation of behavior and environment | Fine-scale foraging ecology and species interactions |
| Acceleration Data Loggers | Monitor fine-scale movements and behaviors | Classification of foraging attempts, grooming, resting |
| Depth-Temperature Recorders | Profile dive behavior and thermal environment | Oceanographic data collection in ice-covered regions |
| Heart Rate Monitors | Estimate energy expenditure and physiological stress | Flight energy calculations in seabirds, swimming costs |
| Geolocation Sensors | Track position using light-based algorithms | Migration mapping for small-bodied species |
The Biologging intelligent Platform represents a significant advancement in ecological data archiving by providing standardized protocols for data preservation, sharing, and reuse. By adhering to the preparation and upload procedures outlined in this guide, researchers contribute to a sustainable future for biologging data that transcends individual research projects and disciplinary boundaries. The platform's commitment to international standards, coupled with its advanced analytical capabilities and flexible sharing models, addresses the critical need for interoperable data frameworks in movement ecology and environmental monitoring. As biologging continues to expand across taxonomic groups and research questions, BiP offers a robust infrastructure for preserving these valuable datasets for future scientific discovery and conservation applications.
In the expanding field of biologging, where animal-borne electronic tags generate vast datasets on wildlife movements, behavior, and physiology, comprehensive metadata provides the essential context that transforms raw sensor readings into meaningful, reusable scientific data. The rapid growth of this discipline offers unprecedented opportunities for collaborative research and biological conservation but also presents significant challenges in data integration, sharing, and preservation [10]. Establishing robust archiving and sharing approaches is no longer a secondary concern but a foundational requirement for advancing biologging science. This application note delineates the critical metadata standards and protocols necessary for creating dynamic, interoperable archives of biologging data, ensuring their utility for future scientific discovery across multiple disciplines, from ecology to oceanography [2].
A structured metadata framework is indispensable for ensuring data interoperability, facilitating discovery, and enabling accurate interpretation. The following tables summarize the essential metadata elements, categorized by domain, required for effective biologging data archiving.
Table 1: Essential Animal-Based Metadata This table details the fundamental metadata related to the study subject, which is crucial for interpreting behavioral and physiological data in a biological context.
| Metadata Field | Description | Standard/Format Recommendation |
|---|---|---|
| Species | Scientific name of the animal. | Integrated Taxonomic Information System (ITIS) [2] |
| Common Name | Common name of the animal. | Automatically populated from ITIS selection [2] |
| Sex | Biological sex of the individual. | Controlled vocabulary (e.g., M, F, Unknown) |
| Life Stage | Age class or life stage. | Controlled vocabulary (e.g., Adult, Subadult, Juvenile) |
| Body Size | Morphometric measurements (e.g., weight, length). | Numerical value with standardized unit (e.g., kg, cm) |
| Breeding Status | Reproductive condition at time of deployment. | Controlled vocabulary (e.g., Breeding, Non-breeding) |
Table 2: Essential Instrument-Based Metadata This table outlines the technical metadata describing the data-logging device, which is vital for understanding sensor capabilities, accuracy, and limitations.
| Metadata Field | Description | Standard/Format Recommendation |
|---|---|---|
| Device Type | Category of the instrument (e.g., GPS logger, SRDL). | Controlled vocabulary [2] |
| Device ID | Unique manufacturer serial number. | Alphanumeric string |
| Manufacturer | Name of the device manufacturer. | Free text |
| Sensors | List of sensors integrated (e.g., depth, temperature, accelerometer). | Controlled vocabulary [2] |
| Calibration Dates | Dates when sensors were calibrated. | ISO 8601 (YYYY-MM-DD) |
| Sampling Rate | Frequency at which data is recorded. | Numerical value with unit (e.g., Hz) |
Table 3: Essential Deployment-Based Metadata This table describes the context of how the instrument was attached to the animal, providing necessary information for analyzing data onset and assessing potential impacts on the animal.
| Metadata Field | Description | Standard/Format Recommendation |
|---|---|---|
| Deployment ID | A unique identifier for the deployment event. | Alphanumeric string |
| Attachment Method | Technique used to affix the device (e.g., harness, glue). | Controlled vocabulary |
| Deployment DateTime | The precise date and time the device was attached. | ISO 8601 (YYYY-MM-DDThh:mm:ssZ) |
| Deployment Location | Geographic coordinates of the deployment site. | Decimal degrees (Lat, Lon) |
| Recapture DateTime | The date and time the device was recovered (if applicable). | ISO 8601 (YYYY-MM-DDThh:mm:ssZ) |
| Investigator | Name of the researcher who performed the deployment. | Free text |
This protocol provides a step-by-step methodology for researchers to prepare and upload biologging data to a centralized, standards-compliant platform like the Biologging intelligent Platform (BiP) [2].
1. Pre-Deployment Registration: - Action: Prior to field deployment, register a new deployment event within the platform (e.g., BiP). - Methodology: Populate the online form with the planned deployment metadata, drawing from the fields defined in Tables 1-3. Utilize pull-down menus for controlled vocabularies to ensure consistency and minimize entry errors [2].
2. Sensor Data Collection and Export: - Action: Retrieve data from the biologging instrument after recovery or via remote transmission. - Methodology: Use the manufacturer's software to export the raw sensor data. Preserve the original data format. Common parameters include timestamp, latitude, longitude, depth, temperature, and acceleration [2].
3. Data Format Standardization: - Action: Convert the raw sensor data into a standardized format. - Methodology: Leverage the platform's (e.g., BiP) integrated tools to map raw data columns (e.g., "Lat," "Latitude") to standardized column names. Standardize date-time formats to ISO 8601 and ensure numerical values have consistent units [2].
4. Metadata Association and Validation: - Action: Link the standardized sensor data file with the complete deployment, animal, and instrument metadata. - Methodology: The platform will present the pre-registered metadata for confirmation and completion. Finalize all entries. The system should run an automated validation check to ensure all required fields are populated and conform to the agreed standards [2].
5. Licensing and Sharing Policy Selection: - Action: Define the terms of use for the dataset. - Methodology: Select a license, such as CC BY 4.0, which permits sharing and adaptation with appropriate attribution. Choose the data visibility (e.g., fully open, available on request) as required by the research funders and ethical permits [2].
The following diagram illustrates the logical flow and decision points within the data standardization and archiving protocol.
Table 4: Key Platforms and Databases for Biologging Data Management This table lists essential resources for storing, visualizing, and analyzing biologging data, forming the core infrastructure for the field.
| Resource Name | Type | Primary Function | Key Feature |
|---|---|---|---|
| Biologging intelligent Platform (BiP) | Integrated Data Platform | Standardized storage, visualization, and analysis of sensor data and metadata [2]. | Online Analytical Processing (OLAP) tools for estimating environmental parameters [2]. |
| Movebank | Web-Based Database | Management, sharing, and analysis of animal tracking data [2]. | Largest database, containing billions of location points across numerous taxa [2]. |
| AniBOS (Animal Borne Ocean Sensors) | Observation Network | Establishing a global ocean observation system using animal-borne sensors [2]. | Focuses on gathering physical environmental data (e.g., temperature, salinity) for oceanography [2]. |
| Seabird Tracking Database | Taxonomic-Specific Database | Hosting and managing tracking data for seabirds [10]. | Enables meta-analyses of seabird behavior and distribution for conservation [2]. |
| OTN (Ocean Tracking Network) | Acoustic Tracking Network | Global data repository for aquatic animal acoustic telemetry data [10]. | Coordinates a network of acoustic receivers to track animal movements over long distances. |
| International Bio-Logging Society | Coordinating Body | Community-led organization promoting best practices and standardization [10]. | Launches working groups (e.g., Data Standardisation WG) to develop community standards [10]. |
Within modern biologging research, a critical challenge lies in transforming vast archives of raw animal-borne sensor data into accessible, quantitative environmental knowledge. The Biologging intelligent Platform (BiP), a platform designed for sharing and analyzing biologging data, addresses this through its integrated Online Analytical Processing (OLAP) tools [36]. These tools are instrumental for a research thesis focused on novel archiving and sharing approaches, as they enable the secondary use of animal behavior data to estimate physical environmental parameters [36]. This application note details the protocols for using OLAP to calculate key environmental variables such as surface currents, ocean winds, and waves from biologging datasets, thereby illustrating how shared data can create value across disciplines like oceanography and meteorology [36].
Biologging data, when processed through OLAP, provides critical environmental measurements that complement traditional observation systems like Argo floats and meteorological satellites [36]. The following parameters can be derived, offering high temporal resolution and coverage in regions difficult for conventional methods to access.
Table 1: Core Environmental Parameters Derivable via OLAP Analysis
| Environmental Parameter | Data Source (Animal Subjects) | Typical Sensor Data Required | Complement to Traditional Observation |
|---|---|---|---|
| Surface Currents | Seabirds, Marine Reptiles [36] | Horizontal position (GPS), timestamps [36] | Provides data in shallow waters and ice-covered regions unsuitable for Argo floats [36]. |
| Ocean Winds & Waves | Seabirds [36] | Flight dynamics, acceleration [36] | Offers higher temporal resolution data at the ocean-atmosphere boundary [36]. |
| Water Temperature Profiles | Seals, Sea Turtles, Sharks [36] | Depth, temperature [36] | Collects data in polar and eastern Pacific regions with high density, filling spatial gaps [36]. |
| Salinity Profiles | Phocid Seals (e.g., Elephant Seals) [36] | Conductivity, temperature, depth (CTD) [36] | Data volume in the Antarctic is comparable to that from Argo floats [36]. |
Table 2: Comparison of Observation Systems
| Observation System | Spatial Coverage | Temporal Resolution | Key Limitations |
|---|---|---|---|
| Meteorological Satellites | Large areas [36] | Limited frequency [36] | Cannot penetrate saltwater; only monitors surface conditions [36]. |
| Argo Floats | Global oceans (deep waters) [36] | ~10 days per profile [36] | Unsuitable for shallow waters; limited sub-surface temporal resolution [36]. |
| Animal-Borne Sensors (via OLAP) | Polar, Temperate, Tropical regions [36] | High (e.g., continuous profiles during dives) [36] | Coverage depends on animal movement and distribution [36]. |
This section provides a detailed, step-by-step methodology for researchers to estimate environmental parameters from biologging data using the OLAP tools within the BiP platform. The workflow encompasses data preparation, upload, processing, and analysis.
Objective: To format raw biologging data and its associated metadata according to international standards to ensure compatibility with the BiP platform and OLAP tools.
Materials:
Procedure:
YYYY-MM-DD HH:MM:SS).Objective: To upload the standardized dataset to the BiP platform and execute OLAP tools to estimate environmental parameters.
Materials:
https://www.bip-earth.com).Procedure:
The following diagram illustrates the complete experimental protocol from data collection to analysis, providing a logical overview of the process.
Successful execution of these protocols requires a suite of specialized materials and digital resources. The following table details the key "research reagent solutions" essential for biologging research and environmental data analysis.
Table 3: Essential Research Reagents and Materials for Biologging Analysis
| Item Name | Function/Application | Specifications & Examples |
|---|---|---|
| Animal-Borne Data Loggers | Primary data collection units attached to animals to record movement, behavior, and environmental data. | Includes Satellite Relay Data Loggers (SRDLs) for marine mammals [36], and devices for birds, reptiles, and fish. Measures depth, temperature, acceleration, etc. |
| Biologging Intelligent Platform (BiP) | The core platform for data archiving, standardization, and analysis. Hosts the OLAP tools for environmental parameter estimation. | Web-based platform (https://www.bip-earth.com). Supports FAIR data principles, CC BY 4.0 licensing for open data, and stores sensor data with standardized metadata [36]. |
| OLAP (Online Analytical Processing) Tools | Integrated algorithms within BiP that calculate environmental parameters from animal movement and sensor data. | Algorithms published in peer-reviewed studies are integrated to estimate surface currents, ocean winds, and waves [36]. |
| Metadata Standards | Defined vocabularies and formats to ensure data interoperability and reuse across disciplines. | Conforms to international standards: Integrated Taxonomic Information System (ITIS), Climate and Forecast (CF), and ISO conventions [36]. |
| Data Visualization & Statistical Tools | Software for exploring, analyzing, and communicating biological and environmental data findings. | R (with ggplot2 package) and Python (with Matplotlib, Seaborn) for creating publication-quality plots [37] [38]. Tools like GraphPad Prism for biostatistics [38]. |
Biologging, the practice of attaching data recorders to animals to monitor their behavior, physiology, and surrounding environment, has transformed ecological research and conservation [2]. This methodology provides unprecedented, real-time insights into animal lives in wild settings, offering data that is critical for understanding ecological dynamics and informing conservation strategies [39]. However, the deployment of these technologies has not been uniform across global ecosystems. Recent research led by the University of California, Berkeley, reveals substantial global biases and gaps in biologging data collection, with the majority of data originating from remote or suburban regions in Europe and the United States [39]. This disparity leaves critical knowledge gaps, particularly for highly urbanized areas and regions in the Global South that are experiencing rapid environmental change. These biases hinder the development of effective, global biodiversity conservation strategies. This document outlines standardized protocols and archival approaches to mitigate these disparities and promote a more equitable, comprehensive global biologging data infrastructure.
The following table summarizes the primary quantitative findings on global biologging data collection biases, as identified in a 2025 analysis of existing tracking data [39].
Table 1: Documented Biases in Global Biologging Data Collection
| Bias Dimension | Overrepresented Regions/Contexts | Underrepresented Regions/Contexts |
|---|---|---|
| Geographical | Remote and suburban regions in Europe and the United States [39] | Highly urbanized areas globally; vulnerable regions across the Global South [39] |
| Environmental Context | Pristine or protected national park areas [39] | Human-dominated landscapes, areas undergoing rapid environmental change [39] |
| Oceanographic Data | Antarctic, Arctic, and the eastern Pacific Ocean (data primarily from pinnipeds) [2] | Temperate and tropical regions (despite use of turtles, sharks, and fish) [2] |
These biases mean that our understanding of animal behavior and ecology is spatially and contextually limited. As noted by researchers, scientists often prioritize tracking animals in remote national parks, leaving a critical gap in knowledge about how animals live and cope in cities and other human-modified landscapes [39]. This information is vital for designing effective conservation strategies for a planet increasingly shaped by human activity.
To address the documented gaps and biases, a standardized, proactive methodology for biologging studies is essential. The protocol below provides a framework for planning and executing a biologging deployment with an emphasis on equitable data collection and sharing.
Diagram 1: Standardized workflow for equitable biologging studies.
Study Design & Ethical Review: Formulate a clear biological question. Crucially, apply the 5R principle (Replace, Reduce, Refine, Responsibility, and Reuse) to enhance animal welfare and data quality from the outset [40]. Submit the study plan for approval by an institutional animal ethics committee.
Technology Selection & Calibration: Choose biologging devices (e.g., GPS loggers, satellite relay data loggers - SRDLs, accelerometers, proximity loggers like the Encounternet system) based on the scientific question, target species, and habitat [2] [41]. Calibrate sensors, such as those measuring Received Signal Strength Indication (RSSI) in proximity loggers, to ensure accurate distance estimation during encounters [41].
Field Deployment: Deploy tags on animals using species-appropriate attachment methods (e.g., weak-link harnesses for birds) [41]. Actively prioritize deployments in underrepresented regions, including the Global South and highly urbanized environments, to directly counter existing geographical biases [39].
Data Acquisition & Processing: Collect raw data from the devices or via basestations. For proximity loggers, this results in "encounter logs" containing IDs, timestamps, and RSSI values [41]. Process the data to filter records by signal strength, amalgamate temporally clustered logs, and examine tag reciprocity to ensure data quality.
Data Standardization & Archiving: Standardize the processed sensor data and associated metadata according to international conventions (e.g., Integrated Taxonomic Information System (ITIS), Climate and Forecast (CF) Metadata Conventions) [2] [13]. This step is critical for making data interoperable and reusable across different disciplines.
Data Analysis & Sharing: Conduct biological and ecological analyses. To maximize impact and collaboration, share the standardized dataset via public platforms like the Biologging intelligent Platform (BiP) or Movebank, ideally under a permissive license such as CC BY 4.0 [2].
Table 2: Key Research Reagent Solutions for Biologging Studies
| Item Name | Function / Application | Key Features / Standards |
|---|---|---|
| Satellite Relay Data Logger (SRDL) | Transmits compressed data (e.g., dive profiles, temperature) via satellite; used for long-term tracking of marine mammals in remote areas [2]. | Enables oceanographic data collection in ice-covered regions inaccessible to ships and Argo floats [2]. |
| Encounternet Proximity Logger | A digital proximity-logging system for direct mapping of animal social encounters by recording reciprocated contacts between tagged individuals [41]. | Logs raw signal-strength (RSSI) data for distance estimation; supports tag-to-tag communication over distances >10m [41]. |
| Biologging intelligent Platform (BiP) | An integrated online platform for sharing, visualizing, and analyzing standardized biologging data [2]. | Adheres to international metadata standards; includes Online Analytical Processing (OLAP) tools to calculate environmental parameters [2]. |
| Movebank Database | A web-based platform for managing, sharing, and analyzing animal tracking and other biologging data [13]. | One of the largest biologging databases; supports collaboration and data reuse across the research community [2]. |
| Animal Borne Ocean Sensors (AniBOS) | A global observation network that uses animal-borne sensors to gather physical oceanographic data [2]. | Complements traditional ocean observation systems like Argo floats and satellites, especially in shallow waters [2]. |
A robust and standardized archival process is the cornerstone of overcoming data gaps and fostering collaborative, interdisciplinary science. The workflow for this process is outlined below.
Diagram 2: Data archiving and sharing workflow.
Addressing the profound global biases in biologging data is not merely a technical challenge but an ethical and strategic imperative for conservation. By adopting the standardized protocols, equitable deployment strategies, and robust data archiving frameworks outlined in these application notes, the research community can transform biologging from a patchwork of disparate studies into a truly global, collaborative, and actionable system for understanding and protecting biodiversity.
The exponential growth of bio-logging research—the use of animal-borne electronic tags—has generated unprecedented volumes of data on animal movement, behavior, physiology, and environments [10]. These datasets constitute invaluable dynamic archives of animal life on Earth, with tremendous potential for addressing biodiversity threats and expanding digital natural history collections [10] [42]. However, this opportunity is tempered by a significant challenge: extreme heterogeneity in data formats, column headers, and structural schemas across independent research initiatives. This heterogeneity creates substantial bottlenecks in data integration, analysis, and reuse, hindering the transformative potential of bio-logging science. Data harmonization—the practice of reconciling disparate data types, levels, and sources into compatible and comparable formats—emerges as a critical methodology for unlocking the full value of these ecological archives [43]. This article establishes comprehensive protocols for standardizing diverse data formats and column headers, framed within the imperative to create accessible, preserved bio-logging data collections for the global research community.
Data harmonization resolves heterogeneity across three primary dimensions, each presenting distinct challenges for bio-logging data integration.
Syntax refers to the technical format of data files (e.g., .csv, JSON, HDF5, SQL databases) [43]. Bio-logging data arrives in myriad formats from different sensor systems and proprietary software, requiring initial conversion to workable formats before harmonization can proceed.
Structure concerns how variables relate within datasets, ranging from highly organized tables to unstructured data streams [43]. Bio-logging data manifests structural variance in:
Semantics involves the intended meaning of terms and variables [43]. This presents particularly subtle challenges in bio-logging, where identical terminology may measure different concepts (e.g., "migration duration" defined differently across studies) or different terms may describe identical concepts (e.g., "location quality" versus "positional precision").
Table 1: Dimensions of Data Heterogeneity in Bio-Logging Research
| Dimension | Definition | Bio-Logging Examples | Impact on Analysis |
|---|---|---|---|
| Syntax | Technical file format and encoding | CSV, JSON, HDF5, proprietary binary formats | Prevents immediate data loading and combination |
| Structure | Conceptual schema and variable relationships | Event data vs. panel data formats; multi-index headers | Requires structural transformation before analysis |
| Semantics | Intended meaning of terms and variables | Differing operational definitions of "foraging behavior" | Leads to erroneous comparisons and conclusions |
The following protocol provides a systematic methodology for transforming heterogeneous column headers into a standardized schema suitable for bio-logging data integration.
Table 2: Essential Computational Tools for Header Standardization
| Tool/Resource | Function | Application Context |
|---|---|---|
| Pandas Library | Python data manipulation toolkit | Primary engine for data transformation and header operations |
| String Methods | (.str.upper(), .str.replace()) | Case normalization and character replacement in headers |
| Regular Expressions | Pattern matching for complex string operations | Identifying and transforming patterned header elements |
| Custom Mapping Functions | (lambda functions) | Applying complex transformation logic to header sets |
| Semantic Type Libraries | Pre-defined value dictionaries (e.g., species taxonomies) | Standardizing content within columns after header stabilization |
Convert Headers to String Type
Case Normalization
Remove Extraneous Whitespace
Character Standardization
Structural Flattening
Semantic Mapping
The following workflow diagram illustrates the complete header standardization process:
Beyond header formatting, standardizing values within columns is essential for meaningful data integration in bio-logging collections.
Algorithm Selection
Threshold Configuration
Dictionary-Based Validation
The ECHO-wide Cohort approach demonstrates successful large-scale semantic harmonization through Common Data Models (CDMs) and harmonization protocols [47]. This involves:
Table 3: Semantic Harmonization Framework for Bio-Logging Data
| Harmonization Phase | Process | Output |
|---|---|---|
| Concept Definition | Define core constructs with domain experts | Standardized ontology for bio-logging phenomena |
| Measure Inventory | Catalog existing measures for each construct across datasets | Crosswalk between measure-specific and common variables |
| Transformation Specification | Develop algorithms to map specific measures to common format | Processing scripts and validation checks |
| Validation | Assess harmonized data for conceptual equivalence | Quality metrics and harmonization assessment report |
Successful large-scale harmonization requires coordinated community effort and infrastructure [10]:
The complete data harmonization pipeline integrates both technical and conceptual processes:
Table 4: Metrics for Evaluating Harmonization Success in Bio-Logging Data
| Evaluation Dimension | Pre-Harmonization State | Post-Harmonization Target | Measurement Approach |
|---|---|---|---|
| Syntax Compatibility | Multiple proprietary formats | ≤ 3 standardized, open formats | Percentage of data in standard formats |
| Header Consistency | Heterogeneous case, spacing, separators | 100% consistent formatting | Automated header validation checks |
| Semantic Interoperability | Variable operational definitions | Common ontology with mapping | Cross-dataset comparability index |
| Processing Efficiency | Manual, one-off transformations | Automated, reproducible workflows | Reduction in data preparation time |
| Repository Compliance | Dataset-specific schemas | Common Data Model adherence | Validation against CDM specification |
Harmonizing diverse data formats and column headers transcends technical exercise to become foundational for realizing the potential of bio-logging data as dynamic archives of animal life [10]. The protocols outlined here provide a rigorous methodology for standardizing syntax, structure, and semantics—enabling integration across disparate datasets. When embedded within broader community initiatives for data standardization, preservation, and access [10] [42], these approaches support the transformation of fragmented individual datasets into unified resources. Through committed adoption of harmonization practices, the bio-logging research community can ensure these vital digital archives continue to illuminate wildlife biology and inform conservation strategies for years to come.
Overfitting represents a fundamental challenge in developing robust behavioral classification models, particularly within the domain of biologging data research. It occurs when a machine learning model fits its training data too closely, learning both the underlying patterns and the irrelevant noise or random fluctuations [48] [49]. In the context of behavioral classification, an overfitted model essentially "memorizes" the specific examples in its training dataset rather than learning the generalizable features that distinguish behavioral states [49] [50]. This results in a model that performs exceptionally well on its training data but fails to generalize to new, unseen data—a critical flaw for models intended for real-world scientific application.
The epidemic nature of overfitting in behavioral classification is particularly concerning. A systematic review of animal accelerometer-based behavior classification literature revealed that 79% of examined studies (94 papers) did not adequately validate their models to robustly identify potential overfitting [51]. This prevalence underscores the need for heightened awareness and improved methodological rigor throughout the research community. When overfitted models are deployed or shared without detection, they produce misleading results that can misdirect scientific conclusions and waste valuable research resources.
The consequences of overfitting are especially profound within biologging research, where models are increasingly used to draw ecological inferences and inform conservation strategies. An overfitted behavioral classifier may appear highly accurate during development but will perform poorly when applied to data from new individuals, different populations, or varying environmental conditions [51] [52]. This limitation fundamentally undermines the scientific value of shared biologging datasets and hampers reproducibility across studies.
The most straightforward method for detecting overfitting involves comparing model performance between training and validation datasets:
Table 1: Performance Indicators of Model Fit Status
| Model Status | Training Performance | Validation Performance | Performance Gap |
|---|---|---|---|
| Overfitting | High accuracy (e.g., >95%) [53] | Significantly lower accuracy (e.g., <70%) [53] | Large gap (>20 percentage points) |
| Underfitting | Low accuracy [49] | Similarly low accuracy [49] | Minimal gap |
| Well-fit | High accuracy | Similarly high accuracy | Small gap (<10 percentage points) |
To implement this diagnostic approach:
K-fold cross-validation provides a more robust approach for detecting overfitting than a single train-validation split [48] [55]:
Table 2: K-fold Cross-Validation Implementation
| Step | Procedure | Purpose |
|---|---|---|
| 1 | Randomly shuffle dataset and split into k equally sized folds (typically k=5 or k=10) | Ensure random representation across folds |
| 2 | Iteratively use k-1 folds for training and the remaining fold for validation | Maximize data usage for both training and validation |
| 3 | Repeat process until each fold has served as validation once | Eliminate bias from single data split |
| 4 | Calculate mean performance across all folds | Obtain stable estimate of generalization performance |
| 5 | Compare training and validation performance for each fold | Identify consistency of performance gaps |
The following workflow represents the complete model development and validation process for behavioral classification:
For behavioral classification tasks with temporal dependencies (e.g., accelerometry time series), special consideration is needed during data splitting. Standard random splitting may create data leakage through temporal autocorrelation [51]. Instead, implement:
Learning curves provide powerful visual diagnostic tools for detecting overfitting:
Materials Required:
Procedure:
Interpretation:
This protocol evaluates how model complexity affects generalization:
Materials Required:
Procedure:
Interpretation:
Table 3: Overfitting Prevention Techniques and Applications
| Technique | Mechanism | Implementation Examples |
|---|---|---|
| Regularization | Adds penalty for complexity to loss function [56] | L1 (Lasso), L2 (Ridge), ElasticNet [56] [53] |
| Early Stopping | Halts training when validation performance stops improving [48] [50] | Monitor validation loss; stop when no improvement for N epochs |
| Data Augmentation | Artificially increases dataset size and diversity [48] [54] | For accelerometry: add noise, time-warping, rotation [55] |
| Ensemble Methods | Combines multiple models to reduce variance [48] | Random Forests, Gradient Boosting Machines (XGBoost) [55] |
| Dimensionality Reduction | Reduces feature space to most informative variables [56] | PCA, feature selection based on importance scores |
| Dropout | Randomly disables neurons during training (for neural networks) [50] [55] | Typically disable 20-50% of neurons per layer |
The quality and quantity of training data fundamentally influence overfitting risk:
Increasing Dataset Size and Diversity:
Data Augmentation for Behavioral Data:
The following diagram illustrates the relationship between model complexity, dataset size, and overfitting risk:
The overfitting epidemic directly impacts practices for archiving and sharing biologging data. When models overfit to specific datasets, they fail to generalize across studies, diminishing the value of shared data resources.
Comprehensive Metadata Documentation:
Stratified Data Archives:
When sharing pre-trained behavioral classifiers:
Required Documentation:
Standardized Reporting: Adopt reporting checklists that explicitly address overfitting risks:
Table 4: Key Research Reagents and Solutions for Behavioral Classification
| Resource Category | Specific Tools/Approaches | Function in Overfitting Prevention |
|---|---|---|
| Regularization Implementations | L1/L2 in scikit-learn, Dropout in TensorFlow/PyTorch [50] [55] | Explicitly penalize model complexity to improve generalization |
| Cross-Validation Frameworks | Scikit-learn KFold, StratifiedKFold, GroupKFold [51] | Robust performance estimation and hyperparameter tuning |
| Automated ML Platforms | Amazon SageMaker, Azure Automated ML [54] [53] | Built-in overfitting detection and regularization |
| Model Interpretation Tools | SHAP, LIME [55] | Identify feature reliance patterns suggestive of overfitting |
| Data Augmentation Libraries | Albumentations, SciPy signal processing | Increase effective dataset size and diversity |
| Ensemble Methods | XGBoost, Random Forests, Stacking ensembles [55] | Reduce variance through model averaging |
The overfitting epidemic in behavioral classification represents a critical challenge that demands systematic attention throughout the research pipeline. From experimental design through data sharing, each stage offers opportunities to detect and prevent overfitting. The protocols and frameworks presented here provide concrete strategies for developing more robust, generalizable behavioral classifiers.
For the biologging research community, addressing overfitting is particularly essential for realizing the full potential of data archiving and sharing initiatives. Only when models can generalize across datasets can we build a cumulative science of animal behavior. By adopting these practices, researchers can contribute to more reproducible, reliable behavioral classification that advances our understanding of animal movement and ecology.
Within biologging research, which involves collecting detailed data from animal-borne sensors on movement, physiology, and environmental parameters, the challenge of ensuring data longevity is paramount [2]. The massive volumes of complex sensor data generated are not only critical for behavioral ecology but are also increasingly valuable for secondary applications in oceanography and meteorology [2] [3]. Effective archiving and sharing of these datasets are essential. This document outlines application notes and protocols for migrating and refreshing outdated storage media, providing a structured approach to safeguard biologging data against technological obsolescence and physical degradation.
A strategic approach to data longevity begins with selecting appropriate storage media. The table below compares key characteristics of traditional and emerging options relevant to biologging data archiving.
Table 1: Comparative Analysis of Long-Term Archival Storage Media
| Storage Medium | Estimated Lifespan | Capacity/Density | Key Advantages | Key Challenges / Refresh Triggers |
|---|---|---|---|---|
| Magnetic Tape (LTO) | 15 - 30 years [57] | High (Terabytes per cartridge) | Proven, cost-effective for large volumes [57]. | Technology obsolescence (drive compatibility); requires controlled environment. |
| Hard Disk Drives (HDD) | 3 - 5 years (active use) | High | Fast read/write speeds; suitable for active archives. | Mechanical failure; high power consumption for always-on systems. |
| Optical Discs (Archival Grade) | 50 - 100+ years (theoretical) | Low to Moderate | Passive, durable media; immune to electromagnetic effects. | Susceptible to physical scratches, UV light; low capacity. |
| Synthetic DNA Data Storage | Thousands of years [58] [59] | Extremely High (theoretical) | Unparalleled density and longevity; ultrastable passive medium [58] [59]. | Very high write cost and latency; early-stage technology; specialized retrieval [58]. |
This protocol details the process for verifying data on existing storage and migrating it to new media, a cornerstone of a robust data refresh cycle.
3.1.1. Materials and Reagents
md5deep, sha256sum).3.1.2. Procedure
DNA storage is an emerging platform for "cold" archives, offering exceptional longevity. This protocol describes the workflow from digital file to synthetic DNA and back [58] [59].
3.2.1. Research Reagent Solutions
Table 2: Essential Reagents for DNA Data Storage Protocols
| Item | Function / Description |
|---|---|
| Oligonucleotide Pool | Short, synthetic DNA strands encoding the digital data, typically purchased from commercial synthesis providers. |
| DNA Stabilization Matrix (e.g., Silica) | Protects DNA molecules from hydrolysis and other environmental damage, enabling room-temperature storage for centuries [58]. |
| Polymerase Chain Reaction (PCR) Reagents | Enzymes and nucleotides for targeted amplification of specific DNA sequences, enabling large-scale random access and retrieval [58]. |
| Next-Generation Sequencing (NGS) Kit | Reagents for reading the nucleotide sequence of the retrieved DNA strands to reconstruct the original digital data. |
3.2.2. Procedure
Part A: Encoding and Writing (Digital-to-Biological)
Part B: Retrieval and Decoding (Biological-to-Digital)
Diagram 1: DNA data storage and retrieval workflow.
Successful long-term preservation of biologging data relies on a combination of technical infrastructure, standardized practices, and strategic planning.
Table 3: Essential Tools and Practices for Biologging Data Archiving
| Tool / Practice | Function in Ensuring Data Longevity |
|---|---|
| Data Management Plan (DMP) | A formal document outlining the lifecycle management of data, including storage formats, refresh schedules, metadata standards, and sharing policies [57]. |
| Common Data Elements (CDEs) & Ontologies | Standardized terms and definitions (e.g., from ITIS, CF/ACDD conventions) that ensure data consistency and interpretability over time and across research groups [2] [57]. |
| Standardized Metadata | Detailed information about animal traits, instrument details, and deployment context, stored in internationally recognized formats to facilitate future understanding and reuse [2]. |
| Trusted Data Repositories | Platforms like Movebank or the Biologging intelligent Platform (BiP) that provide structured environments for storing, sharing, and preserving biologging data with standardized formats [2]. |
| Checksum Verification Tools | Software utilities that generate and verify digital fingerprints of files, critical for ensuring data integrity during migration and refresh cycles. |
Integrating these protocols requires a forward-looking strategy. For traditional media, establish a regular migration schedule based on media lifespan (e.g., every 3-5 years for HDDs) [57]. For emerging technologies, DNA storage is projected to evolve from prototypes to a practical supplementary archive for "cold" biologging data, such as raw sequencing archives and de-identified records, between 2025 and 2030 [58]. A hybrid approach is recommended: use disk or tape for active projects and frequent access, while planning for DNA or other advanced media for final, irreplaceable dataset archiving.
Diagram 2: A tiered data archiving strategy.
The expansion of biologging research generates vast datasets detailing animal movement, behavior, and physiology. Effectively archiving and sharing this data is crucial for ecological discovery and conservation, yet it presents a fundamental challenge: balancing the imperative for open data sharing with the ethical and practical need to protect sensitive information. This Application Note provides structured protocols for implementing managed access frameworks that support collaborative biologging science while safeguarding data integrity, individual animal welfare, and stakeholder interests. Adopting these approaches ensures data flow from acquisition to repository is efficient, standardized, and secure, enabling data to function as a living archive of animal life on Earth [13].
Establishing a clear data classification system is the foundational step in managing access. This involves categorizing datasets based on their sensitivity and potential risks, which in turn dictates the appropriate level of access control.
Table 1: Data Sensitivity Tiers and Corresponding Access Protocols
| Sensitivity Tier | Data Description | Primary Risks | Recommended Access Protocol |
|---|---|---|---|
| Public | Processed, non-sensitive movement paths; summary metrics; species occurrence data. | Misinterpretation; lack of attribution. | Open Access (e.g., CC BY 4.0 license); immediate download upon registration [2]. |
| Protected | High-resolution movement paths; behavioral data; locations of sensitive species (e.g., endangered). | Ecological disturbance; poaching; harassment. | Embargoed Access; requires formal data request and justification; possible time-bound embargo [13]. |
| Restricted | Precise locations of nesting sites, dens, or breeding colonies; data on species of high conservation concern. | Population-level harm; habitat disruption. | Managed Access; requires specific data use agreement; project-specific restrictions; may involve data owner coordination [2]. |
| Private | Raw, unprocessed sensor data; data subject to ongoing analysis for thesis or publication. | Pre-publication scooping; invalid conclusions from unvetted data. | Private/Closed; metadata may be public but data is accessible only to the owner and designated collaborators [2]. |
Modern data platforms provide the technical infrastructure to enforce the data sensitivity tiers described above. These platforms facilitate standardization, storage, and granular access control.
The Biologging intelligent Platform (BiP) is an integrated system designed to store standardized sensor data alongside rich metadata. A key feature of BiP is its flexible access model:
Similar capabilities are embedded within platforms like Movebank, which manages billions of animal location records. The technical protocol for setting access rights on such platforms typically involves a dropdown menu or checkbox interface during the upload or dataset management phase, allowing the owner to select from visibility states such as "Public," "Embargoed," or "Private."
Interoperability and effective data sharing rely on community-approved standards. Adhering to frameworks like those proposed by the International Bio-logging Society's Data Standardisation Working Group is essential [13]. Standardized metadata ensures that data, whether public or private, can be discovered, understood, and reused.
Table 2: Essential Metadata Classes for Managed Access Biologging Datasets
| Metadata Class | Key Elements | Standards & Formats | Function in Access Management |
|---|---|---|---|
| Animal Traits | Species, sex, body mass, life history stage. | ITIS (Integrated Taxonomic Information System) | Enables filtering and responsible use; e.g., hiding data for sensitive demographic groups. |
| Instrument & Deployment | Device type, manufacturer, attachment method, deployment location/date. | Custom, but aligned with Climate and Forecast (CF) conventions. | Critical for data quality assessment and interpreting sensor limitations. |
| Data Collection | Sensor parameters, sampling frequency, calibration information. | Attribute Convention for Data Discovery (ACDD) | Allows for correct data fusion and analysis across different studies and devices. |
| Project & Access | Principal investigator, funding source, data license, embargo period. | ISO (International Organization for Standardization) | Directly encodes the terms of use and access restrictions for the dataset. |
The movepub software package, for instance, provides a practical tool for preparing Movebank data for publication, ensuring these metadata standards are met before a dataset is shared publicly or with restricted groups [13].
The following protocol outlines the steps for a research group to process a raw biologging dataset and publish it under a managed access model on a platform like BiP or Movebank.
Objective: To transform a raw device output into a standardized, archived dataset with appropriate access controls. Primary Output: A discoverable dataset with rich metadata, where data access is tiered according to its sensitivity.
Data Download and Initialization
/Raw_Data, /Processing_Scripts, /Metadata, /Outputs./Raw_Data folder, which should have restricted access (e.g., only accessible to the core research team).Data Cleaning and Standardization
/Outputs directory.Metadata Compilation
/Metadata folder.Sensitivity Assessment and Access Tier Assignment
Platform Upload and Access Configuration
/Outputs directory.
Successful implementation of managed access frameworks relies on a combination of software tools, data standards, and policy documents.
Table 3: Key Research Reagent Solutions for Managed Data Access
| Tool/Reagent | Type | Primary Function | Access Management Relevance |
|---|---|---|---|
| Biologging intelligent Platform (BiP) | Software Platform | Integrated platform for storing, sharing, visualizing, and analyzing biologging data. | Provides built-in functionality for setting datasets as open or private, and manages user requests for access [2]. |
| Movebank | Data Repository | A global repository for animal tracking and other biologging data. | Offers granular, study-level permissions for defining user roles (viewer, downloader, editor) and setting data visibility [13]. |
| ETN R Package | Software Tool | An R package to access data from the European Tracking Network. | Demonstrates how standardized API access can be programmed for authorized users, a model for managed data retrieval [13]. |
| movepub | Software Tool | An R package to prepare Movebank data for publication. | Ensures data and metadata meet quality and standardization requirements before being shared under any access level [13]. |
| CC BY 4.0 License | Legal Document | A creative commons license requiring attribution. | The standard license for "Public" tier data, permitting reuse while ensuring attribution [2]. |
| Data Use Agreement (DUA) | Legal Document | A formal contract outlining the terms and conditions for using a dataset. | The primary mechanism for governing access to "Restricted" and some "Protected" tier data, enforcing responsible use [2]. |
Balancing data openness and protection is not a one-size-fits-all endeavor but a dynamic process requiring careful judgment. By implementing the tiered sensitivity framework, utilizing modern data platforms, and adhering to community standards outlined in this protocol, researchers can navigate this complex landscape. This approach maximizes the collaborative and conservation potential of biologging data while rigorously upholding responsibilities to protect sensitive biological information and respect the contributions of data creators.
The application of supervised machine learning (ML) to biologging data, particularly animal-borne accelerometry, has revolutionized our ability to decipher fine-scale animal behaviors at unprecedented scales [61] [51]. However, this powerful approach brings forth significant challenges in model reliability and generalizability. Within the broader context of archiving and sharing biologging data, rigorous validation is not merely a technical step but a fundamental requirement for ensuring that shared data and models are robust, reproducible, and truly useful for the scientific community.
A core challenge is overfitting, where a model over-adapts to the training data, memorizing specific instances rather than learning the underlying generalizable patterns [51]. Such models may appear highly accurate during training but perform poorly on new, unseen data, directly undermining the value of shared models and datasets. A systematic review of 119 studies using accelerometer-based supervised ML revealed that 79% did not employ adequate validation techniques to robustly identify potential overfitting [61] [51]. This highlights a critical gap in the current research practices and underscores the urgent need for standardized protocols to ensure the quality and reliability of biologging research.
In machine learning, overfitting occurs when a model's complexity approaches or surpasses the complexity of the data itself [51]. This causes the model to capture not only the underlying signal but also the noise and specific nuances of the training dataset. The tell-tale sign of an overfit model is a significant drop in performance between the training set and an independent test set, indicating low generalizability to new data [51].
Data leakage arises when the evaluation set is not kept fully independent from the training process, allowing information from the test set to inadvertently influence the model during training [51]. This compromises the validity of the performance evaluation because the model is assessed on data that is more similar to the training data than truly unseen data would be. Consequently, data leakage masks the effects of overfitting and leads to a significant overestimation of a model's real-world performance [51].
Table 1: Common Pitfalls in Machine Learning Validation for Biologging
| Pitfall | Description | Consequence |
|---|---|---|
| Non-independent Test Set | Test data is not properly isolated during training and/or feature selection. | Masks overfitting, inflates performance estimates. |
| Non-representative Data Splitting | Training and test sets do not represent the overall data distribution (e.g., splitting data from a single individual). | Poor generalizability to new individuals or conditions. |
| Incorrect Hyperparameter Tuning | Hyperparameters are tuned directly on the test set rather than a dedicated validation set. | Optimistic bias, as the test set is no longer independent. |
| Inappropriate Performance Metrics | Reliance on metrics that do not reflect the class imbalance or the biological question. | Misleading interpretation of model utility. |
To address these challenges, we propose the following standardized workflow for the rigorous validation of supervised ML models in biologging. This protocol is designed to be applicable to a wide range of biologging data, including accelerometry, and aligns with efforts to promote data standardization and sharing [13].
The following diagram outlines the core workflow for partitioning data to ensure a robust validation process.
This section provides a step-by-step methodology for implementing the validation workflow described above.
Protocol 1: Nested Cross-Validation with a Holdout Set
1. Objective: To train a supervised ML model for behavior classification from biologging data (e.g., accelerometry) and obtain a robust, unbiased estimate of its performance on unseen data.
2. Experimental Principles: The protocol is designed to prevent overfitting and data leakage by strictly separating data used for training, model selection, and final evaluation. It combines the strengths of cross-validation for reliable model development and a holdout set for final performance assessment.
3. Reagents and Materials:
4. Procedure:
5. Data Analysis:
The following table details key solutions and materials required for implementing rigorous ML validation in biologging research.
Table 2: Essential Research Reagents and Computational Tools
| Item Name | Function/Description | Example/Note |
|---|---|---|
| Labeled Accelerometer Data | The fundamental reagent for supervised learning. Consists of raw acceleration signals paired with ground-truthed behavior labels. | Labels obtained via direct observation, video synchronisation, or captive surrogate training [51]. |
| Data Standardization Tools | Software and protocols to format, annotate, and share data consistently, enabling reuse and collaboration. | Tools from the Data Standardisation Working Group [13]; ETN package for tracking data [13]. |
| Computational Framework | The programming environment for implementing ML algorithms and validation routines. | Python's scikit-learn, R's caret or tidymodels. |
| Hyperparameter Tuning Libraries | Tools that automate the search for optimal model parameters. | Scikit-learn's GridSearchCV or RandomizedSearchCV. |
| Performance Metric Suite | A set of functions to quantitatively evaluate model performance from different angles. | Includes accuracy, precision, recall, F1-score, and Cohen's kappa. |
| Data Reuse Information (DRI) Tag | A machine-readable metadata tag for public data, indicating the creator's preference for contact before reuse [62]. | Associated with an ORCID, the DRI tag facilitates equitable data reuse and collaboration [62]. |
The push for rigorous ML validation must be integrated into the broader movement for standardized data archiving and sharing in biologging. Inadequate validation not only produces unreliable models but also degrades the value of shared datasets, as subsequent users cannot trust the models built upon them.
The proposed Data Reuse Information (DRI) tag is a key innovation in this space [62]. By associating a machine-readable tag with public sequence data (and potentially other data types), it clarifies the conditions for reuse and provides a mechanism for data consumers to contact creators. This fosters a collaborative environment where rigorous validation is the norm, as it builds trust between data creators and consumers. Adopting such standards, alongside the FAIR data principles, ensures that biologging data collections function as living archives of animal life on Earth [13] that support reproducible and impactful science.
The guidelines and protocols outlined here provide a concrete path toward this goal, ensuring that the machine learning models powering the next generation of biological discovery are as robust and reliable as the data they analyze.
The FAIR Guiding Principles—Findable, Accessible, Interoperable, and Reusable—were established in 2016 to provide a framework for enhancing the reusability of scientific data holdings and improving the capacity of computational systems to automatically find and use data [63]. In the specific context of biologging research, which generates complex multi-modal data through animal-attached tags and sensors, implementing FAIR principles addresses the critical challenge of ensuring that valuable tracking, physiological, and environmental datasets remain discoverable, interpretable, and useful beyond their initial collection purpose [64]. The FAIR principles emphasize machine-actionability—the capacity of computational systems to find, access, interoperate, and reuse data with minimal human intervention—which is particularly important given the increasing volume, complexity, and creation speed of biologging data [65].
Effective data management is crucial for scientific integrity and reproducibility, especially in biologging research where studies often involve intricate protocols, extensive metadata, and large datasets [64]. Biologging data reuse is essential for various purposes including repurposing, meta-analyses, longitudinal studies, predictive modeling, and training machine learning algorithms [64]. Proper FAIR data management supports these goals by embedding metadata, provenance, and context to help research teams track how data was collected, processed, and interpreted [63].
The FAIR principles are defined by four interdependent components, each with specific requirements for implementation:
Findable: Data should be easy for both humans and computers to discover. This requires assigning globally unique and persistent identifiers (such as DOIs) to all datasets and ensuring they are indexed with rich, machine-actionable metadata [65] [63]. The first step in (re)using data is finding them, making findability an essential foundation for all other FAIR principles.
Accessible: Data must be retrievable by users through standardized communication protocols, even when behind authentication and authorization layers [65] [63]. The accessible principle emphasizes that once a user finds the required data, they should know how to access them, potentially including authentication and authorization procedures.
Interoperable: Data needs to be machine-readable and compatible with other systems and formats beyond the initial experimental environment [65] [63]. This requires describing data using standardized vocabularies and ontologies, and storing them in formats that can be seamlessly combined with other datasets, which is particularly important for integrating diverse biologging data types.
Reusable: Data must be easily replicated and studied in new contexts [65] [63]. This necessitates clarity on licensing and usage rights, robust documentation of data provenance and quality, and annotation with rich, well-described metadata. Reusability represents the ultimate goal of FAIR, optimizing the potential for data to be repurposed in different settings.
A critical distinction exists between FAIR data and open data. FAIR data is focused on making data findable, accessible, interoperable, and reusable, not necessarily publicly available [63]. FAIR principles aim to ensure data is well-structured, richly described, and machine-actionable to maximize utility in complex research environments.
In contrast, open data is made freely available for anyone to access, use, modify, and share without restrictions [63]. While open data serves the public good through free accessibility, it may not always be compatible with privacy rules, intellectual property protection, and other governance restrictions—particularly relevant for biologging data involving endangered species or sensitive locations.
Table 1: Comparison of FAIR Data and Open Data
| Aspect | FAIR Data | Open Data |
|---|---|---|
| Primary Focus | Data structure, metadata, and machine-actionability | Unrestricted public access |
| Access Restrictions | Can include authentication and authorization | Generally none |
| Main User | Computational systems and researchers | Anyone |
| Metadata Requirements | Rich, structured, and standardized | Variable |
| Compatibility with Privacy Rules | Yes, through access controls | Limited |
Systematic assessment of FAIR implementation requires structured evaluation tools. Research has demonstrated the development of an 11-item questionnaire with strong internal consistency (Cronbach's α = 0.85-0.87) to evaluate the FAIRness of research data [66]. This tool groups questions according to the four FAIR attributes, providing a quantitative means to assess compliance levels.
The assessment framework can be implemented through a structured workflow that guides researchers through the evaluation process for each FAIR component, identifies gaps, implements improvements, and verifies compliance through iterative refinement.
Implementation of FAIR principles can be measured using standardized metrics that assess compliance across the four FAIR components. The following table summarizes key metrics and assessment methods for evaluating FAIR implementation in biologging data:
Table 2: FAIR Implementation Metrics and Assessment Methods
| FAIR Principle | Assessment Metric | Measurement Method | Target Compliance |
|---|---|---|---|
| Findable | Persistent identifiers | Check for DOI or other persistent ID | 100% of datasets |
| Rich metadata | Metadata completeness score | >80% required fields | |
| Searchable indexing | Repository indexing verification | Fully indexed | |
| Accessible | Standardized retrieval | Protocol compliance check | HTTPS/API available |
| Authentication clarity | Access procedure documentation | Clear access pathway | |
| Metadata persistence | Metadata accessibility after data | Always available | |
| Interoperable | Vocabulary standards | Ontology usage audit | Standard terms >90% |
| Format compatibility | Machine-readability test | Fully machine-readable | |
| Qualified references | Related resource links | All references valid | |
| Reusable | Usage licenses | License presence and clarity | Clear license present |
| Provenance documentation | Provenance completeness | Full workflow documented | |
| Community standards | Domain standard adherence | Full compliance |
Research indicates that implementing such assessment frameworks reveals significant variability in FAIR compliance across different data types, with metadata richness and vocabulary standardization often presenting the greatest challenges [66]. The same study found that structured assessment tools demonstrated strong internal consistency across all FAIR domains (Cronbach's α: Findable=0.85, Accessible=0.87, Interoperable=0.86, Reusable=0.85), supporting their reliability for evaluating FAIR implementation [66].
Purpose: To establish a comprehensive Data Management Plan (DMP) that ensures biologging data compliance with FAIR principles throughout the research lifecycle.
Materials and Reagents:
Procedure:
Pre-Data Collection Planning
Metadata Specification
Data Collection and Documentation
Quality Control and Validation
Data Publication and Sharing
Purpose: To transform existing biologging data collections into FAIR-compliant resources, maximizing their potential for reuse and integration.
Duration: 4-8 weeks depending on data volume and complexity
Procedure:
Data Inventory and Assessment
Metadata Enhancement
Format Standardization
Identifier Assignment and Repository Deposition
Troubleshooting:
Successful implementation of FAIR principles requires both technical infrastructure and methodological tools. The following table catalogs essential "research reagent solutions" for biologging researchers implementing FAIR data practices:
Table 3: Essential Research Reagent Solutions for FAIR Data Management
| Solution Category | Specific Tools/Resources | Function in FAIR Implementation | Application Notes |
|---|---|---|---|
| Metadata Standards | REMBI [67], DwC, Darwin Core | Provide structured frameworks for metadata annotation | REMBI specifically designed for biological images; essential for interoperability |
| Persistent Identifiers | DOI, UUID, ARK | Create permanent, unique references to datasets | DOI most widely recognized; required by many repositories and publishers |
| Data Repositories | BioImage Archive [67], EMPIAR, Zenodo [66] | Provide preservation, access control, and indexing | Domain-specific repositories often provide enhanced curation and standardization |
| Ontology Services | OBO Foundry, EDAM, EBI Ontology Lookup Service | Standardize terminology and enable semantic interoperability | Critical for cross-dataset integration and machine-actionability |
| Data Management Tools | Electronic Lab Notebooks, REDCap [66], DataSifter [68] | Support documentation, organization, and privacy-preserving sharing | REDCap provides secure data collection; DataSifter enables privacy protection |
| Synthetic Data Generators | DataSifter, Synthetic Data Vault (SDV) [68] | Create "digital twin" datasets for sharing while minimizing re-identification risk | DataSifter shows particular strength with longitudinal data [68] |
| Assessment Tools | F-UJI, ARDC FAIR Self-Assessment Tool [66] | Evaluate FAIR compliance and identify improvement areas | Essential for benchmarking and continuous improvement |
Implementing FAIR principles requires integration throughout the entire data lifecycle. The following workflow diagram illustrates a comprehensive data management process tailored to biologging research, from project initiation through to data sharing and reuse:
Implementing FAIR principles in biologging research presents several significant challenges:
Fragmented Data Systems and Formats: Biologging research often involves multiple instruments, platforms, and measurement systems generating data in disparate formats [63]. This fragmentation creates substantial interoperability challenges when attempting to integrate datasets for analysis.
Lack of Standardized Metadata or Ontologies: Different research teams frequently use different vocabularies for the same concepts, creating semantic mismatches and ontology gaps that hinder data integration [63]. Without consistent terminology, automated systems cannot properly interpret or combine datasets.
High Cost and Time Investment: Transforming legacy data into FAIR-compliant formats requires substantial resources [63]. The curation process involves both technical expertise and domain knowledge, creating resource constraints particularly for smaller research groups.
Cultural Resistance and Awareness Gaps: Research teams may lack awareness of FAIR principles or perceive them as burdensome administrative requirements rather than scientific enablers [66] [63]. Traditional academic reward systems often prioritize publication over data sharing, reducing motivation for FAIR implementation.
Addressing these challenges requires systematic approaches:
Adopt Interoperable Standards and Platforms: Implement standardized protocols, metadata frameworks, and data formats to ensure interoperability across systems [69]. Leverage common platforms that facilitate data analysis, visualization, and integration.
Develop Comprehensive Data Management Plans: Create detailed Data Management Plans (DMPs) that describe systems used, data flow, management roles and responsibilities, plus methods for back-ups, storage and archiving while ensuring anonymization and privacy [66].
Utilize Privacy-Preserving Techniques: For sensitive biologging data, employ techniques like statistical obfuscation and synthetic data generation. DataSifter has demonstrated strong privacy protection (0.83 privacy score) while preserving key statistical signals (83.1% confidence interval overlap in regression models) [68].
Implement Incremental FAIRification: Rather than attempting complete FAIR compliance immediately, prioritize high-value datasets and implement improvements progressively. This iterative approach builds experience while delivering tangible benefits.
The implementation of FAIR principles represents a fundamental shift in how biologging research data is managed, shared, and utilized. By making data Findable, Accessible, Interoperable, and Reusable, researchers can accelerate scientific discovery, enhance research reproducibility, and maximize the value of increasingly complex and costly biologging datasets. The protocols and frameworks presented in this article provide practical pathways for researchers to implement FAIR principles in their specific contexts.
As the volume and complexity of biologging data continue to grow, FAIR implementation will become increasingly essential for extracting maximum scientific insight from research investments. By adopting these practices, biologging researchers can contribute to a more open, collaborative, and efficient research ecosystem that benefits the entire scientific community and ultimately enhances our understanding of biological systems.
The expansion of biologging—the use of animal-borne data loggers—has generated immense volumes of data on animal movement, behavior, physiology, and the surrounding environment. Effectively archiving and sharing this data is critical for advancing ecological research, informing conservation policy, and enabling cross-disciplinary science. This application note examines the capabilities of modern biologging data platforms, with a focus on the types of data managed, the analytical functions provided, and the flexibility of data sharing protocols. We present structured comparisons of platform attributes, detailed protocols for data standardization and benchmarking, and visual workflows to guide researchers in selecting and utilizing these platforms. The findings underscore that adherence to standardized data and metadata formats, coupled with integrated analytics and flexible sharing options, is paramount for transforming dispersed biologging data into a cohesive, living archive of life on Earth [10] [13].
Biologging data offers unparalleled insights into animal life, serving fields from ecology to oceanography. However, the heterogeneity of data formats, sensor types, and collection protocols poses a significant challenge to its integration and reuse. The vision for the future is to establish biologging data collections as dynamic, living archives [10]. Realizing this vision depends on robust benchmarking of the platforms that store, process, and disseminate this data. Key capabilities to assess include the diversity of data types a platform can ingest, the analytical tools it provides for data processing and environmental parameter estimation, and the flexibility it offers for data sharing and access control [2] [3]. This document provides a detailed examination of these capabilities, offering application notes and protocols for researchers engaged in biologging data management.
A critical function of any biologging platform is its ability to handle a wide array of data and metadata types in a standardized manner. This ensures interoperability and facilitates secondary use across disciplines.
Biologging platforms manage two primary classes of information: raw sensor data and contextual metadata. The sensor data constitutes the core measurements, while metadata provides the essential context that makes the sensor data interpretable and reusable.
Table 1: Common Data Types in Biologging Platforms
| Data Category | Specific Data Types | Description |
|---|---|---|
| Spatial Data | Latitude, Longitude, Altitude/Depth | Horizontal position and vertical dimension data [2]. |
| Movement Data | Speed, Acceleration (3-axis), Angular Velocity | Kinematic measurements of animal movement and behavior [2]. |
| Environmental Data | Water Temperature, Salinity, Atmospheric Pressure, Light Intensity | Parameters characterizing the animal's physical environment [2] [3]. |
| Physiological Data | Body Temperature, Heart Rate | Metrics reflecting the internal state and physiology of the animal [2]. |
Table 2: Essential Metadata for Biologging Data
| Metadata Category | Examples | Purpose |
|---|---|---|
| Animal Metadata | Species, Sex, Body Size, Breeding History | Provides biological context for interpreting sensor data [2]. |
| Instrument Metadata | Device Type, Manufacturer, Sensor Specifications | Details the data source and its technical parameters [2]. |
| Deployment Metadata | Deployment Date/Location, Recapture Date, Researcher Contact | Documents the experimental context and provenance [2]. |
Platforms like the Biologging intelligent Platform (BiP) address the challenge of format inconsistency by enforcing international standards for both data and metadata, such as the Integrated Taxonomic Information System (ITIS) and Climate and Forecast (CF) Metadata Conventions [2]. This standardization is a foundational step for data integration and preservation.
The value of biologging data is maximized when it can be shared and accessed by a broad community, while respecting the needs and rights of data owners.
Platforms implement different sharing modes:
This flexible framework supports the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, ensuring data can be both protected and widely utilized [70].
Beyond data storage, modern platforms integrate analytical tools that extract meaningful biological and environmental information from raw sensor data.
A key feature of advanced platforms like BiP is the inclusion of OLAP tools. These tools calculate higher-order environmental and behavioral parameters by applying published algorithms to the raw sensor data [2]. For instance:
This integrated analysis transforms raw telemetry data into actionable knowledge for fields like conservation biology and physical oceanography.
Benchmarking is the conceptual framework for evaluating the performance of computational methods against a defined ground truth [70]. In the context of biologging, this can involve comparing different analytical algorithms for classifying animal behavior or estimating environmental variables.
The process requires:
Table 3: Common Performance Metrics for Benchmarking
| Metric | Calculation | Interpretation |
|---|---|---|
| Precision | True Positives / (True Positives + False Positives) | The proportion of identified items that are correct. |
| Recall | True Positives / (True Positives + False Negatives) | The proportion of true items that were successfully identified. |
| F1 Score | 2 * (Precision * Recall) / (Precision + Recall) | The harmonic mean of precision and recall. |
This section provides detailed methodologies for key procedures in biologging data management, from initial standardization to performance benchmarking.
Objective: To prepare raw biologging data and associated metadata for upload to a standardized platform, ensuring interoperability and reuse.
Materials:
Procedure:
Objective: To objectively evaluate the performance of a computational tool (e.g., a behavior classifier) against a gold standard dataset.
Materials:
Procedure:
The following diagrams, generated using Graphviz, illustrate the logical relationships and data flows in the key processes described.
Diagram 1: This workflow outlines the journey of biologging data from its raw state to becoming part of a living archive. Raw data and metadata are ingested by a standardized platform, which enables both analytical processing (OLAP) and flexible data sharing, ultimately contributing to a preserved data collection.
Diagram 2: This flowchart details the formal process of benchmarking an analytical tool. It begins with a clear definition of the benchmark and uses gold standard data within a controlled workflow execution to generate performance results, which are then compiled into a report.
Successful biologging data management and analysis relies on a suite of key resources, platforms, and reagents.
Table 4: Essential Research Reagents and Solutions for Biologging Data Science
| Item Name | Function | Example/Reference |
|---|---|---|
| BiP (Biologging intelligent Platform) | A standardized platform for data storage, sharing, visualization, and analysis (OLAP). | [2] |
| Movebank | A global database for animal tracking data, supporting management and analysis of massive datasets. | [10] [13] |
| International Bio-logging Society Standards | Community-developed frameworks for data standardization, ensuring interoperability. | [13] |
| Gold Standard Datasets | Reference data (e.g., from GIB Consortium, mock communities) for validating and benchmarking tools. | [71] [72] |
| Workflow Management Systems | Software for creating reproducible and scalable data analysis pipelines (e.g., Nextflow, Snakemake). | [70] |
| Containerization (Docker/Singularity) | Technology for packaging software and dependencies to guarantee reproducible computational environments. | [70] |
Biologging, the use of animal-borne electronic tags to document movements, behavior, physiology, and environments, has rapidly expanded over the past six decades [73]. This growth presents a critical opportunity to build digital archives of animal life but also introduces significant challenges concerning animal welfare and data quality. Device impacts can alter animal physiology, behavior, and demography—the very metrics researchers aim to measure [73]. The establishment of minimum reporting standards emerges as a low-cost, high-impact strategy to address these challenges, promoting both ethical research practices and the collection of high-quality, reusable data [73].
Framed within a broader thesis on archiving and sharing approaches, these standards are foundational for creating interoperable data collections. They ensure that biologging data can fulfill its potential in ecological discovery, conservation, and contributing to fields like oceanography and meteorology [2] [10]. This document outlines application notes and protocols to implement these standards effectively.
The development of reporting standards is informed by empirical studies on how biologging devices affect studied subjects. The following table summarizes key quantitative findings and impact relationships identified from a comprehensive review of the literature.
Table 1: Quantified Impacts of Biologging Devices and Related Findings
| Subject of Measurement | Key Quantitative Finding or Relationship | Source / Context |
|---|---|---|
| General Device Impacts | Review of 175 biologging impact studies over 25 years reveals broad, multispecies connections between instrument characteristics and animal physiology, behavior, and/or demography. | [73] |
| Data Archiving Quality (Completeness) | 56.4% of 362 assessed open datasets in ecology and evolution were complete (mean score of 3.4/5). | [74] |
| Data Archiving Quality (Reusability) | 45.9% of the same 362 datasets were reusable (mean score of 3.1/5). | [74] |
| Temporal Trend in Reusability | Datasets associated with more recent studies were slightly more reusable than older studies, though improvements are slow. | [74] |
| Data Volume in Movebank | As of January 2025, Movebank, a primary biologging database, contained 7.5 billion location points and 7.4 billion other sensor records across 1,478 taxa. | [2] |
Informed by the documented impacts on welfare and data quality, a minimum reporting standard was distilled into eight best practices [73]. The core of this standard can be implemented as a machine-readable checklist for researchers to include with manuscripts and data submissions.
Table 2: Minimum Reporting Standard Checklist for Biologging Studies
| Category | Reporting Field | Required? | Description & Examples | Alignment with Broader Standards |
|---|---|---|---|---|
| Animal Morphology | Species, Sex, Age, Body Mass | Required | Key traits influencing device impact; use controlled vocabularies (e.g., ITIS). | Integrated Taxonomic Information System (ITIS) [2] [75]. |
| Device Properties | Mass, Dimensions, Attachment Method | Required | Device-to-body mass ratio, attachment type (harness, collar, glue), dimensions. | Informs animal welfare assessment [73]. |
| Deployment Details | Deployment Date, Location, Duration | Required | Context for animal disturbance and data interpretation. | Darwin Core for location data [76] [77]. |
| Data Collection Parameters | Sampled Sensors, Resolution | Required | Sensors used (GPS, accelerometer), sampling frequency, duty cycling. | Movebank Vocabulary [75]. |
| Animal Welfare Assessment | Post-Release Behavior, Device Effects | Recommended | Qualitative/quantitative notes on behavior post-handling, any visible effects. | Promotes ethical review and refinement [73]. |
| Metadata for Reuse | Data Dictionary, License | Required | Explanation of column headers, abbreviations, units; license for reuse (e.g., CC BY). | FAIR Principles [77] [74]. |
This protocol provides a detailed methodology for implementing the minimum reporting standard throughout a biologging study's lifecycle, from pre-deployment planning to data archiving.
Table 3: Key Resources for Biologging Standards and Data Management
| Tool / Resource Name | Type | Function & Key Features | Access / Reference |
|---|---|---|---|
| Movebank | Data Repository | Global platform for managing, sharing, and analyzing animal tracking data. Harmonizes data to a shared vocabulary. | [2] [10] [75] |
| Biologging intelligent Platform (BiP) | Data Repository | Integrated platform for sharing, visualizing, and analyzing standardized biologging data and metadata. Includes Online Analytical Processing (OLAP) tools. | [2] |
| Minimum Reporting Standard Checklist | Reporting Tool | Machine-readable checklist to standardize reporting of device, deployment, and animal details to promote welfare and data quality. | [73] |
| Wildlife Disease Data Standard | Data Standard | Example of a parallel minimum data standard with 40 data fields, demonstrating application of the "tidy data" principle for disaggregated data. | [76] [77] |
| International Bio-Logging Society (IBLS) Data Standardisation WG | Community Body | Community-led group coordinating the development and adoption of data standards and protocols for the biologging community. | [13] [10] |
| ETN / movepub / etn R package | Software Tools | Example software packages for preparing, accessing, and working with standardized biologging data, particularly from Movebank and the European Tracking Network. | [13] |
The field of biologging, which involves attaching data recorders to animals to monitor their movements, behavior, and physiology, generates vast amounts of complex data. The transition from isolated, proprietary data formats to internationally standardized data protocols has been a critical evolution, transforming this data from a specialized biological resource into a powerful, cross-disciplinary asset. This case study examines how standardized biologging data, facilitated by platforms like Movebank and the Biologging intelligent Platform (BiP), now directly fuels advanced ecological research, oceanographic and meteorological science, and evidence-based conservation policy. By establishing common data formats and rich, structured metadata, researchers can now integrate diverse datasets for large-scale meta-analyses, while also providing policymakers with clear, actionable insights derived from robust, reproducible data.
Before widespread standardization, biologging data was characterized by heterogeneity. Inconsistencies in data formats, such as different column names for the same sensor data (e.g., "Latitude" vs. "lat"), variations in date-time formats, differing file types, and disparate numbers of header lines, created significant barriers to collaborative research and secondary data use [2]. These discrepancies often varied by sensor manufacturer, device type, or even software version, making integration and large-scale analysis a manual and error-prone process.
Recognizing these limitations, the international biologging community, spearheaded by the International Bio-logging Society's Data Standardisation Working Group, initiated a concerted effort to develop and promote common protocols [13]. The working group's objective is to "progress standardisation of data protocols used within the bio-logging community, with a view to making databases interoperable" [13]. This has resulted in proposed frameworks and demonstrated standards, such as those outlined by Sequeira et al. (2021), which provide a foundation for storing diverse data types along with associated metadata in a consistent manner [13] [2].
The theoretical framework for standardization is implemented through specific platforms and protocols that serve as the infrastructure for cross-disciplinary data sharing.
| Platform Name | Primary Function | Key Standardization Features | Data Access Policy |
|---|---|---|---|
| Biologging intelligent Platform (BiP) [2] | Integrated platform for sharing, visualizing, and analyzing biologging data. | Adheres to international standards for sensor data and metadata (e.g., ITIS, CF, ACDD, ISO); integrated Online Analytical Processing (OLAP) tools. | CC BY 4.0 license for open data; private data requires owner permission; metadata and route maps are publicly viewable. |
| Movebank [2] | Large-scale database for animal tracking and sensor data. | Manages over 7.5 billion location points with standardized taxon classifications; supports tools like movepub for data publication preparation [13]. |
Data visibility and access controlled by data owner; used for large-scale collaborative studies and distribution mapping. |
| European Tracking Network (ETN) [13] | Centralized access to data from the European aquatic animal tracking network. | Provides standardized data access via the etn R package, ensuring consistent data retrieval and formatting for users. |
Data access typically requires registration and adherence to network data policies. |
For data to be truly interoperable, it must be accompanied by rich, structured metadata. The following table details the core metadata classes required for a standardized biologging data submission, as implemented by platforms like BiP [2].
| Metadata Class | Description | Example Fields | Relevant Standard |
|---|---|---|---|
| Animal Metadata | Records traits of the individual animal studied. | Species (Scientific Name, Common Name), Sex, Body Mass, Life Stage, Breeding Status. | Integrated Taxonomic Information System (ITIS) |
| Instrument Metadata | Describes the data-logging device used. | Device Manufacturer, Model, Serial Number, Sensor Types (e.g., GPS, accelerometer, depth). | Climate and Forecast (CF) Metadata Conventions |
| Deployment Metadata | Documents the context of the device attachment. | Deployment DateTime, Location, Retrieval DateTime, Attachment Method, Data Processing Steps. | Attribute Conventions for Data Discovery (ACDD) |
Objective: To collect in-situ physical oceanographic data (e.g., temperature, salinity) from marine animals to complement traditional ocean-observation systems [2].
Device Selection and Configuration:
Animal Deployment:
Data Transmission and Archiving:
Data Processing and Quality Control:
Objective: To synthesize biologging data from multiple studies and species to identify critical habitats, migration corridors, and anthropogenic threat areas to inform marine spatial planning and policy [2].
Research Question and Data Discovery:
Data Access and Integration:
individual_id, timestamp, location_lat, location_long) allows for seamless merging.Spatio-Temporal Analysis:
Policy Reporting and Visualization:
The following table lists key hardware and software "reagents" essential for conducting modern, standardized biologging research.
| Item Name / Category | Function / Application | Key Specifications |
|---|---|---|
| Satellite Relay Data Logger (SRDL) [2] | Transmits compressed data (dive profiles, temperature) via satellite; enables long-term, remote data collection without device retrieval. | Sensors: Depth, Temperature, Salinity; Communication: Argos/Iridium; Deployment Duration: >1 year. |
| ORI400-PD3GT (Little Leonardo) [79] | High-resolution data logging for marine species. Records data for later retrieval. | Sensors: Swimming Speed, Depth, Temperature, 3-Axis Acceleration; Memory: Internal Storage. |
| LoggLaw G2 (Biologging Solutions) [79] | Terrestrial/freshwater animal tracking with direct communication via cellular networks. | Sensors: GPS; Communication: LTE-M; Power: Rechargeable battery. |
| DEBUT FLEX II (Druid tech) [79] | Flexible tracking device for a variety of mid-sized terrestrial animals. | Sensors: GPS, Temperature, Illuminance, ODBA; Communication: 4G. |
movepub R Package [13] |
Software tool to prepare and standardize tracking data from Movebank for publication and archiving. | Function: Data cleaning, format standardization, metadata generation. |
etn R Package [13] |
Provides programmatic, standardized access to data from the European Tracking Network (ETN). | Function: Data extraction, integration, and analysis within the R environment. |
The following diagram illustrates the workflow from raw data collection to cross-disciplinary application, enabled by standardization.
The implementation of these standardized protocols has tangible impacts. Biologging data from seals and turtles now provides ocean temperature data comparable in volume to Argo floats in specific regions like the Antarctic and Arctic, filling critical observational gaps [2]. Initiatives like the AniBOS project formalize this contribution as part of the Global Ocean Observing System. In conservation, the ability to integrate dozens of datasets has enabled comprehensive mapping of species distributions for identifying Marine Protected Areas [2]. The future of the field depends on sustaining this momentum through continued community coordination, development of sustainable funding models for long-term data curation, and the creation of robust incentive structures that reward researchers for publishing high-quality, standardized data [13].
Effective archiving and sharing are no longer optional but fundamental to unlocking the full potential of biologging data. By adopting standardized formats, utilizing dedicated platforms like BiP and Movebank, and implementing rigorous validation protocols, researchers can transform isolated datasets into a cohesive, global resource. These practices directly address current challenges, from model overfitting to geographic data biases, thereby enhancing the reliability and scope of research findings. The future of biologging lies in its integration with global biodiversity monitoring frameworks and its expanding utility in fields like drug discovery, where animal-borne environmental data can provide unique insights. Embracing these collaborative and robust data management strategies is paramount for accelerating scientific discovery, informing effective conservation policies, and fostering a truly open scientific ecosystem.