SportCLIP

 

Research  

 

SportCLIP 

 

Description


Imagen Sport

 

SportCLIP is a multi-sport benchmark expressly built to evaluate text-guided video–summarization methods. It extends our earlier MATDAT work beyond a single discipline and provides both the dataset, and the full Python implementation of the CLIP-based framework introduced in our paper “Text-Guided Sports Highlights: A CLIP-Based Framework for Automatic Video Summarization.”

Key characteristics

  • Sports covered (4): diving, long-jump, pole-vault and tumbling.
  • Source material: user-generated recordings captured with static consumer-grade cameras.
  • Duration & scale:
    • Diving ≈ 5 min
    • Long-jump ≈ 2 ½ min
    • Pole-vault ≈ 2 ½ min
    • Tumbling ≈ 10 min
  • Varied highlight density: each sport exhibits different ratios of highlight to non-highlight frames, challenging cross-domain generalisation.

  

Ground-truth Description


Every video is accompanied by a comma-separated file (*.csv) structured as: 

ImagenSport

Highlights (HL) mark frames in which an athlete is actively performing (minimum 30 frames ≈ 1 s). NHL spans all other intervals. An uncertainty (UN) tag is used around the boundaries of every highlight, mirroring our MATDAT protocol.

 

Code & pretrained model


Full end-to-end pipeline:

  • extractor.py - decodes the raw video, samples frames at any FPS and runs CLIP ViT-B/32 (CUDA-accelerated or CPU) to store one embedding per frame.
  • multi_sentences.py - auto-generates and scores highlight / non-highlight sentence pairs, then filters them with separation, dynamic-range and AUC criteria to keep only the most discriminative prompts.
  • summarize.py - fuses the kept scores with rolling averages, applies a morphological closing, duration & area filters, evaluates the detections and writes an MP4 highlight reel (highlight.mp4) plus all diagnostic plots.
  • Supporting helpers in utils.py (KDE curves, event detection, frame-level / event-level metrics, pretty console colours).

Config driven experimentation:

  • A single Config class lets you point to a new video, swap sports-specific prompts or tweak any hyper-parameter without touching the core code.

 

Download


Dataset (videos + GT)

Source code (GitHub)

 

 

Citation


If you use the SportCLIP dataset or the accompanying code, please cite:

Rodrigo, C. Cuevas and N. García“Text-Guided Sports Highlights: A CLIP-Based Framework for Automatic Video Summarization,” under review.

For further information, or if you discover any issue with the dataset, feel free to contact Marcos de Rodrigo at This email address is being protected from spambots. You need JavaScript enabled to view it..