SportCLIP

Description

SportCLIP is a multi-sport benchmark expressly built to evaluate text-guided video–summarization methods. It extends our earlier MATDAT work beyond a single discipline and provides both the dataset, and the full Python implementation of the CLIP-based framework introduced in our paper “Text-Guided Sports Highlights: A CLIP-Based Framework for Automatic Video Summarization.”

Key characteristics

Sports covered (4): diving, long-jump, pole-vault and tumbling.
Source material: user-generated recordings captured with static consumer-grade cameras.
Duration & scale:
- Diving ≈ 5 min
- Long-jump ≈ 2 ½ min
- Pole-vault ≈ 2 ½ min
- Tumbling ≈ 10 min
Varied highlight density: each sport exhibits different ratios of highlight to non-highlight frames, challenging cross-domain generalisation.

Ground-truth Description

Every video is accompanied by a comma-separated file (*.csv) structured as:

Highlights (HL) mark frames in which an athlete is actively performing (minimum 30 frames ≈ 1 s). NHL spans all other intervals. An uncertainty (UN) tag is used around the boundaries of every highlight, mirroring our MATDAT protocol.

Code & pretrained model

Full end-to-end pipeline:

extractor.py - decodes the raw video, samples frames at any FPS and runs CLIP ViT-B/32 (CUDA-accelerated or CPU) to store one embedding per frame.
multi_sentences.py - auto-generates and scores highlight / non-highlight sentence pairs, then filters them with separation, dynamic-range and AUC criteria to keep only the most discriminative prompts.
summarize.py - fuses the kept scores with rolling averages, applies a morphological closing, duration & area filters, evaluates the detections and writes an MP4 highlight reel (highlight.mp4) plus all diagnostic plots.
Supporting helpers in utils.py (KDE curves, event detection, frame-level / event-level metrics, pretty console colours).

Config driven experimentation:

A single Config class lets you point to a new video, swap sports-specific prompts or tweak any hyper-parameter without touching the core code.

Download

Dataset (videos + GT)

Source code (GitHub)

Citation

If you use the SportCLIP dataset or the accompanying code, please cite:

Rodrigo, C. Cuevas and N. García, “Text-Guided Sports Highlights: A CLIP-Based Framework for Automatic Video Summarization,” under review.

For further information, or if you discover any issue with the dataset, feel free to contact Marcos de Rodrigo at This email address is being protected from spambots. You need JavaScript enabled to view it..

Research

Projects

Publications

GTI Blog

GTI Data

Quality of Experience tests

SportCLIP

Description

Ground-truth Description

Code & pretrained model