Grupo de Tratamiento de Imágenes

CVPR 2025

From June 11th to 15th, the Grupo de Tratamiento de Imágenes (GTI) participated in the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR 2025)—the world’s leading conference in computer vision and artificial intelligence—held this year in Nashville, Tennessee.

Carlos Roberto del Blanco and Marcos Rodrigo presented the work titled “ViMoCLIP: Augmenting Static CLIP Representations with Video Motion Cues for Animal Action Recognition”, which proposes an extension of the CLIP model that integrates motion information, significantly improving its ability to recognize animal actions in video—an essential task in fields such as computational biology and smart aquaculture.

The system introduces a student–teacher learning framework in which:

- - - CLIP extracts static visual representations,
    - an additional model learns temporal features through optical flow,
    - and a temporal Transformer merges both modalities to enable dynamic scene understanding.

When applied to the Animal Kingdom dataset, the model outperforms previous CLIP-based approaches, showing greater accuracy in identifying complex behaviors such as stalking, jumping, or flying.

By combining computer vision with motion analysis, this technology enables the development of continuous monitoring systems for animal behavior in aquatic environments. It allows for the automatic detection of feeding patterns, signs of stress, or illness, having a direct impact on animal welfare and production efficiency.

Full implementation available: https://github.com/MarcosRodrigoT/VIMO-CLIP

News and Events