Grupo de Tratamiento de Imágenes

VIA MADRID

Our colleague César Díaz participated as a speaker at the VIA Madrid Summer School, an initiative focused on computer vision and artificial intelligence aimed at master's and PhD students.

In his talk titled "Smarter pixel squeezing: reshaping image and video compression with AI", he explained how artificial intelligence is revolutionizing image and video compression through the combination of autoencoders and generative models. He also addressed current challenges faced by this technology, such as high computational demands, the lack of universal standards, or the appearance of undesired artifacts, as well as research directions aimed at overcoming these limitations.

In addition, Enmin Zhong and Marcos de Rodrigo, researchers from the group, presented the talk "Teaching Foundation Models to See Movement", in which they showed how to overcome the limitations of foundation models like CLIP, which are originally trained only on static images. Using their work ViMoCLIP as an example, they described a teacher–student distillation strategy to efficiently integrate motion information (optical flow). The resulting model, ViMoCLIP, achieves an improvement of approximately 2.5 points in fine-grained recognition tasks, without relying on text prompts or heavy 3D backbones—demonstrating the potential of multimodality (RGB + flow) to enhance temporal understanding at low computational cost.

The summer school was organized by CSIC, Universidad Autónoma de Madrid (UAM), Universidad Carlos III de Madrid (UC3M), and Universidad Politécnica de Madrid (UPM), within the framework of the IDEALCV-CM project, funded by the Comunidad de Madrid. This project aims to advance the development of deep learning-based computer vision systems, improving their accuracy, robustness, efficiency, and explainability.

News and Events