About Me

I currently hold a postdoctoral researcher position in the Computer Vision Lab (CVL) at ETH Zürich, working with prof. Luc Van Gool and Dr. Radu Timofte. Before joining CVL, I spent 5 delightful years in the Biomedical Computer Vision (BCV) group working closely with Pablo Arbelaez at the University of Los Andes (Colombia).

My areas of interest lie in the field of Machine Learning, Artificial Intelligence and Technologies, namely Computer Vision and Pattern Recognition. Particularly, I am interested in developing techniques using Generative Adversarial Networks (GANs) for image-to-image translation problems such as image editing, semantic manipulations, style-guided transformations, among others.

During my PhD I developed a strong connection with the world of Facial Expressions Recognition and Affective Computing.

I was awarded three consecutive times by the Google Research Awards for Latin America (2017, 2016, 2015), recently called Latin America Research Awards (LARA), with my project: “Learning Dynamic Action Units for Three-­dimensional Facial Expression Recognition” under the guidance of Prof. Pablo Arbeláez.

Press coverage:

Andrés Romero (left) and Pablo Arbelaez (right) were interviewed by one of the most influential Colombian News, RCN. Follow the link to see the video.
Andrés Romero explaining the importance of automated facial expression recognition, and what is exactly the method developed to detect them.

Older press coverage:

Non-academical interests:

I am very much into the photography world, I am on my way to becoming an expert photographer. As to the date, I own a Sony α7III and I love to shoot with an 18-35mm lens.
In a very different scenario, I am a backpack traveller, I try to explore different places as much as possible, a gym enthusiast with strong routines, and I try to maintain my life as healthy as possible.


    • Abstract: Cross-domain mapping has been a very active topic in recent years. Given one image, its main purpose is to translate it to the desired target domain, or multiple domains in the case of multiple labels. This problem is highly challenging due to three main reasons: (i) unpaired datasets, (ii) multiple attributes, and (iii) the multimodality (e.g. style) associated with the translation. Most of the existing state-of-the-art has focused only on two reasons i.e., either on (i) and (ii) or (i) and (iii). In this work, we propose a joint framework (i, ii, iii) of diversity and multi-mapping image-to-image translations, using a single generator to conditionally produce countless and unique fake images that hold the underlying characteristics of the source image. Our system does not use style regularization, instead, it uses an embedding representation that we call domain embedding for both domain and style. Extensive experiments over different datasets demonstrate the effectiveness of our proposed approach in comparison with the state-of-the-art in both multi-label and multimodal problems. Additionally, our method is able to generalize under different scenarios: continuous style interpolation, continuous label interpolation, and fine-grained mapping.

    • Abstract: We propose a novel convolutional neural network architecture to address the  fine-grained recognition problem of multi-view dynamic facial action unit detection. We leverage recent gains in large-scale object recognition by formulating the task of predicting the presence or absence of a specific action unit in a still image of a human face as holistic classification. We then explore the design space of our approach by considering both shared and independent representations for separate action units, and also different CNN architectures for combining color and motion information. We then move to the novel setup of the FERA 2017 Challenge, in which we propose a multi-view extension of our approach that operates by first predicting the viewpoint from which the video was taken, and then evaluating an ensemble of action unit detectors that were trained for that specific viewpoint. Our approach is holistic, efficient, and modular, since new action units can be easily included in the overall system. Our approach significantly outperforms the baseline of the FERA 2017 Challenge, which was the previous state-of-the-art in multi-view dynamic action unit detection, with an absolute improvement of 14%.


September 1

Join CVL @ ETH Zürich

Posdoctoral Researcher
August 11

PhD Dissertation

July 17

PhD Internship

PhD Internship
Period of research in the Computer Vision Lab at ETH Zürich.  
August 4

Google Research Award

Google Research Award
Part of the recipients of the Latin America Research Awards 2017.
July 25

Join PhD program

Join PhD program
Doctoral studies in Engineering at University of Los Andes.
July 15

Google Research Award

Google Research Award
Part of the recipients of the Google Research Award for Latin America 2016.
June 10

MSc Degree

MSc in Biomedical Engineering
August 5

Google Research Award

Google Research Award
Part of the recipients of the Google Research Award for Latin America 2015.
January 16

Join MSc program

Join MSc program
Graduate student in the Biomedical Engineering department at University of Los Andes.
March 15

BS Degree

BS Degree
Bioengineer graduation at University of Antioquia