I'm a computer scientist from Barcelona. I'm passionate about FP (mostly Scala), data, algorithms and mountains.

Resume

I'm Miguel, a Computer Scientist from Barcelona. I'm passionate about data, functional programming (mainly Scala), distributed systems, algorithms and mountains. I'm very pragmatic and demanding with the things I build, but I also love to experiment with new languages and technologies.

Experience

Sep 2023

Mar 2025

Senior Engineering Manager, Data Platform

Verve

Nov 2022

Aug 2023

Engineering Manager, Data Platform

Verve

Feb 2020

Oct 2022

Lead Data Engineer

Verve

Jul 2019

Jan 2020

Data Engineer

Verve

Jul 2015

Dec 2017

Data Engineer

Trovit

I worked in the Data team. We helped the company leverage the data generated by the users and other departments using distributed computing. We also managed a self-hosted YARN cluster (with both Hadoop and Spark jobs) of about 60 hosts.

I led the keywords management and other related batch data pipelines. A keyword is just a set of tokens related to content. The goal of the pipeline was to manage all the keywords and, thus, the visibility of all the content to search engines. The total number of keywords exceeded the hundreds of millions, and the pipeline consisted of different phases:

Check if new keywords could be generated
Simulate the number of results of each keyword (the ones without a minimum quantity of content are useless)
Categorize and contextualize the tokens (what does the keyword really mean?)
Relate keywords with each other to generate linking (by hierarchy, clustering...)
Check which keywords are worth indexing and generate a Solr index with them

The pipeline was implemented using a hybrid Hadoop-Spark batch pipeline. It was challenging in many ways: performance issues, lack of context (the same token could mean a lot of different things), dealing with search engines performance, dealing with different languages, etc.

Other projects I worked on:

Ads categorization, deduplication, sorting and automatic expiration. Kafka, was used to enqueue the downloaded ads. A Hadoop ecosystem (YARN, HDFS and MapReduce) was used to consume, process and analyse them. Finally, Solr indices were built with all the processed information and deployed to production.
Stats processing. We used Kafka to enqueue impressions, clicks, e-mail openings and conversions from the site. Then, different Hadoop ETL pipelines processed the queues and extracted useful information for the company. Finally, the data was persisted to Hive, Impala or MySQL so, it could be consumed more easily.

Jul 2014

Jul 2015

Full Stack Web Developer

Trovit

I worked developing different experimental Web projects expected to be an important part of the company in the future. The most important one was the "Publish Your Ad" project, where users could post their own ads directly on Trovit (which was a pure aggregator before that).

Some of the technologies I used were:

PHP (Composer)
Javascript (jQuery, backbone, requirejs, zepto)
MySQL
Amazon S3

Mar 2012

Aug 2013

Internship as Web Developer and Systems Administrator

Polytechnic University of Catalonia (UPC)

I worked in the TSC (Signal Theory and Communications) department. I started helping to manage the department's data center (servers and network). Later on, I started developing both front-end and back-end web tools, which were used to improve the management of the department.

Some of the technologies I used were:

Apache 2
PHP (Symfony 2 framework)
SQL (MySQL)
Javascript (jQuery framework)
LDAP
Bash

Education

Sep 2009

Apr 2014

Bachelor’s degree in Informatics Engineering

Barcelona School of Informatics (FIB), Universitat Politècnica de Catalunya (UPC).

Major in Computing. I was trained to assess the difficulty of computing problems, to identify the most suitable machines, languages and programming paradigms, and to design and implement the best IT solution.

I successfully finished different advanced computing modules, including Theory of Computation, Machine Learning, Advanced Algorithms and Distributed Intelligent Systems.

Certifications

Aug 2015

Verified Certificate for Scalable Machine Learning

edX

I learned how machine learning algorithms could be adapted and used in large clusters of commodity machines. Particularly, I used Apache Spark to resolve different machine learning problems.

Languages

	Reading	Listening	Writing	Speaking
Spanish	Native
Catalan	Native
English	C1 Advanced	C1 Advanced	B2 Upper-Intermediate	B2 Upper-Intermediate
	Common European Framework of Reference for Languages (CFE)

Publications

Oct 2015

Using Multi-Agent Systems to mediate in an assistive social network for elder population

Co-authors: Cristian Barrué, Ulises Cortés, Atia Cortés and Jonatan Moreno.

Resume

Miguel Pérez Pasalodos

Experience

Education

Certifications

Languages

Publications