I'm a computer scientist from Barcelona. I'm passionate about FP (mostly Scala), data, algorithms and mountains.

Resume

I'm Miguel, a Computer Scientist from Barcelona. I'm passionate about data, functional programming (mainly Scala), distributed systems, algorithms and mountains. I'm very pragmatic and demanding with the things I build, but I also love to experiment with new languages and technologies.

Experience

Jul 2019
Present
Data Engineer
PubNative
Jul 2015
Dec 2017
Data Engineer
Trovit

I worked in the Data team. We helped the company leverage the data generated by the users and other departments using distributed computing. We also managed a self-hosted YARN cluster (with both Hadoop and Spark jobs) of about 60 hosts.

I led the keywords management and other related batch data pipelines. A keyword is just a set of tokens related to content. The goal of the pipeline was to manage all the keywords and, thus, the visibility of all the content to search engines. The total number of keywords exceeded the hundreds of millions, and the pipeline consisted of different phases:

  1. Check if new keywords could be generated
  2. Simulate the number of results of each keyword (the ones without a minimum quantity of content are useless)
  3. Categorize and contextualize the tokens (what does the keyword really mean?)
  4. Relate keywords with each other to generate linking (by hierarchy, clustering...)
  5. Check which keywords are worth indexing and generate a Solr index with them

The pipeline was implemented using a hybrid Hadoop-Spark batch pipeline. It was challenging in many ways: performance issues, lack of context (the same token could mean a lot of different things), dealing with search engines performance, dealing with different languages, etc.

Other projects I worked on:

  • Ads categorization, deduplication, sorting and automatic expiration. Kafka, was used to enqueue the downloaded ads. A Hadoop ecosystem (YARN, HDFS and MapReduce) was used to consume, process and analyse them. Finally, Solr indices were built with all the processed information and deployed to production.
  • Stats processing. We used Kafka to enqueue impressions, clicks, e-mail openings and conversions from the site. Then, different Hadoop ETL pipelines processed the queues and extracted useful information for the company. Finally, the data was persisted to Hive, Impala or MySQL so, it could be consumed more easily.
Jul 2014
Jul 2015
Full Stack Web Developer
Trovit

I worked developing different experimental Web projects expected to be an important part of the company in the future. The most important one was the "Publish Your Ad" project, where users could post their own ads directly on Trovit (which was a pure aggregator before that).

Some of the technologies I used were:

  • PHP (Composer)
  • Javascript (jQuery, backbone, requirejs, zepto)
  • MySQL
  • Amazon S3
Mar 2012
Aug 2013
Internship as Web Developer and Systems Administrator
Polytechnic University of Catalonia (UPC)

I worked in the TSC (Signal Theory and Communications) department. I started helping to manage the department's data center (servers and network). Later on, I started developing both front-end and back-end web tools, which were used to improve the management of the department.

Some of the technologies I used were:

  • Apache 2
  • PHP (Symfony 2 framework)
  • SQL (MySQL)
  • Javascript (jQuery framework)
  • LDAP
  • Bash

Education

Sep 2009
Apr 2014
Bachelor’s degree in Informatics Engineering
Barcelona School of Informatics (FIB), Universitat Politècnica de Catalunya (UPC).

Major in Computing. I was trained to assess the difficulty of computing problems, to identify the most suitable machines, languages and programming paradigms, and to design and implement the best IT solution.

I successfully finished different advanced computing modules, including Theory of Computation, Machine Learning, Advanced Algorithms and Distributed Intelligent Systems.

Certifications

Aug 2015
Verified Certificate for Scalable Machine Learning
edX
I learned how machine learning algorithms could be adapted and used in large clusters of commodity machines. Particularly, I used Apache Spark to resolve different machine learning problems.

Languages

ReadingListeningWritingSpeaking
SpanishNative
CatalanNative
English
C1
Advanced
C1
Advanced
B2
Upper-Intermediate
B2
Upper-Intermediate
Common European Framework of Reference for Languages (CFE)

Publications

Oct 2015
Using Multi-Agent Systems to mediate in an assistive social network for elder population
Co-authors: Cristian Barrué, Ulises Cortés, Atia Cortés and Jonatan Moreno.