Resume
Miguel Pérez Pasalodos
Experience
I worked in the Data team. We helped the company leverage the data generated by the users and other departments using distributed computing. We also managed a self-hosted YARN cluster (with both Hadoop and Spark jobs) of about 60 hosts.
I led the keywords management and other related batch data pipelines. A keyword is just a set of tokens related to content. The goal of the pipeline was to manage all the keywords and, thus, the visibility of all the content to search engines. The total number of keywords exceeded the hundreds of millions, and the pipeline consisted of different phases:
- Check if new keywords could be generated
- Simulate the number of results of each keyword (the ones without a minimum quantity of content are useless)
- Categorize and contextualize the tokens (what does the keyword really mean?)
- Relate keywords with each other to generate linking (by hierarchy, clustering...)
- Check which keywords are worth indexing and generate a Solr index with them
The pipeline was implemented using a hybrid Hadoop-Spark batch pipeline. It was challenging in many ways: performance issues, lack of context (the same token could mean a lot of different things), dealing with search engines performance, dealing with different languages, etc.
Other projects I worked on:
- Ads categorization, deduplication, sorting and automatic expiration. Kafka, was used to enqueue the downloaded ads. A Hadoop ecosystem (YARN, HDFS and MapReduce) was used to consume, process and analyse them. Finally, Solr indices were built with all the processed information and deployed to production.
- Stats processing. We used Kafka to enqueue impressions, clicks, e-mail openings and conversions from the site. Then, different Hadoop ETL pipelines processed the queues and extracted useful information for the company. Finally, the data was persisted to Hive, Impala or MySQL so, it could be consumed more easily.
I worked developing different experimental Web projects expected to be an important part of the company in the future. The most important one was the "Publish Your Ad" project, where users could post their own ads directly on Trovit (which was a pure aggregator before that).
Some of the technologies I used were:
- PHP (Composer)
- Javascript (jQuery, backbone, requirejs, zepto)
- MySQL
- Amazon S3
I worked in the TSC (Signal Theory and Communications) department. I started helping to manage the department's data center (servers and network). Later on, I started developing both front-end and back-end web tools, which were used to improve the management of the department.
Some of the technologies I used were:
- Apache 2
- PHP (Symfony 2 framework)
- SQL (MySQL)
- Javascript (jQuery framework)
- LDAP
- Bash
Education
Major in Computing. I was trained to assess the difficulty of computing problems, to identify the most suitable machines, languages and programming paradigms, and to design and implement the best IT solution.
I successfully finished different advanced computing modules, including Theory of Computation, Machine Learning, Advanced Algorithms and Distributed Intelligent Systems.
Certifications
Languages
Reading | Listening | Writing | Speaking | |
---|---|---|---|---|
Spanish | Native | |||
Catalan | Native | |||
English | C1 Advanced | C1 Advanced | B2 Upper-Intermediate | B2 Upper-Intermediate |
Common European Framework of Reference for Languages (CFE) |