> Hi, I'm Marcelo Prates!
My name is Marcelo de Oliveira Rosa Prates, I’m a 33yo software developer and artist based in Porto Alegre, Brazil.

About
Marcelo de Oliveira Rosa Prates · Software developer, data scientist & generative artist in Porto Alegre, Brazil
Education

I hold a PhD in Computer Science with Major in Machine Learning from UFRGS (Aug 2015 – Aug 2019). During my doctorate, I pursued research in Geometric Deep Learning / Graph Neural Networks and the Ethics of Artificial Intelligence, particularly Machine Bias.
My Erdős number is 3, through Moshe Vardi.
Career
I currently work as a Data Scientist and Machine Learning Engineer at Dataside, with previous experience as an AI researcher at Samsung R&D Institute Brazil, where I led an innovative project for VO2Max estimation on wearable devices now shipped worldwide with the Samsung Galaxy Watch line. I also provide consultancy for companies interested on building solutions based on Machine Learning, Computer Vision or Large Language Models.
- Built LLM-powered RAG platforms (law and analytics) with multi-format ingestion and automated workflows, significantly reducing manual workload and cost (Dataside).
- Led production computer vision for 360° construction monitoring (semantic segmentation, analytics dashboard), improving project tracking and decision-making (ConstructIN).
- Drove ROI lifts in digital marketing by redesigning forecasting and real-time bid optimization systems, with robust monitoring and MLOps (Condati).
- Designed advanced forecasting with conformal prediction and quantile regression to control risk of under/overestimation (Dataside).
Art & Creative Coding
I have also been playing around with Generative art / Creative coding for some time now.
I have also been experimenting with generative art (art created through programming) as a hobby since 2015. You can see some of my sketches in the Generative Art tab.
Interests
My main artistic interests include the intersection of art and exact sciences, the nature of the artistic process in the context of generative art and the relationship of generative art with different types of media (2D printing, 3D printing and pen plotting; projections, interactive “sketches”). Other topics of interest are: physical and biological simulations, mathematical art, complex systems, signed distance functions, cartography, pen-plotters, fractals and abstract procedural art.
Open Source
I’m the creator of the open-source Python package prettymaps (now boasting more than 10k stars on GitHub!) which allows anyone to create highly stylized maps from public OpenStreetMap data for free. Visit the repository.
Selected Projects
Allprettymaps
Generate beautiful maps from OpenStreetMap data with Python and matplotlib.
easyshader
A Python DSL for 3D art using signed distance functions and raymarching, with mesh export and AR experiments.
Cosmos
An edited, modernized LaTeX edition of Humboldt’s “Cosmos” (Vol. 1) from OCR, with figures and improved accessibility; PDF available.
Streamlines
Generative art from vector fields: ODE-integrated streamlines with stylization (median blur + SLIC) and palette colors; includes Blender/plotter and TSP animations.
Open Source
All on GitHubprettymaps
Generate beautiful maps from OpenStreetMap data with Python and matplotlib.
Turmites
An interactive tool Written in Processing to draw and generate Turmites (Turing Machines on 2D tapes).
easyshader
A Python DSL for 3D art using signed distance functions and raymarching, with mesh export and AR experiments.
Cosmos
An edited, modernized LaTeX edition of Humboldt’s “Cosmos” (Vol. 1) from OCR, with figures and improved accessibility; PDF available.
TSP-Animation
CLI tool for generating smooth transition animations from unordered collections of images
audio-gravity
Audio Gravity is a Processing sketch implementing a gravitational particle system which reacts in real-time to an user-inputed song
Selected Papers
Assessing gender bias in machine translation: a case study with Google Translate
Neural computing & applications (Print)•2018
Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective
International Joint Conference on Artificial Intelligence•2020
Learning to Solve NP-Complete Problems - A Graph Neural Network for the Decision TSP
AAAI Conference on Artificial Intelligence•2018
Graph Colouring Meets Deep Learning: Effective Graph Neural Network Models for Combinatorial Problems
IEEE International Conference on Tools with Artificial Intelligence•2019
On Quantifying and Understanding the Role of Ethics in AI Research: A Historical Account of Flagship Conferences and Journals
Global Conference on Artificial Intelligence•2018
Multitask Learning on Graph Neural Networks - Learning Multiple Graph Centrality Measures with a Unified Network
International Conference on Artificial Neural Networks•2018
More on Semantic Scholar.
Resume
Skills (auto-converted from LaTeX)
src/sidebars/page1sidebar.texUncertainty Quantificationcoreml
Causal Inferencecoreml
Bayesian Machine Learningcoreml
Graph Neural Networkscoreml
Computer Visioncoreml
ML for Healthcarecoreml
ML for FinTechcoreml
Geospatial Analyticscoreml
LLMs & RAG Systemsgenai
LangChain Ecosystemgenai
Vector Databasesgenai
Tool Usage & Agentic AIgenai
LangGraph & State Machinesgenai
Context Engineeringgenai
Hybrid Search & Metadata Filteringgenai
DSPy & Prompt Optimizationgenai
MLOps & Production MLdeploy
MLflow & Experiment Trackingdeploy
FastAPI & API Designdeploy
CI/CD Pipelinesdeploy
Docker & Containerizationdeploy
Data Projects Managementdeploy
System Designdeploy
Technical Leadershipdeploy
Azure Cloud Platforminfra
Databricks & Big Datainfra
SQL & Database Designinfra
Juliainfra
System Programming (C/C++)infra
Portuguese (native speaker)proficiency
Englishproficiency
Spanishproficiency
Experience (auto-converted from LaTeX)
src/experience.texSenior Data Scientist $\rightarrow$ Lead Data Scientist — Dataside (Oct 2023 — Now) — São José dos Campos, Brazil (Remote)
- Initially joined as Data Scientist; promoted to Lead Data Scientist from mid 2024 onwards.
- Mentored junior data scientists in ML, generative AI, and MLOps; led client engagement and solution design for successful project acquisition.
- Built Agentic RAG platforms for law and analytics with multi-format ingestion (PDF / Excel / Image / Text), and automated workflows (translation, summarization, legal analytics, NL database queries), reducing manual workload and costs. These RAG solutions combined hybrid search (embeddings + BM25) and metadata filtering with self-querying retrievers.
- As solution architect, designed and partially implemented an automated candidate resume evaluation system leveraging embeddings and reranking to match candidates to job descriptions. Integrated agentic workflows to auto-generate interview questions and dynamically rank candidates as evaluations progressed. The system continuously updated rankings and interview questions to address gaps identified in prior candidates, ensuring targeted assessments. Additionally, optimized an existing version for lower latency and reduced token usage.
- Developed advanced sales forecasting models with conformal prediction and quantile regression to minimize underestimation risk and control overestimation.
- Created hybrid classification systems (TF-IDF + LLM embeddings) for product categorization with calibrated probability rejection, improving accuracy and reducing revision time.
- Used the DsPy framework to build few-shot classifiers and RAG solutions from client databases, optimizing few-shot example selection for prompting via Bayesian search.
- Delivered agentic RAG solutions for structured data extraction from unstructured documents (publishers, law firms) using multimodal ingestion and table detection, achieving high accuracy. Leveraged hybrid search (embeddings + BM25 or TF-IDF), metadata filtering via self-querying retrievers, parent document retrieval, and structured outputs with Pydantic validation for LLM-based metadata enrichment.
- Built computer vision systems for health and food sectors: (1) liquid volume estimation from photos; (2) food tray detection/classification for automated consumption tracking.
- Additional: custom chatbots, knowledge extraction, clustering, and outlier detection systems.
- Tech stack:
- Languages: Python, Julia, Javascript, C#, SQL, Bash
- AI/ML: PyTorch, Lightning, MLflow, Scikit-Learn, Optuna, PyCaret
- LLM/NLP: Azure OpenAI, LangChain, RAG, HuggingFace, CrewAI, PydanticAI, LangGraph
- Data: Databricks, Pandas, Polars, NumPy, Dask, PySpark, Pinecone, Weaviate, Chroma, PostgreSQL, Redis, DsPy
- MLOps: Azure, Docker, CI/CD, GitHub Actions
- APIs: FastAPI, Flask
- CV: OpenCV, Open3D, Scikit-Image, Shapely
Large Language Models Consultant — Vortigo (Nov 2023 — Jun 2024) — Porto Alegre, Brazil (Remote)
- As LLM consultant, I collaborated with a Brazilian tech company in the design and implementation of assistant ChatBots informed by proprietary source code and spreadsheet knowledge bases by leveraging OpenAI's paid API and pretrained Large Language Models such as GPT3.5 and GPT4.
- Tech stack:
- Languages & Core: Python
- AI & ML: PyTorch, PyTorch Lightning, Pandas, Scikit-Learn, NumPy, Jupyter Notebooks, AWS Sagemaker
- LLM & NLP: OpenAI API, HuggingFace, LangChain, BertTopic
Sabbatical Period — To focus on my generative art projects (Apr 2023 — Oct 2023) — Porto Alegre, Brazil
- I took a short sabbatical period to focus on my generative art projects and mantain / improve existing Python packages I had built to help me in my artistic process, including prettymaps
Machine Learning / Computer Vision Consultant — ConstructIN (Mar 2022 — Apr 2023) — Porto Alegre, Brazil (Remote)
-
As a senior ML & Computer Vision consultant, I spearheaded the design and implementation of advanced computer vision solutions for automated construction site monitoring using 360-degree photography, leveraging state-of-the-art deep learning architectures.
-
Led a team of 3 data scientists in developing and deploying 4 production-ready computer vision applications, including a comprehensive analytics dashboard that enabled real-time construction progress monitoring through custom semantic segmentation models. The solution significantly improved project tracking efficiency and decision-making capabilities for clients.
-
Tech stack:
-
Languages & Core: Python
-
AI & ML: TensorFlow, Keras, PyTorch, PyTorch Lightning, Pandas, Scikit-Learn, SciPy, NumPy, Matplotlib
-
Computer Vision: Open3D, OpenCV, Scikit-Image
-
MLOps & Infrastructure: AWS, Docker
-
APIs & Services: Flask, Django
Generative Art Teacher — Responsive Cities (Nov 2022) — Porto Alegre, Brazil
- Taught a course on generative art history & principles and on useful tools and libraries for creative coding
Senior Data Scientist — Condati (Nov 2021 — Dec 2023) — Menlo Park, California (Remote)
-
As Senior Data Scientist, I led the redesign and optimization of ML solutions for digital marketing campaign bid strategies, achieving significant ROI improvements and meeting client KPIs through:
-
Implementation of advanced forecasting models and automated bidding systems
-
Development of robust monitoring and validation frameworks
-
Design of novel optimization algorithms for real-time bid adjustments
-
End-to-end MLOps pipeline implementation for model deployment and monitoring
Successfully diagnosed and resolved critical performance issues, bringing model effectiveness back to target levels within one year. Tech stack:
- Languages & Core: Python, Julia
- AI & ML: PyTorch, TensorFlow.jl, Torch.jl, MLJ.jl, Flux.jl, Pandas, SciPy, NumPy
- Data Engineering: MySQL
- MLOps & Infrastructure: AWS Sagemaker
Senior AI Researcher & Project Lead - ML for Health — Samsung Research Brazil (Mar 2020 — Nov 2021) — Campinas, Brazil (Remote)
-
Led the development of ML-powered health monitoring solutions for Samsung wearables, resulting in global implementation in the Galaxy Watch line.
-
Designed robust data collection protocols and ML architectures for physiological signal analysis and health metric estimation.
-
Developed memory-optimized, real-time health monitoring algorithms for resource-constrained wearable devices, ensuring high accuracy and efficiency.
-
Deployed production models on Samsung Tizen OS, using custom Python-to-C transpilers and ONNX for efficient inference.
-
Presented project outcomes directly to Samsung HQ, leading to worldwide adoption and impact.
-
Led and mentored cross-functional teams of researchers and engineers, driving innovation in wearable health technology while meeting strict performance and resource constraints.
-
Tech stack:
-
Languages & Core: Python, Julia, C/C++
-
AI & ML: TensorFlow, Keras, PyTorch, PyTorch Lightning, MLJ.jl, Flux.jl, Pandas, Scikit-Learn, SciPy, NumPy, Matplotlib
-
MLOps & Infrastructure: AWS Sagemaker
-
Deployment: C, ONNX, Custom Python-to-C transpilers, Samsung Tizen OS
Mid-level Data Scientist — Poatek IT Consulting (Jun 2019 — Mar 2020) — Porto Alegre, Brazil
-
As a Data Scientist, I led multiple high-impact projects across different domains, delivering innovative solutions through:
-
Development of exact and heuristic algorithms for complex vehicle routing optimization
-
Implementation of computer vision and NLP pipelines for automated document processing and data extraction
-
Design of advanced NLP solutions for Named Entity Recognition and sentiment analysis
-
Creation of sophisticated credit risk modeling systems
-
Development of geospatial data analysis and visualization frameworks
Successfully integrated various ML/DL technologies including CNNs, ensemble methods, and pre-trained language models to enhance solution performance. Tech stack:
- Languages & Core: Python, Julia, C/C++
- AI & ML: TensorFlow, Keras, PyTorch, PyTorch Lightning, Pandas, GeoPandas, Scikit-Learn, SciPy, NumPy, Matplotlib
- Computer Vision: OpenCV, Scikit-Image
- MLOps & Infrastructure: Docker
- APIs & Services: Flask, Django
- Optimization: JuMP, Google OR-Tools