Data Science Portfolio

I am a multi-disciplinary engineer and data scientist working at the intersection of cloud hardware, software, and supply chain systems.

I have hands-on experience spanning from design conceptualization to datacenter operations. I especially enjoy building data and artificial intelligence systems to create tangible solutions to engineering and business challenges alike.

Analytics Skills: Data Mining, Statistical Analysis, ML/DL, DOE
Platforms: Azure Services, MLflow, Spark, Git/GitHub, PowerBI, JMP
Languages: Python, SQL, Julia, KQL, R
Libraries: Pandas, PyTorch, Scikit-Learn, NumPy, Statsmodels, SciPy, Plotly
Extracurricular Roles: UCF Data Science External Advisory Board

Professional Projects

Gray Artificial Intelligence, Inc.

Statistical and Deep Learning Methods for Server Telemetry Analysis

Focus: Deep Learning, Time Series, Anomaly Detection, Server Validation
Platforms: Git/GitHub, MLflow, Azure (SQL Server, ML Studio, VMs)

Developed master’s capstone project in collaboration with GrayAI, a startup leveraging deep learning and AI technologies to optimize data center and manufacturing operations through server validation, power distribution, and preventative maintenance optimization.
Led 4-member university team in analysis of server memory hardware to model telemetry patterns, identify deviations from baseline operational behavior, and optimize validation testing
Developed PyTorch-based temporal convolutional network autoencoder architecture to analyze multivariate time series data and quantify anomalous server behavior in time spans as short as 100 milliseconds, improving model accuracy by reducing reconstruction loss from 28% to 0.25%
Developed comprehensive GitHub repository with scripts, notebooks, and reporting as well as MLflow structure to log experiments, model parameters, and artifacts for client

Project Video Summary (click image)

Microsoft Azure Hardware & Systems Infrastructure

(1) Server Chassis Production Qualification

Focus: Design of Experiments, Statistical Process Control, Material Testing
Tools: JMP, Quantum XL

Utilized statistical process control methods to identify subpar server chassis yield in supplier production due to use of corrosion-prone sheet metals
Assembled team of subject matter experts across reliability engineering, quality engineering and sourcing teams to align on validation plan which balanced testing rigor with downstream customer commitments
Created full-factorial sheet metal Design of Experiments to determine minimum material quality requirements based on temperature and humidity factors
Contracted 3rd party laboratory to conduct sheet metal testing and validate results, ultimately unblocking supplier production and creating $8 million in downstream savings

(2) Hardware Program Assignment Optimization

Focus: Data Governance, Data Mining, Regression Modeling, Visualization
Platforms: Azure (SQL Server, Data Studio, ML Studio, DevOps), Power BI

Managed end-to-end development of Power BI tool to forecast and balance workload distribution of 400+ programs across 7 engineering teams and save 2,400 labor hours annually
Created process standards and implementation logic to define clear requirements for data engineering collaborators to execute against
Leveraged SQL and Python to extract, clean, and model program development timelines, improving schedule forecast accuracy by 32% and enabling further workload balancing optimization

(3) Nvidia H100 Repair Ticket Prioritization Hackathon

Focus: Regression / Ensemble Modeling, MLP Neural Networks
Platforms: Azure (Kusto, SQL Server, ML Studio)

Analyzed 20,000+ datacenter repair tickets for Nvidia H100 nodes based on failure fault code and repair action data
Create prediction model to estimate mean time to repair and repair duration of a given node based on fault codes
Utilized model to prioritze datacenter technician repair tickets with greatest impact on node uptime

(4) Rack Assembly Inspection Process Kaizen

Focus: Six Sigma, DMAIC, Data Analysis, Cycle Time Reduction
Platforms: JMP, Azure (SQL Server, Data Studio, ML Studio, DevOps)

Led consolidation and analysis of rack assembly inspection cycle time data, identifying bottlenecks and redundant review processes
Collaborated with broader team to drive lean process improvements and streamlined stakeholder communication resulting in 25% reduction in inspection cycle time

Personal Projects

(1) DJ Music Collection Segmentation

Focus: Dimensionality Reduction, Clustering, Classification, Metadata
Platforms: Spotify API, Jupyter Notebooks

Consolidated metadata for personal music collection of 3,000 tracks using existing features (e.g. tempo, key, runtime) as well as queried attributes from Spotify’s API (e.g. instrumentalness, danceability, energy).
Implemented non-linear classification models (KNN, Ensemble, Multilayer Perceptron) to determine genre of tracks missing this designation, improving accuracy by 31% and recall by 48% relative to linear methods.
Clustered tracks via K-Means and K-Medoids algorithms combined with Principal Component Analysis (PCA) of existing attributes.
Utilized novel track groupings to create new setlists and mixes for DJ performances.
Project Repository
Medium Article

K-Medoids Track Clustering

Track Clustering Visualization