Available for PhD & freelance engagements

Muhammad Ahsan Shakeel

AI Researcher & Machine Learning Engineer.

MSc AI at LUMS researching inference-efficient LLM reasoning. I ship production AI systems — agentic pipelines, RAG, MLOps — and publish peer-reviewed work on the side.

Lahore, Pakistan·LUMS · MSc AI '26·IEEE PIMRC 2026
Muhammad Ahsan Shakeel

Current research

Failure-Mode-Guided Directional Feedback for LLM reasoning

About

Research-grade rigour, production-grade delivery.

I'm an MSc Artificial Intelligence candidate at the Lahore University of Management Sciences, working with Dr. Naveed-ul-Hassan on inference-efficient LLM reasoning. My thesis introduces FMGDF — a three-pass agentic pipeline that uses failure-mode taxonomies to maximise accuracy under tight call-count and token budgets.

Alongside research I teach 300+ undergraduates at the University of Lahore and ship LLM systems for industry — RAG, agentic workflows, MLOps pipelines on AWS Bedrock and Kubernetes. I care about the boring parts: reproducibility, evaluation harnesses, cost per call, and graceful failure.

"Most LLM gains come from disciplined feedback loops, not bigger models."

For PhD committees

Research & publications

Focused on LLM reasoning, agentic systems, and the physics of learning under constraint. Actively seeking PhD positions starting Fall 2026.

Manuscript in preparation2026

Directional Feedback Guided by Failure Modes for Inference-Efficient LLM Reasoning in Telecom Mathematics

M. A. Shakeel, N. U. Hassan

IEEE PIMRC2026

Intent-Driven LLM-Based Two-Level Control in AI-Native 6G Networks

Z. Hussain*, M. A. Shakeel*, N. U. Hassan, H. Chen, C. Yuen

Submitted — IEEE Proceedings2025

Noise Resilience of Quantum Support Vector Machines: Feature Map Architectures for NISQ Deployment

M. A. Shakeel et al.

Research experience

Sept 2024 — Present

LUMS, EE Department

Graduate Research Assistant — LLM Reasoning & Inference Efficiency

  • Designed FMGDF, a three-pass agentic inference pipeline (critic LLM, failure-mode taxonomy, diversity prefix) maximising accuracy under joint call-count and token budgets.
  • Benchmarked AWS Bedrock (Claude 3 Sonnet, Llama 3.1-70B) against self-hosted OSS models behind a unified Python interface — model swaps via config change only.
  • Tracked experiments with MLflow and built an automated evaluation harness — cut manual reporting time by 70%.

2025

LUMS, EE Department (PIMRC 2026)

Research Contributor — LLM-Driven 6G Network Control

  • Co-designed intent-driven two-level control with an LLM upper layer refining Lyapunov parameters via structured feedback.
  • Validated across GPT-4.5, GPT-OSS-120B, and Llama-3.3-70B with full integration tests on every API contract.

2024 — 2025

LUMS, Physics Department

Research Contributor — Quantum Machine Learning

  • Ran 52 controlled experiments across four QSVM feature-map architectures and three noise types.
  • Amplitude-inspired encoding sustained 100% accuracy up to noise level 0.10 — a 10× gain over the next-best approach.

For founders & teams

Freelance services

Engagements range from rapid LLM prototypes to production MLOps overhauls. Available for contract, retainer, or advisory.

01

LLM Agents & RAG Systems

Multi-turn agentic workflows with LangChain, Bedrock Agents, FAISS / Chroma vector stores, tool-use, and Knowledge Bases.

  • Production agent pipelines
  • Vector store + ingestion
  • Tool & function-calling design
02

MLOps & Deployment

From notebook to production: Docker, Kubernetes, GitHub Actions CI/CD, MLflow tracking, DVC versioning, AWS deployment.

  • Reproducible training pipelines
  • CI/CD for models & prompts
  • Monitoring & rollback strategy
03

Fine-Tuning & Model Evaluation

PyTorch fine-tuning, structured A/B experiments, bias audits, calibration analysis, and responsible-AI reporting.

  • Domain-adapted models
  • Evaluation harness
  • Compliance documentation
04

FastAPI Microservices

Async REST endpoints, OpenAPI schemas, and provider-agnostic backends so model swaps require only a config change.

  • Production API service
  • OpenAPI contract
  • Container + deployment
05

Computer Vision & Classical ML

Image classification, tabular ML (XGBoost, ensembles), imbalanced-data techniques, ONNX export, REST serving.

  • Trained & exported model
  • Inference endpoint
  • Validation report
06

Data Pipelines & ETL

Airflow DAGs, Spark transformations, Hadoop/Hive batch processing, Dask for distributed workloads.

  • Scheduled DAGs
  • Distributed ETL
  • Data quality checks
07

Research Collaboration

Co-author on ML / LLM / quantum-ML papers — problem framing, experimental design, ablations, LaTeX manuscript prep, and rebuttal support for IEEE / NeurIPS-tier venues.

  • Co-authored manuscript
  • Reproducible experiment repo
  • Reviewer-response pack
08

Research Assistant (Remote)

Literature reviews, dataset curation, baseline reproductions, and end-to-end experiment running for PhD students, labs, and independent researchers.

  • Annotated lit review
  • Curated dataset + scripts
  • Weekly progress reports
09

Virtual Assistant — Technical

Long-term remote support for founders and academics: inbox triage, calendar and travel, manuscript formatting, slide decks, data entry, and light automation in Python / Zapier.

  • Inbox & calendar SOPs
  • Polished decks & docs
  • Automation scripts

Selected work

Projects worth talking about

LLM Agent Orchestration

Multi-agent workflow with LangChain tool use, REST calls for real-time data, deployed as an AWS microservice.

LangChainAWSAgents

Solar Inverter Control Interface

Designed and built a touch-friendly monitoring & control UI for a solar inverter — real-time telemetry (V, I, kWh, battery SOC), fault alerts, and Modbus/MQTT backend integration.

Embedded UIMQTTModbus

Naruto Jutsu Recognition (CV)

Real-time hand-sign / mudra classifier using MediaPipe landmarks + a lightweight CNN over OpenCV — maps sequences to jutsu and overlays anime-style visual effects on the webcam feed.

OpenCVMediaPipePyTorch

Melanoma Detection

Binary classifier on HAM10000 with Dask distributed computing. 92.81% accuracy. ONNX export + REST.

PyTorchDaskONNX

Real Estate Voice Agent

End-to-end agentic assistant with RAG over FAISS and Chroma for property search and legal Q&A.

RAGFAISSFastAPI

Bankruptcy Prediction

LR, RF, XGBoost, GBM compared with SMOTE on imbalanced finance data. Full lifecycle in MLflow.

XGBoostMLflowSMOTE

Big Data Pipeline

ETL on HDFS + Hive across a multi-node Hadoop cluster, extended with Apache Spark for analytics.

HadoopHiveSpark

Trajectory

Experience

Jun 2024 — Present

University of Lahore

Junior Lecturer, CS & IT

  • Teach Programming Fundamentals and OOP to 300+ undergraduates per semester.
  • Designed a hands-on lab on fine-tuning small LMs and deploying via FastAPI to cloud.
  • Introduced an applied AI-ethics module on bias auditing and responsible generative AI.

Jul 2024 — Feb 2026

Phoenux Design (Remote)

AI & Web Projects Associate

  • Built agentic workflows on LangChain + Bedrock; containerised with Docker, orchestrated on Kubernetes — cut p95 latency by 35%.
  • Implemented DVC + MLflow so every deployed model was reproducible from a single config.
  • GitHub Actions CI/CD dropped deployment effort from hours to under 10 minutes.

2022 — 2024

Bugsfree Solutions · Phoenux Design

AI Research Intern / Web Designer

  • Fine-tuned PyTorch text classifier, served via FastAPI, integrated into a client SaaS.
  • End-to-end feature engineering on 500k+ rows — F1 improved 18% via SMOTE + RFECV.
  • Airflow DAGs for nightly refresh and automated retraining; DVC for dataset versioning.

Toolkit

Technical stack

Generative AI & Agents

LLM fine-tuningRAGLangChainAWS BedrockBedrock AgentsFAISSChroma

ML & AI Frameworks

PyTorchTensorFlowScikit-learnXGBoostQiskitCNNsTransformersONNX

MLOps & Deployment

MLflowDVCAirflowFastAPIDockerKubernetesGitHub ActionsAWS EC2 / ELB / SageMaker

Data & Distributed

PandasNumPyDaskHadoop / HDFSHiveApache SparkETL

Programming

PythonJavaC++SQLDartLaTeX

Get in touch

Let's talk — research or build.

Whether you're recruiting for a PhD cohort or need an AI engineer for a shipping deadline, I read every message.

Delivered straight to my inbox. Replies usually within 24 hours.