Loading Experience...
Data Scientist
& AI Engineer
I'm Simarjit — a 24-year-old Data Scientist and AI/ML Engineer currently pursuing my MSc in Data Science, AI & Digital Business at GISMA University in Potsdam, Germany.
I build end-to-end ML pipelines, GenAI applications, and data products — from raw data ingestion through to deployed, production-ready systems. My work sits at the intersection of machine learning, LLMs, and business intelligence.
Before Berlin, I spent 2+ years at AISECT as a Data Analyst and Python Instructor — trained 50+ students in Python and ML. Full story in Experience below.
0+
Students Trained
0
Projects Shipped
0+yrs
Industry Experience
0
Certifications
Tools & Stack
Quick Facts
Dual-mode book discovery across a corpus of 10,000+ titles — natural language search via a custom RAG pipeline with semantic embeddings, plus OCR-based cover recognition from photo input.
EDA and hypothesis testing on a 4-table e-commerce dataset (10K customers, 45K transactions, 70K sessions) — 10 statistical tests (t-test, chi-squared, Pearson r), Cohen's d effect sizes, and business insights on discount impact, acquisition channels, and revenue trends.
Three-model NLP comparison on WELFake — 72,134 news articles across 4 benchmark sources. TF-IDF feature extraction with Logistic Regression (96.22%), Random Forest (95.09%), and ANN (96.12%). Live prediction demo flags fabricated content at 99.97% confidence.
Weighted least squares regression on UK Living Costs & Food Survey 2013 (~4,000 households) — models household expenditure against occupational class, tenure type, and household composition using survey weights and HC3 robust standard errors. Validated across 9 robustness appendices.
Time-series analysis of 5 years of daily ADANIPORTS equity data — built ARIMA and LSTM models, compared forecast accuracy (RMSE), and surfaced 3 actionable market trend signals.
End-to-end NLP classification pipeline across 15+ threat categories — cut analyst triage time by 40% (~3h/day per analyst). Deployed as a production REST API with FastAPI.
Interactive policy dashboard across 27 EU member states — visualising €110B+ in annual fossil fuel subsidies with country-level drill-down. Built for policy researchers and journalists.
Custom CNN trained on 54,000 images across 38 classes — achieved 98% test accuracy, then pruned and quantised for mobile deployment at <10MB model size.
Real-time attendance system recognising 50+ registered faces simultaneously from a live camera feed — eliminated 100% of manual roll-call entry for a 200-student cohort.
01
July 2023 – September 2024
AISECT — Chhattisgarh, India
Junior Data Analyst & Python Instructor
- Delivered 200+ hours of Python training (Pandas, NumPy, SQL, data visualization) to 50+ students
- Built Python scripts and Jupyter Notebooks for data cleaning, EDA, and predictive modelling across 5+ ML pilot projects
- Designed analytics curriculum covering ETL pipelines, feature engineering, statistical analysis, and SQL
- Built Excel and Google Sheets dashboards to track student performance, generating insights that improved pass rates
- Managed Git-based curriculum workflows for 50+ students, improving version control and reproducibility
02
July 2022 – December 2022
AISECT — Chhattisgarh, India
Python Intern
- Assisted in Python lab sessions: debugging scripts and mentoring students
- Built automated grading scripts and reporting pipelines — saved instructors 10+ hours/week
- Contributed Python examples, data analysis case studies, and visualization workflows
September 2024 – Present
In ProgressMSc Data Science, AI & Digital Business
GISMA University of Applied Sciences
Potsdam, Germany
July 2020 – June 2023
GraduatedBachelor of Computer Applications
Kalinga University
Raipur, India
