Loading Experience...

01
About MeThe Person Behind The Work

Data Scientist
& AI Engineer

I'm Simarjit — a 24-year-old Data Scientist and AI/ML Engineer currently pursuing my MSc in Data Science, AI & Digital Business at GISMA University in Potsdam, Germany.

I build end-to-end ML pipelines, GenAI applications, and data products — from raw data ingestion through to deployed, production-ready systems. My work sits at the intersection of machine learning, LLMs, and business intelligence.

Before Berlin, I spent 2+ years at AISECT as a Data Analyst and Python Instructor — trained 50+ students in Python and ML. Full story in Experience below.

0+

Students Trained

0

Projects Shipped

0+yrs

Industry Experience

0

Certifications

Tools & Stack

PythonPyTorchTensorFlowHuggingFacescikit-learnSciPyStatsmodelsOpenCVFastAPILangChainDockerAWS EC2SQLPower BIPlotlyPandasNumPySeaborn

Quick Facts

Based InBerlin, Germany
UniversityGISMA — Potsdam
DegreeMSc Data Science & AI
AvailableFull-Time Roles
02
Selected Works9 Projects
BookVault
012026

Dual-mode book discovery across a corpus of 10,000+ titles — natural language search via a custom RAG pipeline with semantic embeddings, plus OCR-based cover recognition from photo input.

PythonRAGDockerOCR
View on GitHub ↗
E-Commerce Customer Behaviour
022026

EDA and hypothesis testing on a 4-table e-commerce dataset (10K customers, 45K transactions, 70K sessions) — 10 statistical tests (t-test, chi-squared, Pearson r), Cohen's d effect sizes, and business insights on discount impact, acquisition channels, and revenue trends.

PythonPandasSciPyPlotlySQLSeaborn
View on GitHub ↗
Fake News Detection
032026

Three-model NLP comparison on WELFake — 72,134 news articles across 4 benchmark sources. TF-IDF feature extraction with Logistic Regression (96.22%), Random Forest (95.09%), and ANN (96.12%). Live prediction demo flags fabricated content at 99.97% confidence.

PythonTF-IDFscikit-learnTensorFlowNLP
View on GitHub ↗
LCF Household Expenditure
042026

Weighted least squares regression on UK Living Costs & Food Survey 2013 (~4,000 households) — models household expenditure against occupational class, tenure type, and household composition using survey weights and HC3 robust standard errors. Validated across 9 robustness appendices.

PythonWLSStatsmodelsSciPyPandas
View on GitHub ↗
Adani Ports Stock Prediction
052025

Time-series analysis of 5 years of daily ADANIPORTS equity data — built ARIMA and LSTM models, compared forecast accuracy (RMSE), and surfaced 3 actionable market trend signals.

PythonPandasTime-series
View on GitHub ↗
Cyber Threat Detection
062025

End-to-end NLP classification pipeline across 15+ threat categories — cut analyst triage time by 40% (~3h/day per analyst). Deployed as a production REST API with FastAPI.

PyTorchNLPFastAPI
View on GitHub ↗
EU Fossil Fuel Subsidies
072025

Interactive policy dashboard across 27 EU member states — visualising €110B+ in annual fossil fuel subsidies with country-level drill-down. Built for policy researchers and journalists.

PythonPlotlyPandas
View on GitHub ↗
Tomato Disease Detection
082025

Custom CNN trained on 54,000 images across 38 classes — achieved 98% test accuracy, then pruned and quantised for mobile deployment at <10MB model size.

TensorFlowOpenCVCNN
View on GitHub ↗
Face Recognition Attendance
092023

Real-time attendance system recognising 50+ registered faces simultaneously from a live camera feed — eliminated 100% of manual roll-call entry for a 200-student cohort.

OpenCVface_recognitionPython
View on GitHub ↗
03
Experience2 Roles

01

July 2023 – September 2024

AISECTChhattisgarh, India

Junior Data Analyst & Python Instructor

  • Delivered 200+ hours of Python training (Pandas, NumPy, SQL, data visualization) to 50+ students
  • Built Python scripts and Jupyter Notebooks for data cleaning, EDA, and predictive modelling across 5+ ML pilot projects
  • Designed analytics curriculum covering ETL pipelines, feature engineering, statistical analysis, and SQL
  • Built Excel and Google Sheets dashboards to track student performance, generating insights that improved pass rates
  • Managed Git-based curriculum workflows for 50+ students, improving version control and reproducibility

02

July 2022 – December 2022

AISECTChhattisgarh, India

Python Intern

  • Assisted in Python lab sessions: debugging scripts and mentoring students
  • Built automated grading scripts and reporting pipelines — saved instructors 10+ hours/week
  • Contributed Python examples, data analysis case studies, and visualization workflows
04
EducationAcademic Background

September 2024 – Present

In Progress

MSc Data Science, AI & Digital Business

GISMA University of Applied Sciences

Potsdam, Germany

July 2020 – June 2023

Graduated

Bachelor of Computer Applications

Kalinga University

Raipur, India

05
Skills & Certifications7 Categories

Technical Stack

Programming
PythonSQLRC++Google BigQueryDAX
AI & Machine Learning
PandasNumPyScikit-learnSciPyStatsmodelsTensorFlowKerasPyTorchHuggingFace TransformersOpenCVXGBoostNLPTF-IDFText ClassificationANNCNNRNN
LLMs & Generative AI
OpenAI APILangChainLlamaIndexRAG ArchitectureStreamlitFastAPI
Cloud & DevOps
AWS EC2Google CloudDockerLinuxGitCI/CDGitHub ActionsCloudflareNginx
Data Engineering
ETL pipeline designFeature engineeringEDAStatistical Hypothesis TestingTime-series analysisData validationHyperparameter tuningCross-validation
Business Intelligence
Power BITableauMatplotlibSeabornPlotlyExcelGoogle Sheets
Databases
MySQLPostgreSQLSQLiteMongoDBBigQuery
06
Get In TouchOpen To Opportunities
Simarjit Singh

Simarjit Singh

Data Scientist · Berlin

GitHubLinkedInXing
Available · Berlin
I Am… *