Data Engineer · Denver, CO

Sai Praneeth
Vella

ETL & ELT Pipelines | Python & SQL | Spark · Kafka · dbt | AWS · GCP · Azure

Data Engineer with 2+ years of software engineering experience building scalable data pipelines, cloud-based data infrastructure, and real-time streaming systems. M.S. in Information & Communications Technology — University of Denver (Aug 2025).

▸ View Projects → Get in Touch

Years Experience

7–10

OTT Platforms Integrated

Cloud Platforms

50K+

Test Records Processed

Expertise

Technical Skills

Programming & Query

PythonPySparkPandas NumPySQLJavaScript (ES6+)

Data Engineering

Apache SparkApache Kafkadbt ETL / ELTBatch ProcessingReal-Time Streaming

Cloud Platforms

AWS S3AWS GlueAWS EMR GCP BigQueryGCP DataflowAzure Data FactoryAzure Synapse

Databases & Warehouses

SnowflakeAmazon RedshiftPostgreSQL MySQLMongoDBBigQuery

Orchestration & DevOps

Apache AirflowDockerGit CI/CDGitHubAgile / Scrum

Front-End & APIs

React.jsNext.jsHTML5 CSS3REST APIsFirebasePubNub

Work History

Experience

YuppTV India Pvt. Ltd.

Software Engineer — Videograph.ai

Oct 2021 – Aug 2023

Videograph.ai — AI-powered video intelligence SaaS platform processing large-scale media metadata and streaming analytics for major Indian OTT broadcasters. · Hyderabad, India

Architected and deployed real-time data ingestion pipelines using Google Firebase and PubNub event streaming — supporting data-driven decisions for 1K–10K active viewers.
Drove successful data pipeline integrations with 7–10 major Indian OTT platforms, expanding Videograph.ai's client footprint and contributing to enterprise revenue growth.
Built end-to-end analytics dashboards in React.js / Next.js surfacing critical KPIs — buffering rates, viewer engagement, drop-off metrics — enabling business teams to identify monetization opportunities and reduce churn.
Optimized API response handling and implemented front-end caching strategies, improving platform responsiveness and directly enhancing user retention for client-facing analytics tools.
Standardized data integration documentation and API contracts across all client integrations, reducing partner onboarding time and cutting engineering support overhead.
Mentored 3–5 interns and junior engineers on data integration patterns and pipeline best practices, increasing sprint delivery capacity without additional headcount cost.

Portfolio

Projects

PROJECT_01

Real-Time E-Commerce Data Pipeline & Analytics Platform

End-to-end streaming pipeline built locally with Docker — ingests simulated e-commerce events through Kafka, transforms with PySpark, and serves analytics via Snowflake + dbt.

PythonApache KafkaPySpark SnowflakedbtAWS S3 AWS EMRAirflow

Kafka ingestion pipeline consuming simulated e-commerce events — validated end-to-end in local Docker environment
PySpark jobs benchmarked against Pandas/SQL — measurably faster on 50K–100K record datasets
Snowflake star schema with dbt — sub-second query performance on loaded test datasets
Airflow DAGs with retry logic and alerting — full pipeline runs successfully end-to-end locally
dbt schema + singular tests catching nulls, duplicate keys, and referential integrity issues upstream

PROJECT_02

Multi-Cloud Patient Health Data Warehouse & ETL Framework

ETL framework built with synthetic patient data (Faker) — consolidates multi-source records into a Redshift data warehouse with a fully documented multi-cloud orchestration architecture.

PythonPostgreSQLMongoDB Amazon RedshiftGCP Dataflow Azure Data Factorydbt

Python ETL consolidating 4 source schemas (PostgreSQL, MongoDB, REST APIs) into a single canonical model
Redshift star schema — validated query performance across fact & dimension tables on synthetic datasets
Multi-cloud blueprint: GCP Dataflow + Azure Data Factory — fully documented with pipeline diagrams
Field-level PII masking (name, DOB, SSN) using AWS KMS-aligned encryption patterns
GCP Pub/Sub alerting module — tested with simulated failure injection scenarios

Academic Background

Education

Master of Science — Information & Communications Technology

University of Denver (DU) · Denver, CO

August 2025 · Concentration: Software Design & Programming

Coursework: Database Systems, Cloud Computing, Big Data Analytics, Algorithms & Data Structures, Software Architecture, Data Modeling

Bachelor of Science — Electronics & Communication Technology

Vidya Jyothi Institute of Technology (VJIT) · Hyderabad, India

Aug 2017 – Aug 2021

Capstone: Smart Healthcare System — IoT-based patient vitals monitoring with real-time web dashboard. 🏆 2nd Place, Project Expo.

Get In Touch

Contact

Let's build something great.

Open to mid-level Data Engineer roles. I bring hands-on project experience with Spark, Kafka, dbt, Snowflake, Redshift, and multi-cloud pipeline architecture across AWS, GCP, and Azure — backed by Google, AWS, and dbt certifications.