Data Engineer · Denver, CO

Sai Praneeth
Vella

ETL & ELT Pipelines | Python & SQL | Spark · Kafka · dbt | AWS · GCP · Azure

Data Engineer with 2+ years of software engineering experience building scalable data pipelines, cloud-based data infrastructure, and real-time streaming systems. M.S. in Information & Communications Technology — University of Denver (Aug 2025).

2+
Years Experience
7–10
OTT Platforms Integrated
3
Cloud Platforms
50K+
Test Records Processed

Technical Skills

Programming & Query
PythonPySparkPandas NumPySQLJavaScript (ES6+)
Data Engineering
Apache SparkApache Kafkadbt ETL / ELTBatch ProcessingReal-Time Streaming
Cloud Platforms
AWS S3AWS GlueAWS EMR GCP BigQueryGCP DataflowAzure Data FactoryAzure Synapse
Databases & Warehouses
SnowflakeAmazon RedshiftPostgreSQL MySQLMongoDBBigQuery
Orchestration & DevOps
Apache AirflowDockerGit CI/CDGitHubAgile / Scrum
Front-End & APIs
React.jsNext.jsHTML5 CSS3REST APIsFirebasePubNub

Experience

YuppTV India Pvt. Ltd.
Software Engineer — Videograph.ai
Oct 2021 – Aug 2023
Videograph.ai — AI-powered video intelligence SaaS platform processing large-scale media metadata and streaming analytics for major Indian OTT broadcasters. · Hyderabad, India
  • Architected and deployed real-time data ingestion pipelines using Google Firebase and PubNub event streaming — supporting data-driven decisions for 1K–10K active viewers.
  • Drove successful data pipeline integrations with 7–10 major Indian OTT platforms, expanding Videograph.ai's client footprint and contributing to enterprise revenue growth.
  • Built end-to-end analytics dashboards in React.js / Next.js surfacing critical KPIs — buffering rates, viewer engagement, drop-off metrics — enabling business teams to identify monetization opportunities and reduce churn.
  • Optimized API response handling and implemented front-end caching strategies, improving platform responsiveness and directly enhancing user retention for client-facing analytics tools.
  • Standardized data integration documentation and API contracts across all client integrations, reducing partner onboarding time and cutting engineering support overhead.
  • Mentored 3–5 interns and junior engineers on data integration patterns and pipeline best practices, increasing sprint delivery capacity without additional headcount cost.

Projects

PROJECT_01
Real-Time E-Commerce Data Pipeline & Analytics Platform
End-to-end streaming pipeline built locally with Docker — ingests simulated e-commerce events through Kafka, transforms with PySpark, and serves analytics via Snowflake + dbt.
PythonApache KafkaPySpark SnowflakedbtAWS S3 AWS EMRAirflow
  • Kafka ingestion pipeline consuming simulated e-commerce events — validated end-to-end in local Docker environment
  • PySpark jobs benchmarked against Pandas/SQL — measurably faster on 50K–100K record datasets
  • Snowflake star schema with dbt — sub-second query performance on loaded test datasets
  • Airflow DAGs with retry logic and alerting — full pipeline runs successfully end-to-end locally
  • dbt schema + singular tests catching nulls, duplicate keys, and referential integrity issues upstream
PROJECT_02
Multi-Cloud Patient Health Data Warehouse & ETL Framework
ETL framework built with synthetic patient data (Faker) — consolidates multi-source records into a Redshift data warehouse with a fully documented multi-cloud orchestration architecture.
PythonPostgreSQLMongoDB Amazon RedshiftGCP Dataflow Azure Data Factorydbt
  • Python ETL consolidating 4 source schemas (PostgreSQL, MongoDB, REST APIs) into a single canonical model
  • Redshift star schema — validated query performance across fact & dimension tables on synthetic datasets
  • Multi-cloud blueprint: GCP Dataflow + Azure Data Factory — fully documented with pipeline diagrams
  • Field-level PII masking (name, DOB, SSN) using AWS KMS-aligned encryption patterns
  • GCP Pub/Sub alerting module — tested with simulated failure injection scenarios

Education

Master of Science — Information & Communications Technology
University of Denver (DU) · Denver, CO
August 2025 · Concentration: Software Design & Programming
Coursework: Database Systems, Cloud Computing, Big Data Analytics, Algorithms & Data Structures, Software Architecture, Data Modeling
Bachelor of Science — Electronics & Communication Technology
Vidya Jyothi Institute of Technology (VJIT) · Hyderabad, India
Aug 2017 – Aug 2021
Capstone: Smart Healthcare System — IoT-based patient vitals monitoring with real-time web dashboard. 🏆 2nd Place, Project Expo.

Contact

Let's build something great.

Open to mid-level Data Engineer roles. I bring hands-on project experience with Spark, Kafka, dbt, Snowflake, Redshift, and multi-cloud pipeline architecture across AWS, GCP, and Azure — backed by Google, AWS, and dbt certifications.