Back to Projects
GCP Serverless Bioinformatics Pipeline
FeaturedServerless Bioinformatics Pipeline · Cloud Architecture · 2024
Overview
A production-ready serverless bioinformatics pipeline on Google Cloud Platform that automatically processes FASTQ sequencing files, performs quality control analysis, and provides real-time visualizations. The system uses event-driven architecture to trigger automated QC processing when files are uploaded, stores metrics in BigQuery for analytics, and serves an interactive React dashboard for researchers to monitor data quality.

Key Features
- •Event-driven serverless architecture with Cloud Functions that automatically trigger QC analysis when FASTQ files are uploaded to Cloud Storage, eliminating manual processing steps.
- •Interactive React dashboard with real-time metrics visualization including quality trends, GC content distribution, and comprehensive file tables with sortable columns and search functionality.
- •Scalable data pipeline using BigQuery for storing and querying QC metrics, enabling efficient analysis of large-scale sequencing datasets with SQL-like queries.
- •Infrastructure as Code with Terraform for reproducible deployments, including GCS buckets, BigQuery datasets, Cloud Functions, and Cloud Run services with proper IAM configurations.
- •FastAPI backend providing RESTful endpoints for metrics retrieval and signed URL generation for secure file uploads, with comprehensive error handling and CORS support.
Technologies
Frontend
ReactTypeScriptViteTailwindCSSRechartsReact Query
Backend
PythonFastAPIBiopython
Cloud Services
Google Cloud FunctionsCloud StorageBigQueryCloud Run
Infrastructure
TerraformDockerGitHub Actions