Back to Projects

GCP Serverless Bioinformatics Pipeline

Featured

Serverless Bioinformatics Pipeline · Cloud Architecture · 2024

Overview

A production-ready serverless bioinformatics pipeline on Google Cloud Platform that automatically processes FASTQ sequencing files, performs quality control analysis, and provides real-time visualizations. The system uses event-driven architecture to trigger automated QC processing when files are uploaded, stores metrics in BigQuery for analytics, and serves an interactive React dashboard for researchers to monitor data quality.

GCP Serverless Bioinformatics Pipeline Dashboard

Key Features

  • Event-driven serverless architecture with Cloud Functions that automatically trigger QC analysis when FASTQ files are uploaded to Cloud Storage, eliminating manual processing steps.
  • Interactive React dashboard with real-time metrics visualization including quality trends, GC content distribution, and comprehensive file tables with sortable columns and search functionality.
  • Scalable data pipeline using BigQuery for storing and querying QC metrics, enabling efficient analysis of large-scale sequencing datasets with SQL-like queries.
  • Infrastructure as Code with Terraform for reproducible deployments, including GCS buckets, BigQuery datasets, Cloud Functions, and Cloud Run services with proper IAM configurations.
  • FastAPI backend providing RESTful endpoints for metrics retrieval and signed URL generation for secure file uploads, with comprehensive error handling and CORS support.

Technologies

Frontend

ReactTypeScriptViteTailwindCSSRechartsReact Query

Backend

PythonFastAPIBiopython

Cloud Services

Google Cloud FunctionsCloud StorageBigQueryCloud Run

Infrastructure

TerraformDockerGitHub Actions

Links