Skip to content

0xaDanteees/ai-scribe-notes

Repository files navigation

AI Scribe

AI Scribe is a tool for generating AI-powered content.

Table of Contents

Installation

npm install

Quickstart

# 1) Install deps (root)
npm install

# 2) Ensure env vars
cp .env.example .env    # edit values as needed

# 3) Start DB + migrate + seed (Docker)
npm run dev:with-db     # boots DB, migrates, seeds, and runs FE/BE

# Alternative: manual steps
# npm run env:all && npm run db:up && npm run db:create && npm run db:migrate && npm run db:seed
# npm run dev

Environment Variables

Create .env at repo root. Example (redact secrets):

# App
PORT=8080

# Postgres (docker compose expects these)
POSTGRES_USER=user_db
POSTGRES_PASSWORD=password_db
POSTGRES_DB=challenge_db
POSTGRES_HOST=localhost
POSTGRES_PORT=5432

# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_TRANSCRIBE_MODEL=whisper-1
OPENAI_SUMMARY_MODEL=gpt-3.5-turbo-0125

# AWS S3
AWS_REGION=us-east-x
AWS_S3_BUCKET=challenge-bucket
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

# Frontend (Vite)
VITE_API_BASE_URL=http://<be-host>:<be-port>/api/v1

API Reference

  • POST /api/v1/notes/text

    • Body JSON: { patientId: uuid, title: string, text: string }
    • 201 { noteId }
  • POST /api/v1/uploads/audio/create-note

    • multipart/form-data: patientId, title, audio(file)
    • 201 { noteId, transcript, summary, bucket, key }
  • GET /api/v1/notes

    • 200 [ { id, created_at, input_type, status, title, first_name, last_name, preview } ]
  • GET /api/v1/notes/:id

    • 200 { note, patient, raw: [...], processed: [...] }

Diagrams

Database ER

Database ER

Backend Flow

Backend Flow

Frontend-Backend Flow

Frontend-Backend Flow

Project Notes

Key notes

  • The backend is a Node.js application that uses the OpenAI API to generate AI-powered content.
  • The frontend is a React application that uses the backend to generate AI-powered content.
  • Both backend and frontened are dockerized for local running and deployed to vercel.
  • We use Postgres as a database.
  • We store audio files in a S3 bucket.

Decision making process (backend)

  • I started addressing the architecture, I usually start with the database, then the backend, then the frontend. I also try to keep the architecture as simple as possible (depending on requirements, deadlines, etc). Since the challenge require a single git repository, I decided to use a monorepo.
  • Once I had decided to dockerize, I had to make a decision regarding audio files storage. I decided to use a S3 bucket in order to harvest the bonus points!
  • Next, I had to make db decisions:
    • Do I want to use indexes, or is it an overkill?
    • How do I structure the tables and relationships?
    • Which types of notes do I want to handle? ('pending', 'processed', 'failed')
  • Once we had define architecture/infra and db, we using nodeJS, so we need to handle the db creation and migrations ourselves. I decided to use a simple SQL file for that.

Database commands (Docker-first)

We run migrations and seeds using the Postgres container's built-in psql. This avoids installing psql on the host and guarantees the commands run against the containerized DB with the right credentials.

  • Why not the Node npm scripts? Those rely on host psql and host environment resolution, which often fails (missing psql, wrong PATH, unexported env). Docker-based commands are portable and consistent.

Commands (run from repo root):

# Using npm scripts
npm run db:up
npm run db:create
npm run db:migrate
npm run db:seed

# Reset DB
npm run db:reset

# using docker compose


# Quick check: list patients (docker compose "exec")
docker compose exec -T postgres-ai-scribe \
  sh -lc 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "SELECT id, first_name, last_name, dob FROM patient ORDER BY created_at DESC LIMIT 5;"'

DB

Here is the DB schema after the review of it, I decided we needed to track:

  • Patient creation and updates
  • Note creation and updates
  • Raw note creation
  • Processed note creation User activity in general, the schema is self explanatory.
erDiagram
  patient ||--o{ note : "has many"
  note ||--o{ raw_note : "has many"
  note ||--o{ processed_note : "has many"
  patient ||--o{ user_activity : "generates"
  note ||--o{ user_activity : "context for"

  patient {
    uuid id PK
    text first_name
    text last_name
    date dob
    text medical_record_number UK
    text email
    text phone
    timestamptz created_at
    timestamptz updated_at
  }

  note {
    uuid id PK
    uuid patient_id FK
    note_input_type input_type
    note_status status
    text title
    text audio_s3_bucket
    text audio_s3_key
    integer audio_duration_seconds
    text source_language
    text ai_model_transcribe
    text ai_model_summary
    timestamptz created_at
    timestamptz updated_at
  }

  raw_note {
    uuid id PK
    uuid note_id FK
    raw_kind kind
    text text_content
    integer token_count
    timestamptz created_at
  }

  processed_note {
    uuid id PK
    uuid note_id FK
    processed_format format
    text text_content
    jsonb json_content
    integer token_count
    timestamptz created_at
  }

  user_activity {
    uuid id PK
    text action
    uuid note_id FK
    uuid patient_id FK
    jsonb details
    timestamptz created_at
  }
Loading

Services

  • ai.ts: AI services, transcribe and summarize
  • s3.ts: S3 services, create presigned URLs
  • db.ts: DB services, query

ok for this section I decided to use openai and S3 services, also we have a db.ts file for db queries

For aws s3 I decide also to manipulate the files using multer instead of using the aws sdk and instead of file system. This because multer is a middleware that handles file uploads and it is more flexible and can be used with any storage system + im a bit familiar with it.

Backend

for the backend we have the following structure:

backend/
├── config.ts
├── db.ts
├── routes/
│   ├── notes.ts
│   └── uploads.ts
│   └── patients.ts
├── services/
│   ├── ai.ts
│   └── s3.ts
├── types.ts
└── index.ts

i decided to go simple with the structure since is pretty straight forward and i don't want to overcomplicate it + I think this folder structure is self explanatory and is able to scale.

backend testing

After finishing the backend v1, I decided to test it with curl and jq. After completing the MVP of the challenge we can implement jest for testing. In fact It would of been a good idea to do test driven development from the start. But feature finished is better than perfect, isn't it?

  dan@Andricks-MacBook-Air ai-scribe % curl -s http://localhost:8080/health | jq .
{
  "status": "ok"
}
dan@Andricks-MacBook-Air ai-scribe % curl -s http://localhost:8080/api/v1/patients | jq .
[
  {
    "id": "482ba8a3-396f-4cbe-ba14-beb2ff745701",
    "first_name": "Dantes",
    "last_name": "dev",
    "dob": "2001-11-17T06:00:00.000Z",
    "medical_record_number": "MRN-123456"
  },
  {
    "id": "7d73e31d-ca28-481b-89b6-4872f39e4ff0",
    "first_name": "John",
    "last_name": "Doe",
    "dob": "1990-11-21T06:00:00.000Z",
    "medical_record_number": "MRN-123457"
  },
  ...
]
dan@Andricks-MacBook-Air ai-scribe % curl -s http://localhost:8080/api/v1/notes | jq .   
[]
dan@Andricks-MacBook-Air ai-scribe % curl -I https://scribe-ai-challenge.s3.amazonaws.com
HTTP/1.1 403 Forbidden
x-amz-bucket-region: us-east-2
...
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Wed, 29 Oct 2025 23:03:33 GMT
Server: AmazonS3
...
dan@Andricks-MacBook-Air ai-scribe % set -a; source .env; set +a 
dan@Andricks-MacBook-Air ai-scribe % curl -sS -X POST https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, is my OpenAI setup working?"}
    ]
  }' | jq .
{
  "id": "chatcmpl-CW9Yo4MDUHqrTJcxvG3Amm6JmgoGU",
  "object": "chat.completion",
  "created": 1761779282,
  "model": "gpt-3.5-turbo-0125",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm here to help you. I'm an AI assistant, not directly affiliated with OpenAI. However, if you need assistance with 
        ...
}
dan@Andricks-MacBook-Air ai-scribe % 

As expected:

  • the health is ok
  • the seed patients are there
  • the notes are empty for the seed patients
  • the S3 bucket is up but not accessible (it's private)
  • the OpenAI API is working

Frontend

For the fronted we'll do a SPA using React and Vite. I feel more comfortable with NextJs but I think this is a simple enough app to use Vite.

Decision making process (frontend)

First of all we decide the framework (as mentioned above) we use Vite + React. Secondly, we need to think about the UI, for this we need to consider:

  • Who's the target user?
  • What's the target device?
  • How the user might be feeling?

After considering this, I came to the conclution that the color palette should be blue/green since it inspires calm and peace. And we probably will be using this (if this were to be a prod-ready product) mobile. So we will take a mobile-first approach.

For fetching data we have options, we will use tanstack query; and particularily for components, I'll usually go with shadcn/ui since it's a solid library with a lot of components and it's easy to use. But I feel like doing some components since it will help use customize the UI to our needs and considering its a simple SPA we don't need a full-blown UI library nor create too many components.

For validation we will use zod, same as backend.

I dont think we need redux or zustand here, since is a SPA and we don't have complex state management needs...

Frontend Approach

as said before we implemented a SPA, so its all component based; we implement url state using react-router-dom; a bit of a overkill for this simple app but it's a good practice to use url states so we can share the url with others and they can access the same state.

useMediaRecorder

this was a bit tricky and it was not exactly asked for in the challenge but I thought it would be a cool feature to implement it.

the approach was the following:

  • use the media recorder api to record audio
  • use the media stream to create a blob
  • use the blob to create a file
  • use the file to create a presigned url
  • use the presigned url to upload the file to s3

S3 and OpenAi integration

  • create a presigned url to upload a file to s3
  • upload the file to s3
  • create a presigned url to download the file from s3
  • use the presigned url to download the file from s3
  • use the file to transcribe
  • use the file to summarize
  • use the file to create a note

Rate Limiting

By this point we already had an MVP, but I thought about adding rate limiting to the API since its a good practice, more considering we have third party services (S3 and OpenAi) that we are using and we want to avoid attacking them.

token counting

as we reach the finale of this challenge I noticed I was not populating the token count for the raw note and processed note, nor counting the tokens for the summary.

for this we using tiktoken from openai

The tradeoffs I did take

Ok so we finished the challenge, but of course there's always room for improvement. I did take some tradeoffs to make the implementation easier and faster, but I'll try to compensate that by explaining them:

  • No testing: It was not part of the scope and I indeed was negligent, I did prove the endpoints before doing the frontend connection though.
  • Max-error handling: In both frontend and backend I did not implement max-error handling, what do I mean by this?

well, in the backend there are many generic status codes that we use in a day to day basis; but i did not take into consideration media error (415), did not make a centralized error handler. In the frontend even though I implemented zod, the form still prone to failure on edge cases, I will get away with it because there is no QA team and the time given and becasue its not prod env.

  • Pagination: self explanatory, not enough data to worry about it in this challenge

  • no redis or caching: self explanatory, not enough data to worry about it in this challenge x2

  • No toaster customization

Everyday as a dev you will have to make tradeoffs, this is my final implementation for this challenge.

References


About

Technical Interview Project: AI Scribe Notes Management Tool (Express, S3, Vite, OpenAI, Multer)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published