Voice Cursor - AI Tinkerers & Google Cloud: Agents Hackathon Toronto
AI Tinkerers - Toronto
Hackathon Showcase

Voice Cursor

Team consisting of GenAI consultants/data scientists from AI Talentflow/CGI, Tangerine and CloudCosmos—LLMs, RAG/LangChain, PyTorch/TensorFlow, AWS; Punjabi Univ. & Lambton grads; Kaggle top 0.2%.

3 members Watch Demo
  1. Voice → Text User speaks; audio is streamed to Google Speech-to-Text (ASR) which returns a transcript. (Note: this is Speech-to-Text, not Text-to-Speech.)
  1. PII scrubbing The transcript is sent to Google Deidentify API (DLP), which masks/removes PII and returns de-identified text (optionally with redaction metadata).
  1. Safety classification De-identified text goes to Meta Llama Guard 4-12B, which evaluates content against 13 offense categories and returns: allowed/blocked, category, severity, and rationale.
  1. Policy gate

If blocked or concerning, the system tags it with the category, logs it to the Safety Ledger, and applies mitigations/refusals.

If allowed, the request proceeds to orchestration.

5.Orchestration & routing An Orchestrator/Router examines intent and safety labels, then fans out work to the Voice IDE agents as needed.

  1. Voice IDE agents run

coder-agent: drafts/refactors code, scaffolds tests.

reasoning-agent: plans steps, decomposes tasks.

security-agent: scans for secrets/vulns, enforces guardrails.

speech-agent: optimizes voice UX (barge-in, brevity).

validator-agent: lint/compile/sanity checks; verifies spec adherence. Agents may call Gemini 2.0 tools during their work.

  1. Context capture (temporary) Agent outputs and conversation turns are appended to Markdown files (temporary context store) to ground follow-ups. (This layer is designed to be swapped to CosmosDB later.)
  1. Generation The Orchestrator builds a prompt using the current user query + retrieved context snippets and calls Gemini 2.0 Flash to generate the final draft (code/answer).

9.Human validation gate Any action that changes or creates files (e.g., committing code, modifying repos) is blocked until a human approver reviews the Gemini/agent proposal and clicks Approve. Only then does the system execute the change/output.

10.Response delivery Approved output is returned to the user (text/markdown/code). If voice reply is enabled, it’s optionally sent to TTS for playback.

11.Observability (end-to-end) Throughout every stage, the platform records metrics, logs, and traces: request counts, latency, token usage, and estimated cost, plus safety category stats—feeding dashboards and alerts.

AI Tinkerers Google Google Cloud

GIT REPO

Summarizing URL...