About us
About TorontoAI
TorontoAI is a vibrant, inclusive community of engineers, builders, founders, and curious minds passionate about making AI infrastructure more accessible, human-centered, and scalable.
We host bi-weekly in-person socials, tech meetups, and hands-on webinars to connect people across disciplines — from DevOps to Data Science, from students to senior architects. Whether you're deploying LLMs in production or just exploring what Databricks does, you're welcome here.
🤝 We’re Building More Than a Meetup
In a world dominated by virtual everything, we believe in real, human-to-human connection.
TorontoAI is a space to:
- Share ideas over coffee
- Spark collaborations face-to-face
- Meet people who understand your stack and your journey
- Build your network beyond LinkedIn likes
💬 What We Talk About:
- Scalable AI & LLM infrastructure (Kubernetes, GPUs, vLLM, Ollama, LangChain)
- Databricks, Snowflake, Fivetran, dbt — building the modern data stack
- MLOps, LLMOps, DevOps — the operational glue of AI systems
- Real-world engineering stories, founder spotlights, and tool breakdowns
🌈 Who We Welcome:
- DevOps, SREs & Platform Engineers moving into data/AI
- Data Engineers, Analysts & ML practitioners
- Founders, freelancers, and technologists in transition
- Students and early-career professionals seeking real-world exposure
We’re committed to creating a welcoming, diverse, and equity-focused space where all voices matter — no gatekeeping, no rockstars, just good humans building cool stuff.
📍 Based in Toronto, open to the world
📅 Join an event — and be part of something human, helpful, and hands-on.
Upcoming events
3

Becoming an AI Cloud Engineer — Free Live Career Session (2026)
·OnlineOnlineWhat "AI Cloud Engineer" actually means as a job in 2026. AWS, Kubernetes, Terraform, CI/CD — plus LLMOps, RAG, vector DBs, GPU infra, AgentOps. Live demo (git push → EKS cluster → running container), Q&A, recording for everyone who registers
What I'll cover in 1 hour:
- Why "DevOps Engineer" job postings are being quietly renamed — and to what
- The foundation that didn't change: AWS, Kubernetes, Terraform, CI/CD, SRE
- The new layer: LLMOps, RAG, vector DBs, GPU infra, AgentOps
- Live demo: git push → EKS cluster → running container
- Open Q&A
Speaker:
Chandan Kumar — Founder of beCloudReady. Toronto-based DevOps engineer and educator. Trained 500+ engineers now working at startups and enterprises across Canada and the US.
Register: https://becloudready.com/webinar/ai-cloud-engineer
34 attendees
Build Your First AWS Data Lake in 60 Minutes — Live, with Real Code
·OnlineOnlineA free, hands-on community session for data engineers and folks breaking into data.
If you've read the AWS docs but never actually built the pipeline end-to-end — or you've shipped pieces of it but never seen how they all fit together — this session is for you. We'll go from an empty AWS account to a working data lake in 90 minutes. Live build, real code, real data.### What we'll build together
The pipeline that's underneath every "data lake on AWS" project — once you see it once, you see it everywhere:
```
S3 raw CSV → Glue Crawler → Glue Catalog → Glue ETL Job
→ S3 curated Parquet → Glue Crawler → Athena
```Concretely, you'll watch:
- A raw CSV (Kaggle Crude Oil historical data, ~6,400 rows) land in S3
- A Glue Crawler infer the schema and register a table in the Glue Catalog
- An Athena query against that CSV — count rows, filter, aggregate, with no servers to manage
- A PySpark Glue ETL job transform the CSV into partitioned Parquet (columnar, compressed, ~10× cheaper to scan)
- A second crawler register the Parquet table
- The same query, run again — and a side-by-side comparison of "data scanned" between CSV and Parquet
That last comparison is the punchline. Parquet vs CSV is the difference between a $5 query and a $0.50 query at scale. Seeing it in the Athena UI lands differently than reading about it.
### What you'll leave with
- The full lab open-sourced — Terraform for the IAM + sandbox + Glue + Athena setup, the PySpark ETL script, and a step-by-step walkthrough you can re-run on your own AWS account.
- A working mental model of the shape of every data lake project: land raw → catalog → transform → catalog again → query. The dataset is just the variable.
- Practical IAM patterns most tutorials skip — region-locking, prefix-scoped S3 access, Glue role policies that don't accidentally grant the world. The kind of thing that actually shows up in production reviews.
- A take-home assignment: run the same pipeline against a Kaggle dataset of your choice. Bring it to the next session for feedback.
### Who this is for
- Data engineers who've shipped pieces of a data lake but want to see the whole pipeline end-to-end
- Career changers moving into data engineering — this is a portfolio-grade project you can talk about in interviews
- Bootcamp grads and self-taught engineers who can write SQL and Python but haven't seen how Glue, Athena, and S3 actually fit together
- Backend engineers picking up data work and wanting a fast on-ramp to the AWS data stack
- Anyone whose manager said "we should look at building a data lake" and now it's on your plate
If you've never opened the AWS console, you'll still follow along — we explain every click. If you've been doing this for years, you'll probably still pick up the IAM scoping pattern.
### Format
- Live on Microsoft Teams — questions in chat, full screen-share, no slides
- 90 minutes — same length as the lab itself
- Recording shared with everyone who registers
- Open Q&A throughout, not just at the end
### About your host
Chandan Kumar — founder of beCloudReady and organizer of TorontoAI, a 10K+ member community of AI and data builders. Twenty-plus years across software, cloud, and data engineering. Has trained and placed 500+ engineers across Canada and the US. Maintainer of open-source labs and the db-agent project (presented at AAAI-25).
34 attendees
Genie, Agent Bricks, or Build Your Own on Databricks Lakebase
·OnlineOnlineData Engineering leaders deciding whether to adopt Genie, build a custom text-to-SQL stack, or wire something in between.
90 minutes. Live build, not slides. Real workspace, real data, a real LLM call across an HTTP boundary you control.What you will see
I'll go from an empty Databricks workspace to a working text-to-SQL agent that:
- Joins live OLTP rows in Lakebase (managed Postgres) with pre-aggregated gold tables in Unity Catalog Delta — through Lakehouse Federation, in a single query.
- Generates SQL via a pluggable LLM endpoint — Databricks Model Serving, OpenAI-compatible APIs, or a self-hosted vLLM on a neo-cloud GPU — switched with one environment variable.
- Validates every SQL string before execution with a SELECT-only safety guardrail that catches the Databricks-specific destructive ops generic validators miss (`OPTIMIZE`, `VACUUM`, `ZORDER`, `COPY`).
- Is auditable end-to-end: one question = one LLM call, one SQL statement, one execution. No autonomous loops, no surprise bills.
What you will leave with
- A decision framework for db-agent vs Genie vs Agent Bricks for your specific use case — including when not to build.
- The companion open-source db-agent repo (presented at AAAI-25, ships a Databricks Apps deployment variant) and a quick-lab repo with a step-by-step build.
- A reference architecture diagram and the actual code — pipeline orchestrator is ~60 lines of Python, safety validator is ~30.
- Specific gotchas that cost me a half-day each: federation database options, Lakebase token rotation, Streamlit/Apps reverse-proxy traps, context-window blowouts on real catalogs.
Who is this for
- Heads of Data, Data Engineering Managers, Staff and Principal Data Engineers.
- Teams already on Databricks (or evaluating) who are being asked: "Can we put an AI agent on top of this?"
- Anyone making a build-vs-buy call between Genie, Agent Bricks, and a custom text-to-SQL stack — and wants to make it with their eyes open.
- Demo of the Reference Architecture explained here - https://becloudready.com/blog/text-to-sql-databricks-lakebase-db-agent
This is a technical session. We'll read code. Bring your senior engineers.
Agenda
- The architecture in one slide (5 min)
- Lakebase + Unity Catalog + Lakehouse Federation — why both data planes, and what breaks (15 min)
- The agent pipeline — schema → prompt → LLM → validate → execute (20 min)
- The SQL safety guardrail — what generic SELECT-only validators miss on Databricks (10 min)
- The pluggable LLM layer — live swap from a hosted API to a self-hosted vLLM on a neo-cloud GPU (15 min)
- db-agent vs Genie vs Agent Bricks — when to use which, and why (10 min)
- Q&A (15 min)
About the Speaker
Chandan Kumar — founder of BeCloudReady, organizer of the TorontoAI community (10K+ members), and a Databricks Partner. Maintainer of the open-source db-agent text-to-SQL agent, presented at AAAI-25. Runs the Databricks Lakehouse Bootcamp and works with engineering teams on getting AI agents into production against real data24 attendees
Past events
277


