Skip to content

Bay Area Spark Meetup cover photo

Bay Area Spark Meetup

San Francisco, CA, US

11,487 members · Public group

Organized by Matei Zaharia and 7 others

Share:

Join this group

Join this group

What we’re about

This is a meetup for Bay Area users of Apache Spark (http://spark.apache.org), a unified analytics engine for large-scale data processing. We rotate hosting meetups among locations in San Francisco, Peninsula, and South Bay.

We discuss other Spark-related ecosystem projects, including Spark SQL, MLlib, GraphX, and Structured Streaming. Additionally, we include introductions to the various Spark features, tutorials, case studies from users, community contributors, best practices for deployment and tuning, and updates on future development and releases.

Upcoming events (1)

Wed, May 14, 2025, 4:30 PM UTCTransformWithState for Structured Streaming in Upcoming Apache Spark™ 4.0
Link visible for attendees
Please join us for this important webinar to discuss the Structured Streaming features in the upcoming Apache Spark™ 4.0.0 release.

This session will cover the new transformWithState operator API, part of the upcoming Apache Spark™ 4.0 release. This operator supports numerous features to help unlock the next generation of mission-critical operational workloads running on Apache Spark.

We’ll cover the merits of this new operator API to:

Define custom processing logic using an object-oriented paradigm

Perform flexible state modeling

Run event-driven programs with fine-grained triggers

Leverage other features: TTL, operator-chaining, state schema evolution, etc.

Please join us and learn more about the Structured Streaming feature in the upcoming Apache Spark™ 4.0.0 🤝

📅 Date: May 14, 2025
⏰ Time: 9:30 AM - 10:30 AM PST
📍 Location: online

RSVP HERE 👉 https://lu.ma/euu8qytl

Agenda:

Talk 1: TransformWithState in Structured Streaming for upcoming
Apache Spark™ 4.0
Abstract: In this session, we will cover the brand new transformWithState operator API, which will be launched in the upcoming Apache Spark™ 4.0 release. This operator supports numerous features to help unlock the next generation of mission-critical operational workloads running on Apache Spark.

Users can define custom processing logic using an object-oriented paradigm, perform flexible state modeling, and run event-driven programs with the help of fine-grained timers. They can also leverage other features, such as automatic eviction using TTL, chaining multiple operators, and flexible state schema evolution to improve the cost and efficiency of their streaming pipelines.

Through native integrations with state data source readers and Spark Connect, users will also benefit from better usability and observability while running on top of the familiar DataFrame API.

Join us to see what you can build with this new, exciting API in Apache Spark™ Structured Streaming!

Bio: Anish Shrigondekar is a Staff Software Engineer at Databricks working on the core streaming team. Prior to joining Databricks, Anish has worked in other groups at Splunk Inc, and VMware, Inc. His areas of interest include operating systems, databases, storage, filesystems etc.
23 attendees+18

Past events (119)

Tue, Apr 22, 2025, 4:30 PM UTCBay Area Apache Spark ™ Meetup: Upcoming Apache Spark 4.0.0 Release
This event has passed
42 attendees+37

Related topics