
What we’re about
This is a meetup for Bay Area users of Apache Spark (http://spark.apache.org), a unified analytics engine for large-scale data processing. We rotate hosting meetups among locations in San Francisco, Peninsula, and South Bay.
We discuss other Spark-related ecosystem projects, including Spark SQL, MLlib, GraphX, and Structured Streaming. Additionally, we include introductions to the various Spark features, tutorials, case studies from users, community contributors, best practices for deployment and tuning, and updates on future development and releases.
Upcoming events (1)
See all- TransformWithState for Structured Streaming in Upcoming Apache Spark™ 4.0Link visible for attendees
Please join us for this important webinar to discuss the Structured Streaming features in the upcoming Apache Spark™ 4.0.0 release.
This session will cover the new transformWithState operator API, part of the upcoming Apache Spark™ 4.0 release. This operator supports numerous features to help unlock the next generation of mission-critical operational workloads running on Apache Spark.
We’ll cover the merits of this new operator API to:
- Define custom processing logic using an object-oriented paradigm
- Perform flexible state modeling
- Run event-driven programs with fine-grained triggers
- Leverage other features: TTL, operator-chaining, state schema evolution, etc.
Please join us and learn more about the Structured Streaming feature in the upcoming Apache Spark™ 4.0.0 🤝
📅 Date: May 14, 2025
⏰ Time: 9:30 AM - 10:30 AM PST
📍 Location: onlineRSVP HERE 👉 https://lu.ma/euu8qytl
Agenda:
Talk 1: TransformWithState in Structured Streaming for upcoming
Apache Spark™ 4.0
Abstract: In this session, we will cover the brand new transformWithState operator API, which will be launched in the upcoming Apache Spark™ 4.0 release. This operator supports numerous features to help unlock the next generation of mission-critical operational workloads running on Apache Spark.Users can define custom processing logic using an object-oriented paradigm, perform flexible state modeling, and run event-driven programs with the help of fine-grained timers. They can also leverage other features, such as automatic eviction using TTL, chaining multiple operators, and flexible state schema evolution to improve the cost and efficiency of their streaming pipelines.
Through native integrations with state data source readers and Spark Connect, users will also benefit from better usability and observability while running on top of the familiar DataFrame API.
Join us to see what you can build with this new, exciting API in Apache Spark™ Structured Streaming!
Bio: Anish Shrigondekar is a Staff Software Engineer at Databricks working on the core streaming team. Prior to joining Databricks, Anish has worked in other groups at Splunk Inc, and VMware, Inc. His areas of interest include operating systems, databases, storage, filesystems etc.