SF Big Analytics

San Francisco, CA

16,085 members · Public group

Organized by Chester Chen and 6 others

Join this group

What we’re about

SF Big Analytics Youtube Channel

The SF Big Analytics meetup focuses on all aspects of the big data analytics, from data ETL, feature generation, AI/machine learning theory, algorithm and implementation to technologies and infrastructures associated with big data analytics. Topics include AI/Machine Learning (Algorithm & ML Infrastructure), data processing and monitoring, data Infrastructure, data visualization, data science lifecycle etc. This meetup covers the full range of the big data analytics topics and data mining pipelines.

We try to provide high quality talks for each meetup, here are some of the policies related to talks we have been following in last few years

-- Technical focused

-- No marketing

-- No product promotion (unless it is open sourced project)

-- No high level business talks (unless it is from highly respected leaders)

Upcoming events (3)

See all

Wed, May 1, 2024, 5:30 PM PDTAI meetup (Github SF): AI, LLMs, and ML
GitHub, San Francisco, CA
** Important RSVP HERE (Due to building security, you must pre-register at the link for admission)

Description:
Welcome to the monthly in-person AI meetup in San Francisco. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers.

Tech Talk: Deploying LLMs and SLMs locally on consumer hardware
Speaker: Cooper Lindsey (Knap)
Abstract: As companies and consumers become more aware of the high token and data security costs of cloud based LLMs, many are turning to locally deployed LLMs (and Small Language Models or SLMs) for specific tasks. I will be talking through our efforts to vectorize data locally on M-Series Macs, which LLMs work best, and how to configure RAG in this context for local LLMs.

Tech Talk: How AI can save email
Speaker: Michael Vandi (Addy AI), Spatika Ganesh (Addy AI)
Abstract: In this talk, we will be talking about how hyper-personalized LLMs, finetuned for specific organizational workflow have the potential to automate the bulk of email tasks, resulting in high productivity and efficiency gains. We also showcase how Addy AI enhances user experience by offering multiple trained assistants tailored by the user, to assist with performing diverse tasks encountered throughout the day. Additionally, we will also be covering how by implementing RAG, we are able to provide user’s more control over their AI assistants’ responses.

Tech Talk: Customizable RAG workflows with your own Data
Speaker: Christy Bergman (Zilliz)
Abstract: You’ve heard good data matters in Machine Learning, but does it matter for Generative AI applications? Corporate data often differs significantly from the general internet data used to train most foundation models. Join me for a demo on building a customizable RAG (Retrieval Augmented Generation) stack using Milvus vector database, LangChain, Ollama with Llama 3, Ragas RAG eval, and optional Zilliz cloud, Anthropic, OpenAI.

Speakers/Topics:
Stay tuned as we are updating speakers and schedules.
If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors:
We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will not only speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 30,000+ AI developers in San Francisco or 350K+ in global.

Community on Slack/Discord

Event chat: chat and connect with speakers and attendees

Sharing blogs, events, job openings, projects collaborations

Join Slack/Discord (the link is at the bottom of the page)
10 attendees+5
Not open
Tue, May 7, 2024, 5:30 PM PDTReal-time Data Streaming and Processing (San Francisco)
28 2nd St, San Francisco, CA
** RSVP Here. (Due to room capacity, you must pre-register at the link for admission)

Description:
Welcome to the AI meetup in San Francisco, in collaboration with Lenses, Imply. Join us for deep dive tech talks on AI, GenAI, LLMs, ML and Real-time Data Streaming and Processing, with food/drink, networking with speakers and fellow developers.

Agenda:
- 5:30pm~6:00pm: Checkin, food and networking
- 6:10pm~8:00pm: Tech talks and Q&A
- 8:00pm~8:30pm: Open discussion, Mixer and Closing.

Tech Talk: Simplifying Kafka and Troubleshooting data
Speaker: Guillaume Ayme (Lenses.io)
Abstract: In this session, we'll present you with a real-life example of how we implemented a solution that allows developers to have complete control over their event-driven application (EDA), resulting in lightning-fast delivery of new streaming services and quick troubleshooting of any issues that arise. From fine-tuning data pipelines, troubleshooting data issues, offsetting topics, backing up to S3 and seeing your topics topology to harnessing the power of real-time data, developers and engineers will gain invaluable insights into how Lenses.io optimizes data workflows and enhances operational efficiency. This presentation will offer a rich tapestry of technical knowledge, hands-on demonstrations, and networking opportunities.

Tech Talk: Streaming data meets real-time analytics
Speaker: Darin Briskman (Imply)
Abstract: You’re generating events. Lots of events. You’re using data streaming to reliably move those events between applications. How do you use this streaming data to your best advantage, gaining insights and driving automated decision making?
Imply is the database solution for real-time analytics from the creators of Apache Druid®. With high-concurrency subsecond queries of data of any size, combining true stream ingestion with historical batch data, Imply enables fast queries with aggregations and roll-ups while maintaining access to granular data.
In this tech talk, we’ll take a look at the uses of real-time analytics and how it can be used with streaming data.

Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Venue:
355 Bryant St #108, San Francisco, CA 94107

Community on Slack/Discord

Event chat: chat and connect with speakers and attendees

Sharing blogs, events, job openings, projects collaborations

Join Slack/Discord (the link is at the bottom of the page)
6 attendees+1
Fri, May 31, 2024, 4:00 PM UTCEvaluation of LLMs and Scale GNN training
Link visible for attendees
We have invited two guests to cover two different topics. Please register at our event partner AICamp to obtain Zoom link

https://www.aicamp.ai/event/eventdetails/W2024053109

Agenda:
9:00 am -- 9:05 am PST members join online
9:05 am -- 9:45 am PST Talk 1
9:45 am -- 10:15 am PST Talk 2
10:30 am PST event closed

Talk 1: Training GNNs at Internet Scale using cuGraph and WholeGraph
We present our approach to manage 70TB graph datasets, and train GraphSage across 1024 GPUs. One key feature of our approach is the separation of the graph sampling and GNN training phases, giving the user flexibility to scale each independent of the other. WholeGraph provides a distributed feature store that leverages GPU memory and caching to provide high performance dataloading. Dataloading and sampling are the two largest bottlenecks in GNN training according to our profiling.

Speaker: Joe Eaton (NVIDIA)
Joe Eaton is a Distinguished System Engineer for Data and Graph Analytics at NVIDIA, and is currently leading the company strategy for Graph Neural Networks at Nvidia. Joe leads teams for code optimization, graph analysis, framework development and optimization,
as well as interacts with prospective customers in industry.
Key areas of interest are financial services, retail Recommenders, and molecular generation for drug discovery.

Talk 2 : Practical Evaluation of LLMs and LLM Systems: Ensuring Relevance and Effectiveness in Real-World Applications

In-Depth Analysis of LLM Evaluation Methods: Gain insights into various methods used to evaluate LLM models, understanding their strengths and weaknesses.

End-to-End Evaluation Techniques: Explore how LLM augmented systems are assessed from a holistic perspective, ensuring comprehensive evaluation. Pragmatic Approach to System Deployment: Learn practical strategies for applying evaluation techniques to real-world systems, ensuring seamless deployment and functionality.
Focused Overview on Critical LLM Aspects: Get an overview of essential evaluation techniques for assessing crucial elements of modern LLM systems, enhancing understanding and applicability.
Simplifying the Evaluation Process: Understand how to streamline the evaluation process, making the work of LLM scientists more efficient and productive.

Speaker Andrei Lopatenko

Andrei Lopatenko is an accomplished technology expert with over 18 years of experience in the industry. He earned a PhD in Computer Science from the University of Manchester and has been a key player in various high-profile AI projects at leading companies such as Google, Apple, Walmart, eBay, and Zillow. His notable work includes developing essential components of Google's search engine, initiating Apple Maps Search, and leading significant AI and search initiatives at Apple, Walmart, eBay, and Zillow. Additionally, Andrei co-founded a Conversational AI startup that was acquired by Facebook/Meta


25 attendees+20