Texera: Cloud-Based Collaborative Data Science and AI/ML Using Workflows


Details
Please join us for the Orange County ACM Chapter's bi-monthly evening program series.
Texera: Cloud-Based Collaborative Data Science and AI/ML Using Workflows
Agenda
6:30 PM Doors Open & Networking
7:00 PM Announcements and Presentation
8:30 PM Meeting Adjourned
Event Details
Since 2016 our team at UC Irvine has been developing the Texera open-source system (texera.io), with the goal of enabling a cloud-based platform to support collaborative data science, AI, and ML. It allows users with various backgrounds, including those with limited coding skills, domain scientists, and ML experts, to conduct AI-centric data science with a collaboration experience similar to Google Docs.
After eight years of development, the system has a rich set of features, such as shared editing, shared execution, version control, commenting, debugging, user-defined functions in multiple languages (e.g., Python, R, Java), and support of state-of-the-art AI/ML techniques. Its backend parallel engine enables scalable computation on large data sets using computing clusters. It allows bioinformaticians to elastically request resources from AWS to form a cluster to run computationally intensive jobs. It also supports community-based sharing of resources including datasets and workflows.
In this talk, we will give an overview of the system, discuss research challenges encountered in the development and our solutions, and give an overview of its software architecture to achieve high usability, scalability, and reliability. We will show several domains that adopt the systems, including education and scientific communities.
Speakers
Prof. Chen Li is a professor in the Department of Computer Science at UC Irvine. He received his Ph.D. degree in Computer Science from Stanford University, and his M.S. and B.S. in Computer Science from Tsinghua University, China. His research interests are in the fields of data management, data science, AI/ML, databases, data-intensive computing, search, and visualization. He was a co-founder and CTO of a startup to commercialize his research. He was a recipient of recognitions including an NSF CAREER award and test-of-time awards. He is an ACM Distinguished Member, and an IEEE fellow.
Jiadong Bai is a second-year Ph.D. student in Computer Science at UC Irvine, advised by Prof. Chen Li. His research focuses on building intelligent systems for big data management, with interests in integrating AI into different aspects of data science. In the Texera project, he is a lead designer of its cloud infrastructure, and the storage layer built with S3, LakeFS, and Apache Iceberg.
Co-sponsors
This event is co-sponsored by the IEEE Orange County Computer Society, SIGAI OC, and the IEEE OC Solid-State Circuit Society (SSCS).
Venue & Parking
This meeting will be held at the Knobbe Martens' Irvine offices. Parking validation will be provided to attendees.

Sponsors
Texera: Cloud-Based Collaborative Data Science and AI/ML Using Workflows