Wed, Jan 21 · 6:30 PM CET
Welcome to the PyData Berlin January meetup!
We would like to welcome you all starting from 18:30. There will be food and drinks. The talks begin around 19.15 and the doors will close at 18:45. Make sure to arrive on time!
Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited.
Host :
Spiced Academy is excited to welcome you to this month's version of PyData.
**************************************************************************
The Lineup for the evening
Talk 1: From Grape Stomping to Gorilla Trekking: AI-Driven Discovery of Novel Interests at GetYourGuide
Abstract : UX research shows that discovering new, unexpected and unique things to do in a destination is among the most important needs for travelers. However, surfacing these “hidden gems” at scale is a major challenge for any dynamic marketplace. At GetYourGuide, we tackled this by combining large language models, semantic vector embeddings, and a robust human-in-the-loop process to automatically identify and categorize novel customer interests. Join me as I dive into how we created this AI-driven solution and how we achieved scalable, granular and relevant categorization of our activities.
Description: How do you help a traveler find something they never knew they wanted? In this talk, I will explore the exciting journey of developing an AI-driven solution to identify novel customer interests at GetYourGuide. With thousands of new activities onboarded weekly, manual categorization became impractical, prompting the need for automation.
This talk will walk you through our end-to-end solution:
The problem: the challenges travelers face in discovering relevant and unique experiences, the organizational complexity of this on our platform and its business impact
The AI journey: From raw LLM-based extraction to vector embeddings, clustering, all the way to a robust human-in-the-loop pipeline. I’ll share how we iterated from noisy prototypes to an impactful system, launching 300+ new interests and driving significant engagement and net revenue uplifts
What we learned: The realities of working with LLMs in production-like settings, more specifically hallucinations and issues with relevance, granularity and semantic deduplication. We found out that the line between useful innovation and unwanted noise can be surprisingly fine when using LLM’s. I’ll share practical examples of these challenges and how we’re iterating to address them.
Impact and future directions: How we achieved a 10x increase in manual acceptance of AI-suggested interests in just two iterations, and what’s next for AI-driven discovery at GetYourGuide
Bio : Morena Bastiaansen holds a master’s degree in Econometrics from the University of Amsterdam and currently works as a Data Scientist at GetYourGuide in Berlin. She combines her passions for machine learning, language, and travel to build impactful systems for travelers worldwide, with a current focus on personalization in email marketing. Morena is also actively involved in the local tech and ML community.
Talk 2: Deep dive into data streaming security
Abstract : Data streaming is powering everything from fraud detection and real-time analytics to patient monitoring and order fulfillment. But as the role of streaming grows, so does the risk - because many streaming platforms, like Apache Kafka, aren't secure by default. In this talk, we’ll take a practical look at data streaming security through the lens of Kafka, one of the most widely adopted streaming platforms in the world. We'll walk through what can go wrong - real examples of exposing 14M patient-doctor messages, or years of real-time delivery info to the internet - and what it takes to do it right. We'll cover the key pillars of securing a streaming system: encryption in transit and at rest, access control, monitoring, and key management. Along the way, we’ll look at the trade-offs like disk encryption vs. end-to-end encryption, what is behind field-level and envelope encryption, and the realities of using customer-managed keys in regulated industries. We'll also explore how streaming security has evolved, how real vulnerabilities (like CVE-2019-12399) highlight the need for patching and monitoring, and what successful multi-layered security looks like in production - from financial institutions to healthcare platforms. If you're building, running, or scaling streaming systems, this talk will help you see the security blind spots and give you concrete steps to protect the data flowing through your pipelines.
Bio : Olena Kutsenko is a Staff Developer Advocate at Confluent and a recognized expert in data streaming and analytics. With two decades of experience in software engineering, she has built mission-critical applications, led high-performing teams, and driven large-scale technology adoption at industry leaders like Nokia, HERE Technologies, AWS, and Aiven.
A passionate advocate for real-time data processing and AI-driven applications, Olena empowers developers and organizations to use the power of streaming data. She is an AWS Community Builder, a dedicated mentor, and a volunteer instructor at a nonprofit tech school, helping to shape the next generation of engineers.
As an international speaker and thought leader, Olena regularly presents at top global conferences, sharing deep technical insights and hands-on expertise. Whether through her talks, workshops, or content, she is committed to making complex technologies accessible and inspiring innovation in the developer community.
Lightning talks
There will be slots for 2-3 Lightning Talks (3-5 Minutes for each) between the two main talks.
Kindly let us know if you would like to present something :)
***
NumFOCUS Code of Conduct
THE SHORT VERSION
Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS.
All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate.
NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form.
Thank you for helping make this a welcoming, friendly community for all.
If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct
***