Charting a Course to Navigate the Waters of a Cybersecurity Data Lake

December 6, 2024 • Presentation

By
Rosalie Bakken (Mayo Clinic)

Rosalie Bakken of the Mayo Clinic presented this session at FloCon 2024.

Publisher

Software Engineering Institute

Topic or Tag

Flocon

Abstract

The potential utility of a cybersecurity data lake is as expansive as data volumes and types available; yet true value can be difficult to grasp and realize. Where to start with using the lake? How much data to include in a given query? How far to go with any given hypothesis and set of queries? When to veer off and start a new investigational path? Should data scientists work collaboratively on the same hypothesis or should they divide and conquer? How do teams stay aligned and how should differing results based on alternative approaches to a given problem be reconciled? What does it all mean in terms of insights? In short, how can we ensure the data lake has practical use that justifies its cost?

Background: Any widget is the sum of its parts. This idea can also be applied to charting a course through a data lake. Without first starting with the appropriate map, any navigator is destined to run aground. Taking these first steps, we must understand where we want to go (obtain the map), how to get there (chart the course), realize alternate paths exist which may derail our attention (shiny objects), and correct course as needed. Without this guidance, analytical efforts may devolve to perpetually cyclical pursuits, with substandard results.

Approach: Clearly, the potential benefits of a data lake are extensive, but without a rigorous, well-defined process guiding its use, results can easily run amuck, creating confusion and misinterpretation that limit its abilities to bring value. This discussion will delve into the repeatable processes we’ve used to generate insights from analytical discoveries made using an enterprise-scale cybersecurity lake, highlighting six methodological steps for establishing and maintaining focus on prioritized use cases, clearly defining the problem and what success looks like, avoiding bias through collaborative development of logic, and handing off results thoughtfully and in context. These steps enable an openness to discoveries that challenge our presumptions, while at the same time holding true to the original analytic goal in the face of myriad distractions in the form of “interesting data discoveries” that threaten to upend carefully laid plans and timelines.

Attendees Will Learn: Participants will be exposed to a methodology for defining a repeatable logical flow supporting cybersecurity analytic efforts. They will learn how to develop a workflow map including describing the data before jumping into the analytical phases, enabling greater efficiency, and more accurate results. Participants will also learn how to tune out the noise to focus efforts based on the guardrails offered in the logical workflow, preventing distraction from proving or disproving the primary hypothesis while still capturing adjacent hypotheses on a backlog.

Software Engineering Institute