Details
Unstructured data has emerged as a central focus in the GenAI era, offering remarkable opportunities to boost growth and secure a competitive edge. However, this data also presents significant challenges for Chief Data Officers (CDOs) tasked with its effective governance. Traditional tools prove insufficient for managing unstructured data, necessitating new, unique capabilities to leverage it in GenAI applications.
Key questions include: How can we discover, classify, and catalog unstructured data? Which users have access to sensitive data within unstructured datasets? What measures can prevent the exposure of sensitive information? Is the data relevant, current, and free from duplicates? Which policies and regulations are applicable to unstructured data? How do we trace the provenance of data from unstructured data systems to GenAI models and endpoints?
Join Jack Berkowitz, (Chief Data Officer, Securiti) and Ankur Gupta (Director of Product Marketing, Securiti) as they explore how CDOs can adopt a comprehensive approach to unstructured data management, safely unlocking its potential and effectively operationalizing GenAI.
Key Takeaways:
-
The rise of unstructured data in the GenAI era
-
Challenges faced by CDOs in governing unstructured data
-
New governance capabilities required to harness unstructured data for GenAI applications
Speakers
Post-event summary
The webinar titled “Navigating Unstructured Data: A CDO’s Roadmap to GenAI Success,” hosted by EDM Council and Securiti, focused on the challenges and strategies for managing unstructured data in the context of generative AI. Panelists included industry experts:
- Jack Berkowitz, CDO, Securiti
- Ankur Gupta, Director of Product Marketing, Securiti
- Moderator: Jim Halcomb, Head of Product Management, EDM Council
The speakers emphasized the importance of discovering, classifying, and governing unstructured data. They highlighted that unstructured data, which includes files, emails, and transcripts, makes up approximately 90% of organizational data and poses significant challenges for data management and AI implementation.
The speakers discussed various advanced capabilities required for handling unstructured data, such as natural language processing and machine learning. They stressed the need for effective data discovery, cataloging, classification, and curation to ensure the data’s relevance and accuracy. Jack noted, “Our job is to keep data flowing. What we find with the data command graph is it allows people to understand it in context so that they have the confidence to keep data moving.” This quote underscores the critical balance between data accessibility and security.
The webinar also addressed the risks associated with data leakage and the importance of maintaining data entitlements to prevent unauthorized access. The speakers introduced the concept of retrieval augmented generation (RAG) to enhance the contextual relevance of AI models using unstructured data. They highlighted the role of a data command graph in ensuring proper data lineage, security, and governance.
In conclusion, the webinar provided valuable insights into the complexities of managing unstructured data and the necessary steps for leveraging it effectively in AI applications. The emphasis on collaboration between privacy, security, and data teams was deemed essential for building a robust unstructured data governance program.