Hybrid Cloud Cataloging Objects across Storage Systems Nfx301

Title

AWS re:Invent 2022 - Hybrid cloud: Cataloging objects across storage systems (NFX301)

Summary

  • Denise Mace introduced the session and the speakers, Priyesh and Kishore, who are experts in object storage and catalog service infrastructure.
  • Kishore discussed the Netflix Studio ecosystem, focusing on the evolution of workflows, data movement, and storage systems.
  • Netflix uses AWS S3 for persistent storage of media assets and has evolved to use AWS Glacier Flex for cost-effective data lifecycle management.
  • The architecture has shifted from centralized to decentralized, utilizing AWS local zones for performance and proximity to workflows.
  • A catalog service was developed to manage data across different storage systems, formats, and protocols transparently.
  • Priyesh detailed the catalog service's design, requirements, and functionalities, including namespace management, access management, and support for heterogeneous storage systems.
  • The catalog service uses storage classes and locations to manage data, with a core API layer and a storage engine layer for specific storage systems.
  • The service dynamically updates based on changes in storage systems and supports a variety of APIs for lifecycle management, querying, and administration.
  • The presentation concluded with a discussion on the catalog service's role in the Netflix Studio ecosystem, its current capabilities, and future work, followed by a Q&A session.

Insights

  • The evolution of Netflix's storage architecture from centralized to decentralized reflects a broader industry trend towards hybrid cloud solutions that optimize for performance, cost, and scalability.
  • The catalog service developed by Netflix is a sophisticated solution to manage data across disparate storage systems, ensuring that applications can access data without concern for its physical location.
  • The use of AWS Glacier Flex indicates a strategic approach to data lifecycle management, balancing the need for immediate access against cost savings from archival storage.
  • The session highlighted the importance of a global registry for objects in a hybrid cloud environment, which is crucial for managing data at scale and across multiple storage systems.
  • The design of the catalog service, with its abstraction of storage engines and dynamic awareness of changes, showcases a high level of engineering that could be beneficial for other organizations facing similar challenges in data management.
  • The future work mentioned, including better lifecycle management, smart clients, and encryption key management, suggests ongoing improvements and innovations in cloud storage and data management.