Title
AWS re:Invent 2023 - Improve resilience of SAP workloads with AWS Support (SUP312)
Summary
- 3M collaborated with AWS support to enhance the resilience of their SAP workload.
 - Kim Otto from 3M and AWS technical account managers Manik Chopra and Vijay Sitaram shared their experiences.
 - AWS support provided various engagements like tabletop exercises, fault testing, and runbook reviews to improve disaster recovery (DR) exercises.
 - 3M utilized AWS Resiliency Hub, Fault Injection Simulation, and CloudWatch Application Insights to modernize their SAP resiliency management.
 - The session emphasized the importance of high availability, continuity of operations, and continuous resilience.
 - Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are critical metrics for system resiliency.
 - Different categories of failures were discussed, including code deployment, core infrastructure, data corruption, dependencies, and regional outages.
 - AWS services like AWS Health, Trusted Advisor, and CloudWatch were leveraged by 3M.
 - The well-architected SAP with a lens, access to response and incident, business build and review runbooks, manage and operate resilience, test and validate recovery, and monitor and observe availability were key support engagements.
 - 3M's SAP architecture includes two regions for production and non-production workloads, with a focus on identifying single points of failure.
 - Trusted Advisor Priority and Resilience Hub were used to assess and improve the resilience of 3M's SAP systems.
 - Fault Injection Service (FIS) was used to simulate and test various failure scenarios.
 - CloudWatch Application Insights for SAP provided monitoring capabilities for both infrastructure and application metrics.
 - Kim Otto highlighted the journey with AWS, focusing on reducing Mean Time to Recovery (MTTR), modernizing operations, and fostering a culture of resilience testing.
 
Insights
- The collaboration between 3M and AWS demonstrates the value of AWS support in enhancing the resilience of critical workloads.
 - AWS Resiliency Hub and Fault Injection Simulation are powerful tools for assessing and testing the resilience of systems, allowing for proactive identification and mitigation of potential failure points.
 - The use of AWS services for monitoring and observability, such as CloudWatch Application Insights, is crucial for maintaining system health and quickly identifying issues.
 - The integration of Trusted Advisor and Resilience Hub provides a comprehensive view of system resilience and actionable recommendations for improvement.
 - The session highlighted the importance of continuous resilience practices, including regular DR testing and updating runbooks for cloud environments.
 - The focus on RPO and RTO metrics underscores the need for businesses to align their technical resilience strategies with their business continuity requirements.
 - The journey of 3M serves as a case study for other organizations looking to improve their system resilience on AWS, particularly for complex and mission-critical applications like SAP.
 - The emphasis on cross-team communication and collaboration is key to successful resilience management, as it involves both infrastructure and application teams.
 - The session provided insights into the evolving nature of cloud services and the importance of staying current with new features and best practices to maintain a resilient infrastructure.