Title
AWS re:Invent 2023 - How not to practice observability (DOP404)
Summary
- Anand from ManageEngine, a division of Zoho Corporation, discusses common pitfalls in implementing observability.
 - Observability is proactive and relies on historical data, unlike reactive monitoring.
 - Quality of observability improves with the right data sampling, not just more data.
 - Misconceptions about observability can lead to issues like overprovisioning and missing critical spikes in metrics.
 - Creating dashboards should be done thoughtfully to avoid technical debt and ensure they address frequently referred issues.
 - Assumptions can lead to incomplete observability, missing out on capturing all layers of an application.
 - Misconfigurations in alerting can lead to alert fatigue and unnecessary costs.
 - DevOps teams should avoid centralizing configurations and instead tailor observability practices to specific applications.
 - Data hoarding and access restrictions can hinder effective observability.
 - Platform engineering is emerging to address data unification challenges.
 - Observability systems need failover mechanisms and should not cause system crashes.
 - Knowledge transfer between shifts is crucial to avoid reinventing the wheel.
 - Adopting new tools requires internal changes and should not be done just for the sake of using new technology.
 - ManageEngine offers tools for observability and invites attendees to visit their booth for more insights.
 
Insights
- Observability is a complex field that requires a balance between proactive data analysis and avoiding information overload.
 - The right sampling rate is crucial for accurate observability, as both under-sampling and over-sampling can lead to misinterpretation of system health.
 - Dashboard creation is a skill gap in many organizations, and dashboards should be created with a clear purpose and regular usage in mind.
 - There is a risk of assuming that if individual parts of a system are fine, the whole system is fine, which can lead to missing systemic issues.
 - Alerting configurations should be optimized to reduce noise and prevent alert fatigue among engineers.
 - Decentralizing observability configurations can empower teams to tailor observability to their specific needs, avoiding a one-size-fits-all approach.
 - Data accessibility and cross-team observability are essential for quick incident resolution.
 - Platform engineering is becoming important for managing data across various tools and ensuring a unified view of observability data.
 - Observability systems themselves need to be robust and not contribute to system instability.
 - When adopting new tools, it's important to consider the people and processes involved, not just the capabilities of the tool itself.
 - ManageEngine's experience with observability across a wide range of products and customers positions them as a knowledgeable entity in the field.