
Client Profile
-
Organization: Socotec Group
-
Industry: Testing, Inspection & Certification (TIC)
-
Annual Revenue: €1.2 Billion (2024)
-
Employees: 12,000+ professionals worldwide
-
Global Presence: Operations in 25+ countries across 5 continents
-
Digital Footprint: 75+ microservices supporting 200+ business applications
-
Tech Stack:
-
Kubernetes (EKS)
-
Istio Service Mesh
-
Envoy Proxies
-
Executive Summary:
Socotec roup, a global leader in risk management and technical compliance services, faced persistent reliability issues within their mission-critical microservices architecturei Random production failures affecting up to 70% of service calls were severely impacting business operations and customer experiencei After internal teams struggled to resolve the issue for months, Stackgenie was engaged to diagnose and resolve what proved to be a complex Istio service mesh configuration problemi Through advanced troubleshooting and custom Envoy filter development, Stackgenie not only eliminated the failures but also enhanced Socotec’s internal expertise, resulting in a 99.99% service reliability improvemention.
The Challenge: Unpredictable Production
Background
Socotec’s digital transformation initiative included migrating legacy systems to a cloud-native microservices architecture across their global operationsi Their platform processed
over:
- 350,000+ daily API transactions
- 15,000+ internal users across 25 countries
- 50,000+ external client interactions daily
- Critical compliance data for infrastructure safety assessments
The Technical Problem
After two years of successful Istio implementation, Socotec began experiencing intermittent but severe service disruptions characterized by:
-
Error Pattern: Istio sidecar proxy errors showing
"503 UR: upstream_reset_before_response_started{remote_reset}"
-
Failure Impact: 50–70% of service calls failing within affected pods
-
Occurrence Pattern: Unpredictable timing — sometimes appearing within 20 minutes of deployment, other times after months of stability
-
Business Impact: Service disruptions affecting customer-facing applications, compliance reporting systems, and internal operations tools
Previous Resolution Attempts
Socotec’s internal team had attempted multiple remediation strategies:
-
Istio Version Upgrades: Standard upgrades (from 1.12 to 1.14)
-
Resource Allocation Increases: Pod CPU/memory increased by 50%
-
Timeout Configuration Adjustments: Default timeout values increased by 200%
-
Pod Affinity Rules: Modified to optimize workload distribution
-
Load Balancing Algorithm Changes: Implemented alternative strategies
Despite these efforts, the errors persisted unpredictably, creating significant operational uncertainty and threatening Socotec’s service level agreements with major clients in critical infrastructure sectors.
To see the solutions and results download the case study below.

Lets start with a free Platform checkup!
Contact Stackgenie today to discuss how we can help you transform your development process.