Real-Time OEE Monitoring: The Architecture That Actually Works on the Shop Floor
Overall Equipment Effectiveness is the single most quoted, most misunderstood number in modern manufacturing. Half the OEE dashboards we see in Indian plants are mathematically wrong. The other half are mathematically right but practically meaningless because the inputs are gamed at source.
A working OEE system is an architecture problem, not a UI problem. This article is the reference architecture we deploy when we own the OEE stack end-to-end.
The Five Layers
- Acquisition — PLC / SCADA tags, energy meters, vision sensors, operator panels. Raw signals only.
- Edge — a local node per line normalising signals to canonical events. Buffers 72 hours on WAN outage.
- Stream — a message broker (Kafka in larger plants, MQTT in smaller) that durably stores every event.
- Compute — stateless services that compute availability, performance and quality on rolling windows.
- Surface — dashboards, ANDON boards, mobile push and ERP postings.
Where OEE Calculations Go Wrong
- Planned downtime is misclassified as availability loss — inflates losses, masks real bottlenecks.
- Ideal cycle time is set to nameplate — should be set to demonstrated best, not vendor brochure.
- Quality is measured at end-of-line only — misses rework hidden in WIP.
- Micro-stops below threshold are ignored — in high-velocity lines, these are the entire problem.
Canonical Event Schema
Every event flowing into the OEE engine carries the same five fields: machine ID, timestamp (UTC, millisecond), event type, value and source. Downtime reasons, scrap codes and quality categories are attached as enriched metadata downstream. Keep the event shape ruthlessly clean — every "just one more field" costs you for the next decade.
Real-Time vs Eventually Consistent
A live dashboard target of 200 ms is fine for the production manager. The ERP confirmation does not need to be sub-second — it needs to be accurate. Decouple the two paths. Live OEE runs on the stream. ERP posting runs on a settled view that triggers on confirmed material events.
Operator-Sourced Downtime Reasons
No PLC will ever tell you that the line is down because the trolley operator went for chai. Auto-detect the stop — then prompt the operator on a tablet with a short, hierarchical list (Mechanical → Bearing failure → Conveyor drive) plus a free-text field. Force a reason within a configurable grace window or the system promotes the event to "uncategorised — escalate to shift supervisor".
The "Andon Without Andon Boards" Pattern
Physical ANDON boards still matter — but in 2026 your real ANDON is the shift supervisor’s phone. Push the same event taxonomy to a mobile app with SLA-driven escalation: 5 minutes silence → shift lead, 10 minutes → plant head, 20 minutes → operations director. Audit every escalation as part of the weekly review.
Reference Stack
For a 200-machine plant, the stack we standardise on:
- Connectivity: OPC-UA from machines, MQTT from cheaper sensors, REST for legacy bolt-ons.
- Edge: a small industrial PC per line running our edge agent.
- Broker: Kafka in 3-broker HA across a small on-prem cluster.
- Time-series store: TimescaleDB. Relational store: PostgreSQL.
- Compute: Python services for analytics, Go services for the hot path.
- Surface: React dashboards, mobile push via FCM, ERP postings via IDoc/BAPI.
Practitioner note
Real-time OEE is achievable. Real-time OEE that operators *trust* is an organisational achievement — it requires data discipline, escalation discipline and an honest baseline reset every quarter.
Frequently asked
What is a realistic OEE benchmark?
World-class is 85%. Indian discrete manufacturing averages 45 – 60%. Process industries trend higher (55 – 75%). If your reported OEE is above 80% but your bottlenecks are obvious, your data is wrong, not your plant.
Should OEE be calculated per shift or per order?
Both. Per-shift OEE is the management KPI. Per-order OEE is the engineering KPI. Storing events at the lowest granularity lets you recompute either on demand.
Continue reading
14 min read
Manufacturing Execution Systems (MES) in India: The 2026 Practitioner’s Guide
A field guide to choosing, implementing and operating a Manufacturing Execution System in Indian plants — written from 50+ deployments across process and discrete industries.
7 min read
Downtime Tracking: From Operator Excuses to Root-Cause Pareto
Most plants record downtime in three categories: mechanical, electrical, and other. That is why nothing improves. Here is the taxonomy and workflow that actually changes machines.
8 min read
Paperless Shop Floor: Digital Work Instructions Done Right
A pragmatic guide to digitising paper SOPs into auditable, version-controlled, multi-lingual work instructions that operators will actually use.
Amey Kadle
Founder & CEO, Ajinkya Technologies. 20+ years of building MES, ERP and AI systems for India’s most demanding manufacturing plants.