Test Case 23

Verification of the reliability of a redundant system or algorithm (e.g. failover)

Identification

ID

23

Author

Tesfaye Amare Zerihun

Version

1

Project

ERIGrid 2.0

Date

10/05/2020

Test Case Definition

Name of the Test Case

Verification of the reliability of a redundant system or algorithm (e.g. failover)

Narrative

The aim of this test case is to assess and verify the reliability of failover systems in smart grids that relies on putting redundancy. Redundancy is mainly used for critical smart grid applications such as protection in substations. In general, the redundancy can be either for control systems (such as the SCADA controller), communication networks or network controller (such as SDN controller), or a redundancy in the sensors and actuators (CT/PTs, Merging units, Breakers). Specifically, this test case looks into the reliability of failover systems with a redundancy in the communication networks.

The ICT support system/communication network may fail due various reasons for e.g., component (hardware) failures, environmental failures (such as weather disruption failing multiple components), a power outage, overloading of the network (capacity shortage) or cyber-attack. A fail-over system tries to quickly detect failures and switch to backup systems (smooth transition in the case of active standby systems) or it tries to recover the system after experiencing some performance glitch (the case of passive standby system). Unlike active backup systems, in the case of passive redundant systems, there can be small system down time until the backup system is powered on and takeover. The test case investigates the reliability of active or passive fail over systems to verify if these systems can meet the requirements (delays, packet loss or system down time) specified by the smart grid application considered or requirements set by the standards and protocols such as IEC 61850.

Function(s) under Investigation (FuI)
  • Exchange of data (measurement and control commands) through the communication network
  • Control functions on the local controllers / IEDs
  • Protection functions on the local controllers / IEDs
Object under Investigation (OuI)

Communication Network (network devices, switches, network links), control devices or IEDs

Domain under Investigation (DuI)
  • Information & Communication System
  • Control system
  • Power system
Purpose of Investigation (PoI)
  • PoI#1: Characterization of the performance of fail-over systems with a varying degree of redundancy.
    • PoI#1.1: Characterization of the performance of fail-over systems with redundancy in the communication network.
    • PoI#1.2: Characterization of the performance of fail-over systems with redundancy in the controller/IED
  • PoI#2: Characterization of the redundant communication/control system for its vulnerability towards cyber attacks
  • PoI#3: Verification of the system level performance with standard requirements (if it complies with the minimum expected KPIs e.g., down time, packet loss or delay)
System under Test (SuT)
  • Communication network (Switches, routers, network links)
  • Sensors (voltage sensors, CT/PT basic control), Actuators (breakers, intelligent switches)
  • SCADA Controller (for e.g., FLISR, voltage control, monitoring functions) or local controllers/IEDs
  • Power transmission system (substations, transmission lines)
Functions under Test (FuT)
  • Capability of the communication network/ devices to facilitate data exchange between controllers/IEDs, sensors and actuators (breaker, disconnectors)
  • Control and Protection functions
  • Monitoring capability of sensors (devices such as Merging units, PMUs)
  • Actuator (breaker, disconnector) functions
Test criteria (TCR)
  • Run the system without introducing faults, and measure the communication network's performance during the normal operating condition…
  • Introduce a fault and measure the communication network performance degradation right after a failure/fault occur/injected.
  • Calculate and obtain the overall system performance (availability, reliability).
Target Metrics (TM)
  1. End to end delay, packet loss
    • During a normal operating condition
    • After introducing a fault
  2. Up time, down time, availability and reliability
    • After introducing a fault
Variability Attributes (VA)
  • Redundancy type and degree (active-active, active-standby, active-inactive)
  • Number of simultaneous (component) failures
  • Communication network traffic situation (background traffic)
  • Communication topology
  • Failure type
    • Cascading (from PS to CS)
    • Hardware
    • Software
Quality Attributes (QA)

Pass: End to end delay (average, maximum) and packet loss are within the maximum limit or threshold values set by specifications on standard communication protocol or the smart grid application (such as protection) considered. OR, the availability, reliability measures are within the limit on the specification requirement set by system administrator.

  • Packet loss > T, where T is a threshold value set according to the application (protection) type considered.
  • Packet delay larger than D ms, where D is the delay tolerance which depend on the application (protection) type considered. The delay tolerance varies from 1 ms (for bus bar protection), 4 to 8 ms (for other type of protection schemes) up to 800 ms for IED to SCADA communication.

Fail: If the measured metrics (delay, packet loss, availability, or reliability) exceeds the threshold values.

Qualification Strategy


Test Specification TC23.1

Verification of the reliability of Substation Automation Systems (SASs) with redundancy in the communication network