How to ensure your automation system can be recovered
Take backups seriously to avoid disaster
When I was an engineer, one position I held had me responsible for making sure county emergency radio communications systems and their ancillary equipment had better than 99.999% uptime (we actually shot for six nines—not five).
That meant architectures with mirrored power supplies, hot switchover power systems, redundant radio/microwave circuits, and huge battery backup systems. When someone’s life is dependent upon communications systems working, backups are a key part of system design and maintenance. Backups not only can save lives, but they also can save your business from a potentially catastrophic outage.
If you’re a plant or production manager, the last thing you need is a 3 a.m. call from your line supervisor saying, “Process and packaging line 2 is down because we had a 15-second power outage.” Or maybe worse, you’ve contracted a ransomware virus that has shut down your edge server and data historian and migrated to all Windows-based hosts on your network.
If you haven’t become serious about backups, it’s time to do so now—before a key system goes down and puts you out of business for a few days (or longer) while you try to recover. It’s time to do some risk assessments and find out what systems you can’t afford to lose. Ask yourself: What does my downtime cost?
While you may already do some backups, if the ones you have are out of date or don’t address equipment that you had forgotten existed, getting back up and running could be challenging—as though you didn’t back up at all.
I spoke with system integrators, backup system suppliers and makers of PLCs and automation controllers to formulate some basic steps you can take to help survive a downed system, whether due to viruses and cyber hacks, power spikes/brownouts, or just plain controller failure. We’ll look primarily at protecting controls, but also offer some tips for keeping servers and computers up and running as well.
What should be backed up?
At the outset, we need to be concerned about two kinds of backups: data (process and controller configuration data) and power system backups. The two can be intertwined. If you lose power, plant equipment goes down, unless you have onsite generators. In losing power, you may lose vital process data too, unless your microprocessor-based control systems are backed up with power temporarily to allow a graceful shutdown. If your power system suffers from voltage spikes and surges, you could lose controller configuration data and maybe even controller hardware.
Though a rare event today, controller hardware does fail, and you should be prepared. Should, for example, a PLC fail, how much downtime can you afford?
“On systems where it is critical for a line to maintain a very low failure rate, we have installed redundant controls systems,” says Chad Clemens, controls engineer for CHL Systems, an independent system integrator (SI) located in Souderton, Pa. “This setup has included two PLCs that share common I/O. That way, if one of the PLCs fails, the second PLC can take over the process and run the system.”
Maybe you don’t need the hot switchover, but what does your spare parts inventory look like? Robert Glaser, CEO of AUVESY Inc., provider of versiondog backup software, says that processors need to consider the hardware basics and ask: Is our list of equipment (e.g., maintained in an Excel spreadsheet) complete at all? Do we have spare equipment for in-use hardware? Do we need every spare piece of equipment? Do we have firmware versions that are outdated (e.g., in terms of security)? Versiondog answers these questions by monitoring every single asset (unknown and known).
SIs can help users decide what equipment to backup and strategies to implement. “We can provide strategies and processes for safely backing up critical configurations and data if the client does not have that capability or resource in-house,” says Steve Pflantz, CRB associate/senior automation engineer. “We can review the system to understand what risks are present due to power outages, component failures, cyberattack, etc. and develop a plan to provide safe and reliable protection of data.”
Besides PLCs, what else should be backed up? “Every asset,” says Glaser. Whether those assets are PLCs, SCADA, HMI, robots, routers, switches, scanners, field devices, industrial PCs (Windows or Linux) or office PCs, they can be covered by versiondog.
“Yes, a separate backup/recovery solution for each system is recommended,” says Michelle Meyer, MDT Software marketing director. MDT makes a change management system (CMS) called AutoSave for backing up systems. It helps maintain version control of systems. Many of these devices have simple configuration parameters, but it is helpful to store them in a central location.
Many processors today have video surveillance and monitoring systems. What’s necessary to back them up? Techniques vary widely, and the manufacturer’s recommendations should be reviewed, says Meyer.
Typically, video surveillance monitoring and recording systems are protected separately, says Alexander Feokhary, project manager for Paragon Software Group. They need system protection after changing settings, which is relatively rare, and they usually have their own systems for archiving video material, depending on the requirements for storing video monitoring materials instead of backup.
What about boilers and HVAC systems? Since boilers and HVAC systems are becoming an integral part of the process in many applications, it doesn’t hurt to have backup plans in place for these ancillary systems as well, says CHL Systems’ Clemens. If data is being passed back to a central location and stored on a server, then why not capture an HVAC system’s data in a centralized backup of the system with other backups?
“Do a thorough risk assessment and really look at your facility,” says Eric Reiner, Beckhoff Automation Industrial PC (IPC) product manager. Is there a possibility that any of these ancillary systems could lead to a significant danger or problem? While a breach of a HVAC system may not cause data loss and affect traceability, loss of operation of a system that is critical to clean, filtered air flow or a strict range of temperatures could certainly affect food safety or shelf life.
Do you have a backup plan for your coolers and freezers? How long can you go without power? Controls, of course, are important, but what about standby generators? If your system has an automatic switchover from mains power to backup generators, when is the last time you tested it? Do you have an automated test cycle where generators are started once a week or month? Have you recently checked their oil and fuel levels?
Strategies for spikes and power outages
Some large food companies may be fortunate to have more than one utility supplying electrical power, which is a nice-to-have but not-so-often reality for smaller processors in areas where only one utility provider exists. For this latter situation, I’ve seen food plants have more than one utility feed from the same supplier—just in case a vehicular accident or fallen tree shuts down power at a particular feed site.
In the event of incoming mains power loss, while you won’t be able to keep your production/packaging lines running (unless you have on-demand, on-site power generation), powering down controllers gracefully—especially if PC-based—is important to preventing data loss. And the solution is what you may already use at home—an uninterruptible power supply (UPS). A UPS not only provides for a graceful shutdown of sensitive equipment, but also can protect it from voltage spikes and sags (brownouts) as well.
Maintaining and filtering power via a suitable UPS is the obvious approach and can be used to provide a trigger to a backup protocol to ensure that critical data is all stored remotely prior to ultimate failure if the power outage is beyond the capability of the UPS, says Terry Wright, Solid State Disks business development manager.
Besides installing a UPS, Charlie Norz, WAGO product manager of I/O Systems, suggests two other ways to protect systems. For a critical system, add in surge protection on PCs, IPCs, controllers and HMIs. In addition, systems should be programmed to store critical data in memory locations where the data can easily be retrieved in the event of a power loss.
“We will see a lot of end users utilizing an SD card with their HMIs to store the process data, or they have the machine connected to the plant network, allowing them to back up the process data on their plant servers,” says CHL Systems’ Clemens. It is also possible to store the HMI program on the SD card and even have a button on the HMI to save changes if the operators update process variables (PVs) throughout the day. Then those adjusted variables on the HMI are not lost due to a power outage. With appropriate PLC/HMI code and setup, those process variables can be reloaded upon power restoration.
A plant should have a disaster recovery procedure that includes instructions on how to reset automation systems once power is restored—or in cases where a surge has caused hardware failure, replace the equipment, says Gary Gillespie, MDT Software vice president. If the only good copy of the program logic was in that equipment, the program data could be lost. If a CMS is used, all the program logic will be saved on a separate server so that once the equipment is back online, the latest program is simply downloaded to the device and operations resumed—no matter the brand of automation equipment or PC-based applications.
PLCs and PACs offer some built-in protection
PLCs and PACs are typically better designed to handle power brownouts and spikes than their PC counterparts. For example, Mitsubishi PLCs and HMIs are built to continue operating even if there is a momentary power failure up to 20 ms, says Lee Cheung, product marketing engineer. And there is no risk to data even when the PLC or HMI does lose power, as they are designed for sudden power cutoff. IPCs, however, must follow a shutdown procedure, and therefore need a UPS to prevent sudden loss of power.
PLCs have normally been designed to manage PV memory by having batteries retain PV values (e.g., the last known value of sealing temperature or the weight of a package) during power interruptions or planned shutdowns, says Sriram Ramadurai, Omron Automation America marketing manager (controllers & components). Since manufacturers want to reduce maintenance, newer PLCs are being designed without the need for an external battery—thanks to their large capacitors that can retain voltage to hold data in memory for a certain period of time.
However, Omron’s controllers, such as the NX1 and NX1P, have battery compartments for instances in which PV retention is mandatory for customers. The NX1 also has a built-in SQL client that can securely connect to databases if a connection is lost due to a power loss event at the database end. In addition, the NX1 has the capability to spool data up to a certain storage limit. Having a UPS is definitely recommended, as it helps in not only data loss prevention but also in the proper shutdown of equipment by an operator instead of abrupt power downs.
Knowing that edge devices are installed in challenging locations, Opto 22 specifically designed the groov EPIC (edge programmable industrial controller) with multiple guards against data loss prevention, says Josh Eastburn, Opto 22 director of technical marketing. At the software level, groov EPIC uses a transactional file system designed specifically to protect data in devices where power loss may occur. All file system modifications are made in unused disk space until the current write is complete and a transaction point recorded. At the hardware and firmware levels, the controller’s solid-state drive (SSD) is capable of executing a safe power-down in the event of power loss. Together these features eliminate the possibility of data corruption due to power failure.
While we’ve seen approaches to enable a PLC to shut down gracefully upon power loss through software, internal hold-up capacitors and conventional UPSs, another approach taken by Beckhoff is to include battery-backed or capacitive UPSs for mounting on a DIN rail. Thinking one step ahead, Beckhoff also made it easy to change out NiMH batteries in the UPS on the fly, because batteries do fail after some time, says Reiner.
Speaking of battery failure, batteries last a finite period of time in UPS service. In some cases, you may find it not much more expensive to replace an entire UPS, rather than just the battery—unless you can find a good price for the battery. Another option is to purchase a UPS with an external battery, so you can extend runtime when the power shuts down.
“We have found UPSs to be helpful for power surges and spikes … PLCs and PACs typically have batteries to protect the program and data, but of course all batteries eventually die,” says Matt Hess, founder/president of PLC Paramedics. “EEPROMs have also been frequently used to store programs, but in my opinion, by far the best protection is to have a well-documented program backup file … It is not only your best defense against program loss, but is also the key to effective troubleshooting when you have machine or system issues.”
Being ready for cyberattacks
Obviously, this section doesn’t pretend to cover all there is to know about cybersecurity, and you can find more on the subject in the February 2019 FE cover story, “Cybersecurity helps manufacturers create more secure, resilient networks.” However, let’s look at a few recovery options.
The first step to a successful data recovery plan is to practice defense-in-depth security measures to prevent penetration by a cyberattack, says Mitsubishi’s Cheung. From a PLC level, security functions to prevent unauthorized read/write access should be implemented with file passwords, communication connection passwords and IP address filtering.
Generally speaking, at the machine level, PLCs from suppliers like Siemens and Rockwell are usually somewhat safer than Windows-based controllers, says CHL Systems’ Clemens. But as we’ve seen with Stuxnet, there are no guarantees—especially when the stakes are high. However, Clemens recommends a good prophylactic to ward off potential problems—the layer two firewall device, which is available for many PLCs and fits right into their I/O slots. These hardware firewalls are used locally and are an efficient way to protect specific controllers.
Backup can be done in a variety of ways, such as creating dedicated PCs for backup data storage, but Mitsubishi offers an incredibly easy method of backing up PLC programs to the HMI, says Cheung. Backup to the HMI is a great solution because the HMI is always connected to the PLC, allowing for quick backup and recovery. The HMI is also much more difficult to compromise than a PC, and therefore a safe storage for backup. Backups can be automatically triggered by the PLC program or at a predefined schedule.
As with Mitsubishi, Omron also provides its own backup system within Omron Sysmac Studio software. That said, the main concern is where to store the backup program, says Johnston Hall, product engineer. You can use a laptop as long as it doesn’t catch a virus. You can use a central storage system, but it can suffer the same problem. If you keep each version of the backup on its own USB stick, then there’s less chance of the backup being compromised.
USB sticks can be easily lost and can be difficult to catalog, so this may be a reason to use a well-protected single storage point or server. However, a server should be protected, says CRB’s Pflantz.
“Creating a backup routinely to make sure it is current is a big step that, surprisingly, is not done as well as it should be,” he says. Once you have a backup, consider virus or ransomware situations and isolate backup files so they are protected from propagation of a virus. “An ‘air gap,’ or physically separated drive system, will serve that function and allow you to reinstall everything once you get things cleaned up,” adds Pflantz.
An integrated approach to cybersecurity is necessary for comprehensive protection, and it must include the management of automated device programs and their changes, says Gillespie of MDT. There are many products that claim to protect a facility from an attack, and while many are useful, none can fully protect control logic from being changed inadvertently or maliciously. Firewalls should be used as they can stop many malware attacks, but not all of them. Plant floor isolation may not protect against that single USB containing corrupted software. Network monitoring can detect a threat but cannot identify if a change was made or reverse that change.
“A sound approach to securing program device data against security threats includes three key areas: preparation, detection and recovery,” says Gillespie. Preparation involves storing a copy of each program revision in a central repository so that program intellectual property is secured. Detection involves identifying if the program data on file does not match the program running in the device. If a mismatch is detected, the differences must be identified, and the appropriate people must be notified. For recovery from a harmful change, immediate access to a central repository of all program revisions enables the plant to quickly restore the latest approved program.
The case for backing up oft-forgotten equipment
With rapidly expanding wired and wireless plant networks today, it’s easy to lose track of the location of firewalls, routers and switches—especially if one was needed quickly to extend network coverage in a new location. CMS software, such as AUVESY’s versiondog and MDT’s AutoSave, not only can help with the documentation of these far-flung network devices, but can also back up their configuration files, which are often forgotten until they’ve been corrupted either through cyber hacking or unintentional reconfiguration errors.
“All the components in your automation system, including switches and routers, should have the same backup strategy as your PLC or SCADA backups,” says Pflantz. “If it is a configuration that can be saved, save it. That includes device configurations for instruments, VFDs, etc.”
“Any device that has software, parameters, or other configuration data is susceptible to data loss … The manufacturer has to determine if losing the data is critical enough to warrant backups,” says PLC Paramedics’ Hess. In most cases it is. Being proactive and mitigating the risk upfront is the key to success.
Why backup these systems with a CMS? Glaser points out four good reasons:
- Have a backup in case of disaster recovery
- Monitor any unauthorized change
- Monitor cyberattacks (e.g., via a honeypot)
- Scan the network for unknown switches/routers
But be sure these switches and routers also have battery backup if they serve critical equipment. CHL Systems’ Clemens explains, “If a [control] system has backup power from a UPS, but the switch is not being powered, it is impossible for the [system’s] processor to communicate with anything. It would be as if you pulled the plug on the entire system because it would not be able to communicate with any devices, and possibly even the supervisory control if that is run through the switch.”
When working with firewalls and switches, WAGO’s Norz makes three recommendations for applying them. First, separate the plant floor into firewalled zones. Second, use PLCs with built-in firewalls to reduce costs while helping increase network security. Finally, in brownfield applications, use managed switches for network security features.
Some final thoughts
As we’ve seen, some PLCs and PACs are capable of backing themselves up, and most automation suppliers support backups of their controllers—whether PLC, motor drives, robotics or other automation controllers. For facilities using mostly a sole source automation provider, backups are handled by the supplier. Where a heterogeneous mix of equipment is involved, a CMS may be a good solution. In this case, if you’re not sure which path to take, your local SI or engineering house can certainly help out.
“Your automation system is critical to your operation, so give it the attention it deserves,” says Pflantz. “Protect it, back it up and make sure you can recover from potential disasters. Automation is a design discipline, not a trade.”
The key thing to keep in mind is to be proactive about backups, says Clemens. “These systems and plans cannot be implemented after a major event occurs and be expected to save data from that event. At that point, it just becomes an exercise in recovering as much as possible and getting back on line quickly.”
For more information:
Mitsubishi Electric Automation,
Omron Automation Americas,
Paragon Software Group,
Solid State Disks Ltd,