« Essential services: 3000-MPE/iX computing | Main | Tips on Using FTP on MPE/iX Systems »

April 06, 2020

SSD devices head for certain failures

Western Digital SanDisk
A solid-state storage device is not usually a component of HP 3000 configurations. However, with the onset of virtualizing MPE servers, those drives that do not move, but still store? They are heading for absolute failures. HP is warning customers.

The problem is surfacing in HP storage units. It's not limited to HP-brand gear, though. SanDisk devices cause these failures. One fix lies in HP Enterprise firmware updates.

HP Enterprise disk drives face a failure date of October 2020, unless administrators apply a crucial firmware patch. Notices from HP Enterprise warn the owners of some disks about failures not earlier than October. Other Solid State Drive (SSD) disks are already in danger of dying.

Some SanDisk SSD drives have already rolled past a failure date of last fall, for those that have operated constantly since late 2015. The failure of the drives is being called a data death bug.

For some, HPD7 firmware is a critical fix. HPE says that Western Digital told the vendor about failures in certain Serial Attached Storage (SAS) models inside HPE server and storage products. Some SAS SSD drives can use external connections to HPE's VMS Itanium servers.

The drives can be inside HPE's ProLiant, Synergy, and Apollo 4200 servers. Some of these units could serve as hardware hosts for virtualized 3000 systems. The SSD problem also exists in HP's Synergy Storage Modules, D3000 Storage Enclosure, and StoreEasy 1000 Storage. If the disks have a firmware version prior to HPD7, they will fail at 40,000 hours of operation (i.e., 4 years, 206 days, 16 hours). Another, even larger group of HP devices will fail at 3 years, 270 days 8 hours after power-on, a total of 32,768 hours.

The numbers mean that the failures might have started as early as September of last year. The first affected drives shipped in late 2015. HP estimates the earliest date of failure based on when it first shipped the drives. Another batch of HP drives shipped in 2017. They are also at risk. These are the drives looking at an October 2020 failure date without a firmware update.

Beyond HP gear

The devices are Western Digital's SanDisk units, according to a report on the website The Register. Dell has a similar support warning for its enterprise customers. Dell lists the SanDisk model numbers:

LT0200MO
LT0400MO
LT0800MO
LT1600MO
LT0200WM
LT0400WM
LT0800WM
LT0800RO
LT1600RO

RAID failures will occur if there is no fault tolerance, such as RAID 0. Drives will fail even in a fault tolerance RAID mode "if more SSDs fail than are supported by the fault tolerance of the RAID mode on the logical drive. Example: RAID 5 logical drive with two failed SSDs."

Adding to the complexity of the SSD failures, firmware to fix the issue has two different numbers. HPD7 repairs the 40,000-hour drives. HPD8 repairs a bigger list of devices. Leaving the HPD7 firmware inside drives among the larger list of disks — which have a death date that may arrive very soon this year — will ensure the failures.

Full details from HP's bulletins for the 40,000-hour and for the 32,768-hour drives are at the HPE website. There are also instructions on how to use HP's Smart Storage Administrator to discover uptime, plus a script for VMware, Unix, and Linux. These scripts "perform an SSD drive firmware check for the 32,768 power-on-hours failure issue on certain HPE SAS SSD drives."

A list of 20 HPE disk units falls under the 32,768-hour deadline. Four other HPE devices are in the separate 40,000-hour support bulletin.

11:51 AM in Homesteading, Migration, News Outta HP, Newsmakers | Permalink

Bookmark and Share

Use our search engine to find 20 years
of HP 3000 news and articles

Comments

Comments

These days people like to point at Windows 95 as the granddaddy of all timer-tick-counter-overflow programming errors but I seem to remember a story from the SMUGBOOK about early MPE having this sort of problem, first noticed when someone got a 3000 to stay up for 24 days.

So it is kind of hilarious that HP(E) is having this sort of problem again.

And, just because it's solid-state disk doesn't mean it doesn't need backing up.

Posted by: Frank McConnell | Apr 6, 2020 5:09:50 PM

The comments to this entry are closed.