Fault description
1. A node of the virtual machine keeps reporting the error: "Lost access to volume XX due to connectivity problems, restoration attempts are in progress",
and at the same time prompts: "Successful restoration of access to volume XX after connectivity problems";
2. the V7000 links mapped to this node's volume change from 4 to 3. the V7000 volume degrades the alarm.
Troubleshooting
1, the initial inspection to solve the problem as soon as possible, put all the energy on the failure point,
the failure point failed to unify the series of ideas resulting in confusion;
2, because this case involves the host - optical cross - storage layers, so the first should sort out the system involved in the relevant equipment,
overall consideration of the problem;
3, as the saying goes, "sharpening the knife is not a mistake", a clear system topology map is very necessary to solve the problem,
which can help us find the point of the problem;
4, the system topology map can visualize the association between the equipment, we can compare the topology changes before and after the failure
in order to troubleshoot the link failure.
Failure Analysis
1. list the relevant devices within the faulty SAN network:
2. Capture the host-optical cross-storage ports to form a SAN network combing table:
3. Form a SAN topology map according to the combing table:
4. Analyze according to the topology diagram:
a, the host 4 optical ports are connected to a fiber optic switch;
b, fiber optic switch port 08 corresponding link 10:00:00:90:fa:a8:ee:08 → 50:05:07:68:0b:21:bb:f8 10:00:00:90:fa:a8:ee:08 → 50:05:07:68:0b:21:bb:f9
in the V7000 alarms in the F9 lost;
c. Viewed through the topology diagram, if the physical link of port 08 is interrupted, then the links from optical intersection 1 to storage 2 will be lost;
the links from optical intersection1 to storage 2 are not lost, indicating that the physical link is not interrupted.
But the storage side of the F9 link loss and that there is indeed a problem;
d, further comparison to view other nodes host 50:05:07:68:0b:21:bb:f8 50:05:07:68:0b:21:bb:f9 storage end of the two links are normal, then the point of failure
in the host 10:00:00:90:fa:a8:ee:08on the physical connection is normal, the link is lost;
e. The physical connection is normal and the link is lost, at this time, it is suspected that there is a problem with the fiber optic cable from the host to the fiber
optic switch, or there is a problem with the host's port 08.
Troubleshooting
1, the fiber optic switch port 08 collection value is too high, optical signal attenuation;
2、Replace the corresponding fiber optic cable of the port, and the fault is solved.
Summary of experience
1, the failure is the fiber optic cable aging or extrusion caused by optical signal attenuation and thus cause transmission problems;
2, the problem may occur when more than one point alarm, should be considered from the overall system environment,
and gradually stripped of the relationship between the fault points, and ultimately find the real cause of the problem point.
For more information, please visit Antute's official website:wm0t.aredsa.com