[VPLEX]Metro winner掉线时手动从对端恢复IO
注:非专业人士勿自行操作,请咨询DellEMC客服服务团队。
Metro winner掉线时手动从对端恢复IO
说明:
当VPLEX Metro因无第三机房没有配置Witness时,winner集群宕机时且无法及时恢复时,需手动从对端拉起业务,恢复I/O。
场景描述:
假定客户业务在VPLEX Cluster-1,分布式卷的一致性组设置winner为cluster-1,业务服务器由VPLEX cluster-1前端口提供存储。
此时VPLEX cluster-1 及其连接主机和存储宕机,暂时无法恢复业务,需要从VPLEX cluster-2及其后端存储和前端服务器恢复业务。
VPLEX端操作:
登陆VPLEX cluster-2的CLI管理界面,用户名密码为service/P@ssw0rd,切换到vplexcli模式下。
service@vplex02:~> vplexcli
Trying ::1...
Connected to localhost.
Escape character is '^]'.
将目录切换到所需要恢复的一致性组下,并使用ll查看此一致性组状态
VPlexcli:/>cd /cluster/cluster-2/consistency-groups/UNITY500_25T
VPlexcli: /cluster/cluster-2/consistency-groups/UNITY500_25T>ll
Attributes:
Name Value
-------------------- ---------------------------------------------------------
active-clusters []
cache-mode synchronous
detach-rule winner cluster-1 after 5s
operational-status [(cluster-1, { summary:: degraded,details:: [member-volumes-unreachable] }), (cluster-2, { summary:: suspended, details:: [cluster-departure] })]
passive-clusters []
read-only false
recoverpoint-enabled false
storage-at-clusters [cluster-1, cluster-2]
virtual-volumes [UNITY500_25T]
visibility [cluster-1, cluster-2]
Contexts:
Name Description
------------ -----------
advanced -
recoverpoint -
查看operation-status,显示cluster-1的卷已经不可达,此时需要从cluster-2恢复I/O,则需要手动设置cluster-2为临时winner。
VPlexcli:/clusters/cluster-1/consistency-groups/UNITY500_25T> choose-winner cluster-2
WARNING: This can cause data divergence and lead to data loss. Ensure the other cluster is not serving I/O for this consistency group before continuing. Continue? (Yes/No) yes
此时VPLEX cluster-2已恢复I/O,cluster-2前端服务器可访问存储,此时可进行主机操作恢复业务。
主机端操作:
1)当主机环境为Metro两端主机Oracle RAC时,只需VPLEX恢复I/O后拉起服务即可。
2)当主机环境为Metro两端为VMware虚拟化开启HA且两端主机在同一集群内时,只需VPLEX恢复I/O后重启虚拟机即可。
3)当Metro两端为独立服务器时,VPLEX恢复I/O后,需重新识别存储后再拉起服务。
4)当Metro两端为VMware两个集群,VPLEX恢复I/O后,需要从原有集群中关闭虚拟机并从清单中删除,对端VMware集群重新识别存储后,从储存内找到虚拟机xms文件,右键添加到清单中,然后即可打开虚拟机。
恢复Cluster-1后解决冲突分离
当VPLEX cluster-1从灾难中恢复后,VPLEX两个集群会恢复连接,此时查看一致性组状态如下:
Plexcli:/clusters/cluster-1/consistency-groups/UNITY500_500G> ll
Attributes:
Name Value
-------------------- ---------------------------------------------------------
active-clusters []
cache-mode synchronous
detach-rule winner cluster-1 after 5s
operational-status [(cluster-1, {summary:: degraded, details::[requires-resolve-conflicting-detach] }), (cluster-2, { summary:: degraded, details::[requires-resolve-conflicting-detach] })]
passive-clusters []
read-only false
recoverpoint-enabled false
storage-at-clusters [cluster-1, cluster-2]
virtual-volumes [UNITY500_25T]
visibility [cluster-1, cluster-2]
Contexts:
Name Description
------------ -----------
advanced -
recoverpoint -
查看operation-status,显示cluster-1和cluster-2,显示需要解决冲突分离
使用命令“resolve-conflicting-detach”解决冲突分离
VPlexcli:/clusters/cluster-1/consistency-groups/UNITY500_25T>resolve-conflicting-detach -c cluster-2
This will cause I/O to suspend at clusters in conflict with cluster cluster-2, allowing you to stop applications at those clusters. Continue? (Yes/No) yes
注:此命令将作用在两个集群上重新同步数据映像,-c后的参数为要保留的源的集群,此时cluster-2已经有业务运行,所以要保留cluster-2的数据,所以参数为cluster-2
再次查看一致性组状态,VPLEX 两端集群状态恢复。
VPlexcli:/clusters/cluster-1/consistency-groups/UNITY500_500G> ll
Attributes:
Name Value
-------------------- ---------------------------------------------------------
active-clusters []
cache-mode synchronous
detach-rule winner cluster-2 after 5s
operational-status [(cluster-1,{ summary:: ok, details:: [] }), (cluster-2,{summary:: ok, details:: [] })]
passive-clusters []
read-only false
recoverpoint-enabled false
storage-at-clusters [cluster-1, cluster-2]
virtual-volumes [UNITY500_500G]
visibility [cluster-1, cluster-2]
Contexts:
Name Description
------------ -----------
advanced -
recoverpoint -
此时即可通过主机层在线迁移数据回需要的一端。