Hbase数据迁移到MySQL
-
Hbase环境
# 发票通Hase环境(用户:hadoop)
主机名
|
IP
|
Hadoop进程
|
sz-dzfp-data-d2mng01
|
10.10.24.101
|
NameNode(hadoop), DFSZKFailoverController(hadoop)
|
sz-dzfp-data-d2mng02
|
10.10.24.102
|
NameNode(hadoop), DFSZKFailoverController(hadoop)
|
sz-dzfp-data-d2mng03
|
10.10.24.103
|
QuorumPeerMain(zookeeper), JournalNode(hadoop), HMaster(hbase)
|
sz-dzfp-data-d2mng04
|
10.10.24.104
|
QuorumPeerMain(zookeeper), JournalNode(hadoop), HMaster(hbase)
|
sz-dzfp-data-d2mng05
|
10.10.24.105
|
QuorumPeerMain(zookeeper), JournalNode(hadoop), HMaster(hbase)
|
sz-dzfp-data-d2store01
|
10.10.24.111
|
DataNode(Hadoop), HRegionServer(hbase)
|
sz-dzfp-data-d2store02
|
10.10.24.112
|
DataNode(Hadoop), HRegionServer(hbase)
|
sz-dzfp-data-d2store03
|
10.10.24.113
|
DataNode(Hadoop), HRegionServer(hbase)
|
sz-dzfp-data-d2store04
|
10.10.24.114
|
DataNode(Hadoop), HRegionServer(hbase)
|
…….
|
…….
|
…………
|
sz-dzfp-data-d2store20
|
10.10.24.130
|
DataNode(Hadoop), HRegionServer(hbase)
|
# 原沙河国信Hbase环境
主机名
|
IP
|
Hadoop进程
|
master1
|
10.10.45.14
|
NameNode,QuorumPeerMain,DFSZKFailoverController, HMaster,master
|
master 2
|
10.10.45.15
|
NameNode,QuorumPeerMain,DFSZKFailoverController, HMaster,master
|
master 3
|
10.10.45.16
|
QuorumPeerMain,DFSZKFailoverController, HMaster,master
|
hadoop1
|
10.10.45.17
|
DataNode, HRegionServer, JournalNode,work
|
hadoop2
|
10.10.45.18
|
DataNode, HRegionServer, JournalNode,work
|
hadoop3
|
10.10.45.19
|
DataNode, HRegionServer,work
|
# 现国信Hbase环境(用户:hadoop)
主机名
|
IP
|
Hadoop进程
|
gx-dzfp-data-mng01
|
10.10.73.1
|
NameNode,QuorumPeerMain,DFSZKFailoverController, HMaster,master
|
gx-dzfp-data-mng02
|
10.10.73.2
|
NameNode,QuorumPeerMain,DFSZKFailoverController, HMaster,master
|
gx-dzfp-data-mng03
|
10.10.73.3
|
QuorumPeerMain,DFSZKFailoverController, HMaster,master
|
gx-dzfp-data-store01
|
10.10.73.4
|
DataNode, HRegionServer, JournalNode,work
|
gx-dzfp-data-store02
|
10.10.73.5
|
DataNode, HRegionServer, JournalNode,work
|
gx-dzfp-data-store03
|
10.10.73.6
|
DataNode, HRegionServer,work
|
gx-dzfp-data-store04
|
10.10.73.7
|
DataNode, HRegionServer,work
|
gx-dzfp-data-store05
|
10.10.73.8
|
DataNode, HRegionServer,work
|
gx-dzfp-data-store06
|
10.10.73.9
|
DataNode, HRegionServer,work
|
gx-dzfp-data-store07
|
10.10.73.10
|
DataNode, HRegionServer,work
|
gx-dzfp-data-store08
|
10.10.73.11
|
DataNode, HRegionServer,work
|
-
Hbase数据迁移步骤
1.需要迁移数据的表
以下表格里面的发票通hbase表数据需要迁移至国信:
序号
|
表名
|
表大小(截止至2018.7.31日)
|
表描述
|
EINVOICE_INFO
|
20.8TB
|
发票PDF表
|
|
INVOICE_XML
|
394.5 G
|
发票结构化数据表
|
|
KPF_INDEX
|
259.3 G
|
开票方索引表
|
|
SPF_INDEX
|
9.5 G
|
收票方索引表
|
|
EMAIL_INDEX
|
8.9 G
|
邮箱号索引表
|
|
FPT_INDEX
|
4.6 G
|
帐号索引表
|
|
MOBILE_INDEX
|
50.6 G
|
手机号索引表
|
2.迁移整体步骤
# 迁移hbase数据库整体步骤
①迁移发票通上全量数据至国信生产环境
②配置发票通hbase增量同步至国信hbase集群
③同步发票通迁移过程中产生的增量hbase数据
④国信上目前hbase集群所有表数据导入新hbase集群中
-
迁移发票通全量数据至国信生产环境(hbase snapshot)
迁移表的步骤按照以下顺序执行:
①EINVOICE_INFO
②INVOICE_XML
③KPF_INDEX
④MOBILE_INDEX
⑤SPF_INDEX
⑥EMAIL_INDEX
⑦FPT_INDEX
单张表的全量迁移采用hbase快照的方式导出到另外一个集群,再到另外一个集群恢复表,具体的迁移步骤见hbase snapshot.txt
具体操作步骤如下(hadoop用户操作)
-
发票通Hbase集群所有机器上的/etc/hosts文件中添加国信Hbase集群的机器。
vim /etc/hosts
10.10.73.1 gx-dzfp-data-mng01
10.10.73.2 gx-dzfp-data-mng02
10.10.73.3 gx-dzfp-data-mng03
10.10.73.4 gx-dzfp-data-store01
10.10.73.5 gx-dzfp-data-store02
10.10.73.6 gx-dzfp-data-store03
10.10.73.7 gx-dzfp-data-store04
10.10.73.8 gx-dzfp-data-store05
10.10.73.9 gx-dzfp-data-store06
10.10.73.10 gx-dzfp-data-store07
10.10.73.11 gx-dzfp-data-store08
-
发票通Hbase集群中hbase-site.xml文件新增如下配置并重启hbase服务:
备注:2.2章节配置同步时的配置可以一起配置了。
# 第一台节点Hbase节点(10.10.24.103)操作
# su - hadoop
# 备份配置文件
# cp -a /opt/server/hbase-1.1.2/conf/hbase-site.xml /tmp/hbase-site.xml.bak
# vim /opt/server/hbase-1.1.2/conf/hbase-site.xml
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
# 拷贝到其他节点(10.10.24.104~105 111~130)
# for i in {104..105};do scp /opt/server/hbase-1.1.2/conf/hbase-site.xml 10.10.24.$i:/opt/server/hbase-1.1.2/conf/hbase-site.xml; done
# for i in {111..130};do scp /opt/server/hbase-1.1.2/conf/hbase-site.xml 10.10.24.$i:/opt/server/hbase-1.1.2/conf/hbase-site.xml; done
# Hbase regionserver节点(10.10.24.111~10.10.24.130)重启regionserver服务
# hbase-daemon.sh stop regionserver
# hbase-daemon.sh start regionserver
# Hbase Master 节点重启Hmater服务
# hbase-daemon.sh stop master
# hbase-daemon.sh start master
-
发票通Hadoop Yarn服务配置
# hadoop nomenode第一个节点上配置
# vim /opt/server/hadoop-2.5.2/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>nn1,nn2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.nn1</name>
<value>sz-dzfp-data-d2mng01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.nn2</name>
<value>sz-dzfp-data-d2mng02</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>nn1</value> #第二个节点配置nn2
</property>
<property>
<name>yarn.resourcemanager.address.nn1</name>
<value>${yarn.resourcemanager.hostname.nn1}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.nn1</name>
<value>${yarn.resourcemanager.hostname.nn1}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.nn1</name>
<value>${yarn.resourcemanager.hostname.nn1}:8089</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.nn1</name>
<value>${yarn.resourcemanager.hostname.nn1}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.nn1</name>
<value>${yarn.resourcemanager.hostname.nn1}:8025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.nn1</name>
<value>${yarn.resourcemanager.hostname.nn1}:8041</value>
</property>
<property>
<name>yarn.resourcemanager.address.nn2</name>
<value>${yarn.resourcemanager.hostname.nn2}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.nn2</name>
<value>${yarn.resourcemanager.hostname.nn2}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.nn2</name>
<value>${yarn.resourcemanager.hostname.nn2}:8089</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.nn2</name>
<value>${yarn.resourcemanager.hostname.nn2}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.nn2</name>
<value>${yarn.resourcemanager.hostname.nn2}:8025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.nn2</name>
<value>${yarn.resourcemanager.hostname.nn2}:8041</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/opt/data/hadoop/yarn</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/opt/var/logs/hadoop</value>
</property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>sz-dzfp-data-d2mng03:2181,sz-dzfp-data-d2mng04:2181,sz-dzfp-data-d2mng05:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>sz-dzfp-data-d2mng03:2181,sz-dzfp-data-d2mng04:2181,sz-dzfp-data-d2mng05:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4096m</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
</configuration>
# vim /opt/server/hadoop-2.5.2/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
# 拷贝到其他节点(10.10.24.102~105 111~130)
# for i in {102..105};do scp /opt/server/hadoop-2.5.2/etc/hadoop/{yarn-site.xml,mapred-site.xml} 10.10.24.$i:/opt/server/hadoop-2.5.2/etc/hadoop/; done
# for i in {111..130};do scp /opt/server/hadoop-2.5.2/etc/hadoop/{yarn-site.xml,mapred-site.xml} 10.10.24.$i:/opt/server/hadoop-2.5.2/etc/hadoop/; done
# master上启动yarn服务
# start-yarn.sh
-
通过Hbase snapshot迁移全量数据
快照管理相关命令:clone_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
快照:
hbase> snapshot 'REP','REPSnapshot'
列出当前所有得快照:
hbase> list_snapshots
删除快照信息:
hbase> delete_snapshot 'REPSnapshot'
基于快照,clone一个新表:
hbase> clone_snapshot 'REPSnapshot', 'REP'
基于快照恢复表:
hbase> disable 'REP'
hbase> restore_snapshot 'REPSnapshot'
hbase> enable 'REP'
# hbase shell
-
EINVOICE_INFO 20.8TB
快照(发票通):
hbase> snapshot 'EINVOICE_INFO','EINVOICE_INFO_SNAPSHOT_20180813'
列出当前所有得快照(发票通):
hbase> list_snapshots
连接active状态的hadoop master节点 (发票通):
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot EINVOICE_INFO_SNAPSHOT_20180813 -copy-to hdfs://gx-dzfp-data-mng01:8020/hbase -mappers 64 -bandwidth 1000 -overwrite
基于快照,clone一个新表(国信):
hbase> clone_snapshot 'EINVOICE_INFO_SNAPSHOT_20180813', 'EINVOICE_INFO'
-
INVOICE_XML 394.5G
快照(发票通):
hbase> snapshot 'INVOICE_XML','INVOICE_XML_SNAPSHOT_20180813'
列出当前所有得快照(发票通):
hbase> list_snapshots
连接active状态的hadoop master节点 (发票通):
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot INVOICE_XML_SNAPSHOT_20180813 -copy-to hdfs://gx-dzfp-data-mng01:8020/hbase -mappers 64 -bandwidth 500 -overwrite
基于快照,clone一个新表(国信):
hbase> clone_snapshot 'INVOICE_XML_SNAPSHOT_20180813', 'INVOICE_XML'
-
KPF_INDEX 259.3G
快照(发票通):
hbase> snapshot 'KPF_INDEX','KPF_INDEX_SNAPSHOT_20180813'
列出当前所有得快照(发票通):
hbase> list_snapshots
连接active状态的hadoop master节点 (发票通):
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot KPF_INDEX_SNAPSHOT_20180813 -copy-to hdfs://gx-dzfp-data-mng01:8020/hbase -mappers 64 -bandwidth 500 -overwrite
基于快照,clone一个新表(国信):
hbase> clone_snapshot 'KPF_INDEX_SNAPSHOT_20180813', 'KPF_INDEX'
-
MOBILE_INDEX 9.5G
快照(发票通):
hbase> snapshot 'MOBILE_INDEX','MOBILE_INDEX_SNAPSHOT_20180813'
列出当前所有得快照(发票通):
hbase> list_snapshots
连接active状态的hadoop master节点 (发票通):
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MOBILE_INDEX_SNAPSHOT_20180813 -copy-to hdfs://gx-dzfp-data-mng01:8020/hbase -mappers 64 -bandwidth 500 -overwrite
基于快照,clone一个新表(国信):
hbase> clone_snapshot 'MOBILE_INDEX_SNAPSHOT_20180813', 'MOBILE_INDEX'
-
SPF_INDEX 8.9G
快照(发票通):
hbase> snapshot 'SPF_INDEX','SPF_INDEX_SNAPSHOT_20180813'
列出当前所有得快照(发票通):
hbase> list_snapshots
连接active状态的hadoop master节点 (发票通):
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot SPF_INDEX_SNAPSHOT_20180813 -copy-to hdfs://gx-dzfp-data-mng01:8020/hbase -mappers 64 -bandwidth 500 -overwrite
基于快照,clone一个新表(国信):
hbase> clone_snapshot 'SPF_INDEX_SNAPSHOT_20180813', 'SPF_INDEX'
-
EMAIL_INDEX 4.6 G
快照(发票通):
hbase> snapshot 'EMAIL_INDEX','EMAIL_INDEX_SNAPSHOT_20180813'
列出当前所有得快照(发票通):
hbase> list_snapshots
连接active状态的hadoop master节点 (发票通):
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot EMAIL_INDEX_SNAPSHOT_20180813 -copy-to hdfs://gx-dzfp-data-mng01:8020/hbase -mappers 64 -bandwidth 500 -overwrite
基于快照,clone一个新表(国信):
hbase> clone_snapshot 'EMAIL_INDEX_SNAPSHOT_20180813', 'EMAIL_INDEX'
-
FPT_INDEX 50.6 G
快照(发票通):
hbase> snapshot 'FPT_INDEX','FPT_INDEX_SNAPSHOT_20180813'
列出当前所有得快照(发票通):
hbase> list_snapshots
连接active状态的hadoop master节点 (发票通):
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FPT_INDEX_SNAPSHOT_20180813 -copy-to hdfs://gx-dzfp-data-mng01:8020/hbase -mappers 64 -bandwidth 500 -overwrite
基于快照,clone一个新表(国信):
hbase> clone_snapshot 'FPT_INDEX_SNAPSHOT_20180813', 'FPT_INDEX'
-
配置发票通hbase增量同步至国信hbase集群(hbase replication)
Hbase增量数据同步采用hbase本身自带的replication功能,同步配置的时候需要把上面表格里面的7个表都配置上去,具体的配置步骤见hbase replication.txt。
①EINVOICE_INFO
②INVOICE_XML
③KPF_INDEX
④MOBILE_INDEX
⑤SPF_INDEX
⑥EMAIL_INDEX
⑦FPT_INDEX
具体操作如下:
-
发票通Hbase集群所有机器上的/etc/hosts文件中添加国信Hbase集群的机器。
vim /etc/hosts
10.10.73.1 gx-dzfp-data-mng01
10.10.73.2 gx-dzfp-data-mng02
10.10.73.3 gx-dzfp-data-mng03
10.10.73.4 gx-dzfp-data-store01
10.10.73.5 gx-dzfp-data-store02
10.10.73.6 gx-dzfp-data-store03
10.10.73.7 gx-dzfp-data-store04
10.10.73.8 gx-dzfp-data-store05
10.10.73.9 gx-dzfp-data-store06
10.10.73.10 gx-dzfp-data-store07
10.10.73.11 gx-dzfp-data-store08
-
发票通Hbase集群中hbase-site.xml文件新增如下配置并重启hbase服务:
# 第一台Hbase节点(10.10.24.103)操作
# su - hadoop
# 备份配置文件
# cp -a /opt/server/hbase-1.1.2/conf/hbase-site.xml /tmp/hbase-site.xml.bak
# vim /opt/server/hbase-1.1.2/conf/hbase-site.xml
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
# 拷贝到其他节点(10.10.24.104~105 111~130)
# for i in {104..105};do scp /opt/server/hbase-1.1.2/conf/hbase-site.xml 10.10.24.$i:/opt/server/hbase-1.1.2/conf/hbase-site.xml; done
# for i in {111..130};do scp /opt/server/hbase-1.1.2/conf/hbase-site.xml 10.10.24.$i:/opt/server/hbase-1.1.2/conf/hbase-site.xml; done
# Hbase regionserver节点(10.10.24.111~10.10.24.130)重启regionserver服务
# hbase-daemon.sh stop regionserver
# hbase-daemon.sh start regionserver
# Hbase Master 节点重启Hmater服务
# hbase-daemon.sh stop master
# hbase-daemon.sh start master
-
通过Hbase Replication进行实时同步(发票通执行)
注:REPLICATION_SCOPE设置为1表示要备份数据,如果表已经存在了可以用alter来修改表的属性
-
EINVOICE_INFO
alter 'EINVOICE_INFO', {NAME=>'DATA',REPLICATION_SCOPE=>1}
add_peer '1',"gx-dzfp-data-mng01,gx-dzfp-data-mng02,gx-dzfp-data-mng03:2181:/hbase","EINVOICE_INFO"
-
INVOICE_XML
alter 'INVOICE_XML', {NAME=>'DATA',REPLICATION_SCOPE=>1}
add_peer '2',"gx-dzfp-data-mng01,gx-dzfp-data-mng02,gx-dzfp-data-mng03:2181:/hbase","INVOICE_XML"
-
KPF_INDEX
alter 'KPF_INDEX', {NAME=>'DATA',REPLICATION_SCOPE=>1}
add_peer '3',"gx-dzfp-data-mng01,gx-dzfp-data-mng02,gx-dzfp-data-mng03:2181:/hbase","KPF_INDEX"
-
MOBILE_INDEX
alter 'MOBILE_INDEX', {NAME=>'DATA',REPLICATION_SCOPE=>1}
add_peer '4',"gx-dzfp-data-mng01,gx-dzfp-data-mng02,gx-dzfp-data-mng03:2181:/hbase","MOBILE_INDEX"
-
SPF_INDEX
alter 'SPF_INDEX', {NAME=>'DATA',REPLICATION_SCOPE=>1}
add_peer '5',"gx-dzfp-data-mng01,gx-dzfp-data-mng02,gx-dzfp-data-mng03:2181:/hbase","SPF_INDEX"
-
EMAIL_INDEX
alter 'EMAIL_INDEX', {NAME=>'DATA',REPLICATION_SCOPE=>1}
add_peer '6',"gx-dzfp-data-mng01,gx-dzfp-data-mng02,gx-dzfp-data-mng03:2181:/hbase","EMAIL_INDEX"
-
FPT_INDEX
alter 'FPT_INDEX', {NAME=>'DATA',REPLICATION_SCOPE=>1}
add_peer '7',"gx-dzfp-data-mng01,gx-dzfp-data-mng02,gx-dzfp-data-mng03:2181:/hbase","FPT_INDEX"
-
同步发票通迁移过程中产生的增量hbase数据(研发未提供)
这一部分的数据会使用程序来进行同步,同步的程序已经开发好并且测试过了,之后直接使用即可。程序会在迁移之前提供给运维。
-
国信上目前hbase集群所有表数据导入新hbase集群中(环境切换时执行)
此部分表的数据同步采用hbase的export/import方式,用export导出国信目前线上集群数据,用import导入数据的新的集群中。导出之前需要先停止国信线上的服务,导完成后把工程对hbase集群的配置改成新的,再启动服务即可。
Export的语法结构如下:
Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]
如:hbase org.apache.hadoop.hbase.mapreduce.Export INVOICE_XML /tmp/table/INVOICE_XML
Import的语法结构如下:
Import [options] <tablename> <inputdir>
如:hbase org.apache.hadoop.hbase.mapreduce.Import INVOICE_XML /tmp/table/INVOICE_XML/
具体操作如下:
-
EINVOICE_INFO
# 原国信Hbase集群执行
hbase org.apache.hadoop.hbase.mapreduce.Export EINVOICE_INFO /tmp/table/EINVOICE_INFO
hadoop fs -get /tmp/table/EINVOICE_INFO /tmp/table/
# 从原国信拷贝到国信平台
scp -r /tmp/table gx-dzfp-data-mng01:/tmp/
# 国信平台Hbase集群执行
hadoop fs -put /tmp/table/EINVOICE_INFO /tmp/table/EINVOICE_INFO
hbase org.apache.hadoop.hbase.mapreduce.Import EINVOICE_INFO /tmp/table/EINVOICE_INFO/
# 原发票集群:
cat hbaseTable
AD_HOMEPAGE
APP_BIZ_TRIP_ITEM
BXDJ_INDEX
BXDJ_UPLOAD_PDF
CLOUD_INVOICE_XML
COPY_CONTENT
COPY_THEME
CZPJ
EINVOICE_INFO
EMAIL_INDEX
ENT_UPLOAD_RES
FFS_BX_INDEX
FM_FL_MEMBER
FPT_DWZ
FPT_INDEX
FPT_USER_HEAD
FP_BX_INDEX
FP_ZFB_INDEX
GROUP_INFO
INVOICE
INVOICE_ACCOUNT
INVOICE_IDX_SY
INVOICE_IDX_XY
INVOICE_PDFIMG
INVOICE_SEARCH_RS
INVOICE_XML
KPF_INDEX
KPF_STAT
LSH_INDEX
MEMBER_CONTRACT_DZD
MEMBER_CONTRACT_QYZT
MEMBER_ENT_APPLY
MEMBER_ENT_CONTRACT
MEMBER_ENT_KPJL_AD
MEMBER_ENT_KPJL_COO_EWM
MEMBER_ENT_KPJL_PL
MEMBER_ENT_ORDER
MEMBER_ENT_SHBG
MEMBER_INFO_ACTIVITY
MEMBER_INFO_GIFT
MEMBER_INFO_PRODUCT
MOBILE_INDEX
PREPOSEFILE
QZFW_FILEBATCH
SCAN_CK_RECORD
SCAN_MKRESULT_RECORD
SCAN_MK_RECORD
SPF_INDEX
SYSTEM_DICT_VALUE
USER_CA
USER_HEAD
VAT_FP_IMAGE
VAT_FP_LIST
WX_AD
WX_CARD_MSG
WX_CELL
WX_MEMBER_ENT_AD
WX_MEMBER_ENT_CELL
WX_MEMBER_ENT_EXT
WX_MENU_DATA
WX_REPLY_KEYWORD
YYPT_VERSION_LOG
# cat copy_hbase.sh
#!/bin/bash
for i in $(cat ./hbaseTable);do
hbase org.apache.hadoop.hbase.mapreduce.Export $i /tmp/table/$i
hadoop fs -get /tmp/table/$i /tmp/table/
done
# 从原国信拷贝到国信平台
scp -r /tmp/table gx-dzfp-data-mng01:/tmp/
# 新国信hbase集群
# cat hbaseTable
# cat copy_hbase.sh
#!/bin/bash
for i in $(cat ./hbaseTable);do
hadoop fs -put /tmp/table/$i /tmp/table/$i
hbase org.apache.hadoop.hbase.mapreduce.Import $i /tmp/table/$i/
done
创建时间:2021-09-15 21:20
넶浏览量:0