Hadoop之使用MR编程实现join的两种方法

文章目录3.端实现Join
前言:通过MR不同方式的Join编程是为了更加熟悉join的实现过程以及不同方式的优缺点 , 切记 , 生产中要杜绝写MR , 本文只供学习参考1.需求
【Hadoop之使用MR编程实现join的两种方法】有两张表 , 分表是产品信息数据以及用户页面点击日志数据如下:
#产品信息数据:product_info.txt#c1=产品ID(id),c2=产品名称(name),c3=价格(privce),c4=生产国家(country)p0001,华为,8000,中国p0002,小米,3000,中国p0003,苹果,1500,美国p0004,三星,10000,韩国#用户页面点击日志数据:page_click_log.txt#c1=用户ID(id),c2=产品id(prod_id),c3=点击时间(click_time),c4=动作发生地区(area)u0001,p0001,20190301040123,华中u0002,p0002,20190302040124,华北u0003,p0003,20190303040124,华南u0004,p0004,20190304040124,华南
由于点击日志的数据量过去庞大 , 数据是存在HDFS上 , 故需要使用MR来实现如下的逻辑:
select b.id,b.name,b.privce,b.country,a.id,a.click_time,a.area from page_click_log a join product_info b on a.prod_id=b.id
2.Map端实现Join 2.1思路分析
可以将小表数据分发到所有的map节点 , 然后可以与在本所读到的大表数据进行join并输出最终结果
优缺点:大大提高了jion的并发 , 速度快
2.2编程实现
数据封装类Info.java
package com.wsk.bigdata.pojo;import org.apache.hadoop.io.Writable;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import java.util.ArrayList;public class Info implements Writable {/*** 产品唯一标识id*/private String pId;/*** 产品名称*/private String pName;/*** 产品价格*/private float price;/*** 产品生产地区*/private String produceArea;/*** 用户Id*/private String uId;/*** 用户点击时间戳:yyyyMMddHHmmss*/private String dateStr;/*** 用户点击发生地区*/private String clickArea;/*** flag=0 , 表示封装用户点击日志数据* flag=1 , 表示封装产品信息*/private String flag;public String getpId() {return pId;}public void setpId(String pId) {this.pId = pId;}public String getpName() {return pName;}public void setpName(String pName) {this.pName = pName;}public float getPrice() {return price;}public void setPrice(float price) {this.price = price;}public String getProduceArea() {return produceArea;}public void setProduceArea(String produceArea) {this.produceArea = produceArea;}public String getuId() {return uId;}public void setuId(String uId) {this.uId = uId;}public String getDateStr() {return dateStr;}public void setDateStr(String dateStr) {this.dateStr = dateStr;}public String getClickArea() {return clickArea;}public void setClickArea(String clickArea) {this.clickArea = clickArea;}public String getFlag() {return flag;}public void setFlag(String flag) {this.flag = flag;}public Info(String pId, String pName, float price, String produceArea, String uId, String dateStr, String clickArea, String flag) {this.pId = pId;this.pName = pName;this.price = price;this.produceArea = produceArea;this.uId = uId;this.dateStr = dateStr;this.clickArea = clickArea;this.flag = flag;}public Info() {}@Overridepublic String toString() {String[] fileds = {this.pId,};return "pid=" + this.pId + ",pName=" + this.pName + ",price=" + this.price+ ",produceArea=" + this.produceArea+ ",uId=" + this.uId + ",clickDate=" + this.dateStr + ",clickArea=" + this.clickArea;}@Overridepublic void write(DataOutput out) throws IOException {out.writeUTF(this.pId);out.writeUTF(this.pName);out.writeFloat(this.price);out.writeUTF(this.produceArea);out.writeUTF(this.uId);out.writeUTF(this.dateStr);out.writeUTF(this.clickArea);out.writeUTF(this.flag);}@Overridepublic void readFields(DataInput in) throws IOException {this.pId = in.readUTF();this.pName = in.readUTF();this.price = in.readFloat();this.produceArea = in.readUTF();this.uId = in.readUTF();this.dateStr = in.readUTF();this.clickArea = in.readUTF();this.flag= in.readUTF();}}