|
java自学网(www.javazx.com)-java论坛,java电子书推荐:《大数据猩球:海量数据处理实践指南》) p. g* l8 E8 \+ k
java电子书推荐理由:采用黑猩猩和大象的隐喻,基于棒球统计数据集,使用Apache Hadoop和Pig等工具展示了如何处理大规模数据。此外,通过处理真实数据、解决现实问题,作者还以实例的形式总结了一些实践分析模式,为有创造力的分析人员提供了最强大、最有价值的方法。本书特别适合那些需要大数据工具箱来解决实际问题的人们。 {6 x: U4 m, i* C: ~) d3 g: B
) K' F; ]2 b- D: l& r( M) f$ ?+ Q
& @; Y$ X6 j2 q
作者:(美)Philip Kromer(菲利普·克罗默),Russell Jurney(拉塞尔·贾米)
6 q% {) Y0 R2 T, [& L7 D出版社:电子工业出版社8 j* e6 W8 b1 M, D" l, ]
出版时间:2016年8月3 o# u) j' r* [8 c- D4 E
T# S$ D& D& D9 P4 J
$ n5 Q* s+ T2 g, d, \6 f" ?+ @
6 |3 f9 a( D: m5 {, x) t/ G& Ejava电子书目录:
) |/ g$ q6 G; m/ a5 c5 y
& w. x$ C2 o Q* i8 A3 X' F: m. K前言 ..................................................................................................XI
3 B0 q) O) \$ q- `第一部分 入门 :理论和工具+ C; y5 r' r; A V
第 1 章 Hadoop 基础 ........................................................................3
- _4 u0 F- D1 L4 l1 i9 k6 ?: D黑猩猩和大象创业 .................................................................................................................4
# H u* m$ R% u% D' nMap-Only 作业 :逐个处理记录 ...........................................................................................5; S6 u/ H3 K! B6 E3 A# y
Pig Latin Map-Only 作业........................................................................................................61 z$ K. v8 V" k) e! c
创建 Docker Hadoop 集群 ......................................................................................................8
" o; `) B/ O+ P5 ]0 p$ ~运行作业 .......................................................................................................................12
0 b7 c( x6 S9 i小结 .......................................................................................................................................154 S$ ]! h8 v( [( S
第 2 章 MapReduce........................................................................17
9 q( h$ z& Q& t/ t+ f黑猩猩和大象拯救圣诞节 ...................................................................................................17# L" w2 @2 V. M. ~( K" z: l
玩具岛上的麻烦 ...........................................................................................................17+ g. I/ l) n; Y2 J% t0 N
黑猩猩把信件变成带标签的玩具表 ...........................................................................19
6 F& m6 U+ q% H. t! h! ~小象将玩具表送到适当的工作台 .......................................................................................21% ^( g# r0 C- M$ B5 l8 P
示例 :驯鹿游戏 ...................................................................................................................23
; @8 Q: |$ B& _3 [) \) m5 w1 b* cUFO 数据 ......................................................................................................................24
8 S% _2 n: T- {# }: t# C根据报道延迟对 UFO 目击分组 .................................................................................24
& R6 {/ Q/ W; cMapper ..........................................................................................................................24
" q+ T0 Z/ n6 }/ g8 EReducer .........................................................................................................................267 \+ Q, H0 b8 E# C
数据可视化 ...................................................................................................................290 `& S9 c8 w2 z
驯鹿小结 .......................................................................................................................306 z \) F/ v" d. _
Hadoop 与传统数据库 .........................................................................................................30
% {0 K; J/ x7 M( j eMapReduce 俳句 ...................................................................................................................311 \0 W) {0 }: o6 |7 w
Map 阶段简述 ..............................................................................................................328 B9 a0 T+ m/ W/ l$ z ~, w
Group-Sort 阶段简述 ...................................................................................................320 e) R3 Q3 W1 m
Reduce 阶段简述 ..........................................................................................................32( H' ?1 q2 Q, _& J, D6 U
小结 .......................................................................................................................................33
1 S6 ^% a$ m0 h2 C1 {" ]* d第 3 章 棒球数据集速览 ..................................................................35- r+ b$ H* b% r9 {; Z
数据 .......................................................................................................................................35
2 W- z$ D/ P4 L, F8 e, m缩略词和术语 .......................................................................................................................369 o3 [/ }2 b; r/ ]9 O* E
规则和目标 ...........................................................................................................................37
8 C2 Y/ x- x$ p评价指标 ...............................................................................................................................37
6 `7 m) }% J& r& Q小结 .......................................................................................................................................38
4 Y$ W. O* p3 H3 f0 C. C5 u6 k4 ^, f第 4 章 Pig 入门 ..............................................................................39. K" C h. C) t+ l: l
Pig 帮助 Hadoop 处理数据表,而不是记录 ......................................................................39, e0 v0 F. [5 G: a
维基百科访问数统计 ...................................................................................................41
5 P% b" Z! n6 H基本数据操作 .......................................................................................................................43: `& R+ x+ m! l9 R# Y
控制操作 .......................................................................................................................44
, U3 e z1 `6 ~% ?3 B管道操作 .......................................................................................................................44
! m/ J5 b/ w4 G# P2 U/ @ i结构化操作 ...................................................................................................................44
/ c; L0 p' I- c5 P; QLOAD 定位并描述你的数据 ...............................................................................................46. \3 {) C# c6 h# ~
简单类型 .......................................................................................................................46
& l4 M; \. d6 N) X5 @* X复杂类型 1,元组 :带类型字段的固长序列 ............................................................47
1 f$ l8 o8 Q% t" \* }, P1 H复杂类型 2,袋 :元组的无限集合 ............................................................................47+ G& }& q1 h T! }* a Z1 |3 w
定义变换后的记录模式 ...............................................................................................484 W2 T7 a9 `! L4 N
STORE 将数据写入磁盘 .....................................................................................................49: S/ n1 T! q M% P' m+ H6 z _
辅助命令 ...............................................................................................................................50! O$ K1 l. q$ r$ T$ x6 {
DESCRIBE ...................................................................................................................509 \$ H: F3 A1 P" s# D- [( e9 ]
DUMP ...........................................................................................................................505 a0 k+ l3 R! ~" E: Q& L' C
SAMPLE .......................................................................................................................50$ |' |/ v# f. y: p- T* B( i1 I# ^
ILLUSTRATE ...............................................................................................................51
8 f: B& b- W u# R' @/ WEXPLAIN......................................................................................................................51
$ d8 Y6 a. w. S2 U3 r1 n; bPig 函数 .................................................................................................................................51
$ G% D2 x8 U# `0 WPiggybank ..............................................................................................................................53
2 n# b. E: j* n) M' J* `Apache DataFu ......................................................................................................................56
* n' H' j- e8 |9 W& ]$ L小结 .......................................................................................................................................59
5 A% c; `7 U+ t! `3 ] j: Q6 Y" E% e第二部分 战术 :分析模式: [/ x2 T! t# r- K# O2 G
第 5 章 Map-Only 操作 ...................................................................63
' h/ \* _ B/ J; S$ A+ n5 M6 C" W模式用法 .......................................................................................................................63* c% a/ `1 b- E; Z' N+ p
清除数据 ...............................................................................................................................64( L: k# A5 v3 ] K9 C- ]% {) k
选择满足条件的记录 :FILTER 等 .....................................................................................65
: w. [4 ^& W" A8 m2 F" s# S选择满足多个条件的记录 ...........................................................................................66
7 {- ~7 O- G0 J) r: r选择或丢弃空值记录 ...................................................................................................663 I# b9 E2 T& V! A" S ]6 e
选择匹配正则表达式的记录(MATCHES) ..............................................................67
: _1 q( I- @! y" L: u$ ?0 u
a' t% B4 s8 ~# G r* Y5 x! H- t* u
# p* Z% X: _, F3 L1 m, f! h
百度网盘下载地址链接(百度云)检索下载地址:. V F: D# K7 `+ v# X, c
. r) [% _$ j4 y2 n& y4 F+ B
9 u7 k# W: g) v' M. w; y( q! a |
|