Skip to content

ForceInjection/Big-Data-Theory-and-Practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big-Data-Theory-and-Practice

《大数据理论与实践》课程学习材料仓库

1. 简介

本仓库是《大数据理论与实践》课程的学习材料集合,包含课程讲义、经典论文、参考书籍、环境配置指南和实践练习等内容。旨在为学习者提供系统性的大数据理论知识和实践技能培养。


  • 数据密集型应用设计》(Designing Data-Intensive Applications)
  • Hadoop 权威指南》第 4 版
  • 大数据处理框架 Apache Spark 设计与实现

3 课程

以下是课程章节内容:


4. 参考论文

大数据领域的经典论文集合。

年份 技术/系统 论文标题 技术领域
2003 GFS The Google File System 分布式文件系统
2004 MapReduce MapReduce: Simplified Data Processing on Large Clusters 分布式计算框架
2006 Bigtable Bigtable: A Distributed Storage System for Structured Data 分布式数据库
2006 Chubby The Chubby lock service for loosely-coupled distributed systems 分布式锁服务
2007 Thrift Thrift: Scalable cross-language services implementation RPC 框架
2008 Hive Hive: A warehousing solution over a map-reduce framework 数据仓库
2010 Dremel Dremel: Interactive analysis of web-scale datasets 交互式查询引擎
2010 Spark Spark: Cluster computing with working sets 内存计算框架
2010 S4 S4: Distributed stream computing platform 流计算平台
2011 Megastore Megastore: Providing scalable, highly available storage for interactive services 分布式存储
2011 Kafka Kafka: A distributed messaging system for log processing 消息队列系统
2012 Spanner Spanner: Google's globally distributed database 全球分布式数据库
2014 Storm Storm@Twitter 实时流处理
2014 Raft In search of an understandable consensus algorithm 分布式一致性算法
2015 Dataflow The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing 流处理模型
2018 PolarFS PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database 云原生文件系统
2020 Delta Lake Delta lake: high-performance ACID table storage over cloud object stores 数据湖存储
2021 Lakehouse Lakehouse: A New Generation of Open Platforms for AI and Data Analytics 湖仓一体架构
2023 HTAP 综述 HTAP 数据库关键技术综述 混合事务分析处理
2024 云原生数据库综述 云原生数据库综述 云原生数据库
2024 Iceberg Apache Iceberg: The Definitive Guide 表格式标准

5 环境搭建

Hadoop 环境配置指南和部署脚本

5.1 单节点集群部署(开发测试环境)

5.2 多节点集群部署(作业环境)


About

Big Data Theory and Practice

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •