leveldb初探

2012年2月26日 没有评论

近来各种nosql像雨后春笋一样,最近听说HyperDex比较火爆,为避免落伍,今天闲来无事,玩了一把leveldb,选择leveldb把玩的原因三点:

0.Google出品;

1.作者Jeff Dean;

2.玩不起HBase;

于是leveldb成了一个很好的材料;

下面进入正题,或者FAQ,很多人都在讲,抄来用用:
Q:什么是leveldb?
A:LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
也可以说Leveldb是一个Google实现的高效kv数据库,重点是开源。据说v1.2目前能够支持十亿级别数据量。

Q:leveldb都有哪些特点?
A:从官方摘抄如下的Features:
0.Keys and values are arbitrary byte arrays.
1.Data is stored sorted by key.
2.Callers can provide a custom comparison function to override the sort order.
3.The basic operations are Put(key,value), Get(key), Delete(key).
4.Multiple changes can be made in one atomic batch.
5.Users can create a transient snapshot to get a consistent view of data.
6.Forward and backward iteration is supported over the data.
7.Data is automatically compressed using the Snappy compression library.
8.External activity (file system operations etc.) is relayed through a virtual interface so 9.users can customize the operating system interactions.
10.Detailed documentation about how to use the library is included with the source code.

Q:小白如何checkout&编译code?
A:
checkout:
git clone https://code.google.com/p/leveldb/
svn co http://leveldb.googlecode.com/svn/trunk/ leveldb

编译方式很多,这里列出我的环境:

Linux **** 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 21:21:01 UTC 2011 i686 GNU/Linux
MemTotal        : 1026484 kB
model name	: Intel(R) Core(TM)2 Duo CPU     P9400  @ 2.40GHz
stepping	: 10
cpu MHz		: 2298.463
cache size	: 6144 KB
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)

剩下的都懂得开始make^-^

Q:Benchmark数据?
A:条件所限,在虚拟机上进行了测试,系统信息如上,为控制文件数量,特别将sst文件大小设为32M,默认2M:static const int kTargetFileSize = 32 * 1048576; via version_set.cc:23,这里列出两组测试数据:

$./db_bench --num=100000 --write_buffer_size=$((256*1024*1024))
LevelDB:    version 1.2
Date:       Sun Feb 26 16:53:11 2012
CPU:        1 * Intel(R) Core(TM)2 Duo CPU     P9400  @ 2.40GHz
CPUCache:   6144 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    100000
RawSize:    11.1 MB (estimated)
FileSize:   6.3 MB (estimated)
WARNING: Snappy compression is not enabled
------------------------------------------------
fillseq      :      14.581 micros/op;    7.6 MB/s
fillsync     :   91140.200 micros/op;    0.0 MB/s (100 ops)
fillrandom   :      24.181 micros/op;    4.6 MB/s
overwrite    :      32.435 micros/op;    3.4 MB/s
readrandom   :      25.788 micros/op;
readrandom   :      19.767 micros/op;
readseq      :       6.858 micros/op;   16.1 MB/s
readreverse  :      54.625 micros/op;    2.0 MB/s
compact      : 2306046.000 micros/op;
readrandom   :      54.663 micros/op;
readseq      :       2.142 micros/op;   51.6 MB/s
readreverse  :       8.731 micros/op;   12.7 MB/s
fill100K     :    5607.600 micros/op;   17.0 MB/s (100 ops)
crc32c       :      21.006 micros/op;  186.0 MB/s (4K per op)
snappycomp   :  204255.000 micros/op; (snappy failure)
snappyuncomp :  175050.000 micros/op; (snappy failure)
acquireload  :       4.271 micros/op; (each op is 1000 loads)

$./db_bench --num=1000000 --write_buffer_size=$((256*1024*1024))
LevelDB:    version 1.2
Date:       Sat Feb 25 17:26:31 2012
CPU:        1 * Intel(R) Core(TM)2 Duo CPU     P9400  @ 2.40GHz
CPUCache:   6144 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
WARNING: Snappy compression is not enabled
------------------------------------------------
fillseq      :      10.199 micros/op;   10.8 MB/s
fillsync     :   87355.685 micros/op;    0.0 MB/s (1000 ops)
fillrandom   :      16.159 micros/op;    6.8 MB/s
overwrite    :      23.154 micros/op;    4.8 MB/s
readrandom   :      32.558 micros/op;
readrandom   :      35.783 micros/op;
readseq      :       4.380 micros/op;   25.3 MB/s
readreverse  :       6.911 micros/op;   16.0 MB/s
compact      : 7270752.000 micros/op;
readrandom   :      28.186 micros/op;
readseq      :       0.837 micros/op;  132.2 MB/s
readreverse  :       2.349 micros/op;   47.1 MB/s
fill100K     :    2924.748 micros/op;   32.6 MB/s (1000 ops)
crc32c       :      10.733 micros/op;  363.9 MB/s (4K per op)
snappycomp   :   34784.000 micros/op; (snappy failure)
snappyuncomp :   46721.000 micros/op; (snappy failure)
acquireload  :       2.106 micros/op; (each op is 1000 loads)

总结:Leveldb轻量级kv本地存储库,读性能一般、随机写不错。接下来消化实现原理。

更多测试结果访问http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html

HBase相关BLOG推荐

2012年1月10日 没有评论

在阅读HBase的过程中看到几篇比较好的BLOG,整理如下:

基础篇
[HBase介绍-淘宝数据平台与产品部-泽远]http://www.tbdata.org/archives/1509
[HBase技术介绍-一淘-莫问]http://www.searchtb.com/2011/01/understanding-hbase.html
[HFile介绍-淘宝数据平台与产品部-玄侯]http://www.tbdata.org/archives/1551 

进阶篇
[HBase存储架构-趋势科技-SPN研发团队]http://www.spnguru.com/tag/hbase-%E5%AD%98%E5%82%A8%E6%9E%B6%E6%9E%84
[HBase中Client如何路由RegionServer-趋势科技-SPN研发团队]  http://www.spnguru.com/2010/07/hbase%E4%B8%AD%E7%9A%84client%E5%A6%82%E4%BD%95%E8%B7%AF%E7%94%B1%E5%88%B0%E6%AD%A3%E7%A1%AE%E7%9A%84regionserver/

HMaster中几类关键数据结构

2011年12月25日 没有评论

1. RegionState:Region状态及说明

(1) OFFLINE:离线

(2) PENDING_OPEN:发出open指令但尚未被执行

(3) OPENING:open指令正在被执行但尚未完成

(4) OPEN:open指令执行完成并更新了其meta信息

(5) PENDING_CLOSE:发出close指令但尚未被执行

(6) CLOSING:close指令正在被执行但尚未完成

(7) CLOSED:close指令执行完成并更新了其meta信息

2. TimeoutMonitor:RegionState超时监控器/线程

作用:Region transition操作的超时监控

默认参数:超时时间180s + 监控周期10s

TimeoutMonitor周期检查Region状态是否超时:

(1) 状态超时,根据不同的超时状态重新进行分配/取消分配操作;

(2) 状态超时未发生,调整监控周期继续检查

3. AssignmentManager:HMaster的Region Assign管理单元,其中几个关键数据结构

(1) servers:TreeMap<HServerInfo, List<HRegionInfo>> 从server到regions assignment的映射;

(2) regions:TreeMap<HRegionInfo,HServerInfo> 从region到server assignment的映射;

(3) regionsInTransition:ConcurrentSkipListMap<String, RegionState> 从Region name(in transition)到其状态的映射;

(4) regionPlans:TreeMap<String, RegionPlan> 从Region name到RegionPlan的映射;

4. ServerManager:负责管理RegionServer,包括Startups,Shutdowns和deaths;

其中的几个关键数据结构

(1) onlineServers:ConcurrentHashMap<String, HServerInfo> 在线region servers列表;

(2) serverConnections:HashMap<String, HRegionInterface> region server name到其rpc connection的映射;

(3) deadservers:HashSet<String> dead region servers列表;

主要的处理逻辑:region server与master之间的交互过程

(1) void regionServerStartup(final HServerInfo serverInfo, long serverCurrentTime) 新region server上线;

(2) HMsg [] regionServerReport(final HServerInfo serverInfo, final HMsg [] msgs, final HRegionInfo[] mostLoadedRegions) region server到master的hb汇报信息;

5. HBaseServer:提供Master的rpc服务

(1) 启动一个Listener线程监听client的请求;

(2) 启动一个Responder线程将响应队列里的数据写给各个client的connection通道;

(3) 启动N(n默认为10)个Handler线程处理请求队列,并将结果写到响应队列;

(4) 启动M(m默认为0)个Handler线程处理priority;

Book list

2011年12月8日 没有评论

《暗时间》(在读)

《完美商店》

《百年孤独》

《启示录》

《失控》(在读)

《美国种族简史》(在读)

《青春》(已读)

—————————————–

HBase: The Definitive Guide》(在读)

《深入理解java虚拟机》 (已读)

Mahout, Zookeeper

《Java编程思想》

《项目管理之美》