Lucene最初由鼎鼎大名Doug Cutting开发,2000年开源,现在也是开源全文检索方案的不二选择,
而且Lucene虽已发展10余年,但仍保持着一个活跃的开发度,以适应着日益增长的数据分析需求,最新的6.0版本里引入block k-d trees,全面提升了数字类型和地理位置信息的检索性能,
#File Structure
|--- luceneInsert 数据插入模块
|--- luceneInsertV2.0 数据插入模块
|--- luceneQuery 性能检测模块
v1.0 性能分析:CentOS7.4、4核8G 3万条/秒
#lucene detail 索引文件格式
根据 Summary of File Extensions 的说明,目前Lucene 6.0中存在的索引格式如下
|Name| Extension |Brief Description
Segments File segments_N Stores information about a commit point
Lock File write.lock The Write lock prevents multiple IndexWriters from writing to the same file
Segment Info .si Stores metadata about a segment
Compound File .cfs, .cfe An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles
Fields .fnm Stores information about the fields
Field Index .fdx Contains pointers to field data
Field Data .fdt The stored fields for documents
Term Dictionary .tim The term dictionary, stores term info
Term Index .tip The index into the Term Dictionary
Frequencies .doc Contains the list of docs which contain each term along with frequency
Positions .pos Stores position information about where a term occurs in the index
Payloads .pay Stores additional per-position metadata information such as character offsets and user payloads
Norms .nvd, .nvm Encodes length and boost factors for docs and fields
Per-Document Values .dvd, .dvm Encodes additional scoring factors or other per-document information
Term Vector Index .tvx Stores offset into the document data file
Term Vector Documents .tvd Contains information about each document that has term vectors
Term Vector Fields .tvf The field level info about term vectors
Live Documents .liv Info about what files are live
Point values .dii, .dim Holds indexed points, if any