Skip to content

Latest commit

 

History

History
541 lines (422 loc) · 23.7 KB

README.md

File metadata and controls

541 lines (422 loc) · 23.7 KB

Elasticsearch ODM

npm version

Like Mongoose but for Elasticsearch. Define models, preform CRUD operations, and build advanced search queries. Most commands and functionality that exist in Mongoose exist in this library. All asynchronous functions use Bluebird Promises instead of callbacks.

This is currently the only ODM/ORM library that exists for Elasticsearch on Node.js. Waterline has a plugin for Elasticsearch but it is incomplete and doesn't exactly harness it's searching power. Loopback has a storage plugin, but it also doesn't focus on important parts of Elasticsearch, such as mappings and efficient queries. This library automatically handles merging and updating Elasticsearch mappings based on your schema definition.

Installation

If you currently have npm elasticsearch installed, you can remove it and access it from client in this library if you still need it.

$ npm install elasticsearch-odm

Features

  • Easy to use API that mimics Mongoose, but cuts out the extras.
  • Models, Schemas and Elasticsearch specific type mapping.
  • Add Elasticsearch specific type options to your Schema, like boost, analyzer or score.
  • Utilizes bulk and scroll features from Elasticsearch when needed.
  • Easy search queries without generating your own DSL.
  • Seamlessly handles updating your Elasticsearch mappings based off your models Schema.

Quick Start

You'll find the API is intuitive if you've used Mongoose or Waterline.

Example (no schema):

var elasticsearch = require('elasticsearch-odm');
var Car = elasticsearch.model('Car');
var car = new Car({
  type: 'Ford', color: 'Black'
});
elasticsearch.connect('my-index').then(function(){
  // be sure to call connect before bootstrapping your app.
  car.save().then(function(document){
    console.log(document);
  });
});

Example (using a schema):

var elasticsearch = require('elasticsearch-odm');
var carSchema = new elasticsearch.Schema({
  type: String,
  color: {type: String, required: true}
});
var Car = elasticsearch.model('Car', carSchema);

API Reference

Core

Core methods can be called directly on the Elasticsearch ODM instance. These include methods to configure, connect, and get information from your Elasticsearch database. Most methods act upon the official Elasticsearch client.

.connect(String/Object options) -> Promise

Returns a promise that is reolved when the connection is complete. Can be passed a single index name, or a full configuration object. The default host is localhost:9200 when no host is provided, or just an index name is used. This method should be called at the start of your application.

If the index name does not exist, it is automatically created for you.

You can also add any of the Elasticsearch specific options, like SSL configs.

Example:

// when bootstrapping your application
var elasticsearch = require('elasticsearch-odm');

elasticsearch.connect({
  host: 'localhost:9200',
  index: 'my-index',
  logging: false, // true by default when NODE_ENV=development
  syncMapping: false // see 'sync mapping' in Schemas documentation
  ssl: {
    ca: fs.readFileSync('./cacert.pem'),
    rejectUnauthorized: true
  }
});
// OR
elasticsearch.connect('my-index'); // default host localhost:9200
new Schema(Object options) -> Schema

Returns a new schema definition to be used for models.

.model(String modelName, Optional/Schema schema) -> Model

Creates and returns a new Model, like calling Mongoose.model(). Takes a type name, in mongodb this is also known as the collection name. This is global function and adds the model to Elasticsearch ODM instance.

.client -> Elasticsearch

The raw instance to the underlying Elasticsearch client. Not really needed, but it's there if you need it, for example to run queries that aren't provided by this library.

.stats()

Returns a promise that is resolved with index stats for the current Elasticsearch connections.

.removeIndex(String index)

Takes an index name, and complete destroys the index. Resolves the promise when it's complete.

.createIndex(String index, Object mappings)

Takes an index name, and a json string or object representing your mapping. Resolves the promise when it's complete.

Document

Like Mongoose, instances of models are considered documents, and are returned from calls like find() & create(). Documents include the following functions to make working with them easier.

.save() -> Document

Saves or updates the document. If it doesn't exist it is created. Like Mongoose, Elasticsearches internal '_id' is copied to 'id' for you. If you'd like to force a custom id, you can set the id property to something before calling save(). Every document gets a createdOn and updatedOn property set with ISO-8601 formatted time.

.remove()

Removes the document and destroys the cuurrent document instance. No value is resolved, and missing documents are ignored.

.update(Object data) -> Document

Partially updates the document. Data passed will be merged with the document, and the updated version will be returned. This also sets the current model instance with the new document.

.set(Object data) -> Document

Completely overwrites the document with the data passed, and returns the new document. This also sets the current model instance with the new document.

Will remove any fields in the document that aren't passed.

.toObject()

Like Mongoose, strips all non-document properties from the instance and returns a raw object.

Model

Model definitions returned from .model() in core include several static functions to help query and manage documents. Most functions are similar to Mongoose, but due to the differences in Elasticsearch, querying includes some extra advanced features.

.count() -> Object

Object returned includes a 'count' property with the number of documents for this Model (also known as _type in Elasticsearch). See Elasticsearch count.

.create(Object data) -> Document

A helper function. Similar to calling new Model(data).save(). Takes an object, and returns the new document.

.update(String id, Object data) -> Document

A helper function. Similar to calling new Model().update(data). Takes an id and a partial object to update the document with.

.remove(String id)

Removes the document by it's id. No value is resolved, and missing documents are ignored.

.removeByIds(Array ids)

Help function, see remove. Takes an array of ids.

.set(String id, Object data) -> Document

Completely overwrites the document matching the id with the data passed, and returns the new document.

Will remove any fields in the document that aren't passed.

.find(Object/String match, Object queryOptions) -> Document

There are four ways to call .find() and it's siblings. You can mix and match styles.

  • Passing only a match object like .find({name:'Joe'})
  • Passing only a string to match against all document fields .find('some string')
  • Passing Query Options (match can be set to null/empty) .find({}, {must: {active: true, sort: 'createdOn'}}}
  • Use chaining options (alias for QueryOptions) .find({}).must({active: true}).sort('createdOn').then(..)

Unlike mongoose, finding exact matches requires the fields in your mapping to be set to 'not_analyzed'. By default {index: not_analyzed} is added to all string fields in your Schema unless you override it. Depending on the analyzer in your mapping, find queries like must, not, and matches may not find any results.

match => Optional. An alias for the 'must' Query Option. Like Mongoose this matches name/value in documents. Also, instead of an object, just a string can be passed which will match against all document fields using the power of an Elasticsearch QueryStringQuery.

queryOptions => Optional (can also use chaining instead). An object with Query Options. Here you can specifiy paging, filtering, sorting and other advanced options. See here for more details. You can set the first argument to null, and only use filters from the query options if you wanted.

returns => Found documents, or null if nothing was found.

Example:

var Car = elasticsearch.model('Car');

// Simple query.
Car.find({color: 'blue'}).then(function(results){
  console.log(results);
});

// Nested query (for nested documents/properties).
Car.find({'location.city': 'New York'})

// Find all by passing null or empty object to first argument
Car.find(null, {sort: 'createdOn'})

// Search all fields using a QueryStringQuery.
Car.find('some text')

// Chained query without using Query Options.
// Instead of Mongoose .exec(), we call .then()
Car.find()
.must({color: 'blue'})
.exists('owner')
.sort('createdOn')
.then(...)
.findById(String id, Object queryOptions) -> Document

Finds a document by id. 'fields' argument is optional and specifies the fields of the document you'd like to include.

.findByIds(Array ids, Object queryOptions) -> Document

Same as .findById() but for multiple documents.

.findOne(Object/String match, Object queryOptions) -> Document

Same arguments as .find(). Returns the first matching document.

.findAndRemove(Object/String match, Object queryOptions) -> 'Object'

Same arguments as .find(). Removes all matching documents and returns their raw objects.

.findOneAndRemove(Object/String match, Object queryOptions) -> 'Object'

Same arguments as .findAndRemove(). Removes the first found document.

.makeInstance(Object data) -> Document

Helper function. Takes a raw object and creates a document instance out of it. The object would need at least an id property. The document returned can be used normally as if it were returned from other calls like .find().

.toMapping()

Returns a complete Elasticsearch mapping for this model based off it's schema. If no schema was used, it returns nothing. Used internally, but it's there if you'd like it.

Query Options

The query options object includes several options that are normally included in mongoose chained queries, like sort, and paging (skip/limit), and also some advanced features from Elasticsearch. The Elasticsearch Query and Filter DSL is generated using best practices.

page & per_page

Type: Integer

For most use cases, paging is better suited than skip/limit, so this library includes thhis instead. Page 0/1 are the same thing, so either can be used. Page and per_page both use default when the other is set, page defaults to the first, and per_page defaults to 10.

Including page or per_page will result in the response being wrapped in a meta data object like the following. You can call toJSON and toObject on this response and it'll call that method on all document instances under the hits property.

// A paged response that is returned when page or per_page is set.
{
  total: 0, // total documents found for the query.
  hits: [], // a collection of document instances.
  page: 0, // current page requested.
  pages: 0 // total number of pages.
}
fields

Type: Array or String

A list of fields to include in the documents returned. For example, you could pass 'id' to only return the matching document id's. See Elasticsearch Fields.

// Query Options.
{
  fields: ['name', 'age']
}

// Chained Query.
.find()
.fields(['name', 'age'])
.then(...)
sort

Type: Array or String

A list of fields to sort on. If multiple fields are passed then they are executed in order. Adding a '-' sign to the start of the field name makes it sort descending. Default is ascending. See Elasticsearch Sort.

Example:

// Query Options.
{
  sort: ['name', 'createdOn']
}

// Chained Query.
.find()
.sort(['name', 'createdOn'])
.then(...)
q

Type: String

A string to search all document fields with using Elasticsearch QueryStringQuery. This can be expensive, so use it sparingly.

Example:

// Query Options.
{
  q: 'Red dog run'
}

// Chained Query.
.find('Red dog run')
.then(...)
must

Type: Object

Key value pairs to match documents against. Essentially it's the same as first argument passed to Mongoose .find(). This is also an alias to the first argument passed to .find() in this library. This is a 'must' Bool Filter.

Elasticsearches internal Tokenizers are used, and fields are analyzed.

You can query nested fields using dot notation.

Example:

// Query Options.
{
  must: {
    name: 'Jim',
    'location.country': 'Canada'
  }
}

// Chained Query.
.find()
.must({name: 'Jim', 'location.country': 'Canada'})
.then(...)
not

Type: Object

The same as must, but matches documents where the key value pairs DON'T match. This is a 'must_not' Bool Filter query.

You can query nested fields using dot notation.

Example:

// Query Options.
{
  not: {
    name: 'Jim',
    'location.country': 'Canada'
  }
}

// Chained Query.
.find()
.not({name: 'Jim', 'location.country': 'Canada'})
.then(...)
missing

Type: Array or String

A single field name, or array of field names. Matches documents where these field names are missing. A field is considered mising, when it is null, empty, or does not exist. See [MissingFilter] (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-missing-filter.html).

Example:

// Query Options.
{
  missing: ['description', 'name']
}

// Chained Query.
.find()
.missing(['description', 'name'])
.then(...)
exists

Type: Array or String

A single field name, or array of field names. Matches documents where these field names exists. The opposite of missing.

Example:

// Query Options.
{
  exists: ['description', 'name']
}

// Chained Query.
.find()
.exists(['description', 'name'])
.then(...)

Schemas

Models don't require schemas, but it's best to use them - especially if you'll be making search queries. Elasticsearch-odm will generate and update Elasticsearch with the proper mappings based off your schema definition. The schemas are similar to Mongoose, but several new field types have been added which Elasticsearch supports. These are; float, double, long, short, byte, binary, geo_point. Generally for numbers, only the Number type is needed (which converts to Elasticsearch integer). You can read more about Elasticsearch types here.

NOTE

  • Types can be defined in several ways. The regular mongoose types exist, or you can use the actual type names Elasticsearch uses.
  • You can also add any of the field options you see for Elasticsearch Core Types
  • String types will default to "index": "not_analyzed". See Custom Field Mappings. This is so the .find() call acts like it does in Mongoose by only fidning exact matches, however, this prevents the ability to do full text search on this field. Simply set {"index":"analyzed"} if you'd like full text search instead.

Example:

// Before saving a document with this schema, your Elasticsearch
// mappings will automatically be updated.

// Note the various ways you can define a schema field type.
var carSchema = new elasticsearch.Schema({
  // native type without options
  available: Boolean,
  // Elasticsearch type without options
  safteyRating: 'float',
  // native array type
  parts: [String],
  // Elasticsearch array type
  oldPrices: {type: ['double']},
  // with options
  color: {type: String, required: true},
  // a field named 'type' must be defined like the following.
  type: {type: String},
  // nested document
  owner: {
    name: String,
    age: Number,
    // force a required field
    location: {type: 'geo_point', required: true}
  },
  // nested document array
  inspections: [{
    date: Date,
    grade: Number
  }],
  // Enable full-text search of this field.
  // NOTE: it's better to than use the 'q' paramater in queryOptions
  // during searches instead of must/not or match when using 'analyzed'
  description: {type:String, index: 'analyzed'}

  // Ignore_malformed is an Elasticsearch Core Type field option for numbers
  price: {type: 'double', ignore_malformed: true}
});

Hooks and Middleware

Schemas include pre and post hooks that function similar to Mongoose. Currently, there are pre/post hooks for 'save' and 'remove'.

Pre Hooks

Same conventions as Mongoose. Function takes a done() callback that must be called when your function is finished. this is scoped to the current document. assing an Error to done() will cancel the current operation. For example, in a pre 'save' hook, passing an error to done() will cause the document not to be saved and will return your error to the save() callers rejection handler.

var schema = new elasticsearch.Schema(...);
schema.pre('save', function(done){
  console.log(this); // this = the current document
  done(); // OR done(new Error('bad document'));
});

Post Hooks

Same conventions as Mongoose. Does not have a done() callback. Executed after the hooked method. The first argument is the current document which may or may not be a document instance (eg. post remove only receives the raw object as the document no longer exists).

var schema = new elasticsearch.Schema(...);
schema.post('remove', function(document){
  console.log(document);
});

Static and Instance Methods

Add methods to your schema with the same convention as Mongoose.

// Instance method.
var schema = new elasticsearch.Schema(...);

schema.methods.getFullName = function(){
  return this.firstName + ' ' + this.lastName;
});

// Static method.
schema.statics.findByColor = function(color){
  return this.find({color: color});
});

Sync Mapping

By default, an attempt will be made on connection to convert your schema definitions into Elasticsearch mappings, and send a PUT mapping request to sync them. This can cause major issues if your schemas mappings have conflicting types.

If you'd like to disable sync mapping, or if your node has mappings already configured, you can do it like so.

elasticsearch.connect({
  host: 'localhost:9200',
  index: 'my-index',
  syncMapping: false
});

CHANGLELOG

See here.

CONTRIBUTING

This is a library Elasticsearch desperately needed for Node.js. Currently the official npm elasticsearch client has about 23,000 downloads per week, many of them would benefit from this library instead. Pull requests are welcome. There are Mocha and benchmark tests in the root directory.

TODO

  • Browser build.
  • Add support for querying nested document arrays with dot notation syntax.
  • Add scrolling
  • Add a wrapper to enable streaming of document results.
  • Add snapshots/backups
  • Allow methods to call Elasticsearch facets.
  • Performance tweak application, fix garbage collection issues, and do benchmark tests.
  • Integrate npm 'friendly' for use with expanding/collapsing parent/child documents.
  • Use source filtering instead of fields.