Unofficial node.js Apache Pnot client. Uses undici to make http requests to pinot brokers.
- Fast http queries using "Undici"
- Simple interface for bringing and using other http client libraries
- Built-in
sql
template tag and safe escaping of values - Support of
raw
andjoin
for complex queries - Typescript support
- Support of Apache Pinot multi-stage engine
- Compatible with prettier sql formatter and VScode sql syntax highlighters
NPM:
npm install pinot-noir
PNPM:
pnpm add pinot-noir
First of all you need to create transport. So far there's only one transport built-in in this lib, but you can use own implementation of IPinotBrokerTransport
interface.
By default we use HTTP JSON transport which is based on undici http client. It requires the URL of your brocker and an API token.
import { PinotBrokerJSONTransport } from 'pinot-noir';
const pinotTransport = new PinotBrokerJSONTransport({
brokerUrl: 'https://broker.pinot.my-cluster.example.startree.cloud', // replace with your broker url
token: '<your-token>', // for docker-based demo pinot leave blank
});
Other options are described in API docs.
Broker client is a wrapper class that uses provided transport to make queries to pinot, handle the responses and so on. For tests you can easily supply your broker client with mock transport.
import { PinotBrokerClient } from 'pinot-noir';
// ... init transport ...
const pinotClient = new PinotBrokerClient({ transport: pinotTransport });
To make sql queries this library supplies sql
template tag which is modified version of sql-template-tag library to match Apache Pinot syntax.
import { sql } from 'pinot-noir';
// ... setup transport and client ...
interface IResult {
hist: number;
}
const year = 2010;
const query = sql`
select sum(hits) as hits
from baseballStats
where yearID > ${year}`;
const result = await pinotClient.select<IResult>(query, { timeoutMs: 1000 });
console.log('== Query ==');
console.log(result.sql);
console.log('');
console.log('== Results ==');
console.table(result.rows);
console.log('== Stats ==');
console.log(result.stats);
While using query options like timeoutMs
they are passed to http request timeouts as well, so they shouldn't run longer than you expect the query should run.
By default pinot transport requests are made via HTTP connection pool with maximum concurrency. If requests rate exceeds max concurrency value, the requests are queued. By default the queue is not limited, but it's strongly recommended to set it to prevent high memory consumption.
import { PinotBrokerJSONTransport } from 'pinot-noir';
const pinotTransport = new PinotBrokerJSONTransport({
...
maxQueueSize: 1024,
});
Since pinot can contain different data some of requests may not tolerate queuing. For addressing that issue in query you may specify queue tolerance. It's a value that represents percent of max queue size that this request could tolerate.
For example if maxQueueSize
is 1000
and queueTolerance
is 0.1
, then the query will be performed only if current queue is less than 1000 * 0.1 = 100
.
Otherwise QUEUE_TOLERANCE_LIMIT
error will be thrown.
For real-time data you can specify queueTolerance
as 0
, then the requests will be discarded if there's a queue.
import { PinotBrokerJSONTransport, PinotBrokerClient, PinotError, EPinotErrorType, EBrokerTransportErrorCode} from 'pinot-noir';
type: EPinotErrorType.TRANSPORT,
code: EBrokerTransportErrorCode.QUEUE_TOLERANCE_LIMIT,
const pinotTransport = new PinotBrokerJSONTransport({
...
maxQueueSize: 1000,
});
const pinotClient = new PinotBrokerClient({ transport: pinotTransport });
const year = 2010;
const query = sql`
select sum(hits) as hits, sum(homeRuns) as homeRuns, sum(numberOfGames) as gamesCount
from baseballStats
where yearID > ${year}`;
try {
const result = await pinotClient.select<IResult>(query, { timeoutMs: 1000, queueTolerance: 0.1 });
// Do something with result
} catch (err) {
if (err instanceof PinotError) {
if (err.type === EPinotErrorType.TRANSPORT && err.code === EBrokerTransportErrorCode.QUEUE_TOLERANCE_LIMIT) {
// Request skipped due to queue size
}
}
throw err
}
Method is used to compile your query into single string. Can be useful in logs and allows you see resulting sql with all variables replaced.
import { sql, SqlUtils } from 'pinot-noir';
const year = 2010;
const query = sql`
select sum(hits) as hits
from baseballStats
where yearID > ${year}`;
const parameters = {
timeoutMs: 10000,
};
SqlUtils.stringifyQuery(query, parameters);
// output
// SET timeoutMs = 10000;
// select sum(hits) as hits
// from baseballStats
// where yearID > 2010
Follow the Pinot quick start guide and setup cluster locally with demo baseball dataset.
Quick copy-paste:
docker run \
-p 2123:2123 \
-p 9000:9000 \
-p 8000:8000 \
-p 7050:7050 \
-p 6000:6000 \
apachepinot/pinot:latest QuickStart \
-type batch
Verify pinot is running and explore the dataset via the following guide.
Connect client to your broker:
import { PinotBrokerClient, PinotBrokerJSONTransport, sql } from 'pinot-noir';
const pinotTransport = new PinotBrokerJSONTransport({
brokerUrl: 'http://127.0.0.1:8000', // replace with your broker url if needed
token: '', // localhost doesn't require any auth
connections: 32,
});
const pinotClient = new PinotBrokerClient({ transport: pinotTransport });
interface IResult {
hist: number;
homeRuns: number;
gamesCount: number;
}
(async () => {
const year = 2010;
const query = sql`
select sum(hits) as hits, sum(homeRuns) as homeRuns, sum(numberOfGames) as gamesCount
from baseballStats
where yearID > ${year}`;
const result = await pinotClient.select<IResult>(query, { timeoutMs: 1000 });
console.log('== Query ==');
console.log(result.sql);
console.log('');
console.log('== Results ==');
console.table(result.rows);
console.log('== Stats ==');
console.log(result.stats);
})();
See results
== Query ==
select sum(hits) as hits, sum(homeRuns) as homeRuns, sum(numberOfGames) as gamesCount
from baseballStats
where yearID > 2010
== Results ==
┌─────────┬────────┬──────────┬────────────┐
│ (index) │ hits │ homeRuns │ gamesCount │
├─────────┼────────┼──────────┼────────────┤
│ 0 │ 126422 │ 14147 │ 198156 │
└─────────┴────────┴──────────┴────────────┘
== Stats ==
{
traceInfo: {},
segments: { matched: 1, processed: 1, queried: 1 },
server: { queries: undefined, responded: 1 },
docs: { scanned: 3935, returned: 1, total: 97889 },
totalTimeMs: 6,
minConsumingFreshnessTimeMs: 0,
numConsumingSegmentsQueried: 0,
numEntriesScannedPostFilter: 11805,
numGroupsLimitReached: false
}
- pinot-client-node - another good Apache Pinot client and inspiration for this library, which in adddition has Pinot controller client.