Ingestly is a simple tool for ingesting beacons to Google BigQuery. Digital Marketers and Front-end Developers often want to measure user's activities on their service without limitations and/or sampling, in real-time, having ownership of data, within reasonable cost. There are huge variety of web analytics tools in the market but those tools are expensive, large footprint, less flexibility, fixed UI, and you will be forced to use SDKs including legacy technologies like document.write
.
Ingestly is focusing on Data Ingestion from the front-end to Google BigQuery by leveraging Fastly's features. Also, Ingestly can be implemented seamlessly into your existing web site with in the same Fastly service, so you can own your analytics solution and ITP does not matter.
Ingestly provides:
- Completely server-less. Fastly and Google manages all of your infrastructure for Ingestly. No maintenance resource required.
- Near real-time data in Google BigQuery. You can get the latest data in less than seconds just after user's activity.
- Fastest response time for beacons. The endpoint is Fastly's global edge nodes, no backend, response is HTTP 204 and SDK uses ASYNC request.
- Direct ingestion into Google BigQuery. You don't need to configure any complicated integrations, no need to export/import by batches.
- Easy to start. You can start using Ingestly within 2 minutes for free if you already have a trial account on Fastly and GCP.
- WebKit's ITP friendly. The endpoint issues 1st party cookies with Secure and httpOnly flags.
You can use one of BigQuery and Elasticsearch, or both as a database for logging. Fastly support multiple log-streaming in the same configuration.
BigQuery support SQL and faster query speed with massive logs. Elasticsearch supports super flexible schema-less data structure.
If you are going to use custom data (*_attr
variables) frequently, or you wish to utilize Kibana's great visualization features, Elasticearch is better choice.
If you will get huge records from the giant website, or you wish to use Data Studio, BigQuery gives you better performance within reasonable cost.
- A Google Cloud Platform account, and a project used for Ingestly.
- A Fastly account, and a service used for Ingestly.
- This endpoint may use cookies named
ingestlyId
,ingestlySes
andingestlyConsent
under your specified domain name.
Note that a GCP project and a Fastly service can be created for Ingestly or you can use your existing one.
- Go to the GCP console, then open
IAM & admin
>service accounts
. - Create a service account like
ingestly
and grant aBigQuery
>BigQuery Data Owner
permission. - Create a key and download it as JSON format.
- Open the JSON you just downloaded and note
private_key
andclient_email
.
- Go to the GCP console, then open
BigQuery
. - Create a dataset like
Ingestly
if you haven't had. - Create a table with your preferred table name like
access_log
, then enableEdit as text
in Schema section. (note your table name) - Open
BigQuery/table_schema
file in this repository, copy the content and paste it to the schema text box of table creation modal. - In the
Partition and cluster settings
section, Selecttimestamp
column for partitioning. - Specify
action,category
to theClustering order (optional)
field. - Finish creating the table.
- Open Kibana UI.
- Go to
Management > Security > Roles
. - Click top-right
Create role
button. - Name this role as
Ingestly
- Type
ingestly-#{%F}
intoIndex
field manually. (an index name will be generated dynamically by strftime. in this case, an index is daily basis with YYYY-MM-DD formatted date.) - Select
create_index
,create
,index
,read
,write
andmonitor
inPrivileges
field, then save. - Go to
Management > Security > Users
- Click top-right
Create user
button. - Name this role as
Ingestly
and fill each field as you like. - Select
Ingestly
from a role list, then save.
- Go to
Dev Tools
. - Type
PUT _template/ingestly
into the first line of Dev Tools console. - Open
Elasticsearch/mapping_template.json
file and copy & paste the content to the second line of Dev Tools console. - Click the triangle icon on the first line (execute the command)
If you see Custom Analyzer
related error message when you executed above process, you should choose one of the following selections.
A. Add Natural Language Analysis plugins to Elasticsearch. analysis-kuromoji
and analysis-icu
are recommended.
B. Remove analysis
section (from line 22 to line 40) from Elasticsearch/mapping_template.json
to deactivate Analyzer.
- Go to
Management > Kibana > Index Patterns
. - Click top-right
Create index pattern
button. - Fill
ingestly
intoIndex Pattern
field, then clickNext step
. - Select
timestamp
fromTime Filter field name
pulldown, then clickCreate index pattern
.
- Open
Dictionaries
underData
menu in CONFIGURE page under your service. - Create a dictionary named
ingestly_apikeys
by clickingCreate a dictionary
button. - Add an item with
key
as2ee204330a7b2701a6bf413473fcc486
,value
astrue
fromAdd item
link foringestly_apikeys
. - In the same way, create a dictionary named
ingestly_metadata
by clickingCreate a dictionary
button. - Add the following two items to the dictionary
ingestly_metadata
.
key | value | description |
---|---|---|
cookie_domain | example.com |
A domain name of Cookies set by the Endpoint. |
cookie_lifetime | 31536000 |
A Cookie lifetime of Cookies set by the Endpoint. |
- Open
Custom VCL
in CONFIGURE page. - Click
Upload a VCL file
button, then set preferred name likeIngestly
, selectingestly.vcl
and upload the file.
- Open
Logging
in CONFIGURE page. - Click
CREATE ENDPOINT
button and selectGoogle BigQuery
. - Open
attach a condition.
link near highlightedCONDITION
and selectCREATE A NEW RESPONSE CONDITION
. - Enter a name like
Data Ingestion
and set(resp.status == 204 && req.url ~ "^/ingestly-ingest/(.*?)/\?.*" || resp.status == 200 && req.url ~ "^/ingestly-sync/(.*?)/\?.*")
intoApply if…
field. - Fill information into fields:
Name
: anything you want.Log format
: copy and paste the content ofBigQuery/log_format
file in this repository.Email
: a value fromclient_email
field of GCP credential JSON file.Secret key
: a value fromprivate_key
field of GCP credential JSON file.Project ID
: your project ID of GCP.Dataset
: a dataset name you created for Ingestly. (e.g.Ingestly
)Table
: a table name you created for Ingestly. (e.g.logs
)Template
: this field can be empty but you can configure time-sliced tables if you enter like%Y%m%d
.
- Click
CREATE
to finish the setup process.
- Open
Logging
in CONFIGURE page. - Click
CREATE ENDPOINT
button and selectElasticsearch
. - Open
attach a condition.
link near highlightedCONDITION
and selectCREATE A NEW RESPONSE CONDITION
. - Enter a name like
Data Ingestion
and set(resp.status == 204 && req.url ~ "^/ingestly-ingest/(.*?)/\?.*" || resp.status == 200 && req.url ~ "^/ingestly-sync/(.*?)/\?.*")
intoApply if…
field. - Fill information into fields:
Name
: anything you want.Log format
: copy and paste the content ofElasticsearch/log_format
file in this repository.URL
: An endpoint URL of Elasticsearch cluster.Index
: An index name for Elasticsearch. Setingestly
.BasicAuth user
: An username for Elasticsearch authentication. SetIngestly
.BasicAuth password
: Set a password for userIngestly
on Elasticsearch cluster.
- Click
CREATE
to finish the setup process.
- Open
Logging
in CONFIGURE page. - Click
CREATE ENDPOINT
button and selectAmazon S3
. - Open
attach a condition.
link near highlightedCONDITION
and selectCREATE A NEW RESPONSE CONDITION
. - Enter a name like
Data Ingestion
and set(resp.status == 204 && req.url ~ "^/ingestly-ingest/(.*?)/\?.*" || resp.status == 200 && req.url ~ "^/ingestly-sync/(.*?)/\?.*")
intoApply if…
field. - Fill information into fields:
Name
: anything you want.Log format
: copy and paste the content ofS3/log_format
file in this repository. You can specify not only CSV but JSON format here ({ ... }
form).Timestamp format
: (not necessary)Bucket name
: The name of the bucket in which to store the logs.Access key
: An access key of the service account that can write into the bucket above.Secret key
: An secret key of the service account that can write into the bucket above.Period
: Log rotation interval(seconds). e.g. 600 means 10 minutes.- Advanced options
Path
: The path within the bucket for placing files. You may specify dynamic variables in strftime format. In order to use Athena's partitioning feature by date, the path name must include/date=%Y-%m-%d/
format.Domain
: The endpoint domain of your S3 bucket region (outside of US Standard region). e.g. Tokyo iss3.ap-northeast-1.amazonaws.com
Select a log line format
: Blank. Otherwise the JSON format will be corrupted.Gzip level
: 9. The best compression to save the storage size.
- Click
CREATE
to finish the setup process.
- Now you are ready to receive beacons. You can install Ingestly Client JavaScript to your website.