Skip to content

Commit

Permalink
add high volume data ingestion option
Browse files Browse the repository at this point in the history
  • Loading branch information
adamtwo committed Oct 30, 2023
1 parent f1511b8 commit f617490
Showing 1 changed file with 40 additions and 21 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Select the right data ingestion tools for Teradata Vantage
= Select the right data ingestion solution for Teradata Vantage
:experimental:
:page-author: Krutik Pathak
:page-email: [email protected]
Expand All @@ -9,40 +9,59 @@

== Overview

This article outlines different use cases involving data ingestion, lists available tools, and recommends the optimal tool for each use case.
This article outlines different use cases involving data ingestion. It lists available solutions and recommends the optimal solution for each use case.

=== Ingesting data from external object store
==== Available Tools: link:https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Native-Object-Store-Getting-Started-Guide-17.20/Welcome-to-Native-Object-Store[Teradata Native Object Store (NOS), window="_blank"], https://docs.teradata.com/r/Teradata-Parallel-Transporter-User-Guide/June-2022/Introduction-to-Teradata-PT[Teradata Parallel Transporter (TPT),window="_blank"]
=== High volume ingestion, including streaming
Available solutions:

* Use link:https://docs.teradata.com/r/Teradata-Parallel-Transporter-Application-Programming-Interface-Programmer-Guide-17.20[Teradata Parallel Transporter API,window="_blank"]
* Stream data to object storage and then ingest using link:https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Native-Object-Store-Getting-Started-Guide-17.20/Welcome-to-Native-Object-Store[Teradata Native Object Store (NOS), window="_blank"].
* Use the https://docs.teradata.com/r/Teradata-Parallel-Transporter-User-Guide/June-2022/Introduction-to-Teradata-PT[Teradata Parallel Transporter (TPT),window="_blank"] command line utility.
* Use Teradata database drivers such as JDBC (Java), teradatasql (Python), Node.js driver, ODBC, .NET Data Provider.

* *Teradata NOS*: NOS is the recommended option to ingest data from files in object storage. This situation is very common in the cloud and on-prem data fabric systems.
Teradata Parallel Transport API is usually the most performant solution which offers high throughput and minimum latency. Use it if you need to ingest tens of thousands of rows per second and if you are comfortable using C language.

+
[NOTE]
====
Teradata Parallel Transporter (TPT) could be used to load data from external object storage into Teradata Vantage, however the recommended tool is Teradata NOS.
====
Use the Teradata database drivers when the number of events is in thousands per second. Consider using the Fastload protocol that is available in the most popular drivers e.g. JDBC, Python.

=== Ingesting data from local files
==== Available Tools: link:https://docs.teradata.com/r/Teradata-Parallel-Transporter-User-Guide/June-2022/Introduction-to-Teradata-PT[Teradata Parallel Transporter (TPT),window="_blank"], link:https://docs.teradata.com/r/Enterprise_IntelliFlex_Lake_VMware/Basic-Teradata-Query-Reference-17.20/Introduction-to-BTEQ[BTEQ,window="_blank"]
If your solution can accept higher latency, a good option is to stream events to object storage and then read the data using NOS. This solution usually requires the least amount of effort.

=== Ingest data from object storage

* *Teradata Parallel Transporter (TPT)*: TPT is the recommended option to load data from local files. TPT is optimized for scalability and parallelism, thus it has the best throughput from all available options.
Available solutions:

* *BTEQ*: BTEQ has full scripting capabilities and the ability to read files. It is a good option if it is already the primary ingestion tool used by a customer.
* link:https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Native-Object-Store-Getting-Started-Guide-17.20/Welcome-to-Native-Object-Store[Teradata Native Object Store (NOS), window="_blank"]
* https://docs.teradata.com/r/Teradata-Parallel-Transporter-User-Guide/June-2022/Introduction-to-Teradata-PT[Teradata Parallel Transporter (TPT),window="_blank"]

=== Move data from different systems for unified query processing
==== Available Tools: link:https://docs.teradata.com/r/Teradata-QueryGridTM-Installation-and-User-Guide/October-2020/Teradata-QueryGrid-Overview[Teradata QueryGrid,window="_blank"]
Teradata NOS is the recommended option to ingest data from files saved in object storage since NOS can leverage all Teradata nodes to perform ingestion. Teradata Parallel Transporter (TPT) runs on the client side. It can be used when there is no connectivity from NOS to object storage.

*Teradata QueryGrid*: QueryGrid is the recommended option to move limited quantities of data between different systems/platforms. This includes movement within Vantage instances, Apache Spark, Oracle, Presto, etc. It is especially suited to situations when what needs to be synced is described by complex conditions that can be expressed in SQL.
=== Ingest data from local files
Available solutions:

=== Ingesting data from SaaS applications (Third Party Tools)
==== Available Tools: link:https://airbyte.com/[Airbyte,window="_blank"]
* link:https://docs.teradata.com/r/Teradata-Parallel-Transporter-User-Guide/June-2022/Introduction-to-Teradata-PT[Teradata Parallel Transporter (TPT),window="_blank"]
* link:https://docs.teradata.com/r/Enterprise_IntelliFlex_Lake_VMware/Basic-Teradata-Query-Reference-17.20/Introduction-to-BTEQ[BTEQ,window="_blank"]

*Airbyte*: Airbyte is an ELT tool that has more than 350 connectors and is Open Source. It's a favored option for conducting lightweight ingestions from SaaS applications into Teradata Vantage.
TPT is the recommended option to load data from local files. TPT is optimized for scalability and parallelism, thus it has the best throughput of all available options. BTEQ can be used when an ingestion process requires scripting. It also makes sense to continue using BTEQ if all your other ingestion pipelines run in BTEQ.

=== Ingest data from SaaS applications
Available solutions:

* Multiple 3rd party tools such as link:https://airbyte.com/[Airbyte,window="_blank"], link:https://precog.com/[Precog,window="_blank"], link:https://nexla.com/[Nexla,window="_blank"], link:https://fivetran.com/[Fivetran,window="_blank"]
* Export from SaaS apps to local files and then ingest using https://docs.teradata.com/r/Teradata-Parallel-Transporter-User-Guide/June-2022/Introduction-to-Teradata-PT[Teradata Parallel Transporter (TPT),window="_blank"]
* Export from SaaS apps to object storage and then ingest using link:https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Native-Object-Store-Getting-Started-Guide-17.20/Welcome-to-Native-Object-Store[Teradata Native Object Store (NOS), window="_blank"].

3rd party tools are usually a better option to move data from SaaS applications to Teradata Vantage. They offer broad support for data sources and eliminate the need to manage intermediate steps such as exporting and storing exported datasets.

=== Use data stored in other databases for unified query processing
Available solutions:

* link:https://docs.teradata.com/r/Teradata-QueryGridTM-Installation-and-User-Guide/October-2020/Teradata-QueryGrid-Overview[Teradata QueryGrid,window="_blank"]
* Export from other databases to local files and then ingest using https://docs.teradata.com/r/Teradata-Parallel-Transporter-User-Guide/June-2022/Introduction-to-Teradata-PT[Teradata Parallel Transporter (TPT),window="_blank"]
* Export from other databases to object storage and then ingest using link:https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Native-Object-Store-Getting-Started-Guide-17.20/Welcome-to-Native-Object-Store[Teradata Native Object Store (NOS), window="_blank"].

QueryGrid is the recommended option to move limited quantities of data between different systems/platforms. This includes movement within Vantage instances, Apache Spark, Oracle, Presto, etc. It is especially suited to situations when what needs to be synced is described by complex conditions that can be expressed in SQL.

== Summary
In this article, we explored various data ingestion use cases, provided a list of available tools for each use case, and identified the recommended option for different scenarios.
In this article, we explored various data ingestion use cases, provided a list of available tools for each use case, and identified the recommended options for different scenarios.

== Further Reading

Expand Down

0 comments on commit f617490

Please sign in to comment.