title | sidebarTitle |
---|---|
Unstructured |
Overview |
Unstructured provides a platform and tools to ingest and process unstructured documents for Retrieval Augmented Generation (RAG) and model fine-tuning.
This 40-second video demonstrates a simple use case that Unstructured helps solve:
<iframe width="560" height="315" src="https://www.youtube.com/embed/E-tupjji22U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen ></iframe> Unstructured Platform - No-code UI. Production-ready. Pay as you go.
To start using the Unstructured Platform right away, skip ahead to the quickstart.
<iframe width="560" height="315" src="https://www.youtube.com/embed/IVKcQDZa9Zc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen ></iframe> Unstructured Serverless API services - Use scripts or code. Production-ready. Pay as you go. (There is also a non-production, free edition with limits.)
To start using Unstructured Serverless API services right away, skip ahead to the quickstart.
Learn more about these products:
No-code user interface, pay-as-you-go platform to get all of your data RAG-ready.
Data is processed on Unstructured-hosted compute resources.
[Try the quickstart](#quickstart-unstructured-platform).
[Learn more](/platform/overview).
[Read the announcement](https://unstructured.io/blog/introducing-unstructured-platform-the-enterprise-etl-platform-for-the-genai-tech-stackintroducing-unstructured-platform-beta-the-enterprise-etl-platform-for-the-genai-tech-stack).
Use scripts or code to call the Unstructured Ingest CLI or Ingest Python library, to get all of your data RAG-ready.
Unstructured Serverless API services have a [Serverless](api-reference/api-services/saas-api-development-guide) pay-as-you-go edition and a [Free](/api-reference/api-services/free-api_) [limited](/api-reference/api-services/free-api#free-unstructured-api-limitations) edition that process data on Unstructured-hosted compute resources.
If you need to use compute resources that you host instead, there are also [Azure](/api-reference/api-services/azure) pay-as-you-go and [AWS](/api-reference/api-services/aws) pay-as-you-go editions; these editions process data by using the Unstructured API installed on compute resources hosted in your own Azure or AWS account.
[Try the quickstart](#quickstart-unstructured-serverless-api).
[Learn more](/api-reference/api-services/overview).
[Read the launch announcement](https://unstructured.io/blog/introducing-unstructured-serverless-api).
import SupportedFileTypes from '/snippets/general-shared-text/supported-file-types.mdx';
If you want to use your local machine for either your source (input) files, or the destination (output) location for Unstructured to deliver the processed data, you cannot use this quickstart. You must run code on your local machine instead: skip to the Quickstart: Unstructured Serverless API, later in this article.
import SharedPlatform from '/snippets/quickstarts/platform.mdx';
Learn more about the Unstructured Platform.
import LocalToLocalPythonIngestLibrary from '/snippets/ingestion/local-to-local.v2.py.mdx'; import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';
This quickstart uses your local machine, with the Unstructured Ingest Python library installed. It preprocesses source (input) files on your local machine, and it uses the Unstructured Serverless API to deliver the processed data to a destination (output) location, also on your local machine. Data is processed on Unstructured-hosted compute resources.
The requirements are as follows.
- Python installed on your local machine.
- Compatible files on your local machine to be processed. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
<Note>
By signing up through the [For Developers](https://unstructured.io/developers) page, your Unstructured account will run within the context of the Unstructured Platform on
Unstructured's own hosted cloud resources. Also, after your first 14 days of usage or more than 1000 processed pages per day,
whichever comes first, your account is then billed at Unstructured's standard service usage rates. You can always
start a prepaid subscription in exchange for usage rate discounts. To switch your account from a pay-as-you-go plan to a prepaid subscription,
contact Unstructured Sales at [[email protected]](mailto:[email protected]).
If you would rather run the Unstructured Platform within the context of your own virtual private cloud (VPC),
or you want to make a long-term billing commitment in exchange for deeply discounted service usage rates,
stop here and sign up through the [For Enterprise](https://unstructured.io/enterprise) page instead.
</Note>
</Step>
<Step title="Sign in">
![Sign in to your Unstructured account](/img/platform/Signin.png)
1. After you have signed up through the [For Developers](https://unstructured.io/developers) page, the Unstructured Platform sign-in page appears.
<Note>
If you signed up through the [For Enterprise](https://unstructured.io/enterprise) page instead, your sign-in process will
be different. For enterprise sign-in guidance, contact Unstructured Sales at [[email protected]](mailto:[email protected]).
</Note>
2. Click **Google** or **GitHub** to sign in with the Google or GitHub account that you signed up with through the **For Developers** page.
Or, enter the email address that you signed up with, and then click **Sign In**.
3. If you entered your email address, check your email inbox for a message from Unstructured. In that email, click the **Sign In** link.
4. The first time you sign in, read the terms and conditions, and then click **Accept**.
</Step>
<Step title="Get your API key and API URL">
![Unstructured account settings](/img/platform/AccountSettings.png)
![Unstructured API key and API URL](/img/platform/APIKeyURL.png)
1. At the bottom of the sidebar, click your user icon, and then click **Account Settings**.
2. On the account settings sidebar, click **API Keys**, if it is not already selected.
3. To get your API key, click the copy icon in the **Actions** column for your API key, and then click **Key Only**. Store your copied API key in a secure location. Do not share it with others.
4. To get your API URL, click the copy icon next to the URL next to **API URL**. Store your copied API URL in a secure location. Do not share it with others.
</Step>
<Step title="Set environment variables">
1. Set an environment variable named `UNSTRUCTURED_API_KEY` to the value of your Unstructured API key.
2. Set another environment variable named `UNSTRUCTURED_API_URL` to the value of your Unstructured API URL.
</Step>
<Step title="Install the Ingest Python library">
Run the following command:
```bash
pip install unstructured-ingest
```
<AdditionalIngestDependencies />
</Step>
<Step title="Run the code">
Run the following code, replacing:
- `<path/to/input>` with the source (input) path to the directory on your local machine that contains the compatible files for Unstructured to process on its hosted compute resources.
- `<path/to/output>` with the destination (output) path to the directory on your local machine that will contain the processed data that Unstructured returns from its hosted compute resources.
<LocalToLocalPythonIngestLibrary />
</Step>
<Step title="View the processed data">
Go to your destination location to view the processed data.
</Step>
Learn more about the Unstructured Serverless API.
If you can't find the information you're looking for in the documentation, or if you need help, get in touch with our Support team at [email protected], or join our Slack where our team and community can help you.