API
Ingest Data
Gathering data from your data sources
The ingestData
endpoint is part of the API, designed to send data for ingestion. The function returns a task ID upon an ingestion started. You can use this task_id
to check for the ingestion status on the ingestion status endpoint.
POST /api.mendable.ai/v1/ingestData
Request
Here is an example request using cURL. The api_key
must be a server-side API key which you can create in the Mendable dashboard.
curl -X POST https://api.mendable.ai/v1/ingestData \
-H "Content-Type: application/json" \
-d '{
"api_key": "YOUR_API_KEY",
"url": "URL_TO_INGEST",
"type": "INGESTION_TYPE",
"include_paths": ["PATH_TO_INCLUDE"],
"exclude_paths": ["PATH_TO_EXCLUDE"]
}'
Supported Ingestion Types
Website Crawler
The website crawler is designed to crawl your website and ingest all the pages. The crawler will follow all the links on the website and ingest all the pages it finds. You can also specify paths to include or exclude during the crawl.
# type: website-crawler
{
"type": "website-crawler",
"url": "https://docs.mendable.ai",
"api_key": "YOUR_API_KEY",
"include_paths": ["/blog/*", "/usecases/*"],
"exclude_paths": ["/app?*"]
}
Docusaurus
The Docusaurus ingestion type is designed to crawl your Docusaurus website and ingest all the pages. The crawler will follow all the links on the website via the sitemap and ingest all the pages it finds.
# type: docusaurus
{
"type": "docusaurus",
"url": "https://docs.mendable.ai",
"api_key": "YOUR_API_KEY"
}
GitHub
The GitHub ingestion type is designed to ingest all the documentation pages from a GitHub repository. The customization via the API is very limited right now and it will always default to your main branch. If you want further customization to ingest a GitHub repository, try out our sel-serve dashboard option here.
# type: github
{
"type": "github",
"url": "https://github.com/nickscamara/nickscamara",
"api_key": "YOUR_API_KEY"
}
YouTube
The YouTube ingestion type is designed to ingest a YouTube video via its transcript.
# type: youtube
{
"type": "youtube",
"url": "https://www.youtube.com/watch?v=123456789",
"api_key": "YOUR_API_KEY"
}
Single Website URL
You can ingest a single website URL by using the url
in the ingestion type. This ingestion type is designed to ingest a single website URL. For now, this method won't work if the website needs JS enabled to render the content. We will be updating it soon.
# type: url
{
"type": "url",
"url": "https://docs.mendable.ai/installation",
"api_key": "YOUR_API_KEY"
}
Sitemap
The Sitemap ingestion type is designed to ingest all the pages from a sitemap.
# type: sitemap
{
"type": "sitemap",
"url": "https://docs.mendable.ai/sitemap.xml",
"api_key": "YOUR_API_KEY"
}
Example Usage
Request
Here is an example request using cURL:
curl -X POST https://api.mendable.ai/v1/ingestData \
-H "Content-Type: application/json" \
-d '{
"api_key": "YOUR_API_KEY",
"url": "URL_TO_INGEST",
"type": "INGESTION_TYPE",
"include_paths": ["PATH_TO_INCLUDE"],
"exclude_paths": ["PATH_TO_EXCLUDE"]
}'
or using Javascript:
const url = 'https://api.mendable.ai/v1/ingestData'
const data = {
api_key: 'YOUR_API_KEY',
url: 'URL_TO_INGEST',
type: 'INGESTION_TYPE',
include_paths: ['PATH_TO_INCLUDE'],
exclude_paths: ['PATH_TO_EXCLUDE']
}
fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(data),
})
.then((response) => response.json())
.then((data) => console.log(data))
.catch((error) => console.error('Error:', error))
Response Here is an example response:
{
"task_id": 1234567890
}
Request Paremeters
Field | Type | Required | Description |
---|---|---|---|
api_key | string | Yes | Your Mendable API key |
url | string | Yes | URL for data ingestion |
type | string | No | Type of ingestion, defaults to "website-crawler". Available types are shown above. |
include_paths | array | No | Paths to include during the crawl. Only applicable for "website-crawler" type. |
exclude_paths | array | No | Paths to exclude during the crawl. Only applicable for "website-crawler" type. |
The task_id is returned as an integer. You can use this to check the status of the ingestion task.
Ingesting Raw documents
We also support ingesting raw documents. This is useful if you want to ingest a document that you have already scraped or if you want to ingest a document that is not publicly available.
POST /api.mendable.ai/v1/ingestDocuments
Example Usage
Request
Here is an example request using cURL:
curl -X POST https://api.mendable.ai/v1/ingestDocuments \
-H "Content-Type: application/json" \
-d '{
"api_key": "SERVER_SIDE_API_KEY",
"documents": [
{
"content": "YOUR_CONTENT_1",
"source": "yoursource.com",
"metadata" : { // optional
"version" : 10,
"author" : "John Doe"
},
"options": { // optional
"summarize" : true,
"summarize_max_chars" : 500
}
},
{
"content": "YOUR_CONTENT_2",
"source": "yoursource2.com",
},
]
}'
Warning: Max number of documents is 500. There is also a limit of 2mb of documents per request, which is around 2,000,000 characters.
Metadata and Options
Metadata and options are optional parameters that can be included in the request.
Metadata is a key-value pair that can be used to add additional information about the document. For example, you can include the version of the document or the author's name. This information can be used later for filtering purposes.
Options is another key-value pair that can be used to specify how the document should be processed. For example, you can specify whether the document should be summarized and the maximum number of characters that should be included in the summary.
summarize
is a boolean that specifies whether the document should be summarized. The default value isfalse
.summarize_max_chars
is an integer that specifies the maximum number of characters that should be included in the summary. This is not guranteed, the AI will attempt to follow this limit but it may not be exact.
Check Ingestion Status
The ingestionStatus
endpoint is part of the API, designed to check the status of an ongoing ingestion task. This function returns the status, metadata, and progress of the ingestion task.
POST /api.mendable.ai/v1/ingestionStatus
Example Usage
Request
Here is an example request using cURL:
curl -X POST https://api.mendable.ai/v1/ingestionStatus \
-H "Content-Type: application/json" \
-d '{
"task_id": "YOUR_TASK_ID"
}'
or using Javascript:
const url = "https://api.mendable.ai/v1/ingestionStatus";
const data = {
task_id: "YOUR_TASK_ID",
};
fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify(data),
})
.then((response) => response.json())
.then((data) => console.log(data))
.catch((error) => console.error("Error:", error));
Response
Here is an example response:
{
"status": "pending",
"current": 17,
"current_step": "SCRAPING",
"metadata": "PENDING",
"total": 400
}
When the ingestion succeed the response will look like this:
{
"result": {
"error": "",
"project_id": 2453,
"success": true
},
"status": "completed"
}
The response provides information about the ongoing task such as the current step, status, and total number of steps.
Request Parameters
Field | Type | Required | Description |
---|---|---|---|
task_id | string | Yes | The ID of the task for which the status needs to be fetched |
Pending Response Parameters
Field | Type | Description |
---|---|---|
current | integer | The number of steps that have been completed in the ingestion task. |
current_step | string | The current step of the ingestion process, such as "SCRAPING". |
metadata | string | The status of metadata, typically "PENDING" until task completion. |
status | string | The overall status of the ingestion task, typically "pending", "running", or "completed". |
total | integer | The total number of steps in the ingestion task. |
Completed Response Parameters
Field | Type | Description |
---|---|---|
result | object | The result of the ingestion task. |
result.error | string | The error message, if any. |
result.project_id | integer | The ID of the project that was created. |
result.success | boolean | Whether the ingestion task was successful. |
status | string | The overall status of the ingestion task, typically "pending" or "completed". |
The task_id
is a unique identifier for each ingestion task. This ID is used to track the progress of the ingestion. The response includes the current step of the task (current_step
), its status (status
), current progress (current
), and the total number of steps (total
).
The status
field indicates whether the task is pending, in progress, or completed.
The current
and total
fields represent the number of steps completed and the total number of steps in the task, respectively.
If the task is in the SCRAPING
step, this means that the data is currently being scraped from the provided URL. If the task is in the EMBEDDING
step, this means that the data is currently being embedded.
Keeping your data updated (Auto Sync)
Mendable now offers reingestion for all users through the dashboard. To activate it, go to the Manage Indexes page. After you ingested from a data source that is supported (Website/Docs, GitHub, Notion, Zendesk + others), an auto sync option will appear in the Manage indexes page. You can then activate it and this will auto sync your data every 24 hours.
If you previously had a CRON job that was manually set up, it won't show up in the Auto Sync tab just yet.