# Data Sources

## Supported Data Sources

Alhena AI can extract and index data from the following sources.

***

### 1. [**Websites & Web Pages**](/docs/ai-configuration/data-sources/websites.md)

* **General Websites** – HTML pages, blogs, wikis, etc.
* **Sitemap XML** – Crawl all URLs listed in an XML sitemap.
* [**Confluence Pages**](/docs/ai-configuration/data-sources/confluence-pages.md) – Crawl pages from Atlassian Confluence.

***

### 2. [**Google Drive**](/docs/ai-configuration/data-sources/google-drive.md)

* **Google Docs**
* **Google Sheets** (spreadsheets)
* **Google Slides** (presentations)
* **Google Drive Folders** – Recursive crawling supported.
* **Google Drive Files** – Generic support for any file inside Drive.

***

### 3. **Document Files**

* [**PDF**](/docs/ai-configuration/data-sources/pdf-crawling.md) (`.pdf`)
* **Word Documents** (`.doc`, `.docx`)
* **Excel Spreadsheets** (`.xls`, `.xlsx`)
* **PowerPoint Presentations** (`.ppt`, `.pptx`)
* [**CSV / TSV**](/docs/ai-configuration/data-sources/csv-excel-and-google-sheets-ingestion.md) (`.csv`, `.tsv`)
* **Plain Text** (`.txt`), **Markdown** (`.md`), **RST** (`.rst`)
* **Rich Text Format** (`.rtf`)
* **OpenDocument Files** (`.odt`, `.ods`, `.odp`)
* **Apple iWork** (`.pages`, `.numbers`, `.key`)
* **Email Files** (`.eml`, `.msg`)
* **EPUB** (`.epub`), **Org-mode** (`.org`)
* **Config/Data Files** (`.ini`, `.yaml`, `.toml`, `.xml`, `.json`)
* **Images** – `.jpg`, `.jpeg`, `.png`, `.webp`, etc.

***

### 4. **Video & Media**

* [**YouTube Videos**](/docs/ai-configuration/data-sources/youtube-videos.md) – Transcripts and content from videos.
* **Other Video Files** – `.mp4`, `.avi`, `.mkv`, `.mov`, and more.

***

### 5. **Connected Workspaces & Communication Platforms**

* [**Notion Pages**](/docs/ai-configuration/data-sources/notion-pages.md) – Connect your Notion workspace via OAuth.
* [**Discord Messages**](/docs/ai-configuration/data-sources/discord-messages.md) – Connect your Discord server.
* [**Slack Messages**](/docs/ai-configuration/data-sources/slack-messages.md) – Connect your Slack workspace.
* [**Twitter Pages**](/docs/ai-configuration/data-sources/twitter-pages.md) – Accounts, posts.

***

### 6. **Helpdesk & Ticketing Systems**

* [**Helpdesk Ticket Import**](/docs/ai-configuration/data-sources/helpdesk-ticket-import.md) — Zendesk, Freshdesk, Freshchat, Gorgias, Salesforce Service Cloud
* [**Zendesk Help Center Articles**](/docs/ai-configuration/data-sources/zendesk-help-center-articles.md)
* [**Freshdesk Knowledge Base Articles**](/docs/ai-configuration/data-sources/freshdesk-knowledge-base-articles.md)

***

### 7. **Ecommerce Platforms**

* [**Shopify Products**](/docs/ai-configuration/data-sources/shopify-api.md)
* [**WooCommerce Products**](/docs/ai-configuration/data-sources/woocommerce-api.md)
* **Salesforce Commerce Cloud**
* **Magento**
* **Generic Product Pages** – Custom HTML product extraction.

***

### 8. [**GitHub**](/docs/ai-configuration/data-sources/github.md)

* **Code Repositories** – Source code, documentation, config files.
* **Issues**
* **Discussions**

***

### 9. **Custom / Other Sources**

* [**Upload Documents**](/docs/ai-configuration/data-sources/upload-documents.md) – Supports any of the formats listed above.
* [**Custom Data Sources**](/docs/ai-configuration/data-sources/custom-data-sources.md) – Extensible scraper support.

***

### 10. **Manual Knowledge Entry**

* [**FAQs**](/docs/ai-configuration/tuning/adding-faqs.md) – Add question-answer pairs directly to your knowledge base.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://alhena.gitbook.io/docs/ai-configuration/data-sources.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
