Data Sources

Alhena AI supports various types of data sources for bots knowledge-base.

Supported Data Sources

Alhena AI can extract and index data from the following sources.


  • General Websites – HTML pages, blogs, wikis, etc.

  • Sitemap XML – Crawl all URLs listed in an XML sitemap.

  • Confluence Pages – Crawl pages from Atlassian Confluence.


  • Google Docs

  • Google Sheets (spreadsheets)

  • Google Slides (presentations)

  • Google Drive Folders – Recursive crawling supported.

  • Google Drive Files – Generic support for any file inside Drive.


3. Document Files

  • PDF (.pdf)

  • Word Documents (.doc, .docx)

  • Excel Spreadsheets (.xls, .xlsx)

  • PowerPoint Presentations (.ppt, .pptx)

  • CSV / TSV (.csv, .tsv)

  • Plain Text (.txt), Markdown (.md), RST (.rst)

  • Rich Text Format (.rtf)

  • OpenDocument Files (.odt, .ods, .odp)

  • Apple iWork (.pages, .numbers, .key)

  • Email Files (.eml, .msg)

  • EPUB (.epub), Org-mode (.org)

  • Config/Data Files (.ini, .yaml, .toml, .xml, .json)

  • Images.jpg, .jpeg, .png, .webp, etc.


4. Video & Media

  • YouTube Videos – Transcripts and content from videos.

  • Other Video Files.mp4, .avi, .mkv, .mov, and more.


5. Connected Workspaces & Communication Platforms


6. Helpdesk & Ticketing Systems


7. Ecommerce Platforms


  • Code Repositories – Source code, documentation, config files.

  • Issues

  • Discussions


9. Custom / Other Sources


10. Manual Knowledge Entry

  • FAQs – Add question-answer pairs directly to your knowledge base.

Last updated