Alhena
  • Introduction
  • Getting Started
  • Architecture
  • Reference
    • Website SDK
      • Configure Proactive Nudges
    • Product FAQs
    • Website chatsdk events
    • Website chatsdk APIs
    • Chat SDK api and events examples
      • Open other external widget once human transfer is initiated
      • Show the Alhena AI widget only when someone scroll the page by 5 px
    • Website SDK - Custom data
      • Website SDK - Customer data with Agent
    • Website SDK - Internationalization
    • API Reference
      • API calls
    • Device Compatibility
  • Tutorials
    • AI Training
      • Training Steps
      • Training Data Sources
        • Websites
        • Youtube videos
        • Google Drive
        • Twitter Pages
        • Discord Messages
        • Confluence Pages
        • Upload Documents
        • Github
        • Zendesk Tickets
        • Freshdesk Tickets
        • Freshchat Tickets
        • Custom data sources
        • Shopify API
        • Woocommerce API
        • PDF Crawling
      • Training Frequently Asked Questions
    • Tuning Alhena AI Post Training
      • Best Practices for configuring the Alhena AI’s personality and guidelines
      • Adding Human Feedback for improving specific Questions
      • Adding to your knowledge base with FAQs
      • Frequently Asked Questions - Tuning Responses
    • QAing Al Conversations
      • Smart Flagging: Streamline Your AI Quality Assurance
    • Integrations
      • Alhena Website Chat SDK
        • Customizing Your Alhena Chat Widget
      • Integrating Alhena AI With Slack
      • Integrating Alhena AI With Discord
      • Integrating Alhena With Freshdesk
      • Integrating Alhena AI With Zendesk
      • Integrating Alhena AI With Email
      • Integrating Alhena AI With Shopify
      • Integration Alhena AI With Trustpilot
      • Integrating Alhena With Gorgias
    • Notifications
    • Alhena Dashboard
      • Managing Team
Powered by GitBook
On this page
  1. Tutorials
  2. AI Training
  3. Training Data Sources

Websites

PreviousTraining Data SourcesNextYoutube videos

Last updated 14 days ago

Alhena AI supports crawling and indexing content from publicly accessible websites to answer queries. This includes:

  • Landing pages

  • Sitemaps

  • Product pages

  • Help articles

  • Notion docs

  • Support articles

  • Developer docs

  • Zendesk support articles

  • CSV file links hosted on public cloud

For each website link, there are two different modes of crawling:

Crawl multiple pages: In multi-page crawl, Alhena AI will find the child pages and continue crawling as long as the root path of the child pages is the same as the root path of the parent URL. We crawl up to 5,000 pages per URL. If you have specific needs or require crawling more than 5,000 pages, just message us or reach out to our human customer support. For sitemaps, choose the multi-page crawl as it will also crawl child pages.

Crawl single page: In single-page crawl, we crawl only one page of the given URL.

Alhena AI Website / URL Crawling options