Confluence Pages

Alhena AI supports data ingestion from Confluence pages.

This process begins when you provide a Confluence URL to Alhena AI. A dialog box will then appear, prompting you to connect with the Confluence app.

This connection involves an OAuth flow that securely links the Alhena AI app to access the specific Confluence pages. Once connected, Alhena AI can fetch data from Confluence pages.

Alhena AI prompting to connect to Confluence for crawling

Scraping an Entire Space

To crawl all pages in a Confluence space:

  1. URL to add: The space overview URL, e.g.:

    • https://your-domain.atlassian.net/wiki/spaces/SPACEKEY/overview

    • https://your-domain.atlassian.net/wiki/spaces/SPACEKEY (also works)

  2. Mode: Select "Multiple Pages"

  3. What happens:

    • Alhena automatically discovers all pages in the space

    • Each page is fetched and added to your knowledge base

  4. Blacklisting: If certain pages should be excluded, you can add their full URLs to the blacklist to skip them during crawling.

Scraping a Single Page

To crawl one specific Confluence page:

  1. URL to add: The page URL, e.g.:

    • https://your-domain.atlassian.net/wiki/spaces/SPACEKEY/pages/12345/Page+Title

  2. Mode: Either "Single Page" or "Multiple Pages" — doesn't matter. A page URL always scrapes just that one page (no sub-page discovery).

  3. What happens:

    • Alhena fetches the page content and adds it to your knowledge base

Key Notes

  • No sub-page discovery: If you add a page URL, only that page is scraped — child/sub-pages are not automatically discovered. To get all pages, use the space URL with Multiple Pages mode.

  • Re-crawling: Subsequent crawls of the same space will skip pages that have already been imported.

  • Overview pages: Space overview URLs (.../overview) are supported — the system resolves them to the space homepage.

Last updated