Skip to content

Latest commit

 

History

History
118 lines (87 loc) · 61.4 KB

File metadata and controls

118 lines (87 loc) · 61.4 KB

Indexing.Datasources

Overview

Available Operations

add

Add or update a custom datasource and its schema.

Example Usage

from glean.api_client import Glean, models
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    glean.indexing.datasources.add(name="<value>", datasource_category=models.DatasourceCategory.UNCATEGORIZED, url_regex="https://example-company.datasource.com/.*", quicklinks=[
        {
            "icon_config": {
                "color": "#343CED",
                "key": "person_icon",
                "icon_type": models.IconType.GLYPH,
                "name": "user",
            },
        },
    ], trust_url_regex_for_view_activity=True, strip_fragment_in_canonical_url=True, is_entity_datasource=False, is_test_datasource=False)

    # Use the SDK ...

Parameters

Parameter Type Required Description Example
name str ✔️ Unique identifier of datasource instance to which this config applies.
display_name Optional[str] The user-friendly instance label to display. If omitted, falls back to the title-cased name.
datasource_category Optional[models.DatasourceCategory] The type of this datasource. It is an important signal for relevance and must be specified and cannot be UNCATEGORIZED. Please refer to this for more details.
url_regex Optional[str] Regular expression that matches URLs of documents of the datasource instance. The behavior for multiple matches is non-deterministic. Note: urlRegex is a required field for non-entity datasources, but not required if the datasource is used to push custom entities (ie. datasources where isEntityDatasource is false). Please add a regex as specific as possible to this datasource instance. https://example-company.datasource.com/.*
icon_url Optional[str] The URL to an image to be displayed as an icon for this datasource instance. Must have a transparency mask. SVG are recommended over PNG. Public, scio-authenticated and Base64 encoded data URLs are all valid (but not third-party-authenticated URLs).
object_definitions List[models.ObjectDefinition] The list of top-level objectTypes for the datasource.
suggestion_text Optional[str] Example text for what to search for in this datasource
home_url Optional[str] The URL of the landing page for this datasource instance. Should point to the most useful page for users, not the company marketing page.
crawler_seed_urls List[str] This only applies to WEB_CRAWL and BROWSER_CRAWL datasources. Defines the seed URLs for crawling.
icon_dark_url Optional[str] The URL to an image to be displayed as an icon for this datasource instance in dark mode. Must have a transparency mask. SVG are recommended over PNG. Public, scio-authenticated and Base64 encoded data URLs are all valid (but not third-party-authenticated URLs).
hide_built_in_facets List[models.HideBuiltInFacet] List of built-in facet types that should be hidden for the datasource.
canonicalizing_url_regex List[models.CanonicalizingRegexType] A list of regular expressions to apply to an arbitrary URL to transform it into a canonical URL for this datasource instance. Regexes are to be applied in the order specified in this list.
canonicalizing_title_regex List[models.CanonicalizingRegexType] A list of regular expressions to apply to an arbitrary title to transform it into a title that will be displayed in the search results
redlist_title_regex Optional[str] A regex that identifies titles that should not be indexed
connector_type Optional[models.CustomDatasourceConfigConnectorType] N/A
quicklinks List[models.Quicklink] List of actions for this datasource instance that will show up in autocomplete and app card, e.g. "Create new issue" for jira
render_config_preset Optional[str] The name of a render config to use for displaying results from this datasource. Any well known datasource name may be used to render the same as that source, e.g. web or gdrive. Please refer to this for more details
aliases List[str] Aliases that can be used as app operator-values.
is_on_prem Optional[bool] Whether or not this datasource is hosted on-premise.
trust_url_regex_for_view_activity Optional[bool] True if browser activity is able to report the correct URL for VIEW events. Set this to true if the URLs reported by Chrome are constant throughout each page load. Set this to false if the page has Javascript that modifies the URL during or after the load.
include_utm_source Optional[bool] If true, a utm_source query param will be added to outbound links to this datasource within Glean.
strip_fragment_in_canonical_url Optional[bool] If true, the fragment part of the URL will be stripped when converting to a canonical url.
identity_datasource_name Optional[str] If the datasource uses another datasource for identity info, then the name of the datasource. The identity datasource must exist already and the datasource with identity info should have its visibility enabled for search results.
product_access_group Optional[str] If the datasource uses a specific product access group, then the name of that group.
is_user_referenced_by_email Optional[bool] whether email is used to reference users in document ACLs and in group memberships.
is_entity_datasource Optional[bool] True if this datasource is used to push custom entities.
is_test_datasource Optional[bool] True if this datasource will be used for testing purpose only. Documents from such a datasource wouldn't have any effect on search rankings.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

retrieve_config

Fetches the datasource config for the specified custom datasource.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    res = glean.indexing.datasources.retrieve_config(datasource="<value>")

    # Handle response
    print(res)

Parameters

Parameter Type Required Description
datasource str ✔️ Datasource name for which config is needed.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.CustomDatasourceConfig

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*