Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/concepts/fs/feature_group/external_fg.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ An external feature group doesn't allow for offline data ingestion or modificati
You can also perform SQL operations, including projections, aggregations, and so on.
The SQL query is executed on-demand when HSFS retrieves data from the external Feature Group, for example, when creating training data using features in the external table.

In the image below, we can see that HSFS currently supports a large number of data sources, including any JDBC-enabled source, Snowflake, Data Lake, Redshift, BigQuery, S3, ADLS, GCS, RDS, and Kafka
In the image below, we can see that HSFS currently supports a large number of data sources, including any JDBC-enabled source, Snowflake, Data Lake, Redshift, BigQuery, S3, ADLS, GCS, SQL, and Kafka

<img src="../../../../assets/images/concepts/fs/fg-connector-api.svg">
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ hide:
<div class="db_frame-top"></div>
<div class="db_frame-mid"></div>
</div>
<div class="name_item db"><a href="./user_guides/fs/data_source/creation/rds/">RDS</a></div>
<div class="name_item db"><a href="./user_guides/fs/data_source/creation/sql/">SQL</a></div>
</div>
</div>
</div>
Expand Down
64 changes: 0 additions & 64 deletions docs/user_guides/fs/data_source/creation/rds.md

This file was deleted.

64 changes: 64 additions & 0 deletions docs/user_guides/fs/data_source/creation/sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# How-To set up an SQL Data Source

## Introduction

The SQL Data Source connects Hopsworks to a Relational Database Service such as MySQL or PostgreSQL.
Using this connector, you can query and update data in your relational database from Hopsworks.

In this guide, you will configure a Data Source in Hopsworks to securely store the authentication information needed to set up a connection to your database instance.
When you're finished, you'll be able to query your SQL database using HSFS APIs.

!!! note
Currently, it is only possible to create data sources in the Hopsworks UI.
You cannot create a data source programmatically.

## Prerequisites

Before you begin, ensure you have the following information from your database instance:

- **Host:** The endpoint for your database instance.

Example from AWS:
1. Go to the AWS Console → `Aurora and RDS`
2. Click on your DB instance.
3. Under `Connectivity & security`, you'll find the endpoint, e.g.:
`mydb.abcdefg1234.us-west-2.rds.amazonaws.com`

- **Database:** The name of the database to connect to.

- **Port:** The port to connect to (e.g. 3306 for MySQL, 5432 for PostgreSQL).

- **Username and Password:** A username and password with the necessary permissions to access the required tables.

## Creation in the UI

### Step 1: Set up a new Data Source

Head to the Data Source View on Hopsworks (1) and set up a new data source (2).

<figure markdown>
![Data Source Creation](../../../../assets/images/guides/fs/data_source/data_source_overview.png)
<figcaption>The Data Source View in the User Interface</figcaption>
</figure>

### Step 2: Enter SQL Settings

Enter the details for your database.
Start by giving the connector a **name** and an optional **description**.

1. Select "SQL" as the storage.
2. Select the database type (e.g. MySQL or PostgreSQL).
3. Enter the host endpoint.
4. Enter the database name.
5. Specify the port.
6. Provide the username and password.
7. Click on "Save Credentials".

<figure markdown>
![SQL Connector Creation](../../../../assets/images/guides/fs/data_source/sql_creation.png)
<figcaption>SQL Connector Creation Form</figcaption>
</figure>

## Next Steps

Move on to the [usage guide for data sources](../usage.md) to see how you can use your newly created SQL connector.
2 changes: 1 addition & 1 deletion docs/user_guides/fs/data_source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ For AWS the following storage systems are supported:

1. [S3](creation/s3.md): Read data from a variety of file based storage in S3 such as parquet or CSV.
2. [Redshift](creation/redshift.md): Query Redshift databases and tables using SQL.
3. [RDS](creation/rds.md): Query Amazon RDS (Relational Database Service) using SQL.
3. [SQL](creation/sql.md): Query Amazon SQL (Relational Database Service) using SQL.

## Azure

Expand Down
22 changes: 19 additions & 3 deletions docs/user_guides/fs/feature_group/create_external.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,15 +158,31 @@ To create a feature group, proceed by clicking `Next: Select Tables` once all of
</figure>
</p>

In the UI you can either select one or more tables or define a custom SQL query.

### Option A: Select tables

The database navigation structure depends on your specific data source.
You'll navigate through the appropriate hierarchy for your platform—such as Database → Schema → Table for Snowflake, or Project → Dataset → Table for BigQuery.

In the UI you can select one or more tables, for each selected table, you must designate one or more columns as primary keys before proceeding.
You can also optionally select a single column as a timestamp for the row (supported types are timestamp, date and bigint), edit names and data types of individual columns you want to include.
Select one or more tables. For each selected table, you must designate one or more columns as primary keys before proceeding.
You can also optionally select a single column as a timestamp for the row (supported types are timestamp, date and bigint), and edit names and data types of individual columns you want to include.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/fs/data_source/configure_feature_group_table.png" style="border: 10px solid #f5f5f5" alt="Select Table in Data Sources and specify features">
</figure>
</p>

### Option B: Define a SQL query

Instead of selecting a table, you can write a custom SQL query to define the feature group.
This is useful when you need to join multiple tables or apply transformations at read time.
As with the table option, you must designate one or more columns as primary keys and optionally select a timestamp column.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/fs/data_source/configure_feature_group.png" style="border: 10px solid #f5f5f5" alt="Select Table in Data Sources and specify features">
<img src="../../../../assets/images/guides/fs/data_source/configure_feature_group_query.png" style="border: 10px solid #f5f5f5" alt="Define a SQL query in Data Sources and specify features">
</figure>
</p>

Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ nav:
- ADLS: user_guides/fs/data_source/creation/adls.md
- BigQuery: user_guides/fs/data_source/creation/bigquery.md
- GCS: user_guides/fs/data_source/creation/gcs.md
- RDS: user_guides/fs/data_source/creation/rds.md
- SQL: user_guides/fs/data_source/creation/sql.md
- Usage: user_guides/fs/data_source/usage.md
- Feature Group:
- user_guides/fs/feature_group/index.md
Expand Down