UPDATED: Add ListSource and DataFrameSource by brian-arnold · Pull Request #31 · walkerlab/orcapod-python

brian-arnold · 2025-07-06T04:07:40Z

I manually rebased the brian branch from the dev branch and added this code back in,

I addressed your previous comments regarding specifying expected_tag_keys as row_index for the dataframesource and element_index for the ListSource. There was one comment you made that I didn't know how to interpret, made on the line specifying the tag_function in the init section of the dataframeSource class:

"Would you mind adding a feature where if a list of strings are passed, they are interpreted as columns whose values should be used as the tags?"

I'm assuming you mean the tags should have the same keys as the column named passed in? Just double checking.

Copilot

Pull Request Overview

This PR adds two new source types—DataFrameSource and ListSource—to enable streaming data from pandas DataFrames and Python lists with customizable tagging.

Introduces DataFrameSource for iterating over DataFrame rows as tagged packets
Introduces ListSource for iterating over list elements as tagged packets
Updates src/orcapod/core/sources.py with new imports and class definitions

Comments suppressed due to low confidence (3)

src/orcapod/core/sources.py:336

The docstring refers to 'expected_keys' but the parameter is named 'expected_tag_keys'. Update the doc to match the code.

        If expected_keys are provided, they will be used instead of the default keys.

src/orcapod/core/sources.py:494

The docstring refers to 'expected_keys' but the parameter is named 'expected_tag_keys'. Update the doc to match the code.

        If expected_keys are provided, they will be used instead of the default keys.

src/orcapod/core/sources.py:207

New classes DataFrameSource and ListSource have been introduced without accompanying unit tests. Please add tests to cover the core functionality and edge cases.

class DataFrameSource(Source):

Copilot · 2025-07-06T04:08:51Z

src/orcapod/core/sources.py

 from typing import Any, Literal
-
+import pandas as pd
 import polars as pl


The import 'polars as pl' is not used in this file. Consider removing it to keep imports clean.

Copilot · 2025-07-06T04:08:51Z

src/orcapod/core/sources.py

+            for idx, row in self.dataframe.iterrows():
+                tag = self.tag_function(row, idx)
+                packet = {col: row[col] for col in self.columns}


Using DataFrame.iterrows() can be slow for large data sets. Consider iterating with itertuples or another vectorized approach for better performance.

Suggested change

for idx, row in self.dataframe.iterrows():

tag = self.tag_function(row, idx)

packet = {col: row[col] for col in self.columns}

for row in self.dataframe.itertuples(index=True, name=None):

idx, *values = row

row_series = pd.Series(values, index=self.dataframe.columns)

tag = self.tag_function(row_series, idx)

packet = {col: row_series[col] for col in self.columns}

Copilot · 2025-07-06T04:08:52Z

src/orcapod/core/sources.py

+        if missing_columns:
+            raise ValueError(f"Columns not found in DataFrame: {missing_columns}")
+
+        if tag_function is None:


[nitpick] Similar tag_function initialization logic appears in both DataFrameSource and ListSource—consider abstracting this into a shared helper or the base class to reduce duplication.

Add ListSource and DataFrameSource

608428f

brian-arnold requested review from Copilot and eywalker July 6, 2025 04:07

brian-arnold marked this pull request as ready for review July 6, 2025 04:08

Copilot AI reviewed Jul 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPDATED: Add ListSource and DataFrameSource#31

UPDATED: Add ListSource and DataFrameSource#31
brian-arnold wants to merge 1 commit intodevfrom
brian

brian-arnold commented Jul 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 6, 2025

Uh oh!

Copilot AI Jul 6, 2025

Uh oh!

Copilot AI Jul 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            for idx, row in self.dataframe.iterrows():
-                tag = self.tag_function(row, idx)
-                packet = {col: row[col] for col in self.columns}
+            for row in self.dataframe.itertuples(index=True, name=None):
+                idx, *values = row
+                row_series = pd.Series(values, index=self.dataframe.columns)
+                tag = self.tag_function(row_series, idx)
+                packet = {col: row_series[col] for col in self.columns}

Conversation

brian-arnold commented Jul 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant