Skip to content

Commit 29d1538

Browse files
committed
Release v0.8.10
- Updated really old README - Fixed UMAP fitting - Fixed SVD fitting - Adds proteinmpnn and boltzgen - Adds experimental methods on Protein
1 parent 237b0be commit 29d1538

File tree

16 files changed

+856
-41
lines changed

16 files changed

+856
-41
lines changed

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# GitHub syntax highlighting
2+
pixi.lock linguist-language=YAML linguist-generated=true

README.md

Lines changed: 15 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[![PyPI version](https://badge.fury.io/py/openprotein-python.svg)](https://pypi.org/project/openprotein-python/)
2-
[![Coverage](https://dev.docs.openprotein.ai/api-python/_images/coverage.svg)](https://pypi.org/project/openprotein-python/)
2+
[![Coverage](https://docs.openprotein.ai/_static/coverage.svg)](https://pypi.org/project/openprotein-python/)
33
[![Conda version](https://anaconda.org/openprotein/openprotein-python/badges/version.svg)](https://anaconda.org/openprotein/openprotein-python)
44

55

@@ -10,19 +10,19 @@ The OpenProtein.AI Python Interface provides a user-friendly library to interact
1010

1111
# Table of Contents
1212

13-
| | Workflow | Description |
14-
|---|----------------------------------------------------|------------------------------------------------------|
15-
| 0 | [`Quick start`](#Quick-start) | Quick start guide |
16-
| 1 | [`Installation`](https://docs.openprotein.ai/api-python/installation.html) | Install guide for pip and conda. |
17-
| 2 | [`Session management`](https://docs.openprotein.ai/api-python/overview.html) | An overview of the OpenProtein Python Client & the asynchronous jobs system. |
18-
| 3 | [`Asssay-based Sequence Learning`](https://docs.openprotein.ai/api-python/core_workflow.html) | Covers core tasks such as data upload, model training & prediction, and sequence design. |
19-
| 4 | [`De Novo prediction & generative models (PoET)`](https://docs.openprotein.ai/api-python/poet_workflow.html) | Covers PoET, a protein LLM for *de novo* scoring, as well as sequence generation. |
20-
| 5 | [`Protein Language Models & Embeddings`](https://docs.openprotein.ai/api-python/embedding_workflow.html) | Covers methods for creating sequence embeddings with proprietary & open-source models. |
13+
| | Workflow | Description |
14+
|---|--------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
15+
| 0 | [`Quick start`](#Quick-start) | Quick start guide |
16+
| 1 | [`Installation`](https://docs.openprotein.ai/python-api/installation.html) | Install guide for pip and conda. |
17+
| 2 | [`Session management`](https://docs.openprotein.ai/python-api/index.html) | An overview of the OpenProtein Python Client & the asynchronous jobs system. |
18+
| 3 | [`Property-Regression-Models`](https://docs.openprotein.ai/python-api/property-regression-models/index.html) | Covers core tasks such as data upload, model training & prediction, and sequence design. |
19+
| 4 | [`De Novo prediction & generative models (PoET)`](https://docs.openprotein.ai/python-api/poet/index.html) | Covers PoET, a protein LLM for *de novo* scoring, as well as sequence generation. |
20+
| 5 | [`Foundational models`](https://docs.openprotein.ai/python-api/foundation-models/index.html) | Covers methods for creating sequence embeddings with proprietary & open-source models. |
2121

2222

2323
# Quick-start
2424

25-
Get started with our quickstart README! You can peruse the [official documentation](https://docs.openprotein.ai/api-python/) for more details!
25+
Get started with our quickstart README! You can peruse the [official documentation](https://docs.openprotein.ai/python-api/) for more details!
2626
## Installation
2727

2828
To install the python interface using pip, run the following command:
@@ -37,16 +37,13 @@ conda install -c openprotein openprotein-python
3737

3838
### Requirements
3939

40-
- Python 3.8 or higher.
41-
- pydantic version 1.0 or newer.
42-
- requests version 2.0 or newer.
43-
- tqdm version 4.0 or newer.
44-
- pandas version 1.0 or newer.
40+
- Python 3.10 or higher.
41+
- Other dependencies as in `pyproject.toml`
4542

4643
# Getting started
4744

4845

49-
Read on below for the quick-start guide, or see the [docs](https://docs.openprotein.ai/api-python/) for more information!
46+
Read on below for the quick-start guide, or see the [docs](https://docs.openprotein.ai/python-api/) for more information!
5047

5148
To begin, create a session using your login credentials.
5249
```
@@ -57,10 +54,10 @@ session = openprotein.connect(USERNAME, PASSWORD)
5754
```
5855
## Job Status
5956

60-
The interface offers `AsyncJobFuture` objects for asynchronous calls, allowing tracking of job status and result retrieval when ready. Given a future, you can check its status and retrieve results.
57+
The interface offers `Future` objects for asynchronous calls, allowing tracking of job status and result retrieval when ready. Given a future, you can check its status and retrieve results.
6158

6259
### Checking Job Status
63-
Check the status of an `AsyncJobFuture` using the following methods:
60+
Check the status of an `Future` using the following methods:
6461
```
6562
future.refresh() # call the backend to update the job status
6663
future.done() # returns True if the job is done, meaning the status could be SUCCESS, FAILED, or CANCELLED

openprotein/embeddings/future.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,10 @@ def sequences(self) -> list[bytes] | list[str]:
114114
return self._sequences
115115

116116
def stream(self) -> Generator:
117-
if self.job_type == JobType.poet_generate:
117+
if (
118+
self.job_type == JobType.poet_generate
119+
or self.job_type == JobType.embeddings_generate
120+
):
118121
stream = api.request_get_generate_result(
119122
session=self.session, job_id=self.id
120123
)

openprotein/embeddings/models.py

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -327,19 +327,13 @@ def fit_umap(
327327
"Expected either assay or sequences to fit UMAP on!"
328328
)
329329
# get assay_id
330-
assay_id = (
331-
assay.assay_id
332-
if isinstance(assay, AssayMetadata)
333-
else assay.id if isinstance(assay, AssayDataset) else assay
334-
)
335-
model_id = self.id
336330
return umap_api.fit_umap(
337-
model_id=model_id,
331+
model=self,
332+
reduction=reduction,
338333
feature_type=FeatureType.PLM,
339334
sequences=sequences,
340-
assay_id=assay_id,
335+
assay=assay,
341336
n_components=n_components,
342-
reduction=reduction,
343337
**kwargs,
344338
)
345339

openprotein/embeddings/poet.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -369,7 +369,7 @@ def fit_umap(
369369
sequences: list[bytes] | list[str] | None = None,
370370
assay: AssayDataset | None = None,
371371
n_components: int = 2,
372-
reduction: ReductionType | None = ReductionType.MEAN,
372+
reduction: ReductionType = ReductionType.MEAN,
373373
**kwargs,
374374
) -> "UMAPModel":
375375
"""

openprotein/embeddings/schemas.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ def __getitem__(self, i):
3737

3838
class EmbeddingsJob(Job, BatchJob):
3939

40-
job_type: Literal[JobType.embeddings_embed, JobType.embeddings_embed_reduced] = Field(
41-
default=JobType.embeddings_embed
40+
job_type: Literal[JobType.embeddings_embed, JobType.embeddings_embed_reduced] = (
41+
Field(default=JobType.embeddings_embed)
4242
)
4343

4444

@@ -75,4 +75,6 @@ class ScoreSingleSiteJob(Job, BatchJob):
7575

7676
class GenerateJob(Job, BatchJob):
7777

78-
job_type: Literal[JobType.poet_generate] = Field(default=JobType.poet_generate)
78+
job_type: Literal[JobType.poet_generate, JobType.embeddings_generate] = Field(
79+
default=JobType.poet_generate
80+
)

openprotein/jobs/schemas.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ class JobType(str, Enum):
4646
embeddings_attn = "/embeddings/attn"
4747
embeddings_logits = "/embeddings/logits"
4848
embeddings_embed_reduced = "/embeddings/embed_reduced"
49+
embeddings_generate = "/embeddings/generate"
4950

5051
svd_fit = "/svd/fit"
5152
svd_embed = "/svd/embed"

openprotein/models/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
"""Make the ModelsAPI class available on the package."""
22

3+
from .foundation.boltzgen import BoltzGenFuture, BoltzGenJob
4+
from .foundation.proteinmpnn import ProteinMPNNModel
35
from .foundation.rfdiffusion import RFdiffusionFuture, RFdiffusionJob
46
from .models import ModelsAPI
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
"""BoltzGen model for protein structure and sequence design."""
2+
3+
from typing import Any, BinaryIO, Literal
4+
5+
from pydantic import BaseModel, Field
6+
7+
from openprotein.base import APISession
8+
from openprotein.common import ModelMetadata
9+
from openprotein.common.model_metadata import ModelDescription
10+
from openprotein.jobs import Future, Job
11+
from openprotein.models.base import ProteinModel
12+
from openprotein.protein import Protein
13+
14+
15+
class BoltzGenRequest(BaseModel):
16+
"Specification for an BoltzGen request."
17+
18+
n: int = 1
19+
# protein: Protein
20+
structure_text: str | None = None
21+
design_spec: dict[str, Any]
22+
diffusion_batch_size: int | None = None
23+
step_scale: float | None = None
24+
noise_scale: float | None = None
25+
26+
27+
class BoltzGenJob(Job):
28+
"""Job schema for an BoltzGen request."""
29+
30+
job_type: Literal["/models/boltzgen"]
31+
32+
33+
class BoltzGenFuture(Future):
34+
"""Future for handling the results of an BoltzGen job."""
35+
36+
job: BoltzGenJob
37+
38+
def get_pdb(self, replicate: int = 0) -> str:
39+
"""
40+
Retrieve the PDB file for a specific design.
41+
42+
Args:
43+
design_index (int): The 0-based index of the design to retrieve.
44+
45+
Returns:
46+
str: The content of the PDB file as a string.
47+
"""
48+
return _boltzgen_api_result_get(
49+
session=self.session, job_id=self.id, replicate=replicate
50+
)
51+
52+
def get(self, replicate: int = 0):
53+
"""Default result accessor, returns the first PDB."""
54+
# TODO handle different design index
55+
return self.get_pdb(replicate=replicate)
56+
57+
58+
def _boltzgen_api_post(
59+
session: APISession, request: BoltzGenRequest, **kwargs
60+
) -> BoltzGenJob:
61+
"""
62+
POST a request for BoltzGen design.
63+
64+
Returns a Job object that can be used to retrieve results later.
65+
"""
66+
endpoint = "v1/design/models/boltzgen"
67+
body = request.model_dump(exclude_none=True)
68+
body.update(kwargs)
69+
response = session.post(endpoint, json=body)
70+
return BoltzGenJob.model_validate(response.json())
71+
72+
73+
def _boltzgen_api_get_metadata(session: APISession) -> ModelMetadata:
74+
"""
75+
POST a request for BoltzGen design.
76+
77+
Returns a Job object that can be used to retrieve results later.
78+
"""
79+
endpoint = f"v1/design/models/boltzgen"
80+
response = session.get(endpoint)
81+
return ModelMetadata.model_validate(response.json())
82+
83+
84+
def _boltzgen_api_result_get(
85+
session: APISession, job_id: str, replicate: int = 0
86+
) -> str:
87+
"""
88+
POST a request for BoltzGen design.
89+
90+
# Returns a Job object that can be used to retrieve results later.
91+
"""
92+
endpoint = f"v1/design/{job_id}/results"
93+
response = session.get(endpoint, params={"replicate": replicate})
94+
return response.text
95+
96+
97+
class BoltzGenModel(ProteinModel):
98+
"""
99+
BoltzGen model for generating de novo protein structures.
100+
101+
This model supports functionalities like unconditional design, scaffolding,
102+
and binder design.
103+
"""
104+
105+
model_id: str = "boltzgen"
106+
107+
def __init__(self, session: APISession, model_id: str = "boltzgen"):
108+
# The model_id from the API might be more specific, e.g., "boltzgen-v1.1"
109+
super().__init__(session, model_id)
110+
111+
def get_metadata(self) -> ModelMetadata:
112+
return ModelMetadata(
113+
model_id="boltzgen",
114+
description=ModelDescription(summary="BoltzGen"),
115+
dimension=0,
116+
output_types=["pdb"],
117+
input_tokens=[],
118+
token_descriptions=[[]],
119+
)
120+
121+
def generate(
122+
self,
123+
design_spec: dict[str, Any],
124+
structure_file: str | bytes | BinaryIO | None = None,
125+
n: int = 1,
126+
diffusion_batch_size: int | None = None,
127+
step_scale: float | None = None,
128+
noise_scale: float | None = None,
129+
**kwargs,
130+
) -> BoltzGenFuture:
131+
"""
132+
Run a protein structure generate job using BoltzGen.
133+
134+
Parameters
135+
----------
136+
design_spec : dict[str, Any]
137+
The BoltzGen design specification to run. This is the Python representation
138+
of the BoltzGen yaml request specification.
139+
structure_file : BinaryIO, optional
140+
An input PDB file (as a file-like object) used for inpainting or other
141+
guided design tasks where parts of an existing structure are provided.
142+
n : int, optional
143+
The number of unique design trajectories to run (default is 1).
144+
diffusion_batch_size : int, optional
145+
The batch size for diffusion sampling. Controls how many samples are
146+
processed in parallel during the diffusion process.
147+
step_scale : float, optional
148+
Scaling factor for the number of diffusion steps. Higher values may
149+
improve quality at the cost of longer generation time.
150+
noise_scale : float, optional
151+
Scaling factor for the noise schedule during diffusion. Controls the
152+
amount of noise added at each step of the reverse diffusion process.
153+
154+
Other Parameters
155+
----------------
156+
**kwargs : dict
157+
Additional keyword args that are passed directly to the boltzgen
158+
inference script. Overwrites any preceding options.
159+
160+
Returns
161+
-------
162+
BoltzGenFuture
163+
A future object that can be used to retrieve the results of the design
164+
job upon completion.
165+
"""
166+
request = BoltzGenRequest(
167+
n=n,
168+
design_spec=design_spec,
169+
diffusion_batch_size=diffusion_batch_size,
170+
step_scale=step_scale,
171+
noise_scale=noise_scale,
172+
)
173+
if structure_file is not None:
174+
if isinstance(structure_file, bytes):
175+
structure_text = structure_file.decode()
176+
elif isinstance(structure_file, str):
177+
structure_text = structure_file
178+
else:
179+
structure_text = structure_file.read().decode()
180+
request.structure_text = structure_text
181+
182+
# Submit the job via the private API function
183+
job = _boltzgen_api_post(
184+
session=self.session,
185+
request=request,
186+
**kwargs,
187+
)
188+
189+
# Return the future object
190+
return BoltzGenFuture(session=self.session, job=job)
191+
192+
predict = generate

0 commit comments

Comments
 (0)