ray-project
diff --git a/‎_toc.yml‎
Lines changed: 24 additions & 0 deletions b/‎_toc.yml‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎courses/01_examples/01_Ray_Data_batch_inference.ipynb‎
Lines changed: 1141 additions & 0 deletions b/‎courses/01_examples/01_Ray_Data_batch_inference.ipynb‎
Lines changed: 1141 additions & 0 deletions
diff --git a/‎courses/01_examples/02_Ray_Data_data_processing.ipynb‎
Lines changed: 835 additions & 0 deletions b/‎courses/01_examples/02_Ray_Data_data_processing.ipynb‎
Lines changed: 835 additions & 0 deletions
diff --git a/‎courses/01_examples/03_Ray_Serve_online_serving.ipynb‎
Lines changed: 368 additions & 0 deletions b/‎courses/01_examples/03_Ray_Serve_online_serving.ipynb‎
Lines changed: 368 additions & 0 deletions
@@ -64,6 +64,30 @@ parts:
   - file: courses/00_Developer_Intro_to_Ray/output/05_Intro_Ray_Serve_PyTorch_04.ipynb
   - file: courses/00_Developer_Intro_to_Ray/output/05_Intro_Ray_Serve_PyTorch_05.ipynb
   - file: courses/00_Developer_Intro_to_Ray/output/README_01.ipynb
+- caption: 01 Examples
+  chapters:
+  - file: courses/01_examples/output/01_Ray_Data_batch_inference_01.ipynb
+  - file: courses/01_examples/output/01_Ray_Data_batch_inference_02.ipynb
+  - file: courses/01_examples/output/01_Ray_Data_batch_inference_03.ipynb
+  - file: courses/01_examples/output/01_Ray_Data_batch_inference_04.ipynb
+  - file: courses/01_examples/output/01_Ray_Data_batch_inference_05.ipynb
+  - file: courses/01_examples/output/01_Ray_Data_batch_inference_06.ipynb
+  - file: courses/01_examples/output/02_Ray_Data_data_processing_01.ipynb
+  - file: courses/01_examples/output/02_Ray_Data_data_processing_02.ipynb
+  - file: courses/01_examples/output/02_Ray_Data_data_processing_03.ipynb
+  - file: courses/01_examples/output/02_Ray_Data_data_processing_04.ipynb
+  - file: courses/01_examples/output/02_Ray_Data_data_processing_05.ipynb
+  - file: courses/01_examples/output/02_Ray_Data_data_processing_06.ipynb
+  - file: courses/01_examples/output/03_Ray_Serve_online_serving_01.ipynb
+  - file: courses/01_examples/output/03_Ray_Serve_online_serving_02.ipynb
+  - file: courses/01_examples/output/03_Ray_Serve_online_serving_03.ipynb
+  - file: courses/01_examples/output/03_Ray_Serve_online_serving_04.ipynb
+  - file: courses/01_examples/output/04_Ray_Train_distributed_training_01.ipynb
+  - file: courses/01_examples/output/04_Ray_Train_distributed_training_02.ipynb
+  - file: courses/01_examples/output/04_Ray_Train_distributed_training_03.ipynb
+  - file: courses/01_examples/output/04_Ray_Train_distributed_training_04.ipynb
+  - file: courses/01_examples/output/04_Ray_Train_distributed_training_05.ipynb
+  - file: courses/01_examples/output/04_Ray_Train_distributed_training_06.ipynb
 - caption: Anyscale 101
   chapters:
   - file: courses/anyscale_101/output/101_anyscale_intro_jobs_01.ipynb
 
@@ -0,0 +1,368 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Online Model Serving with Ray Serve\n",
+    "© 2025, Anyscale. All Rights Reserved"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "💻 **Launch Locally**: You can run this notebook locally.\n",
+    "\n",
+    "🚀 **Launch on Cloud**: Think about running this notebook on a Ray Cluster (Click [here](http://console.anyscale.com/register) to easily start a Ray cluster on Anyscale)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Model serving is the process of deploying machine learning models to production so that they can be accessed and \n",
+    "used by applications or users. It involves creating an API or interface that allows users to send requests to the model\n",
+    "and receive predictions in response. There are several libraries and frameworks available for model serving, \n",
+    "each with its own features and capabilities. In this notebook, we showcase Ray Serve and FastAPI to deploy a sentiment analysis\n",
+    "machine learning (ML) model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### What is Ray Serve?\n",
+    "Ray Serve is a scalable model serving library that allows you to deploy and manage machine learning models in production.\n",
+    "With Ray Serve, you can easily create a scalable and distributed serving architecture that\n",
+    "can handle high traffic and large workloads. It is built on top of Ray, a distributed computing framework\n",
+    "that allows you to run Python code in parallel across multiple machines. \n",
+    "Ray Serve provides a simple API for deploying and managing models, as well as features like autoscaling,\n",
+    "load balancing, and versioning.\n",
+    "\n",
+    "Ray Serve is designed to be easy to use and integrate with existing machine learning workflows.\n",
+    "It supports a wide range of machine learning frameworks, including TensorFlow, PyTorch, and Scikit-learn.\n",
+    "Ray Serve also provides a simple way to deploy models as REST APIs, using FastAPI,\n",
+    "making it easy to integrate with web applications and other services.\n",
+    "\n",
+    "More information: https://docs.ray.io/en/latest/serve/index.html\n",
+    "\n",
+    "\n",
+    "### Why not use just FastAPI or Flask?\n",
+    "We could have simply used FastAPI or Flask to create a REST API for the model,\n",
+    "but Ray Serve provides additional features like autoscaling and load balancing that \n",
+    "make it a better choice for production deployments. Ray Serve also allows you to easily\n",
+    "deploy multiple models and manage their versions, which can be useful in a production environment \n",
+    "where you may need to deploy multiple models or update existing ones.\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Outline\n",
+    "<div class=\"alert alert-block alert-info\">\n",
+    "<ul>\n",
+    "    <li>Architecture\n",
+    "    <li>Import Libraries\n",
+    "    <li>FastAPI service to accept HTTP requests and scaling with Ray Serve\n",
+    "    <li>Simulate Client: Send test requests\n",
+    "    <li>Shutdown the Serve app and the ray cluster\n",
+    "</ul>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Architecture"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Architecture Diagram](https://lz-public-demo.s3.us-east-1.amazonaws.com/anyscale101/01_examples/03_Ray_Serve_architecture.svg?sanitize=true)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Import libraries\n",
+    "In addition to ray and serve, we also import FastAPI to create webservice and Hugging Face transformers to download ML models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import ray serve and FastAPI libraries\n",
+    "import ray\n",
+    "from ray import serve\n",
+    "from fastapi import FastAPI\n",
+    "\n",
+    "# library for pre-trained models\n",
+    "from transformers import pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## FastAPI webservice and deploy a model\n",
+    "FastAPI is used to create a webservice 'app' to accept HTTP requests.\n",
+    "\n",
+    "MySentimentModel class loads the ML model and defines *predict* function for online inference. @serve.deployment decorator defines the Ray Serve deployment.\n",
+    "\n",
+    "*@app.get()* is used to create a GET '/predict' route. Similarly, @app.post() can be used POST requests. See https://docs.ray.io/en/latest/serve/http-guide.html for more details.\n",
+    "\n",
+    "In this example, *application_logic()* function is used to define a sample transformation or business logic that can be applied before sending the input to the ML model for inference. See inline comments for further explanation.\n",
+    "\n",
+    "### Scaling deployment\n",
+    "*num_replicas* parameter sets the number of instances of the deployment. FastAPI and RayServe automatically load balances to send requests to each instance. There are more options to set the *accelerator_type* to GPU and even use fractional GPUs. See configuration options here: https://docs.ray.io/en/latest/serve/configure-serve-deployment.html ."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Define a simple FastAPI app\n",
+    "app = FastAPI()\n",
+    "\n",
+    "# Define a Ray Serve deployment\n",
+    "# This decorator registers the class as a Ray Serve deployment\n",
+    "@serve.deployment(num_replicas=2) # num_replicas specifies the number of replicas for load balancing\n",
+    "@serve.ingress(app) # This decorator allows the FastAPI app to be served by Ray Serve\n",
+    "class MySentimentModel:\n",
+    "    def __init__(self):\n",
+    "        # Load a pre-trained sentiment analysis model\n",
+    "        self.model = pipeline(\"sentiment-analysis\",\n",
+    "                              model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+    "\n",
+    "    # Define any necessary application logic or transformation logic\n",
+    "    def application_logic(self, text):\n",
+    "        \"\"\"        Apply any necessary application logic to the input text.\n",
+    "        \"\"\"\n",
+    "        # simple application logic: truncate text if it exceeds a certain length\n",
+    "        if len(text) > 50:\n",
+    "            return text[:50].lower()  # Truncate and convert to lowercase\n",
+    "        else:\n",
+    "            return text.lower()\n",
+    "        \n",
+    "    @app.get(\"/predict\") # Define an endpoint for predictions\n",
+    "    def predict(self, text: str):\n",
+    "        \"\"\"        Predict sentiment for the given text.\n",
+    "        \"\"\"\n",
+    "        # Define any necessary application logic or transformation logic\n",
+    "        text = self.application_logic(text) # Apply any necessary application logic to the input text\n",
+    "\n",
+    "        # Use the model to make a prediction\n",
+    "        result = self.model(text)\n",
+    "        return {\"text\": text, \"sentiment\": result}\n",
+    " \n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    " ### Deploy the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2025-07-11 09:16:58,624\tINFO worker.py:1908 -- Started a local Ray instance. View the dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8265 \u001b[39m\u001b[22m\n",
+      "\u001b[36m(ProxyActor pid=56603)\u001b[0m INFO 2025-07-11 09:16:59,927 proxy 127.0.0.1 -- Proxy starting on node c2986ec5ee9361515a52f3c506843d5d247ffc6732250203b8a47f21 (HTTP port: 8000).\n",
+      "\u001b[36m(ProxyActor pid=56603)\u001b[0m INFO 2025-07-11 09:16:59,946 proxy 127.0.0.1 -- Got updated endpoints: {}.\n",
+      "INFO 2025-07-11 09:16:59,975 serve 18071 -- Started Serve in namespace \"serve\".\n",
+      "\u001b[36m(ServeController pid=56591)\u001b[0m INFO 2025-07-11 09:17:00,075 controller 56591 -- Deploying new version of Deployment(name='MySentimentModel', app='default') (initial target replicas: 2).\n",
+      "\u001b[36m(ServeController pid=56591)\u001b[0m INFO 2025-07-11 09:17:00,178 controller 56591 -- Adding 2 replicas to Deployment(name='MySentimentModel', app='default').\n",
+      "\u001b[36m(ProxyActor pid=56603)\u001b[0m INFO 2025-07-11 09:17:00,076 proxy 127.0.0.1 -- Got updated endpoints: {Deployment(name='MySentimentModel', app='default'): EndpointInfo(route='/', app_is_cross_language=False)}.\n",
+      "\u001b[36m(ProxyActor pid=56603)\u001b[0m INFO 2025-07-11 09:17:00,080 proxy 127.0.0.1 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x169597d10>.\n",
+      "\u001b[36m(ServeReplica:default:MySentimentModel pid=56595)\u001b[0m Device set to use mps:0\n",
+      "INFO 2025-07-11 09:17:05,114 serve 18071 -- Application 'default' is ready at http://127.0.0.1:8000/.\n",
+      "INFO 2025-07-11 09:17:05,120 serve 18071 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x337673dd0>.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "DeploymentHandle(deployment='MySentimentModel')"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\u001b[36m(ServeReplica:default:MySentimentModel pid=56599)\u001b[0m INFO 2025-07-11 09:20:39,996 default_MySentimentModel kcd1ofsq 1d920c68-1c59-4e2e-a64c-04f87d31bab7 -- GET /predict 200 128.0ms\n",
+      "\u001b[36m(ServeReplica:default:MySentimentModel pid=56595)\u001b[0m INFO 2025-07-11 09:21:11,350 default_MySentimentModel t4o4t2pu dda6dadc-f201-421d-9bbc-638d92b9f065 -- GET /predict 200 543.2ms\n",
+      "\u001b[36m(ServeReplica:default:MySentimentModel pid=56599)\u001b[0m INFO 2025-07-11 09:21:15,969 default_MySentimentModel kcd1ofsq e3611603-54d1-491a-8b05-1f693f413243 -- GET /predict 200 36.5ms\n",
+      "\u001b[36m(ServeReplica:default:MySentimentModel pid=56599)\u001b[0m INFO 2025-07-11 09:22:16,285 default_MySentimentModel kcd1ofsq 5be0e806-0bd5-4992-b15d-c59fd52e6195 -- GET /predict 200 422.7ms\n",
+      "\u001b[36m(ServeReplica:default:MySentimentModel pid=56599)\u001b[0m INFO 2025-07-11 09:22:58,953 default_MySentimentModel kcd1ofsq 8890c548-abf6-49b2-b923-b525fcc403da -- GET /predict 200 76.5ms\n",
+      "\u001b[36m(ServeController pid=56591)\u001b[0m INFO 2025-07-11 09:23:20,248 controller 56591 -- Removing 2 replicas from Deployment(name='MySentimentModel', app='default').\n",
+      "\u001b[36m(ServeController pid=56591)\u001b[0m INFO 2025-07-11 09:23:22,283 controller 56591 -- Replica(id='t4o4t2pu', deployment='MySentimentModel', app='default') is stopped.\n",
+      "\u001b[36m(ServeController pid=56591)\u001b[0m INFO 2025-07-11 09:23:22,283 controller 56591 -- Replica(id='kcd1ofsq', deployment='MySentimentModel', app='default') is stopped.\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "serve.run(MySentimentModel.bind()) # Bind the deployment to the Ray Serve runtime\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Simulate Client: Send test requests\n",
+    "\n",
+    "We use *requests* library to send HTTP requests to the deployed model.\n",
+    "\n",
+    "Note: if you encounter any errors with serve not able to start, most likely it is due to previous instance of serve not being shutdown properly. Restart the notebook or see towards the end of notebook to see how to gracefully shutdown ray serve and the ray cluster."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests # used to send HTTP requests to the deployed model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'text': 'i love ray serve!', 'sentiment': [{'label': 'POSITIVE', 'score': 0.9998507499694824}]}\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "# Query the deployed model\n",
+    "response = requests.get(\"http://localhost:8000/predict\", params={\"text\": \"I love Ray Serve!\"})\n",
+    "print(response.json())  # Should print the sentiment analysis result\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'text': 'mars landscape is a tough place to live, but it ha', 'sentiment': [{'label': 'NEGATIVE', 'score': 0.9590334296226501}]}\n"
+     ]
+    }
+   ],
+   "source": [
+    "text = \"Mars landscape is a tough place to live, but it has its own beauty. Venus is even tougher, with its extreme heat and pressure. \"\n",
+    "response = requests.get(\"http://localhost:8000/predict\", params={\"text\": text})\n",
+    "print(response.json())  # prints result, note the text is truncated to 50 characters in the application logic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'text': 'national parks are a great way to experience natur', 'sentiment': [{'label': 'POSITIVE', 'score': 0.9996353387832642}]}\n"
+     ]
+    }
+   ],
+   "source": [
+    "text = \"National Parks are a great way to experience nature and wildlife.\"\n",
+    "response = requests.get(\"http://localhost:8000/predict\", params={\"text\": text})\n",
+    "print(response.json())  # # prints result, note the text is truncated to 50 characters in the application logic"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Shutdown the Ray Serve instances and Ray Cluster"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# stop ray serve\n",
+    "serve.shutdown()  # Shutdown Ray Serve when done, ray cluster will still be running"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ray.shutdown()  # Shutdown Ray cluster"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Summary\n",
+    "In this notebook, we deployed a sentiment analysis model from Hugging Face using Ray Serve and FastAPI. Using *num_replicas* we scaled the number of instances of the model. There are many more options to autoscale to increase the replicas when the traffic is high and downscale to zero when there is no traffic."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}