Deploy Open Source Models on Paperspace with OpenAI API Compatibility
Here’s a straightforward method to set up your own OpenAI-compatible API endpoint on Paperspace Gradient. This will allow you to deploy open source models and integrate them with other OpenAI tools.
Paperspace is a cloud-based machine learning platform that offers GPU-powered virtual machines and a Kubernetes-based container service. Paperspace Deployments are containers-as-a-service that allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API.
-
Set Up Your Paperspace Account:
- Navigate to Gradient > Deployments > Create.
-
Select Your Hardware:
- Choose a GPU for your deployment, such as the P4000, available at $0.51/hr. Remember, you can start and stop the GPU as needed to manage costs.
-
Configure Your Docker Image:
- Use the Docker image
ollama/ollama:latest
. For more details on this image, visit Ollama GitHub page.
- Use the Docker image
-
Set the Ports:
- Specify the port as 11434 for your deployment.
-
Deployment and Access:
- Upon deployment, you’ll receive an HTTPS endpoint.
- Pull a new model image, for example,
llama3
, using the following command:curl https://<yourendpoint>.paperspacegradient.com/api/pull -d '{"name": "llama3"}'
-
You’re done! You now have an OpenAI-compatible API endpoint available at your Gradient URL. It will work with OpenAI-compatible tools.
Paperspace Deployment Configuration JSON:
{
"apiVersion": "v1",
"image": "ollama/ollama:latest",
"name": "ollama",
"enabled": false,
"resources": {
"machineType": "RTX4000",
"replicas": 1,
"ports": [
11434
]
}
}