1705 words
9 minutes
Build Your First Zero to Hero Application from Scratch

1. Introduction#

Building an AI application from scratch, particularly one leveraging large language models, can feel overwhelming. In this post, we’ll walk through a detailed, end-to-end workflow for developing a practical “Zero to Hero” application. Rather than a simple overview, we’ll dive deep into the process with extended hands-on examples, training logs, and code snippets, so you have all the information you need to build and deploy a working prototype.

1.1 Outline of This Guide#

We’ll cover:

  • Hardware and software setup
  • How to choose and fine-tune a pre-trained model (like DistilBERT or GPT-like models)
  • Building an API with FastAPI
  • Debugging and optimizing through logs
  • Packaging and deploying your solution with Docker

Throughout the guide, we’ll reference Python scripts and command-line outputs to provide real-world visuals of your training and inference processes.


2. Why Create Your Own “Zero to Hero” Application?#

2.1 Hands-On Learning#

Nothing accelerates your understanding more than direct experimentation. By training your own model, you’ll see firsthand how hyperparameters, data cleaning, and code organization affect performance.

2.2 Rapid Iteration#

Working on demos or following basic tutorials is a good start, but building a real application forces you to iterate more quickly. You’ll learn to refine your data pipeline based on actual feedback and see immediate results from your changes.

2.3 Full Ownership and Customization#

Even if you rely on open-source libraries, you maintain control of your application’s entire stack. That means you can tailor the solution to specific requirements—like domain-specific text classification, advanced prompt engineering, or custom deployment scenarios.


3. Setting Up the Foundation#

3.1 Environment Setup#

3.1.1 Hardware Considerations#

  • Local GPU: NVIDIA GPUs with sufficient VRAM (8GB or more) make local experimentation smoother.
  • Cloud Providers: If local hardware is lacking, AWS EC2 (GPU instances), Azure ML, and Google Cloud (Compute Engine with GPU) are great options. Look for machine images pre-installed with CUDA and popular deep learning frameworks.

3.1.2 Software Stack#

  • Python 3.8 or above (type hints and better async support).
  • PyTorch or TensorFlow (we’ll focus on PyTorch in our examples).
  • Hugging Face Transformers (for easy model loading and fine-tuning).
  • FastAPI or Flask (we’ll choose FastAPI for a more modern async approach).
  • Docker (for containerization, if you plan to deploy at scale).

Below is one approach to organizing your files. Feel free to adapt as your application grows:

my_zero_to_hero_app/
├── data/
│ ├── raw/
│ └── processed/
├── models/
│ ├── checkpoints/
│ └── final/
├── scripts/
│ ├── train.py
│ ├── predict.py
│ └── helpers.py
├── app/
│ ├── main.py
│ └── config.py
├── tests/
│ └── test_app.py
├── requirements.txt
└── Dockerfile

4. Selecting Your Model and Task#

4.1 Task Selection#

Popular tasks for language models include:

  • Sentiment Analysis
  • Named Entity Recognition (NER)
  • Text Summarization
  • Question Answering

For illustration, let’s pick a straightforward task: sentiment analysis. This is common, easy to prototype, and highly versatile.

4.2 Model Preference#

  • Start Small: We’ll look at a DistilBERT-based model first for minimal resource requirements.
  • Scale Later: If performance or accuracy is lacking, you can move up to BERT-base, GPT-3.5, GPT-4, or specialized large language models.

5. Data Preparation#

Whatever the task, you need relevant data. For sentiment analysis:

  1. Collect text samples with sentiment labels (e.g., positive, neutral, negative).
  2. Clean them (remove duplicates, unwanted symbols).
  3. Split into training, validation, and test sets (e.g., 80% training, 10% validation, 10% test).

Below is a sample dataset structure:

data/
├── raw/
│ ├── sentiment_dataset_raw.csv
├── processed/
│ ├── train.csv
│ ├── val.csv
│ └── test.csv

6. Fine-Tuning the Model#

6.1 Installing Required Packages#

Make sure your environment is set up with the correct libraries. For PyTorch and Transformers:

Terminal window
pip install torch==2.0.0
pip install transformers==4.30.0
pip install datasets
pip install fastapi uvicorn

6.2 Training Script (train.py)#

Below is an illustrative script using Hugging Face Transformers. It covers data loading, model initialization, and training loops. We’ll include logs so you can see what typical output looks like.

scripts/train.py
import torch
from torch.utils.data import DataLoader
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
import argparse
def parse_args():
parser = argparse.ArgumentParser(description="Fine-tune DistilBERT for sentiment analysis.")
parser.add_argument("--train_file", type=str, required=True, help="Path to training CSV.")
parser.add_argument("--val_file", type=str, required=True, help="Path to validation CSV.")
parser.add_argument("--epochs", type=int, default=3, help="Number of training epochs.")
parser.add_argument("--batch_size", type=int, default=16, help="Batch size.")
parser.add_argument("--lr", type=float, default=2e-5, help="Learning rate.")
parser.add_argument("--output_dir", type=str, default="models/checkpoints", help="Directory to save model checkpoints.")
return parser.parse_args()
def main():
args = parse_args()
# Load dataset using Hugging Face 'datasets' library
dataset = load_dataset("csv", data_files={"train": args.train_file, "validation": args.val_file})
# Load DistilBert tokenizer
tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
def tokenize_function(example):
return tokenizer(example["text"], padding="max_length", truncation=True, max_length=128)
# Tokenize your dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Rename labels column to 'labels' for HF Trainer
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
train_dataset = tokenized_datasets["train"]
val_dataset = tokenized_datasets["validation"]
# Load model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
training_args = TrainingArguments(
output_dir=args.output_dir,
evaluation_strategy="epoch",
save_strategy="epoch",
num_train_epochs=args.epochs,
per_device_train_batch_size=args.batch_size,
per_device_eval_batch_size=args.batch_size,
learning_rate=args.lr,
logging_steps=10,
logging_dir=f"{args.output_dir}/logs",
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
)
trainer.train()
trainer.save_model(args.output_dir)
if __name__ == "__main__":
main()

6.3 Sample Training Logs#

When you run the script, you might see logs similar to the following:

$ python scripts/train.py --train_file data/processed/train.csv --val_file data/processed/val.csv --epochs 3 --batch_size 16
***** Running training *****
Num examples = 16000
Num Epochs = 3
Instantaneous batch size per device = 16
...
Steps: 10 | Loss: 0.5612 | Learning Rate: 1.999e-05
Steps: 20 | Loss: 0.4425 | Learning Rate: 1.998e-05
Steps: 30 | Loss: 0.3894 | Learning Rate: 1.997e-05
...
Epoch 1: eval_loss=0.3204, eval_accuracy=0.8743
Saving model checkpoint to models/checkpoints/checkpoint-1
...
Epoch 2: eval_loss=0.2855, eval_accuracy=0.8932
Saving model checkpoint to models/checkpoints/checkpoint-2
...
Epoch 3: eval_loss=0.2711, eval_accuracy=0.9025
Saving model checkpoint to models/checkpoints/checkpoint-3
Loading best model from models/checkpoints/checkpoint-3 (score: eval_accuracy=0.9025).
Saving final model to models/checkpoints

Logs like these help track your progress, enabling you to experiment with hyperparameters and see how each experiment performs over time.


7. Building the Application#

7.1 Inference Script (predict.py)#

Once you have a fine-tuned model, create a script to load it and run inference on new text samples.

scripts/predict.py
import torch
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
# Load your final model
MODEL_PATH = "models/checkpoints"
tokenizer = DistilBertTokenizerFast.from_pretrained(MODEL_PATH)
model = DistilBertForSequenceClassification.from_pretrained(MODEL_PATH)
def predict_sentiment(text: str):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = logits.argmax().item()
# 0 for negative, 1 for positive in this example
return "positive" if predicted_class_id == 1 else "negative"
if __name__ == "__main__":
sample_texts = [
"I loved this product, would absolutely buy again!",
"This was terrible, I want my money back."
]
for txt in sample_texts:
sentiment = predict_sentiment(txt)
print(f"Text: {txt} -> Sentiment: {sentiment}")

7.2 Web Framework Setup (main.py + FastAPI)#

Now, let’s expose your inference function via FastAPI so end-users or other applications can call your model with HTTP requests.

app/main.py
from fastapi import FastAPI, Request
from pydantic import BaseModel
from scripts.predict import predict_sentiment
app = FastAPI()
class TextPayload(BaseModel):
text: str
@app.post("/predict")
def predict(payload: TextPayload):
sentiment_label = predict_sentiment(payload.text)
return {"sentiment": sentiment_label}
@app.get("/")
def root():
return {"message": "Welcome to the Zero to Hero Sentiment Analysis API!"}

Running the API:#

uvicorn app.main:app --host 0.0.0.0 --port 8000

Sample logs upon starting the server:

INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

You can now send a POST request to http://localhost:8000/predict with a JSON body {"text": "Your input text"} and receive a sentiment label in response.

Sample cURL request:

Terminal window
curl -X POST -H "Content-Type: application/json" \
-d '{"text": "This new movie trailer is brilliant!"}' \
http://localhost:8000/predict
# Expected Response:
# {"sentiment":"positive"}

8. Testing and Validation#

8.1 Functional Tests#

A small test might look like this:

tests/test_app.py
import pytest
from fastapi.testclient import TestClient
from app.main import app
client = TestClient(app)
def test_root():
response = client.get("/")
assert response.status_code == 200
assert response.json() == {"message": "Welcome to the Zero to Hero Sentiment Analysis API!"}
def test_predict_endpoint():
response = client.post("/predict", json={"text": "I love this!"})
assert response.status_code == 200
assert "sentiment" in response.json()

Running tests:

pytest tests/

8.2 Load Tests#

For concurrency and performance testing, try a tool like Locust or Apache JMeter. Example using Locust:

Terminal window
locust -f locustfile.py

Where locustfile.py might look like:

from locust import HttpUser, between, task
class APILoadTest(HttpUser):
wait_time = between(1, 5)
@task
def predict_sentiment(self):
self.client.post("/predict", json={"text": "The user experience is amazing so far."})

9. Deployment Approaches#

9.1 Containerization with Docker#

Creating a Dockerfile ensures consistency across different environments:

# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
# Expose FastAPI port
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Then, build and run your container:

Terminal window
docker build -t zero-to-hero-app .
docker run -p 8000:8000 zero-to-hero-app

9.2 Cloud Deployment#

  • AWS ECS or EKS: Push your Docker image to ECR, then orchestrate with ECS or run on Kubernetes via EKS.
  • Serverless: For smaller applications with sporadic traffic, consider AWS Lambda or Google Cloud Functions (though large model cold-starts can pose a challenge).

10. Common Obstacles and Troubleshooting#

  1. Insufficient Compute: Fine-tuning large models can be resource-intensive. Use smaller or distilled models, or experiment with parameter-efficient methods like LoRA or adapters if you’re limited by hardware or budget.
  2. Latency Issues: Real-time inference might be slow if the model is large. Solutions include model quantization (FP16, INT8), CPU/GPU optimization, or employing caching strategies.
  3. Dataset Shift / Model Drift: Real-world data may differ from your training distribution over time. Set up monitoring pipelines to detect changes in data patterns and re-fine-tune your model when needed.
  4. Version Conflicts: Always pin your library versions in requirements.txt or environment.yml to avoid unexpected breaks when libraries update.

11. Extended Demo: Multi-lingual Twist#

If you want to enhance your “Zero to Hero” app, consider making it multi-lingual:

  • Use XLM-RoBERTa or M-BERT models which are pre-trained on multiple languages.
  • Collect multilingual training data for sentiment analysis.
  • During inference, detect the language automatically (e.g., using langdetect) and route to the appropriate model or multilingual model.

Sample Code Snippet#

scripts/predict_multilingual.py
from langdetect import detect
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
MODEL_PATH = "models/multilingual_checkpoint"
tokenizer = XLMRobertaTokenizer.from_pretrained(MODEL_PATH)
model = XLMRobertaForSequenceClassification.from_pretrained(MODEL_PATH)
def predict_sentiment_multilingual(text: str):
# Detect language
lang = detect(text)
# Tokenize
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
predicted_class_id = outputs.logits.argmax().item()
label = "positive" if predicted_class_id == 1 else "negative"
return lang, label

12. Conclusion and Next Steps#

Congratulations! You’ve built a fully functional “Zero to Hero” sentiment analysis application from the ground up:

  • Organized your project files.
  • Prepared data and fine-tuned a DistilBERT model.
  • Exposed the model via a FastAPI endpoint.
  • Containerized your solution using Docker.
  • Explored advanced scenarios like multi-lingual setup.

12.1 Possible Directions#

  1. Add a Front-End
    Create a React, Vue, or Angular front-end to provide a user-friendly interface around the API.
  2. Automate CI/CD
    Use GitHub Actions or GitLab CI to automate testing, building, and deploying new versions of your application.
  3. Experiment with Larger Models
    If resources allow, try BERT-large, GPT-3.5, or GPT-4 to see how performance changes in terms of accuracy and latency.
  4. Expand to Other Use Cases
    Move into domains like summarization, Q&A, or named entity recognition to scale up your AI capabilities.
Build Your First Zero to Hero Application from Scratch
https://closeaiblog.vercel.app/posts/llm-zero-to-hero/03_build-your-first-application-from-scratch/
Author
CloseAI
Published at
2022-09-23
License
CC BY-NC-SA 4.0