AI Currency Recognition Project using ESP32-CAM
This project detects currency notes in real time using an ESP32-CAM and a YOLO-based Python server.
The ESP32-CAM captures images and sends them to the server over Wi-Fi. The server processes the image and returns the detected currency value, which is displayed on an OLED screen.
The system splits tasks between the device (image capture) and the server (AI processing), enabling fast performance on low-power hardware.

Components Required
- ESP32-CAM
- SSD1306 OLED Display (128×64, I2C)
- CH340G USB to Serial Module
- Connecting wires
- Breadboard
- Micro-USB cable
- 2.4 GHz Wi-Fi network (ESP32-CAM does not support 5 GHz)
- PC or laptop (to run the server)
About Components
ESP32-CAM

The ESP32-CAM is used to capture images of currency notes and send them wirelessly.
Features
- OV2640 camera
- Built-in Wi-Fi
- PSRAM support
- JPEG compression
- Compact and low-cost design
SSD1306 OLED Display

Displays the detected currency value in real time.
Features
- Resolution: 128×64
- I2C communication
- High contrast display
- Low power consumption
What is YOLO
YOLO is a program that can look at an image and identify the object in it.
In this project, it analyzes images of currency notes and determines their value (₹20, ₹100, ₹500, etc.).
How it works?
- Looks at the image
- Detects the object
- Output the label
Example
If shown a ₹100 note, YOLO identifies it as “₹100”.
Why YOLO is used
- Fast
- Works in real time
- Good accuracy
YOLO Model
YOLO is trained on a custom dataset of currency notes.
Features
- Detects different currency values (₹10, ₹20, ₹50, etc.)
- Real-time inference
- High accuracy with proper training
- Lightweight versions available
Why YOLO is Trained
YOLO does not recognize objects by default. It must be trained to identify currency notes.
During training:
- Learns from many images of different notes
- Identifies patterns such as numbers, colors, and design
- Learns to distinguish between note values
After training:
- Analyzes new images
- Accurately identifies the currency value
Training the YOLO Model (Google Colab)
The model was trained using Google Colab with a custom dataset.
The Colab notebook includes dataset setup, path correction, training, and model export.
System Architecture

- ESP32 captures image of currency note
- Flash LED ensures proper illumination
- Image sent via HTTP POST
- Python server decodes image
- YOLO detects denomination
- Highest-confidence result selected
- Result displayed on OLED
- Endpoint: ESP32 sends image to http://<server-ip>:5000/detect
- Format: Images are sent as JPEG byte stream via HTTP POST
- Network Requirement: ESP32 and server must be on the same Wi-Fi network
Why it is Fast
- It uses small images
- It sends images quickly over Wi-Fi
- The computer processes the image fast
- The result appears in about 1–2 seconds
Circuit Connections


(Circuit and Schematic Diagram)

(Circuit to upload code)
OLED → ESP32-CAM
- SDA → GPIO 13
- SCL → GPIO 14
- VCC → 3.3V / 5V
- GND → GND
Flash LED
- Controlled via onboard GPIO (LED_GPIO_NUM)
Code Explanation
ESP32-CAM Code

Stores WiFi credentials and the server endpoint. Update ssid, password, and server IP with your actual network details.

Initializes a 128×64 OLED display and creates a helper function to show detected currency labels with optional FPS info.

Flushes stale frames, captures fresh JPEG, sends it to server via HTTP POST, and updates OLED only when the detection result changes.

Checks WiFi status in every loop iteration. If disconnected, it attempts reconnection and resets HTTP state to avoid stale connections.
Python Code

Configures YOLO model path, inference size (320px), confidence threshold (0.8), and server host/port. Limits threads to available CPU cores.

Loads YOLO model, fuses convolutional and batch normalization layers for speed, then runs 3 dummy frames to compile JIT kernels before real requests arrive.

Receives raw JPEG bytes, decodes them using OpenCV, runs YOLO inference inside a lock (prevents CPU thrashing), picks the highest-confidence detection, and returns the label.

Uses a thread-safe deque to store the last 30 frame timestamps, calculates rolling FPS, and supplies real-time performance metrics to the ESP32.

Provides a /stats endpoint to verify server health and view current FPS, model info, thread count, and inference settings.
Performance and Reliability
- The ESP32 uses QQVGA resolution and JPEG quality 12 to keep payloads small while preserving enough detail.
- Persistent HTTP and WiFi power tuning reduce latency.
- The Python server warms up the model ahead of time so the first request is faster.
- The detection loop avoids extra delays and only redraws the OLED when the result changes.
Why this Architecture Works
- The ESP32 handles capture, connectivity, and display.
- Heavy AI inference runs on a dedicated Python server.
- This split keeps firmware lightweight and the overall system more reliable.
- The OLED display gives immediate feedback for the detected currency label and processing speed.
Speed and Performance
- Uses small images
- Takes about 1–2 seconds to give result
- Works smoothly without stopping
Python Server Setup (PC)
- Install dependencies: pip install ultralytics flask opencv-python numpy torch
- Place `best.pt` in the same folder as the server script.
- Run the server: python server.py
- Verify server: http://localhost:5000/stats
Real-Life Applications
-
Currency Identification System
- Helps users quickly identify denominations without manual inspection.
- Assistive Tool for Visually Impaired
- Provides instant feedback on currency value.
- Smart Vending Machines
- Automatically verify inserted notes.
- Retail Automation
- Detect and validate cash transactions.
- Educational AI Projects
- Demonstrates real-world computer vision deployment.
Result
The system successfully identifies currency denominations in real time using ESP32-CAM and an optimized YOLO server. Images are captured, transmitted, processed, and classified accurately, with results displayed instantly on the OLED.
The optimized backend ensures stable inference, reduced latency, and consistent performance, making the system reliable for practical currency detection applications.