Machine learning, computer vision, building powerful APIs, and creating beautiful user interfaces are interesting areas that are witnessing many innovations. The first two require extensive math and science, while API and UI development focuses on algorithmic thinking and the design of flexible architectures.
They are very different, so deciding what you want to learn next can be a challenge. The purpose of this article is to demonstrate how all four can be used in creating an image processing application.
The application we are going to build is a simple number detector. You draw, the car predicts the number. Simplicity is essential because it allows us to see the big picture, rather than focus on details.
For the sake of simplicity, we will use the most popular and easy-to-learn technologies. The machine learning part will use Python for the back-end application. As for the interactive part of the application, we will operate through a JavaScript library that does not need presentation:
Machine learning to guess numbers
The central part of our application is the algorithm that guesses the number extracted. Machine learning will be the tool used to obtain a good quality of the hypothesis. This type of basic artificial intelligence allows a system to automatically learn with a certain amount of data. In broader terms, machine learning is a process of finding a match or set of matches in data to rely on to guess the outcome.
- Our image recognition process contains three steps:
- Get pictures with the numbers drawn for training
- Train the system to guess the numbers through training data
- Test the system with new/unknown data
Environment
We will need a virtual environment to work with machine learning in Python. This approach is handy because it handles all the necessary Python packages, so you don’t have to worry about them.
Let’s install it with the following terminal commands:
python3 -m venv virtualenv
Virtualenv / bin / enabled source
Training model
Before we start writing the code, we need to choose a suitable “teacher” for our cars. Usually, data science professionals try different models before choosing the best one. We will skip very advanced models that require a lot of skill and we will continue with the k-nearest neighbor’s algorithm.
It is an algorithm that obtains some data samples and arranges them on an ordered plane according to a given set of characteristics. To better understand, let’s review the following image:
Image Processing
Image processing is a method of performing certain operations on an image to enhance it or extract some useful information from it. In our case, we need to smoothly transition the image drawn by a user to the machine learning model format.
Let’s import some helpers to achieve that goal:
import numpy as np
from skimage import exposure
import base64
from PIL import Image, ImageOps, ImageChops
from io import BytesIO
We can split the transition into six distinct parts:
Replace a transparent background with a color
def replace_transparent_background(image):
image_arr = np.array(image) if len(image_arr.shape) == 2:
return image alpha1 = 0 r2, g2, b2, alpha2 = 255, 255, 255, 255 red, green, blue, alpha = image_arr[:, :, 0], image_arr[:, :, 1], image_arr[:, :, 2], image_arr[:, :, 3] mask = (alpha == alpha1) image_arr[:, :, :4][mask] = [r2, g2, b2, alpha2]
return Image.fromarray(image_arr)
To detect the type of green dot, we should check the nearest neighboring k types, where k is the set of arguments. Given the image above, if k is equal to 1, 2, 3, or 4, the assumption will be a black triangle, because most of the nearest k neighbors of the green dot are black triangles. If we increase k to 5, then most objects are blue squares, so the assumption will be a blue square.
There are several dependencies required to create our machine learning model:
sklearn. neighbors.KNeighborsClassifier is the classifier we will use.
sklearn.model_selection.train_test_split is the function that will help us to divide the data into training data and data used to verify the correctness of the model.
sklearn.model_selection.cross_val_score is the function of getting a grade for the correctness of the model. The higher the value, the better the correctness.
sklearn.metrics.classification_report is the function of displaying a statistical report of model assumptions.
sklearn.datasets is the package used to obtain data for training (images with numbers).
NumPy is a widely used package in science because it provides a productive and convenient way to manipulate multidimensional data structures in Python
matplotlib.pyplot is the package used to view the data.
Let’s start by installing and importing all of them:
pip install sklearn numpy matplotlib scipy
from sklearn.datasets import load_digits
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, cross_val_score
import numpy as np
import matplotlib.pyplot as plt
Now we need to load the MNIST database.
MNIST is a classic set of handwritten images used by thousands of beginners in machine learning
Once the data is retrieved and ready, we can move on to the next step of dividing the data into two parts: training and testing.
We will use 75% of the data to train our model to guess numbers and we will use the rest of the data to test the correctness of the model:
(X_train, X_test, y_train, y_test) = train_test_split (
digits.data, digits.target, test_size = 0.25, random_state = 42
)
The data is now arranged and we are ready to use it. We will try to find the best parameter k for our model so that the assumptions are more accurate. We cannot take into account the value of k at this stage, because we have to evaluate the model with different values of k.
Let’s see why it is essential to consider a range of k values and how this improves the accuracy of our model:
ks = np.range (2, 10)
scores = []
for k in ks:
model = KNeighborsClassifier (n_neighbors = k)
scor = cross_val_score (model, X_train, y_train, cv = 5)
scor.mean ()
scors.append (score.mean ())
plt.plot (scores, ks)
plt.xlabel (‘accuracy’)
plt.ylabel (‘k’)
plt.show ()
Executing this code will show you the following graph that describes the accuracy of the algorithm with different k values.
Using Flask to build an API
The core of the application, which is an algorithm that predicts numbers in images, is now ready. Then, you need to decorate the algorithm with an API layer to make it available for use. Let’s use the popular Flask web framework to do this neatly and concisely.
We will start by installing Flask and the dependencies related to image processing in the virtual environment:
pip install Flask Pillow scikit-image
When the installation is complete, we move on to creating the application entry point file:
tap app.py
Creating a drawing panel through React
To quickly start the front application, we will use CRA boilerplate:
frontend create-react-app
cd frontend
After setting up the job, we also need a dependency to draw numbers. The react-sketch package fits our needs perfectly:
npm and reaction-sketch
The application has a single component. We can divide this component into two parts: logic and view.
The view part is responsible for the representation of the drawing panel, the Send and Reset buttons. When we interact, we should also represent a prediction or an error. Logically, it has the following tasks: send images and delete the sketch.
Whenever a user clicks Submit, the component will extract the image from the sketch component and use the make prediction function of the API module. If the back-end request succeeds, we will set the prediction state variable. Otherwise, we will update the error status.
When a user clicks Reset, the sketch will be deleted:
import React, { useRef, useState } from “react”;
import { makePrediction } from “./api”;
const App = () => {
const sketchRef = useRef(null);
const [error, setError] = useState();
const [prediction, setPrediction] = useState();
const handleSubmit = () => {
const image = sketchRef.current.toDataURL();
setPrediction(undefined);
setError(undefined);
makePrediction(image).then(setPrediction).catch(setError);
};
const handleClear = (e) => sketchRef.current.clear();
return null
}
The logic is sufficient. Now we can add the visual interface to it:
import React, { useRef, useState } from “react”;
import { SketchField, Tools } from “react-sketch”;
import { makePrediction } from “./api”;
import logo from “./logo.svg”;
import “./App.css”;
const pixels = (count) => `${count}px`;
const percents = (count) => `${count}%`;
const MAIN_CONTAINER_WIDTH_PX = 200;
const MAIN_CONTAINER_HEIGHT = 100;
const MAIN_CONTAINER_STYLE = {
width: pixels(MAIN_CONTAINER_WIDTH_PX),
height: percents(MAIN_CONTAINER_HEIGHT),
margin: “0 auto”,
};
const SKETCH_CONTAINER_STYLE = {
border: “1px solid black”,
width: pixels(MAIN_CONTAINER_WIDTH_PX – 2),
height: pixels(MAIN_CONTAINER_WIDTH_PX – 2),
backgroundColor: “white”,
};
const App = () => {
const sketchRef = useRef(null);
const [error, setError] = useState();
const [prediction, setPrediction] = useState();
const handleSubmit = () => {
const image = sketchRef.current.toDataURL();
setPrediction(undefined);
setError(undefined);
makePrediction(image).then(setPrediction).catch(setError);
};
const handleClear = (e) => sketchRef.current.clear();
return (
<div className=”App” style={MAIN_CONTAINER_STYLE}>
<div>
<header className=”App-header”>
<img src={logo} className=”App-logo” alt=”logo” />
<h1 className=”App-title”>Draw a digit</h1>
</header>
<div style={SKETCH_CONTAINER_STYLE}>
<SketchField
ref={sketchRef}
width=”100%”
height=”100%”
tool={Tools.Pencil}
imageFormat=”jpg”
lineColor=”#111″
lineWidth={10}
/>
</div>
prediction && <h3>Predicted value is: {prediction}</h3>}
<button onClick={handleClear}>Clear</button>
<button onClick={handleSubmit}>Guess the number</button>
{error && <p style={{ color: “red” }}>Something went wrong</p>}
</div>
</div>
);
};
export default App;
The component is ready, test it out by executing and going to localhost:3000 after:
npm run start
Conclusion
The quality of this classifier is not perfect and I do not claim it to be. The difference between the data I used for training and the data coming from the UI is huge. Despite this, we created a functional application from scratch in less than 30 minutes.
In this process, we have perfected our skills in four areas:
- Machine learning
- Back-end development
- Image processing
- Front-end development
There is no shortage of potential use cases for software capable of recognizing handwritten figures, from educational and administrative software to postal and financial services.
Therefore, I hope this article will motivate you to improve your machine learning, image processing, and front-end and back-end development skills and use these skills to design wonderfully.
To learn more about Machine Learning Number recognition – from zero to application, Call us Today at +91-9555-71-4422 to speak with a Software developer.