Dariusz Jacek Jakóbczak*
Institute of Computer Science and Management, Faculty of Electronics and Computer Science, Koszalin University of Technology, Poland
*Corresponding author:Dariusz Jacek Jakóbczak, Institute of Computer Science and Management, Faculty of Electronics and Computer Science, Koszalin University of Technology, Poland
Submission: April 04, 2024;Published: May 06, 2024
ISSN 2640-9739
Volume3 Issue1
Proposed method is dealing with multi-dimensional data modeling, extrapolation and interpolation using the set of high-dimensional feature vectors. Identification of handwriting, signature, faces or fingerprints need data modeling and each model of the pattern is built by a choice of characteristic key points and multi-dimensional modeling functions. Novel modeling via nodes combination and parameter γ as N-dimensional function enables data parameterization and interpolation for feature vectors. Multidimensional data is modeled and interpolated via different functions for each feature: polynomial, sine, cosine, tangent, cotangent, logarithm, exponent, arc sin, arc cos, arc tan, arc cot or power function.
High-dimensional data interpolation in handwriting identification [1] is not only a pure mathematical problem but important task in pattern recognition and artificial intelligence such as: biometric recognition, personalized handwriting recognition [2-4], automatic forensic document examination [5,6], classification of ancient manuscripts [7]. Also, writer recognition [8] in monolingual handwritten texts is an extensive area of study and the methods independent from the language are well-seen [9-14]. Proposed method represents languageindependent and text-independent [15,16] approach because it identifies the author via a set of letters or symbols from the sample. Writer recognition methods in the recent years are going to various directions [17-19]: Texture analysis with Gabor filters and extracting features, using Hidden Markov Model [20] or Gaussian Mixture Model [21]. The paper wants to approach a problem of curve interpolation and shape modeling by characteristic points in handwriting identification [22]. Proposed method relies on nodes combination and functional modeling of curve points situated between the basic set of key points. Nowadays methods apply mainly polynomial functions, for example Bernstein polynomials in Bezier curves, splines [23] and NURBS. But Bezier curves don’t represent the interpolation method and cannot be used for example in signature and handwriting modeling with characteristic points (nodes). Numerical methods [24-26] for data interpolation are based on polynomial or trigonometric functions, for example Lagrange, Newton, Aitken and Hermite methods. These methods have some weak sides and are not sufficient for curve interpolation in the situations when the curve cannot be built by polynomials or trigonometric functions [27].
Multidimensional modeling of feature vectors
Proposed method has advancements over existing methods because it is computing (interpolating) unknown (unclear, noised or destroyed) values of features between two successive nodes (N-dimensional vectors of features) using hybridization of mathematical analysis and numerical methods, Calculated values (unknown or noised features such as coordinates, colors, textures or any coefficients of pixels, voxels and doxels or image parameters) are interpolated and parameterized for real number α_{i}∈[0;1] (i=1,2,…N-1) between two successive values of feature. This method uses the combinations of nodes (N-dimensional feature vectors) p_{1}=(x_{1},y_{1},…,z_{1}), p_{2}=(x_{2},y_{2},…,z_{2}),…, p_{n}=(x_{n}, y_{n},…z_{n}) as h(p_{1},p_{2},…,p_{m}) and m=1,2,…n to interpolate unknown value of feature (for example y) for the rest of coordinates:
Then N-1 features c_{}1,…, c_{N-1} are parameterized by α_{}1,…, α_{N-1} between two nodes and the last feature (for example y) is interpolated via formula (1). Of course, there can be calculated x(c) or z(c) using (1). Two examples of h (when N=2) computed for MHR method [28] with good features because of orthogonal rows and columns at Hurwitz-Radon family of matrices:
The simplest nodes combination is
and then there is a formula of interpolation:
Formula (1) gives the infinite number of calculations for unknown feature determined by choice of F and h. Nodes combination is the individual feature of each modeled data. Coefficient γ=F(α) and nodes combination h are key factors in data interpolation and object modeling.>
N-dimensional functions in modeling
Unknown values of features, settled between the nodes, are computed using (1). Key question is dealing with coefficient γ. The simplest way of calculation means h=0 and γ_{i}=α_{i}. Then proposed method represents a linear interpolation. Each interpolation requires specific values of α_{i} and γ in (1) depends on parameters α_{i}∈[0;1]:
and F is strictly monotonic for each αi separately. Coefficient γi are calculated using appropriate function and choice of function is connected with initial requirements and data specifications. Different values of coefficients γ_{i} are connected with applied functions F_{i}(α_{i}). These functions γ_{i}=F_{i}(α_{i}) represent the examples of modeling functions for α_{i}∈[0;1] and real number s>0, i=1,2,…N-1. Each function is applied for different modelling:
or any strictly monotonic function between points (0;0) and (1;1). For example, interpolations of function y=2x for N=2, h=0 and γ=αs with s=0.8 (Figure 1) is much better than linear interpolation. Functions γi are strictly monotonic for each variable α_{i}∈[0;1] as γ=F(α) is N-dimensional modeling function, for example:
and every monotonic combination of γi such as
Figure 1:Two-dimensional modeling of function y=2^{x} with seven nodes and h=0, γ=α^{0.8}.
For example, when N=3 there is a bilinear interpolation:
or a bi-quadratic interpolation:
or a bi-cubic interpolation:
or others modeling functions γ. Choice of functions γ_{i} and value s depends on the specifications of feature vectors and individual requirements. What is very important: two data sets (for example a handwritten letter or signature) may have the same set of nodes (feature vectors: pixel coordinates, pressure, speed, angles) but different h or γ results in different interpolations (Figures 2-4). Here are three examples of reconstruction (Figures 2-4) for N=2 and four nodes: (-1.5;-1), (1.25;3.15), (4.4;6.8) and (8;7). Formula of the curve is not given. Algorithm of proposed retrieval, interpolation and modeling consists of five steps: first choice of nodes p_{i} (feature vectors), then choice of nodes combination h(p_{1},p_{2},…,p_{m}), choice of modeling function γ=F(α), determining values of α_{i}∈[0;1] and finally the computations (1). So, there are different data reconstructions with different modeling functions. As it can be observed, there is one extremum between two nodes for modeling with h≠0 (Figures 3&4). Comparing with polynomial or spline interpolations, there is one very important question: how to avoid extremum between each pair of nodes and how to minimize interpolation error? Generally current methods do not answer this key question. Nowadays methods of interpolations rely mainly on polynomials, especially on cubic splines. It means that there are interpolation polynomials W(x) of degree 3 for every range of two successive interpolation nodes (x_{i},y_{i}) and (x_{i}+1, y_{i}+1). This method of cubic splines is C^{2} class-this fact is very important in many applications of cubic interpolation. But second important feature of this method is interpolation error for function f(x):
Figure 2:2D modeling for γ=α^{2} and h=0.
Figure 3:2D reconstruction for γ=sin(α^{2}·π/2) and h in [22].
Figure 4:2D interpolation for γ=tan(α^{2}·π/4) and h=(x_{2}/x_{1}) + (y_{2}/y_{1}).
So, interpolation error depends on second derivative in the range of nodes [a;b] and this value cannot be estimated in general. Cubic spline can have extremum and may differ from interpolated function f(x) very much. Also, interpolation polynomial Wn(x) of degree n (Lagrange or Newton) for n+1 nodes (x_{0}, y_{0}), (x_{1}, y_{1}) … (x_{n},y_{n}) is connected with unpredictable error in general with calculations of derivative rank n+1:
Proposed method with h=0 and α∈[0;1] represents formulas as convex combinations of nodes’ coordinates:
and interpolation error in general between two nodes looks as follows:
Proposed method is dealing with such significant features:
A. No extremum between two nodes.
B. Interpolation error does not depend on the value of derivative
in the nodes or outside the nodes (even if derivative does not
exist).
C. Interpolated function can be smooth in the nodes (class C^{1}).
D. Reconstruction of the function that much differs from the
shape of polynomial, and not only function but any curve, also
closed.
E. Extrapolation is calculated with the same formulas for α∉[0;1].
F. The idea of linear interpolation is applied for other modeling
functions, not only γ=α^{1}.
G. Convexity between the nodes is fixed using two modeling
functions:
γ_{k}=α_{s} or γ_{k}=sin(α^{s}·π/2) with real parameter s>0.
These two kinds of modeling functions are the simplest
function, chosen via many calculations as follows:
a) γ_{k}=α^{s} if convexity is not changing between the nodes (x_{k}, y_{k})
and (x_{k+1}, y_{k+1});
b) γk=sin(αs·π/2) if convexity is changing between the nodes (x_{k},
y_{k}) and (x_{k+1}, y_{k+1}).
Theorem if
A. There are given nodes of continuous function y=f(x): (x_{0}, y_{0}),
(x_{1}, y_{1}) … (x_{n}, y_{n}), n≥2;
B. There are formulas to calculate values between the nodes:
C. Three successive nodes are monotonic, for example let’s assume:
Then there is the method of 2D curve interpolation and
extrapolation such as:
T.1: There is no extremum between two successive nodesinterpolated
function is monotonic in the range of two nodes.
T.2: Interpolated curve is class C^{0} (continuous) or C^{1} (continuous
and smooth).
T.3: Interpolation error does not depend on the value of
derivative in the nodes or outside the nodes (even if derivative does
not exist).
T.4: Convexity between two nodes (x_{k}, y_{k}) and (x_{k+1}, y_{k+1}) is fixed
using modeling functions γ_{k}=α^{s} (if convexity is not changing) or
γ_{k}=sin(α^{s}·π/2) (if convexity is changing).
T.5: Extrapolation is calculated with the same formulas for
α∈[0;1].
Proof
T.1: Convex combination to calculate x(α) and y(α) between two nodes with strictly monotonic function γ_{k} gives us monotonic interpolation of the curve with no extremum between two nodes.
T.2: Interpolated curve is class C0 (continuous) just from definition of x(α) and y(α). Also, smooth interpolation between nodes is achieved with the same. Only smooth function in the inner nodes must be proved. Here is shown how to achieve smooth function in the inner nodes-let’s assume then y_{k}≠y_{k+1} for each k. If y_{k}=y_{k+1} for any k, then according to T.1 there must be the simplest linear interpolation between nodes (x_{k}, y_{k}) and (x_{k+1}, y_{k+1}) and interpolated curve is not smooth in nodes (x_{k}, y_{k}) and (x_{k+1}, y_{k+1}). For first three monotonic nodes (x_{0}, y_{0}), (x_{1}, y_{1}) and (x_{2}, y_{2}) there are calculations to fix parameter s for modeling function γ_{1} between nodes (x_{0}, y_{0}) and (x_{2}, y_{2}) interpolating node (x_{1}, y_{1}) inside:
If convexity is not changing between (x_{0}, y_{0}) and (x_{2}, y_{2}), then γ_{1}=α^{s} and
A1 (beginning of the loop in algorithm for k=2,3…n-1): Having modeling function γ1 between nodes (x_{0}, y_{0}) and (x_{2}, y_{2}), it is possible for any α*→0 calculate
Then left difference quotient c is computed in the node (x_{2}, y_{2}):
Of course, if value of derivative in (x_{2}, y_{2}) is known, c=f’(x_{2})≠0. Then parameter u is fixed to obtain left (c) and right difference quotient equal in (x_{2}, y_{2})-it means smooth in this node. If y_{3} preserves the same monotonicity like y_{2} and y1 (y_{1}>y_{2}>y_{3} or y_{1} < y_{2}< y_{3}) then
If y_{3} does not preserve the same monotonicity like y_{2} and y_{1} then (because of different sign of left and right difference quotient0029
And as it was: if convexity is not changing between (x_{2}, y_{2}) and (x_{3}, y_{3}), then γ_{2}=α^{s} and
If convexity is changing between (x_{2}, y_{2}) and (x_{3}, y_{3}), then γ_{2} = sin(α^{s}·π/2) and
So smooth interpolation function in the node (x_{2}, y_{2}) is achieved. And smooth interpolation for next range of nodes (x_{3}, y_{3}) and (x_{4}, y_{4}) is starting like loop A1 for k=3. And so on till last range of nodes (xn- 1, yn-1) and (x_{n}, y_{n}) for k=n-1 in A1.
T.3: According to T.1 -interpolation error between two nodes for each k is equal:
T.4: These modeling functions are the simplest functions to achieve convexity changing or not.
T.5: Extrapolation left of first node (x_{0}, y_{0}) is done with modeling function γ_{1} and α>1. Extrapolation right of last node (x_{n}, y_{n}) is done with modeling function γ_{n-1} and α<0.
Then modeling function γ_{n-1} must have domain with α<0. If not, there is possibility to define:
This theorem describes main features of proposed method.
The author’s method enables interpolation and modeling of high-dimensional data using features’ combinations and different coefficients γ: polynomial, sinusoidal, co sinusoidal, tangent, cotangent, logarithmic, exponential, arc sin, arc cos, arc tan, arc cot or power function. Functions for γ calculations are chosen individually at each data modeling and it is treated as N-dimensional function: γ depends on initial requirements and features’ specifications. Novel method leads to data interpolation as handwriting or signature identification and image retrieval via discrete set of feature vectors in N-dimensional feature space. So, this method makes possible the combination of two important problems: interpolation and modeling in a matter of image retrieval or writer identification. Main features of the method are: this interpolation develops a linear interpolation in multidimensional feature spaces into other functions as N-dimensional functions; nodes combination and coefficients γ are crucial in the process of data parameterization and interpolation: they are computed individually for a single feature; modeling of closed curves.
© 2024 Dariusz Jacek Jakóbczak. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.