Crimson Publishers Publish With Us Reprints e-Books Video articles

Full Text

COJ Technical & Scientific Research

A Comprehensive Derivation of the Expectance and its Variance in Sampling without Replacement from a Particulate Lot having a Numerical Attribute

Timiș AL1, Pencea I1*, Karas Z2 and Niculescu F1

1Doctoral School of the Materials Science and Engineering Faculty, National Scientific and Technological POLITEHNICA University of Bucharest, Romania

2Destro Kladno SRO Sykorice, Czech Republic

*Corresponding author:Pencea I, Doctoral School of the Materials Science and Engineering Faculty, National Scientific and Technological POLITEHNICA University of Bucharest, Splaiul Independenţei, 313, 060042, Bucharest, Romania

Submission: July 11, 2024:Published: August 09, 2024

DOI: 10.31031/COJTS.2024.05.000604

Volume5 Issue1
August 09, 2024

Abstract

The reliable data analysis in screening for an attribute at the scale of a granulated lot is a critical issue for decision risk regarding an intended application. Practice shows that Sampling Without Replacement (SWR) is habitually used in surveying for mineral resources, for conformity assessment of particulate commodities etc. Errors in estimating expectance and its variance at lot scale may make the difference between profit and loss or between conform versus nonconform status of a batch of commodity. Literature does not clearly show the grounds of the derivations of the expectance and of its variance in case of SWR. Especially, the variance is derived based on unproven correlation in probability among the sampled items. Therefore, the paper addresses a new approach for deriving the mathematical expressions of the expectance and its variance based on irrefutable probabilistic hypotheses, like the law of large number and repeated SWR trials. The derivations posted in the paper help on optimizing the sampling costs by proper choosing the sample size for a given population size. The findings of the paper can be applied in many practical fields where SWR grounds their activity, as conformity assessment, environmental factor monitoring, screening for mineral resources, sampling waste dumps to be classified as secondary resources etc.

Keywords:Sampling without replacement; Expectance; Variance; Particulate lot; Numerical attribute

Introduction

The characteristics of a particulate lot (dump, batch of packed food or of pharmaceutics, warehouse etc.) consisting of discrete items are always estimated on the basis of a suite of samples collected from the respective lot. Such a lot is conceptually modelled as a population of particles/items, each one having or not the targeted property like a chemical concentration, a contaminant etc. In this context, sampling from a population is a random selection of a set i.e. a “sample”, of items/particles. The sample/set is representative if it contains a similar proportion between particles of interest and the particles occurring in the lot/population. Therefore, it is imperative that a sample be as representative of the lot as possible. This paper addresses numerical attributes like concentration of an analyte, grain size, item weight etc., that are estimated at lot scale based on Sampling Without Replacement (SWR). In case where the lot is heterogeneous, then sampling is the main source of “error” (uncertainty) in the process of establishing the average value and its confidence interval at the batch level i.e. expectance and variance of the targeted measurand [1-4]. The exactness of the expectance and of its variance is critical regarding conformity assessment of a commodity lot or recovery of a substance from a secondary resource, as they are the main statistical parameters that underpin a decision. The statistical data analysis in sampling of the particulate materials is based on mass probability distribution like Poisson distribution, double Poisson distribution or binomial distribution [5-6]. Assigning a mass probability distribution should be based on scientific considerations and experimental evidences, that frequently are missed or incomplete. Noteworthy that the scientific considerations underlying the estimation of the expectance and variance, play a decisive role for the reliance of the respective statistics. The use of unfounded hypotheses to prove the mathematical expressions of statistics at the lot scale is critical as it can lead to significant bias and uncertainty. Even that a theoretical derivation of a statistical parameter leads to an exact formula it is a matter of false science if it is based on unproven hypotheses. For the sake of true science and reliable application in practice, the theoretical derivation of a statistical parameter must be entirely comprehensive. In this regard, this article redoes the deduction of the expressions of the expectance of a numerical attribute assigned to a lot based on the results obtained by testing the sampled elements. In this work, it is assumed that the values measured on the sampled items are affected by negligible uncertainties. The problem of establishing the average value related to the lot and its confidence interval if each measurement is affected by significant uncertainty is beyond the scope of this work, but solving this problem is not insurmountable. The addressed novelty consists in the elimination of the hypotheses regarding the correlations between the measurements carried out in the different stages of the sampling, as is done in the referred paper [7]. The estimation of the expectance and of its variance is done based on probability considerations regarding the drawing of a specific item or a specific pair of items in a sampling session. It was assumed that a large number of sampling trials can be performed on a population/lot which is always possible if the lot is cloneable and the cost of the repeated sampling is feasible.

Methods

Theoretical backgrounds

If a sample of size n is taken from a population of N elements, then attributes of the individual item in the sample have the values (x1, x2 … xn). A statistic of the sample is the arithmetic mean, denoted x :

If one executes a suite of repeated sampling with replacement in reproducible conditions, then it is expected that the arithmetic means x has the expectation [7]

where μ is the expectance of the attribute value at lot scale. The expectance of the sample mean (μ) agrees with the expectation of the population. Thus, the arithmetic mean is an unbiased estimation of the expectance. In the case of sampling without replacement of the items, the composition of the population changes with each drawn item. “The probability of obtaining the realization xi is influenced by all the elements already removed. The realizations xi are therefore no longer independent” [7]. Also, it is stated “This finding does not alter the expectation of the sample mean” [7] i.e. equation 2 is valid in case of SWR. The previous statement is not demonstrated in [7], but it is assumed in the derivations regarding sampling without replacement. The variance deriving, in case of sampling without replacement, must take into account that the realizations of xi values are not independent on the realization of xj value. This issue is approached based on equation (4.2.2) [7] i.e.

In case where the square of the sum in equation (4.2.2) [7] is expanded with the aid of the binomial theorem (equation (4.2.3 in [7]):

In the book [7] it is stipulated that the expectations E(x-2) and E(xi*xj),(i≠j) are also expectations of the population and can be derived according to equations (2.9.2) and (2.9.7), [7] respectively:

where X and Y are two random variable and Cov(X,Y) is their covariance, see equation (2.7.9) in [7].

equation (2.9.7) holds for the case of a compound variable Z=X*Y in case where the density distribution of the paired variable (X, Y), let say f(x,y), is known. In the case of SWR there is no known distribution of the pairs (xi, xj), i≠j=1……n. So, the covariance statistics is inapplicable in the case of sampling without replacement. Also, in [8], equation (4.2.9) it is posted:

Derivation of the expectance of a numerical attribute in case of sampling without replacement

In case where n items are drawn from a population of size N then the total number of samples that differ one another with regard to their composition is CnN.

where:CnN is the combination of N items taken as n and N! =1.2……..(N-1)N

The number of samples that contain a xj item, j=1...N, is Cn-1N-1. The probability of drawing the item xj, denoted pj, in a series of Q trials is constant i.e.

According the law of large number [8], which is the fundamental law of probability, xj value appears of n/N*Q times for each j=1...N in a series of Q trials in case where Q is great enough. One can performed Q trials and he obtained a set of means:

The expected occurrence of the x1 in the Q samplings, in case where Q is large enough, is n*Q/N.

It holds for each xj, j=1.......N. Thus, the mathematical/statistical expectance of x −is:

Equation (8) proves that x − is an unbiased estimator for the mean μ of the lot i.e.,

Derivation of the variance assigned to the expectance in case of sampling without replacement

The estimation of the variance denoted σ2(x), will start with the estimation of the expectance of x-2 , denoted, E(x-2) i.e.,

The expectance of the sum is estimated based on the large number of Q sampling trials. Thus, the item x2j has the same occurrence as xj i.e., n-Q/N while the term xj*xk has the probability pjk equal to the favorable cases Cn-2N-2divided by the possible cases CnN

The expectance of the sum can be estimated as:

The expectance of the sum can be estimated as:

The sum in equation 13 can be written as:

Equation 13 can be rewritten as

Equation 15 shows that the mathematical expression of the expectance , differs from that given in [8]. The expectance of E(x-2) is calculated as:

Discussion

For a better understanding of the dependence of the variance on population size (N) and the sample size (n), the equation 17 can be rewritten as:

Thus, [9] the variance encompasses two terms i.e. σ2/N-1 which is a characteristic of the population and N/n-1 which depends both on the population size and of sample size. Hence, the increasing the trustiness of the expectance regarding a given population/ lot depend only on the sample size n, as σ2/N-1 is a constant. The increasing of the trustiness of the expectance means decreasing its variance which implies increasing the sample size n. According equation (18), it seems that n can be increased a lot even n=N when σ2 (x) = 0 . Though, the increasing of n when N > 1000 over a certain value does not bring a significant decreasing of the variance as it is shown in Figure 1a.

Figure 1a:The dependence of the variance on n for given N and σ=4:
a) a large view for n ranging from 1 to 100.


Figure 1b shows that in case where population size exceeds 1000 items, then a sample of 40 items provides a relative variance σ(x)/σ* 100% less than 5% while a sample of 100 items provides a relative variance of approximately 1%. Obviously, a relative variance of 1% is preferable to 5% one but extending sample size from 40 to 100 implies more of twofold costs. Also, Figure1a & 1b show that the variance behaves similarly when lot size exceeds 1000 items.

Figure 1b:The dependence of the variance on n for given N and σ=4:
b) a detailed view for n ranging from 1 to 50.


Conclusion

The paper addresses a new approach for deriving the mathematical expressions of the expectance and its variance in case of SWR based on irrefutable probabilistic hypotheses i.e. the law of large number and repeated trials. The main novelty consists in avoiding the usage of the correlation in probability among the sampled items as the core of the variance derivation, which is an unproven hypothesis. The derivations posted in the paper clarify the meanings of the statistics in SWR of discrete finite lots and provide the readership with powerful understanding of the SWR statistics. The paper shows that the arithmetic average of the numerical values of the observations carried on a suite of n drawn items is an unbiased estimator of the expectance while the standard derivation of a SWR trial can be related to the variance of the expectance depending on the sizes of the lot and of the sample. The dependency of the variance is sensitive on sample size in case where the population size is smaller (10-100) and diminish in case of bigger population (N>1000). Thus, for N>1000, the increasing the sample size over 50 will yields little decreasing of the variance of the expectance. The findings of the paper can be applied in many practical fields where SWR is the basis of their activity, as conformity assessment, environmental factor monitoring, screening for mineral resources, sampling waste dumps to be classified as secondary resources, etc.

Acknowledgement

This work does not beneficiate on financial aid from individuals grants.

Conflict of Interest

The authors declare that there is no financial interest or any conflict of interest regarding this paper.

References

  1. Pitard FR (1993) Pierre Gy’s Sampling Theory and Sampling Practice. Boca Raton.
  2. Minkkinen PO, Esbensen KH (2019) Sampling of particulate materials with significant spatial heterogeneity-Theoretical modification of grouping and segregation factors involved with correct sampling errors: Fundamental sampling error and grouping and segregation error. Anal Chim Acta 21: 1049: 47-64.
  3. Hennebert P, Beggio G (2021) Sampling and sub-sampling of granular waste: Size of a representative sample in terms of number of particles. Detritus 17: 30-41.
  4. Pencea I, Turcu RN, Popescu AAC, Timiș AL, Priceputu A et al., (2023) An improved balanced replicated sampling design for preliminary screening of the tailing’s ponds aiming at zero-waste valorization. A Romanian case study. J Enivron Manag 331: 117260.
  5. Sathakathulla AA, Murthy BN (2013) Single, double and multiple sampling plans: Poisson distribution. Int J Eng Res 4 (2):1-35.
  6. Pitard FF (2010) Theoretical, practical, and economic difficulties in sampling for trace constituents. J South Afr Inst Min Metall 110: 313-321.
  7. Sommer K, (1986) Sampling of powders and bulk materials.
  8. Law of large numbers.
  9. Beggio G, Hennebert P (2022) A novel method to calculate the size of representative waste samples based on particles size. Detritus 18: 3-11,

© 2024 Pencea I. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.

About Crimson

We at Crimson Publishing are a group of people with a combined passion for science and research, who wants to bring to the world a unified platform where all scientific know-how is available read more...

Leave a comment

Contact Info

  • Crimson Publishers, LLC
  • 260 Madison Ave, 8th Floor
  •     New York, NY 10016, USA
  • +1 (929) 600-8049
  • +1 (929) 447-1137
  • info@crimsonpublishers.com
  • www.crimsonpublishers.com