Problem:
Suppose that German tanks are assigned distinct serial numbers
$1, 2, \ldots, N$. You observe 6 tanks with serial numbers$38, 17, 59, 42, 97, 120$.Under a frequentist approach, what is the best guess for
$N$?
If TL;DR, the solution is 139 and turns out there is a much easier way to solve this problem which is by using the minimum-variance unbiased estimator (which I only just read about).
I like to start by solving a much simpler problem where we know $N$ and then we can generalise to the case where we don’t know $N$.
Simple Case
Let $N = 5$ and we sample two tanks with serial numbers 1 and 2 where the population is:
$$\text{all possible tanks} = \{1, 2, 3, 4, 5\}$$
Then we have $X_2$ representing the maximum serial number of the two tanks observed and then we want to find $P(X_2 = k)$ for all possible values of $k$.
Since we need two tanks to be sampled, the minium value of $k$ must be 2 since we can’t have a maximum serial number of 1 for two tanks and of course the maximum value of $k = N$ which in this case is 5.
For $N = 5$ there are 10 possible outcomes given by the number of ways to choose 2 tanks from 5. Now if we start with $k = 2$ we have only one possible outcome where the maximum serial number is 2:
$$P(X_2 = 2) = \frac{1}{10}$$
Then for $k = 3$ we have two possible outcomes where the maximum serial number is 3:
$$P(X_2 = 3) = \frac{2}{10}$$
Because of this pattern, we can see that:
$$P(X_2 = k) = \frac{k - 1}{10}$$
What this pattern is telling us is that when the maximum serial number is $k$ we have $k - 1$ possible outcomes which are favourable.
If we want to generalise $n$ tanks from 1 to $N$ when the maximum is $k$ then we need to make sure to include tank $k$ and then choose the remaining $n - 1$ tanks from 1 to $k - 1$.
The number of ways to do this is going to be:
$$\binom{k-1}{n-1}$$
This means that the generic formula for any $k$ and $n$ is:
$$P(X_n = k) = \frac{\binom{k-1}{n-1}}{\binom{N}{n}}$$
where the denominator is the number of ways to choose $n$ tanks from $N$ (remembering that $n$ is the sample size).
Solution
Now instead of $N = 5$ we don’t know $N$ and instead of $n = 2$ we have $n = 6$ and instead of the serial numbers $1, 2$ we have $38, 17, 59, 42, 97, 120$.
The maximum of our observed sample is 120, so we want to find $N$ such that the expected value of $X_6$ equals 120.
To do this, we can use the generic formula we found earlier:
$$P(X_6 = k) = \frac{\binom{k-1}{5}}{\binom{N}{6}}$$
Then find the expected value of $X_6$ by summing the product of each possible value of $k$ and its probability:
$$E[X_6] = \sum_{k=6}^{N} k \cdot P(X_6 = k)$$
such that the solution is:
$$E[X_6] = 120$$
We can solve this numerically by running an experiment from $n$ to $1000$ and seeing when the expected value of $X_6$ is close to 120.
import math
g = lambda N, n, k: math.comb(k - 1, n - 1) / math.comb(N, n)
E = lambda N, n: sum(k * g(N, n, k) for k in range(n, N + 1))
n = 6
k = 120
for N in range(n, 1000):
e = E(N, n)
if abs(e - k) < 0.01:
print("N: ", N)
print("Expected:", e)
break
N: 139
Expected: 120.0
We found that the best guess for $N$ is 139.