# Returning a random line from a file given as input

Problem Statement:

Given a file as input, you have to return back a string representing the contents of one of the lines present in the file. The algorithm should be designed in a way such that probability of each line getting returned is same i.e. if the file has $n$ lines, probability of each line getting returned as output would be $\frac{1}{n}$.

Constraints:

• The file should be scanned only once.
• There should be only constant extra space used.

Solution:

String getRandomLine(File file){
String output = null;
int linecount = 0;
Random r = new Random();
linecount++;
int rand = r.nextInt(linecount);
if(rand+1 == linecount){
}
}
}
return output;

Are you surprised that this simple method works? 😀

Explanation

The proof of correctness can be seen through mathematical induction on the loop invariant.

Statement is that after seeing n lines, the probability of each line as output is $\frac{1}{n}$

Base case: linecount=1, then first line is stored as output as rand is 1 in first iteration. So, 1/1 = 1

Hypothesis: Assume, that it works correctly, for linecount=k, such that probability of k lines seen so far is $\frac{1}{k}$.

Induction step: we read k+1 th line and we get a random number between 0 to k. Probability of each random number being generated is $\frac{1}{k+1}$. So, the probability of current line (if random number generated is k) being stored as output becomes $\frac{1}{k+1}$. Let's see what happens to probability of each of the previous k lines. The probability of getting a number in the range 0 to k-1 while generating a random number in this iteration is  $\frac{k}{k+1}$, and the probability of each of the previous k lines was $\frac{1}{k}$ at end of last iteration, so now multiplying these two probabilities, the probability of each of those k previous lines becomes $\frac{1}{k+1}$.

An exercise

In this solution, we are generating random numbers in the range 0 to k each time. Suppose, we wanted to cut down the time taken to generate random number in each iteration. So, we generated a random decimal number outside this loop in the range [0,1) and inside the loop, we multiplied that by linecount to get an integer in the range [0, linecount). Will this still give each line in file an equal chance of being the output string?