**Problem Statement:**

Given a file as input, you have to return back a string representing the contents of one of the lines present in the file. The algorithm should be designed in a way such that probability of each line getting returned is same i.e. if the file has lines, probability of each line getting returned as output would be .

**Constraints:**

- The file should be scanned only once.
- There should be only constant extra space used.

**Solution:**

String getRandomLine(File file){ String output = null; int linecount = 0; Random r = new Random(); BufferredReader reader = new BufferedReader(new FileReader(file)); String currentLineRead; while( (currentLineRead=reader.readLine())!=null ){ linecount++; int rand = r.nextInt(linecount); if(rand+1 == linecount){ output = currentLineRead; } } } return output;

Are you surprised that this simple method works? 😀

**Explanation**

The proof of correctness can be seen through mathematical induction on the loop invariant.

Statement is that after seeing n lines, the probability of each line as output is

Base case: linecount=1, then first line is stored as output as rand is 1 in first iteration. So, 1/1 = 1

Hypothesis: Assume, that it works correctly, for linecount=k, such that probability of k lines seen so far is .

Induction step: we read k+1 th line and we get a random number between 0 to k. Probability of each random number being generated is . So, the probability of current line (if random number generated is k) being stored as output becomes . Let’s see what happens to probability of each of the previous k lines. The probability of getting a number in the range 0 to k-1 while generating a random number in this iteration is , and the probability of each of the previous k lines was at end of last iteration, so now multiplying these two probabilities, the probability of each of those k previous lines becomes .

**An exercise**

In this solution, we are generating random numbers in the range 0 to k each time. Suppose, we wanted to cut down the time taken to generate random number in each iteration. So, we generated a random decimal number outside this loop in the range [0,1) and inside the loop, we multiplied that by linecount to get an integer in the range [0, linecount). Will this still give each line in file an equal chance of being the output string?

Isn't this an interview question? I was asked a similar one once.

yes it is.

I am not sure about the last part I asked under the exercise section.