ok, so I did not understand your question then.. (and i still do not understand

Just to clarrify the problem so that other people can understand it too.

The key here is the memory constraint. If there's no memory constraint, then, you can allocate an array of size n (call it B), initialize it to all zeros. Pass through the original data, update the array B along the way. When an element of B is greater than 1, then you have found the repeated data. With the memory constraints, you cannot allocate this array B, well, you can, but with a much smaller size c*log(n), and somehow stuff all the information into this smaller array and still be able to differentiate the repeated data. I am thinking of bracketing with a log scale, (which is a method many people have used), but haven't found a good way yet. This is a good problem.