{"id":1631,"date":"2021-07-20T22:32:13","date_gmt":"2021-07-21T05:32:13","guid":{"rendered":"http:\/\/66.94.114.210\/?p=1631"},"modified":"2022-01-06T18:47:44","modified_gmt":"2022-01-07T02:47:44","slug":"reservoir-sampling","status":"publish","type":"post","link":"https:\/\/nanzhou.cc\/index.php\/2021\/07\/20\/reservoir-sampling\/","title":{"rendered":"Reservoir Sampling"},"content":{"rendered":"\n<p>Given an infinite stream (or a finite sequence of size n), choose k numbers randomly.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"cpp\" class=\"language-cpp\">\n\/\/ n choose k\n\/\/ keep a reservoir of size k\n\/\/ Choose 1, 2, 3, ..., k first and put them into the reservoir.\n\n\/\/ For k+i, pick it with a probability of k\/(k+i), and randomly replace a number in the reservoir.\n\/\/ Another easier way of calculating the probability is P(X was in the reservoir last time) * P(X is not replaced this time) = P(X was in the reservoir last time) * (1- P(X is replaced this time)) = k\/(k+i-1) * (1 - k\/(k+i) * 1\/k) = k\/(k+i).\n\/\/ So for any element X, it was kept when i reaches n:\n\/\/ 1. if X is the first k elements: P(X picked) * P(X not replaced in k+1) * P(not replaced in k+2) * ... = 1 * (1 - k\/k+1 * 1\/k) * (1 - k+1\/k+2 * 1\/k+1) = 1 * k\/k+1 * k+1\/k+2 * ... = k\/n\n\/\/ 2. else: P(X picked) * p(not replaced) = k\/k+i * (1 - k+i\/k+i+1 * 1\/k+i) * ... = k\/n \n\n\nvector&lt;int> ReservoirSampling(vector&lt;int> v, int n, int k) {\n  assert(v.size() == n &amp;&amp; k &lt;= n);\n  \/\/ init: fill the first k elems into reservoir\n  vector&lt;int> reservoirArray(v.begin(), v.begin() + k);\n  int i = 0;\n  int j = 0;\n  \/\/ start from the (k+1)th element to replace\n  for (i = k; i &lt; n; ++i) {\n    j = rand() % (i + 1);  \/\/ inclusive range [0, i]\n    if (j &lt; k) {\n      reservoirArray[j] = v[i];\n    }\n  }\n  return reservoirArray;\n}<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Given an infinite stream (or a finite sequence of size n), choose k numbers randomly.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[48,68],"tags":[],"class_list":["post-1631","post","type-post","status-publish","format-standard","hentry","category-alg","category-random"],"_links":{"self":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/1631","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/comments?post=1631"}],"version-history":[{"count":2,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/1631\/revisions"}],"predecessor-version":[{"id":1660,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/1631\/revisions\/1660"}],"wp:attachment":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/media?parent=1631"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/categories?post=1631"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/tags?post=1631"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}