Skip to content

Commit

Permalink
Merge pull request #6 from lacop/dcj
Browse files Browse the repository at this point in the history
Add Distributed Code Jam
  • Loading branch information
pabloheiber authored Jul 12, 2023
2 parents 24ef947 + 0428413 commit 93eaf00
Show file tree
Hide file tree
Showing 103 changed files with 9,124 additions and 0 deletions.
24 changes: 24 additions & 0 deletions distributed_codejam/2015_finals/analysis_intro.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<p><b>bmerry</b> is the first ever Distributed Code Jam Champion!</p>

<p>The first ever Distributed Code Jam finals are over, and they were a pretty exciting event! With ten participants fighting for the title, we knew the race was going to be pretty intense. The contest was opened by bmerry submitting a solution to the small input for Necklace, followed closely by MiSawa submitting a solution to the small input for Kolakoski, both before the half-hour mark. At this point, the solutions started pouring in - after an hour, all but one contestants had at least one small input to their name, and half of them also having a submission to the large input of Necklace.</p>

<p>The submissions to Necklace large and Kolakoski small continued streaming in. The next breakthrough came at one hour and 40 minutes, when Shik took the lead with a submission to Kolakoski large. He didn't hold it for long, though, as just four minutes later bmerry submitted a solution to Shipping small, followed almost immediately by a submission to Shipping large, netting him a huge advantage over the rest of the contestants.</p>

<p>The other contestants had no way of knowing that while bmerry's small submission was fully correct, the large submission was just a bluff - and a very successful one, as it seems to have focused others on the Shipping problem, instead of the significantly easier Rocks. The solutions to the two easier problems continued streaming in (in particular bmerry strengthened his lead by submitting a solution to Kolakoski large a bit before three hours were up), as did incorrect submissions to Shipping-small.</p>

<p>The two contestants to attack the rocks problem were MiSawa, with a small submission three and a half hours into the contest, but without an attempt on the 53 points to be gained from the large, and bmerry, who submitted the small around two and a half hours in, and tried a slightly too slow submission for the large just before the end of the contest. Meanwhile, most contestants were attacking Shipping, with Marcin.Smulewicz being the only successful one - his successful submission for the small input gave him, in the end, the second place, after bmerry. Shik came in third, with correct large submissions for Kolakoski and Necklace, followed by MiSawa (who had Necklace large, Kolakoski small and Rocks small) and ZbanIlya with Necklace large and Kolakoski small).</p>

<p>Congratulations to the medalists and all the finalists, and hope to see you next year!</p>

<hr/>
Cast<br/>

Problem B. Kolakoski. Written by David Spies and Onufry Wojtaszczyk, prepared by David Spies and Onufry Wojtaszczyk.<br/>

Problem C. Necklace. Written by John Dethridge, Chieu Nguyen and Onufry Wojtaszczyk, prepared by Joachim Bartosik, Tomek Kulczyński and Onufry Wojtaszczyk.<br/>

Problem D. Rocks. Written by Onufry Wojtaszczyk and Chieu Nguyen, prepared by Onufry Wojtaszczyk.<br/>

Problem E. Shipping. Written by John Dethridge and Chieu Nguyen, prepared by Onufry Wojtaszczyk.<br/>

Platform development and support: Onufry Wojtaszczyk, Andi Purice, Maciek Klimek, Jarek Przybyłowicz, Joachim Bartosik, Bartek Janiak, David Spies, Neo Liu and many others.<br/>
11 changes: 11 additions & 0 deletions distributed_codejam/2015_finals/kolakoski/analysis.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<p>There are three, very different, potential solutions to this problem, and we will go over all of them.</p>

<p>Let's begin with the naive, single node solution (which doesn't run in time even for the small input). We can construct the Kolakoski sequence as we go, in an array, keeping two pointers into the array - the last constructed element, and the element that was used to describe the group containing the last constructed element. For instance, if we constructed the sequence 1,2,2,1,1,2,1,2,2 so far, then the last constructed element is the 9th (1-indexed), which is a 2, and the element that describes it is the 6th (which is also a 2, and describes the "2,2" run at places 8 and 9). So, we can construct the whole sequence up to GetIndex() this way, and then multiply each element by the appropriate GetMultiplier(). Since the input can go up to 3 billion, and we have just 700MB of memory, we need to store the sequence efficiently - so store one value per bit. Again, as noted, this will not run in time even for the small input. One thing we should remember from this is that to calculate the next elements of the sequence all we need is the elements that describe it, and the information what was the last element before our elements.</p>

<p>One way to speed it up is to use a bit-calculation trick of some sort, to calculate more than one value of the sequence in a single operation. For instance, we can precalculate for each possible sequence of 20 elements and each possible previous element (2<sup>21</sup> choices in all) what will they describe (as a bitmask), and use this to consume 20 elements of the describing sequence at a time. This will speed up the calculation of the sequence roughly 10x, so we will be able to compute the Kolakoski sequence up to 3 billion on one node. We still need to call GetMultiplier 3 billion times, which is too many to do on a single node, but this can be trivially distributed: we have each of the 100 nodes calculate the whole sequence, and then each node actually does the dot-product for only its own shard of the sequence.</p>

<p>A different approach is to try and shard the "expanding". Imagine we have a part of the sequence in hand in our node. This describes a later part of the sequence. This later part describes in turn an even later part, and so on. If we have each node calculate, say, up to 10<sup>7</sup> first elements of the sequence, and then we take the suffix of those 10<sup>7</sup> that describe later elements; and we shard this suffix into 100 parts, we can have each node do its own expanding. There are two pieces of information we need to effectively expand such a part of the sequence - what the first element of the expanded sequence should be (1 or 2), and what the index of the first element in the whole sequence is (so we know what to multiply by).</p>

<p>One way to get this data is to precalculate and hardcode. We can write code on our machine that actually calculates these numbers for each node and for each expansion in a few minutes; and hardcode the values into our solution. Having those, we can easily expand the sequence on each node as needed. Other similar hardcoding-based approaches are also possible, based on the intuition that only <i>log(N)</i> state is needed to expand the sequence up to the <i>N</i>th element.</p>

<p>The last possible approach is the most "distributed" one. Note that we can expand a sequence without knowing the two bits of information - we will just expand it into a sequence of digits, but we will not know which digit is a "1" and which one is a "2". So, we can have all the nodes do a single expansion in parallel. Once they're done, we can do a message-passing phase, where, starting from the first node, each node receives information about what the first element in its sequence and its index are, in constant time calculates the first element and the index of the next node's sequence, and passes that along. After that is done, each node can calculate the dot product and do another expansion (without knowing the first element and offset) in parallel.</p>
61 changes: 61 additions & 0 deletions distributed_codejam/2015_finals/kolakoski/statement.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
<h3>Problem</h3>
<p>
The Kolakoski sequence is defined as follows, where A(i) is the i-th term in the sequence:
<ul>
<li>A(0) = 1</li>
<li>A(1) = 2</li>
<li>The sequence is composed entirely of alternating runs of 1's and 2's</li>
<li>A(i) is the length of the i-th run.</li>
</ul>
This completely and uniquely defines the sequence.
</p>
<p>
The first twenty terms of the sequence are as follows, where the lines mark the alternating runs of 1's and 2's:<br/><br/>
<code>1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 1 1 2 2 1</code><br/>
<code>_ ___ ___ _ _ ___ _ ___ ___ _ ___ ___ _</code><br/>
<code>1&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;1&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;1&nbsp;&nbsp;2&nbsp;&nbsp;&nbsp;2&nbsp;&nbsp;1</code><br/><br/>
By collecting the lengths of each run, we obtain the same sequence again.
</p>
<p>
You become mystified contemplating the elegance of the Kolakoski sequence and after staring at its 1's and 2's for far too long, you begin to wonder if maybe you should spice it up a little and introduce some more numerical variety to the terms.
</p>

<p>So you decide to assign an arbitrary coefficient to each index in a manner such as the following:<br/><br/>
C(0)=1<br/>
C(1)=3<br/>
C(2)=1<br/>
C(3)=5<br/>
C(4)=2<br/>
C(5)=2<br/>
<br/>
By multiplying the first 6 terms each by their coefficient and summing, we get<br/><br/>
1*1 + 3*2 + 1*2 + 5*1 + 2*1 + 2*2 = 20.
</p>

<p>
Given a mapping from index to coefficient, find the dot product of the first <b>N</b> terms of the Kolakoski sequence and their respective coefficients.
</p>

<h3>Input</h3>
The library "kolakoski" will contain two functions:
<ul><li>GetIndex() which returns <b>N</b>, the number of terms we wish to sum; and</li>
<li>GetMultiplier(i) which takes an index i and returns the coefficient (a number from 0 to 50) for that index.</li></ul>
A single call to GetMultiplier will take approximately 0.005 microseconds.

<h3>Output</h3>
Output one number: the weighted sum of the elements of the Kolakoski sequence.

<h3>Limits</h3>
Each node will have access to 700MB of RAM.<br/>
Your solution will run on 100 nodes in both inputs.<br/>

<h3>Small input</h3>
GetMultiplier(i) will always return 1, for all the inputs.<br/>
1 &le; GetIndex() &le; 10<sup>9</sup><br/>
Each node will have a time limit of 10 seconds.<br/>


<h3>Large input</h3>
1 &le; GetMultiplier(i) &le; 50 for all i<br/>
1 &le; GetIndex() &le; 3 &times; 10<sup>9</sup><br/>
Each node will have a time limit of 12 seconds.<br/>
7 changes: 7 additions & 0 deletions distributed_codejam/2015_finals/necklace/analysis.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<p>This problem has the simplest sharding model of all the finals problems. We simply assign a piece of the necklace to each node, and calculate a sub-answer for this piece of the necklace in each node, then merge them together.</p>

<p>The sub-problem we solve for each piece of the necklace is: for each position in the message string, calculate the longest substring of the message beginning at this position that is a subsequence of our part of the necklace. This is O(|message|) information we need to ship out of each node to some master. Once we have this information from each node, calculating the final answer is easy - for each possible starting position in the message string, we check how long a substring beginning at this position can is a subsequence of the whole string - first cover as much as possible by the first node, then (starting from where the first node finished) cover as much as possible by the second node, and so on.</p>

<p>How do we solve the sub-problems on a single node? For the small input, we can do a DP, where the state DP[position in necklace][position in message] is "how much of the message have we already covered up to this point". The runtime is O(|message| |necklace| / NumberOfNodes()), with a pretty trivial extension rule. This is enough to solve the small input. </p>

<p>To solve the large input, we can't afford to touch all |necklace| positions for each character in the message. One way to avoid that is to reorganize the necklace - for each charcter, store the ordered list of positions on which the character appears. This takes O(|necklace| / NumberOfNodes()) to build. Then, for each starting position in the message, we can greedily append characters. If the last character we appended was at position, say, X, and the next character to append is some c, then we can binary search for the first occurrence of c after X. Doing all these binary searches will run in O(|message|<sup>2</sup> log |necklace|) time, which will be fast enough.</p>
45 changes: 45 additions & 0 deletions distributed_codejam/2015_finals/necklace/statement.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<h3>Problem</h3>
<p>
You've come up with the coolest idea ever for a new fashion trend: customizable necklaces made out of strings with beads that display letters and other characters! The beads appear only on the front of the necklace and read only in one direction, so the string of characters is not circular and irreversible. By itself, this is not really a new idea. The awesome new feature you have in mind is to add a button that lights up some of the beads so that they display a secret message consisting of characters that form a subsequence of the main string of characters. This will have so many applications... just think of the possibilities! And it's so shiny! People are going to love it! Everyone will want their own!
</p>

<p>
So you announce this product, allowing people to place orders for necklaces by specifying the string of characters to be displayed on the necklace as well as the secret message to be lit up when they press the button. The orders come pouring in! Your idea is even more popular than you expected! How exciting!
</p>

<p>
Unfortunately, after examining a few orders, you realize that you forgot to check the crucial constraint that the secret message has to be a subsequence of the main necklace string. Without that, the secret message can't always be lit up entirely.
</p>

<p>
You don't want to disappoint your customers by just telling them that it is impossible to light up their secret messages in the chosen necklace strings. So you decide to offer them an alternative message by finding a substring of their secret message that forms a subsequence of their necklace string, in case they would be satisfied with this shorter version. You want to maximize the length of such a substring.
</p>

<p>
Given a necklace string <b>N</b> and a secret message string <b>M</b>, find the maximum length of a substring of <b>M</b> that is also a subsequence of <b>N</b>.
</p>


<h3>Input</h3>
The input library is called "necklace"; see the sample inputs below for examples in your language. It defines four methods:
<ul><li>GetNecklaceLength(), which returns the length of the necklace string</li>
<li>GetNecklaceElement(i), which returns the i-th (0-indexed) element of necklace string</li>
<li>GetMessageLength(), which returns the length of the secret message</li>
<li>GetMessageElement(i), which returns the i-th (0-indexed) element of the secret message.</li></ul>

A single call of GetNecklaceElement or GetMessageElement will take up to 0.02 microseconds.

<h3>Output</h3>
Output one integer - the maximum length of a substring of Message that is also a subsequence of Necklace.

<h3>Limits</h3>
0 &le; GetNecklaceElement(i), GetMessageElement(i) &le; 10,000<br/>
1 &le; GetNecklaceLength() &le; 10<sup>9</sup><br/>
Each node will have access to 256MB of RAM and a time limit of 5 seconds.<br/>
Your solution will run on 100 nodes (both for the small and the large input).<br/>

<h3>Small input</h3>
1 &le; GetMessageLength() &le; 100<br/>

<h3>Large input</h3>
1 &le; GetMessageLength() &le; 3000<br/>
Loading

0 comments on commit 93eaf00

Please sign in to comment.