SHAttered: SHA-1 collisions

I was reading this post about Google generating the first SHA-1 collision is at first glance fine. At first I didn’t think there was anything surprising there. We already know that hashing algorithms will collide. We’ve known that since they were invented. This isn’t news nor is it surprising. However they go on to say stuff like

You could alter the contents of, say, a contract, and make its hash match that of the original. Now you can trick someone into thinking the tampered copy is the original. The hashes are completely the same.

Well … that’s not so easy to do, I thought. Just because you can generate a collision with 2 different pieces of data, it doesn’t mean that those 2 pieces of data resemble each other. For example one could theoretically generate a hash of the Mona Lisa and a hash of Starry Night and find that they are the same but if you’re trying to convince someone that the Mona Lisa is the same as Starry Night you have a long way to go. You really need a way to tamper with the original data, say in this example Mona Lisa, and vary it subtly, say with a little darkening around the eyes, so that on close visual inspection they look identical. Additionally the hashes of the 2 blobs of data, the Mona Lisa and it’s darkened sibling, must be the same. So I was sceptical when I initially read it but it turns out that they’ve managed to come up with a technique that can alter data, at least in some specific cases, so that the copy hashes to the same hash as the original and looks like the original. That is pretty amazing.

Thoughts on Joe Armstrong’s ‘the mess we’re in’ speech

As much as I like Joe Armstrong and respect him for the work he has done on Erlang I think some of his thinking in the speech
here
is misleading. Fundamentally he seems to equate determinism with understanding. The example he uses is the worlds first program which was a few lines long and easily understandable by a human. But I can write a short program to simulate a coin toss and claim that nobody understands it because they can’t predict what the outcome is deterministically.

His spiel about 2-phase commit is misleading. He seems to think that there is something wrong with it for the following reason. The sender sends a message to the receiver. The receiver gets the message, performs the action, and sends a message back to the sender confirming that it received the message. But then how does the receiver now know that the sender knows that it has received the response message that was sent in response to the initial message? Wow that’s a mouthfull…. Anyway this question is irrelevant because the receiver doesn’t need to know this information.

At the crack of dawn

at-the-crack-of-dawn

I hope I can keep this up. I’ve been following Jocko Willink on Twitter and reading his book Extreme Ownership. Jocko has an insane, by anyones standards, start to the day. He wakes up at 4:30 and goes to his basement gym and works out. Additionally he only goes to bed around 23:00 which means he gets only 5:30 hours of sleep. How the hell do you get through the day on so little sleep? Surely you will crash at 17:00? When I first heard about his routine I thought not for me, I’ll never be able to do it It just made more sense to go to the gym at a sensible time after work and do your exercise then. And this is what I did. I’d go to the gym for a run between 19:00 and 21:00 well after the work day ended, and several years ago I had an even more disciplined routine of weights and running, however I lost the rhythm and struggled to get back into it. But after constantly seeing Jocko’s snaps at 4:30 on twitter I think the message subliminally affected me and I found myself thinking before bed go to the gym in the morning. Note that I say ‘I found myself thinking’ and not ‘I decided’. One day last week, I did. I’d damaged my foot running so I found a new gym with a pool and had a new weights program done. Then I woke up one day at 7:30 and went to the gym. The next morning was a little earlier at 7:00. Then at 6:30. And I’ve done this 6 times since. Two days on, one day off. I didn’t go straight into 6:30 wakeups. I broke it down from 7:30, to 7:00 to 6:30. I probably won’t do 4:30 wakeups since the gym doesn’t open till 6 and I like the swim – it’s a little luxury.

What I’ve noticed so far:

  • I don’t have a mid afternoon slump anymore. Normally abnout 15:00 or 16:00 I get tired at work but this isn’t happening anymore.
  • Last night I woke up at 3:30 and lay in bed trying to sleep till 6:00. I still went to the gym even on 4 hours sleep and I felt fine during the day. Of course I’d rather get a full night of quality sleep, but I have severe sleep apnea so I don’t fall asleep easily, the quality of sleep is bad and I tend to wake up during the night and struggle to sleep again. However I’m not going to let it stop me.
  • When forming a new habit it helps to mentally prepare by repeating to yourself what you want to do. When your last thought before going to bed is I will go to the gym when I wake up it really is so easy to do. It’s a form of mental programming, or programming your subconcious mind. Maybe there’s some hard research out there that backs this up, but whenever I’ve needed to form a new habit, the first thing I do is mentally repeat to myself what I want to do during the weeks prior. That way your subconcious mind takes control and tells you what to do.
  • How to kill jobs in Elixir.

    Sometimes a previous run of a process fails but doesn’t exit cleanly, especially when one is doing the whole edit, compile, debug cycle. For example, I have a program that registers an agent by module name:

    {:ok, pid} = Agent.start_link(fn -> {[], tokens} end, name: __MODULE__)

    If the calling program exits and I restart it I get this error:

    (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.233.0>}}

    To remedy this you need to kill the process that has already started. Process manipulation functions are in the Process module.

    iex(131)> h Process.exit/2
    
                                 def exit(pid, reason)
    
    Sends an exit signal with the given reason to the pid.
    
    The following behaviour applies if reason is any term except :normal or :kill:
    
      1. If pid is not trapping exits, pid will exit with the given reason.
      2. If pid is trapping exits, the exit signal is transformed into a
         message {:EXIT, from, reason} and delivered to the message queue of pid.
      3. If reason is the atom :normal, pid will not exit (unless it is the
         calling process's pid, in which case it will exit with the reason
         :normal). If it is trapping exits, the exit signal is transformed into a
         message {:EXIT, from, :normal} and delivered to its message queue.
      4. If reason is the atom :kill, that is if exit(pid, :kill) is called,
         an untrappable exit signal is sent to pid which will unconditionally exit
         with exit reason :killed.
    
    Inlined by the compiler.
    
    Examples
    
    ┃ Process.exit(pid, :kill)

    The next question is how to create pid? There is a pid function that takes the process numbers from the error message.

    iex(132)> Process.exit(pid(0,233,0), :kill)

    Note that a similar process applies if using Erlang.

    Did you know Apples contain cyanide?

    Did you know apples contain cyanide? Actually, it’s only half true. They contain a cyanide compound in their seeds that will only becomes toxic when you eat a stupid amount of them, or when you have to pay for an Apple Macbook 2016. OMG! WTF!!!

    Why Apple? Maybe it’s the fancy lcd touch bar that I’m sure the folk at Apple designing it agonized over for 3 hour daily meetings every day of every month for a year, while sipping their water bottles, presenting keynotes covering topics like: How big should the icons be? How many should we allow? Should a finger obscure an icon or should there be some of it’s edges protruding? What’s the average finger width? Is that male of or female? Child or adult? Human or animal? Do animals have fingers and shouldn’t we be using the term paws instead? No doubt the design team at Apple are paid handsomely for their intellectual prowess and ability to provide insight into these deep questions. And, no doubt, that’s why the product costs so much.

    This is a genuine problem for people who need a Unix like operating system supported by a big vendor. Apple is the only game in town. If you want a Unix like OS without Apple then be prepared to support it yourself. Buying a Windows laptop isn’t an option for me unless I want to run my Linux development machine in a virtual machine in addtion to the 3 or 4 other virtual machines I need to run to test the code for the clusters I work on. What about buying a Windows laptop and installing Linux over it? I instant messaged Dell on their customer support website and they basically said installing Linux would void the warranty. I wasn’t keen to void the warranty. I had look on the net and System76 popped up. They build laptops with Ubuntu installed except it would have to be shipped from the USA to Australia, and I wasn’t keen on shipping that far, or shipping back there if needed to use the warranty. I wanted something local. Then I saw Metabox, a small independent laptop manufacturer in Perth. Their laptops have an option no operating system which meant I could install Linux on it. The price for a laptop with the same specs as an Apple Mac was $2500 less. So for about $1500 I got:

    • 525GB SSD
    • Nvidia 960M
    • 16 GB RAM
    • 15″ 1920×1080 display
    • Intel i7-6700HQ
    • No operating system

    I figured this would suit me so I took a chance on a small operator and bought it. Physically I like the laptop. It’s chunky. It feels sturdy. It’s not trying to be sleek or a Macbook clone, since it’s a gaming laptop (no idea why people want to game on a laptop – doesn’t make sense to me. Maybe they travel alot.) The keyboard is great to type on. The next trick was to get linux installed on it. I’ve become used to using i3 as a window manager in Linux. It’s very good for those who don’t like to use the mouse. Similar to ratpoison in that sense. Manjaro has an i3 community edition that installs i3 out of the box, and being an Arch user Manjaro is familiar since it’s an Arch derivative. So I downloaded the latest manjaro i3 edition, popped it onto a USB, booted the laptop into the installer and installed. Installation was straighforwared but on reboot, the login display manager didn’t load and I was left with a console to login. I suspected it was something to do with Nvidia and the Intel graphics cards conflicting. Recalling a similar problem I had with an older laptop, that had an Intel i915 and Nvidia gpu built it, I read the thread above again. This time I checked the BIOS for an option to disable the hybrid GPU and forced it to use Nvidia. In the BIOS the options show as “MSHYBRID” or “DISCRETE”. I changed it to “DISCRETE”. Then reinstalled Manjaro using the non-free drivers (the Manjaro installer allows you to choose between free or non-free. I was surprised by this, but it makes life easier when dealing with Nvidia, since I’d rather use the Nvidia drivers over the reverse engineered Nouveau stuff). The only short coming of this solution is I have to use the Nvidia GPU which means that battery life might be compromised. I’m not sure if it works with an external display either, since I haven’t tested that yet. So for $2500 less I get a Unix like laptop. I don’t think Apple can justify their laptop being worth $2500 more. It’s unclear what I’d be paying for? Better support?

    Written in Asciidoc using Vim.

    Median Tracking

    Median tracking is the term given to the process used to track the median in a given array as elements are inserted into an array. Remember that the median is not the average value but the value in the middle. So, for example, in an array of 3 items [1,2,3] the median element, assuming 0 based indexing, is the item at index 1, which is 2. Now we already know that a heap is a good data structure for tracking the minimum or maximum of any value. With a heap we know that we can read an array and push elements as they are read onto the heap and, if the heap is a max heap, the largest of the elements will also be at the top. If it’s a min heap the smallest of the elements will be at the top ready to be popped whenever we want. We can exploit this utility using 2 heaps to track the median value. One heap is a max heap containing the lowest numbers of the array and the other is a max heap containing all the minimum values in the array. Thus, the top of the max heap becomes the max of all the mimimum values and the top of the min heap becomes the min of all the maximum values. The median element is then one of those two values. If the heap sizes are uneven then pick the element at the top of the largest heap to be the median. If the heap sizes are even then pick one heap and always use the top of that heap. The algorithm is shown in C++ below.

    The full source code is here

    Additionally it would be trivial to extend this idea to track the kth smallest number by limiting the size of the max heap to be k.

    Karger’s Min Cut

    A minimum cut of a graph is the minimum number of edges of a graph that is needed such that the graph remains connected. Karger’s min cut algorithm is a process that takes a given graph finds these edges. It’s a random algorithm because the edge it picks at each stage is randomly choosen. That edge is removed and one endpoint of the edge collapses onto the other endpoint. The edges of attached to the collapsed vertex attach to the new vertex, the vertex that is collapsed to. The process repeats, choosing at random another edge, and collapsing the end points. The process is illustrated in the following sequence of diagrams.

    Initial Graph

    Here the edge {1,3} is randomly selected and we decide to collapse vertex 3 to vertex 1.

    Collapse edge 1 3

    This results in the following graph where vertex 3 has disappeared into vertex 1 and it’s edges are now effectively attached to vertex 1. Now we select one of the edges {1,2} and collapse vertex 2 onto vertex 1.

    Collapse edge 2 1

    This results in the following graph with a loop from vertex 1 to vertex 1. This was the other {2,1} edge in the previous graph. Self loops like this are removed. This is the final graph. Once we get to 2 nodes we stop. We can see that the number of edges is 2 therefore the minimum number of edges for a minimum cut of the graph is 2.

    Final

    Note that this isn’t the only solution. The algorithm, because it randomly picks edges, could have come up with a min cut of 3. Therefore you have to run the algorithm multiple times to increase the probabilty of getting the mimimum cut.

    My first attempt at this was in C++. I modelled the adjacency list of vertices using a vector and each element contained a vector of edges that were connected to the vertex. Each edge contained two endpoints which were pointers to vertices. The adjacency list of edges was a vector of these edges. My first attempt didn’t work. I got my head in spin trying to logically reason about pointers to vertices to edges that contained pointers to vertices, and the whole thing didn’t work properly. I was surprised that I couldn’t get it right, and frustrated since I’d spent a few days on it. Eventually I bit the bullet and decided to start from scratch again, but this time, by choosing a simpler representation of the graph. I still used an adjacency list but instead of pointers used the vertex labels. For example, the vertex list for the graph above is [1 => [4,3,2], 2 => [1,3], 3 => [2,4,1], 4=>[1,3]. The corresponding edge list is [{1,2},{2,3},{3,4},{4,1},{1,3}]. Then, with a pen and paper I stepped through what would happen if an edge was removed and the endpoints collapsed. It allowed me to observe the changes to the lists themselves and come up with a different algorithm for collapsing a vertex, one that was not based on the pictorial representation above, that I coded in C++.

    • Pick a random edge, call the endpoints i and j. Let j be the vertex that is collapsing.
    • Remove edge {i,j} from the list of edges
    • In the list of edges replace all occurrences of j with i.
    • Remove self loops from the edge list.
    • Find the vertices connected to vertex j excluding i
    • Append those vertices to the vertex i
    • Remove vertex j from the list of vertices.
    • For each vertex, replace all occurences of j with i in that vertex’s list of vertices.

    I reimplemented the algorithm in Elixir and decided not to publish the source code since this is an assignment question from the Stanford Algorithm Design course on Coursera.

    Written in Asciidoc using Vim.

    No Debugging Programming in Elixir

    At last I have found a language that I do not need to debug. I can just write the code and have it run on the input data, and it will give me the correct result. The language is elixir-lang.org. This is the Holy Grail!

    So this is what happened. I’m studying algorithms and doing a course and one of the assignments was to write a inversion counter. So what’s an inversion? It’s basically where, in a list of numbers, a number is out of sequence with respect to another number in the list.

    Example of no inversions

    [1,2,3,4]

    Example of inversion. Here 3 is out of order.

    [1,3,2,4]

    Here there are 6 inversions.

    [4,3,2,1]

    A brute force algorithm that calculates this would have check each number against all the numbers in the list and therefore would be O(n^2) which is inefficient. The trick is to piggyback on mergesort which has a running time of O(nlogn). The reason this works is because during the merge step of the mergesort we detect inversions. When writing the larger array from the two subarrays the process detects inversions every time you find an element in the left array that is larger than one in the right array. At the end of the day a few subtle modifications to the mergesort algorithm is all that’s needed. And that’s what I did in Elixir. I stumbled along and my tired fingers and sore eyes cranked out some Elixir. The biggest surprise was it worked (ignoring compilation errors) right from the start. I didn’t have an input case that produced the wrong answer for the assignment. I didn’t have to go back and rewrite any of the pattern matches. I didn’t have to fix the recursion because I’d put the calls in the right place. Here it is.

    The problem with the solution above is it is slow. On an input set of 100000 numbers it took 30 seconds to compute. It contains a hidden operation that makes the summarize function O(n^2). The hidden operation is the ++ that appends the element to the end of the list. To do this it has to walk to the end of the list, which is an O(n) operation thus turning the summarize function from an O(n) operation to an O(n^2) function. The way to fix it is to prepend the head on the list and then reverse the list later as shown below.

    Applying algorithms intuitively

    Algorithms need to be internalized such that you can produce a solution without conscious effort, similarly to how you produce words without thinking explicitly about them. In fact thoughts only are created in the subconscious and the consciousness becomes aware of them. There’s no conscious control of them. However what you do have control over is what you program into your subconscious via learning and repetition and active problem solving. I believe that this can be applied to different problem domains such that, after training for a period of time, one can intuitively and a subconscious level see a real world problem and map it to a set of algorithms that solve it.