Distributed Systems – Massive Distribution for Performance

September 10, 2009 by Badgerati Leave a comment

Definition of a Distributed System

A distributed System is one in which independent, self-sufficient – often autonomous or heterogeneous – spatially-separated components must use a common interconnect to exchange information and coordinate actions, and allow the whole to appear to the user as one single coherent system.

Knowledge is POWER!

Think for a moment, what could you do with a computer? Yeah, your right – sod all really, specially if its Windows 😛 . Now, what about 100 computers? That’s right, now were getting somewhere! Now, let’s look at something better. What about an entire data centre comprising thousands of computers! Well, put simply – holy monkeys that’s a lot of computers, its… its like being in heaven 😀 !

Parallel vs. Distributed

Do you remember the parallelization schemes that a distributed system can be placed into? If it helps, its:

SIMD
MISD
MIMD

Where a distributed system is MIMD-parallel computing when the interconnect is on a network-scale.

A Parallelization Example

Say you were given a list ‘a’, which contained 60 items, along with some integer ‘b’.

Now, you and 2 other people were asked to produce another different list, ‘x’, which is identical to list ‘a’, however, every item has been incremented by 1, and the first occurrence of ‘b+3’ has been removed. How would you do it?

Let’s say that it is possible to partition the work, so all 3 people can work in parallel. This would finish the job much quicker. This idea is commonly known as the divide-and-conquer strategy. And we also can say that a parallelization pattern called master-and-workers applies here.

Master-and-Worker

Master-and-worker is when you split the work load evenly between a number of processes that will work in parallel to get the correct answer. Say for example we use the example above. Here we have 3 people and a list of 60 elements. In this case you would split up the work load so each person has a list containing 20 elements from ‘a’:

a₁ = [0:20]
a₂ = [21:40]
a₃ = [41:60]

Here, we will have 3 new lists: x₁, x₂ and x₃, where each element has been incremented by 1.

Now what we do it get all 3 new ‘x’ lists, and combine them together to get the one list: ‘x’, which is every element in ‘a’, but incremented by 1.

Now all we need to do it remove the first item in the list that is equal to ‘b+3’.

This process is called Split, Spawn and Merge. First we split the list, then we spawn methods to return a new list ‘x’, then we merge all the new lists together to get the one new list ‘x’.

Map-Reduce

The map-reduce model – would you believe it – comprises of the map and reduce second-order functions.

MAP

In its simplest form, a map takes a unary function ‘f’ and a collection [c₁, ···, c_n], and then returns a collection:

[f(c₁), ….., f(c_n)]

as you can see, the map will apply the required function onto each element of a given collection. Here’s an example:

Say we have a collection ‘A’
>>> A = [1, 2, 3, 4, 5]
Then we have a function, say: incr(x), which will return the value ‘x+1’
>>>def incr(x):
return x+1
Now we declare a new collection: ‘B’ = map(incr, A)
>>>B = map(incr, A)

Note how the first argument is the function incr, and the second is the collection ‘A’.

Once this is done, the map will apply each element of ‘A’ to the function. And when we print out the new collection ‘B’ we get:
[2, 3, 4, 5, 6]

REDUCE

In its simplest form, reduce takes a binary, associative function, Θ (with an optional, initial value ‘i’) and a collection [c₁, …, c_n], and then returns a new value:

i Θ c₁ Θ … Θ c_n

So reduce will apply the function to the elements of the given collection. Lets have an example:

Lets use the collection ‘B’ from the map above:
>>> B = map(incr, A)
Then we define a binary associative function, that takes to numbers for arguments:
>>>def prodDup(x,y):
return (x*2)*(y*2)
Now we apply the function to the collection ‘B’ using reduce:
>>>reduce(prodDup, B)
When printed, we obtain the value 184,320.