More Advanced Starfish Feature
I promised you in Dynamically Add Methods to Classes Through their Objects in Ruby that there was a good use for that idea coming up. The time has come to show you how to use it.
server do |map_reduce|
map_reduce.type = File
map_reduce.input = "some_file_name.txt"
map_reduce.process = lambda do |text|
do_something_on_the_server(text)
end
map_reduce.finished = lambda do
do_something_when_the_collection_is_totally_processed()
end
end
client do |line|
if line =~ /some_regular_expression/
server.process($1)
end
end
Notice how I am dynamically adding methods to map_reduce in the server declaration. I define the process and finished methods. The process method is called from the client via server.process and the finished method is called when the collection has been fully processed.
Astute readers will notice that being able to dynamically add server side helper methods does a non-distribtued version of reduce (from MapReduce), which is good enough for many real world situations. Enjoy!

7 Comments:
This was asked in a previous post but never answered, so I'm going to ask it again.
In the example above, it appears to have a file as input to process.
Let's say that file contains a listing of unique e-mail addresses. The e-mail addresses are to persons that you want to send an e-mail message to.
How does Starfish prevent multi-processes sending multiple e-mail messages to the same unique e-mail address found in the input text file?
In other words, how does Starfish prevent an e-mail address found in the text input file from receiving more than one e-mail message?
1:21 PM, August 23, 2006
It uses a queue system, so two clients can never grab the same line from a file because once grabbed, that line is not accessible to any other clients. It also has a simple mutex to prevent two calls at the exact same time.
1:25 PM, August 23, 2006
Lucas:
Does Starfish wait for the results of each line to process before dispatching the next "line" to be processed?
Or does it read in all the lines, queues them up and then dispatches each line to be processed. This is with the understanding that some "lines" might take a shorter amount of time to process than others and thus - the results are returned to the dispatcher out of order. If this is the case, would you mind explain how Starfish handles this case since sometime you might need to know the results of line 1 before line 2
1:38 PM, August 23, 2006
Jeff:
How is Starfish different than using Mongrel with regards to HTTP requests?
1:40 PM, August 23, 2006
Starfish reads in all the lines, queues them up and then dispatches each line to be processed. In the case where you might need to know the results of line 1 before line 2, you would have to add a server helper method for that logic, since it is very possible line 200 might be processed before line 199.
1:51 PM, August 23, 2006
What's so confusing about the reddit complaints is that they don't really make sense given the context. Ruby is all about using component libraries to speed up individual tasks through C. Ruby has a fantastic interface for this
Saying, "Abandon Ruby, use C for numerical computation" is basically saying, "Use Ruby and a good library like Narray or Ruby/GSL, but write 4x the code."
2:32 PM, August 25, 2006
what do you think about backgroudrb compared with starfish?
1:10 AM, March 28, 2007
Post a Comment
<< Home