[Campaign-news] Smoke Test

Kai Kohlhoff kohlhoff at stanford.edu
Fri Nov 6 10:14:30 PST 2009


Hi Bill and Marc,

Sorry for my late reply.  I have not been well myself the last few days.

First, I wanted to ask you if you are fine with me putting your names on a 
poster that I would like to present at BCatS tomorrow.  The title is "Using 
graphics processors for the clustring of biological data sets', and it will 
present the general idea of our clustering library and my first speed 
measurements.  I am sorry for the very late notice.  I am not even sure I 
will be well enough to actually join the conference and present it (or if I 
will be able to finish the poster on time).

I talked to Tianyun about the FEATURE data set.  The way it looks we can 
merely use that data for speed comparisons, as the actual biophysical 
meaning of the derived clusters is still unclear, i.e. we won't be able to 
tell what is a 'good cluster'.  My tests so far have been with randomly 
generated data.  This of course is not ideal for clustering, in which one 
assumes the presence of some kind of pattern or structure.  It is sufficient 
to check for correctness between

More promising should be the use of protein structures taken from molecular 
dynamics simulations as we might have ways of telling how those are related. 
The question is whether we actually want to care about 'meaningful' clusters 
in our test set (this might be more important once we start coming up with 
completely new clustering algorithms), or leave it to the user to decide 
which algorithm to use and how to set the parameters.

Maybe we could best discuss this when we meet next week.  Will it be at 
Stanford this time?  For me, Tuesday or Friday are generally the best days, 
but I could fit in a meeting on other days as well.

- Kai





----- Original Message ----- 
From: <whsu at sfsu.edu>
To: "Kai Kohlhoff" <kohlhoff at stanford.edu>
Cc: <campaign-news at simtk.org>
Sent: Thursday, November 05, 2009 11:17 AM
Subject: Re: [Campaign-news] Smoke Test


Hi Kai,

Sure, let's meet next week to firm up our current conversations about
setting up the repository, and test data sets. Marc is not feeling
well this week, and we've been busy battling some problems on another
GPU-related project of ours, so things haven't been moving on that
front.

I can't do Monday the 9th, but the rest of the week looks ok.

Bill

Quoting Kai Kohlhoff <kohlhoff at stanford.edu>:

> Hi Bill and Marc,
>
> I have a small FEATURE data set that we can use for testing, bit will
> also look for something bigger. For k-centers I just completed a
> protein rmsd kernel.
>
> K-centers is currently deterministic for a given seed, but this comes
> at a performance cost. This can be easily overcome, though, because
> clusters are always the same. Only their sequence changes.
>
> I will be back in the office on Wednesday and will get back to you then.
>
> Would it make sense to have another meeting the week after next?
>
> - Kai
>
> Sent from my kaiPhone.  Apologies for brevity or unusual tone.
>
> On Oct 30, 2009, at 14:39, whsu at sfsu.edu wrote:
>
>> I assume Kai probably has a small test data set that he uses?   k-centers 
>> can probably be deterministic if we fix the initial seed   cluster, and 
>> have a tie-breaking mechanism for data points that are  equi-distant from 
>> two clusters (there might be something in Kai's   code already). I need 
>> to spend a bit more time with the k-means   code...
>>
>> Bill
>>
>> Quoting Marc Sosnick <marcsosnick at mac.com>:
>>
>>> Kai, Bill:
>>>
>>> Before I post to the software repository, I want to have the initial
>>> reorganization of the system completed.  To do this I will be rewriting
>>> all the scripts in PERL and normalizing the deployment process.
>>>
>>> To be in compliance with standard SE practices, we should have some
>>> sort of smoke test against which we can test so that we don't post
>>> non-working code to the library.  I was wondering if either of you had
>>> a good idea as to what that smoke test could entail?  Perhaps a known
>>> dataset and algorithm against which we could test?  Of course, it would
>>> mean that whatever we choose, the result must be deterministic.
>>>
>>> I look forward to your ideas...
>>>
>>> Thanks!
>>>
>>> Marc_______________________________________________
>>> Campaign-news mailing list
>>> Campaign-news at simtk.org
>>> https://simtk.org/mailman/listinfo/campaign-news
>>
>>
>>
>> _______________________________________________
>> Campaign-news mailing list
>> Campaign-news at simtk.org
>> https://simtk.org/mailman/listinfo/campaign-news






More information about the Campaign-news mailing list