[Campaign-news] Thank You!

Wed Feb 3 22:07:51 PST 2010

Hi Bill,

K-means beat k-centers, meaning I finished the former before the latter.  It is now in the svn, and it would be great if you try the profiler on it (compile modules dataIO and timing first, since it uses their object files).  It will be one more day for k-centers.

Thanks,
Kai

On Feb 1, 2010, at 7:00 PM, whsu at sfsu.edu wrote:

> Hi Kai (and Marc),
> 
> Sorry about spacing out on getting back to you. I'm busy with a gig on
> the 19th; we should probably meet before the big meeting on the 22nd.
> We've been meeting on Fridays, but I have another meeting on the 12th
> in the afternoon as well. Will say Wed the 10th work?
> 
> I'm getting stuck in the profiling; keep getting this cryptic error message
> with the kcenters code, while the profiler works with some of the other
> examples that I've tried. Perhaps it's because of the input redirection;
> I'll try tweaking the code a bit.
> 
> Bill
> 
> Quoting Kai Kohlhoff <kohlhoff at stanford.edu>:
> 
>> Hi Marc,
>> 
>> Yes, it was a good evening, you guys are a pleasant crowd!
>> 
>> I agree with your next steps, and I will see that I stick to the  format that you have already created.  I have been trying to  simplify the code that we have and am really eager to put it into  the repository.  There is still another project that I have to work  on until tomorrow, but then I'll get to it.
>> 
>> I was thinking that we should pull the distance kernels out of the  current clustering code.  For proper modularity, these should be  called separately in each iteration and a distance matrix should be  provided to the clustering kernel in each iteration.  Also, the I/O  could be put into separate subroutines.  It might be useful, if  ultimately a user could simply write C/C++-code and the GPU  functionality would be hidden.
>> 
>> Something like:
>> 
>> 
>> #include "campaign.h"
>> 
>> campaign.checkPlatform();   // checks which, if any, GPU is present
>> data = campaign.readData(file, format);  // read data
>> data = campaign.preprocess(data, method);  //  use a selected method  to preprocess data
>> clusters.init(data);		// extracts number of data points,  dimensionality, copies data to GPU
>> for (i = 1:N)	// N iterations, data is kept on GPU between kernel  calls; alternatively use convergence criterium
>> {
>> 	distance = campaign.calcDists(data, clusters, metricType); //  metricType = e.g. "manhattan", "euclidean"
>> 	clusters = campaign.iterate(data, clusters, distance,  algorithmType);  // One iteration of algorithmType = e.g.  "kcenters", "kmeans", "birch"
>> }
>> campaign.printResults(clusters, format); // output clustering results
>> 
>> 
>> would be great to have.  If you like the idea, maybe we should start  thinking about how we could get there.  I am not sure this could  make it into our '0.5' version that Russ mentioned, but we could  talk about this.
>> 
>> It makes sense to have something out asap.  It will be fun to  increase the speed of our clustering code in subsequent iterations,  but we should start getting people to use it.  I'll try to deposit  the modules that I have at the end of the week.
>> 
>> Bill, I am looking forward to hearing about your profiling work  during our next meeting.  Your findings will surely help me write  more efficient code right from the onset.
>> 
>> When should we have our next meeting?  Given that the last one has  been awhile, I suggest not having it more than three weeks from now.   How does February 19 sound to you?
>> 
>> Cheers,
>> Kai
>> 
>> 
>> 
>> 
>> On Jan 26, 2010, at 10:21 AM, Marc Sosnick wrote:
>> 
>>> Kai:
>>> 
>>> It was great seeing you last night.  Thanks for helping me out  round out the presentation at the meeting.  Sorry we didn't have  more time to talk about our next steps during dinner, but it was  quite convivial!
>>> 
>>> As we discussed, I had ideas as to what my next steps should be,  and I just want to get your and Bill's agreement before I start.   These are in priority order:
>>> 
>>> 1) Now that we have a smoke test against which to test, take the  current code and refactor each clustering method into a proper c++  class, with a .cpp and .cu file.  This would also include scrubbing  the current code of comments and optimizing code (not including  optimizing memory handling) as if we were presenting it to the  outside world.  This would significantly help us work toward our  first release as per Russ' comments last night.
>>> 2) Take any new clustering algorithms you  have and put them into  the format that we've created up to now and as in (1).
>>> 3) Optimize memory handling and data structures.  This would be  done in tandem with Bill's profiling work.
>>> 
>>> Let me know about those algorithms you have.  Don't worry about  putting anything in the repository, we can always reorganize the  repository as we see fit, so just go ahead.  Probably the best way  would just to be to create a subdirectory off trunk/dev, put your  work in there, and do an svn add directory_name from the parent  directory of directory_name.
>>> 
>>> Again, many thanks!
>>> 
>>> Marc
>>> _______________________________________________
>>> Campaign-news mailing list
>>> Campaign-news at simtk.org
>>> https://simtk.org/mailman/listinfo/campaign-news
>> 
>> -----------------------------------------------------
>> Kai Kohlhoff, PhD
>> Stanford University
>> School of Medicine, Bioengineering
>> Stanford, CA 94305-5448, USA
>> T: ++1 (650) 724 1575
>> E: kohlhoff at stanford.edu
>> 
>> 
> 
> 
> 
> _______________________________________________
> Campaign-news mailing list
> Campaign-news at simtk.org
> https://simtk.org/mailman/listinfo/campaign-news

-----------------------------------------------------
Kai Kohlhoff, PhD
Stanford University
School of Medicine, Bioengineering
Stanford, CA 94305-5448, USA
T: ++1 (650) 724 1575
E: kohlhoff at stanford.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://simtk.org/pipermail/campaign-news/attachments/20100203/d7826e1f/attachment.html