[Campaign-news] Memory limits
whsu at sfsu.edu
whsu at sfsu.edu
Mon Feb 1 19:04:32 PST 2010
I think it's reasonable to just start with approach 1). In general, we might
think about some kind of locality-sensitive approach to the clustering. It's
fairly well-studied territory in high-performance computing, and there should
be work related specifically to data clustering.
Bill
Quoting Kai Kohlhoff <kohlhoff at stanford.edu>:
> There was another thought I had. Since memory is limited, there is
> only so much data that we can have available in one go. Since most
> clustering algorithms (that we implement anyway) require several
> passes over the data, it would be inefficient to have to transfer
> data between host and device memory once the device memory has
> filled up. Do we:
>
> 1) restrict the amount of data that a given algorithm can handle
> based on the size of GPU global memory (and how? abort with error
> message if data too large?)
> or
> 2) go through the pain of finding efficient variants to the
> clustering algorithms that require a minimal amount of memory
> transfers
>
> I would think that 1) is the better solution for now, but should we
> later do 2), or leave it to others to contribute their own
> algorithms for larger data sets? Let me know what you think.
>
> Thanks,
> Kai
>
>
>
> On Jan 27, 2010, at 1:47 PM, Kai Kohlhoff wrote:
>
>> Hi Marc,
>>
>> Yes, it was a good evening, you guys are a pleasant crowd!
>>
>> I agree with your next steps, and I will see that I stick to the
>> format that you have already created. I have been trying to
>> simplify the code that we have and am really eager to put it into
>> the repository. There is still another project that I have to work
>> on until tomorrow, but then I'll get to it.
>>
>> I was thinking that we should pull the distance kernels out of the
>> current clustering code. For proper modularity, these should be
>> called separately in each iteration and a distance matrix should be
>> provided to the clustering kernel in each iteration. Also, the
>> I/O could be put into separate subroutines. It might be useful,
>> if ultimately a user could simply write C/C++-code and the GPU
>> functionality would be hidden.
>>
>> Something like:
>>
>>
>> #include "campaign.h"
>>
>> campaign.checkPlatform(); // checks which, if any, GPU is present
>> data = campaign.readData(file, format); // read data
>> data = campaign.preprocess(data, method); // use a selected
>> method to preprocess data
>> clusters.init(data); // extracts number of data points,
>> dimensionality, copies data to GPU
>> for (i = 1:N) // N iterations, data is kept on GPU between kernel
>> calls; alternatively use convergence criterium
>> {
>> distance = campaign.calcDists(data, clusters, metricType); //
>> metricType = e.g. "manhattan", "euclidean"
>> clusters = campaign.iterate(data, clusters, distance,
>> algorithmType); // One iteration of algorithmType = e.g.
>> "kcenters", "kmeans", "birch"
>> }
>> campaign.printResults(clusters, format); // output clustering results
>>
>>
>> would be great to have. If you like the idea, maybe we should
>> start thinking about how we could get there. I am not sure this
>> could make it into our '0.5' version that Russ mentioned, but we
>> could talk about this.
>>
>> It makes sense to have something out asap. It will be fun to
>> increase the speed of our clustering code in subsequent iterations,
>> but we should start getting people to use it. I'll try to deposit
>> the modules that I have at the end of the week.
>>
>> Bill, I am looking forward to hearing about your profiling work
>> during our next meeting. Your findings will surely help me write
>> more efficient code right from the onset.
>>
>> When should we have our next meeting? Given that the last one has
>> been awhile, I suggest not having it more than three weeks from
>> now. How does February 19 sound to you?
>>
>> Cheers,
>> Kai
>>
>>
>>
>>
>> On Jan 26, 2010, at 10:21 AM, Marc Sosnick wrote:
>>
>>> Kai:
>>>
>>> It was great seeing you last night. Thanks for helping me out
>>> round out the presentation at the meeting. Sorry we didn't have
>>> more time to talk about our next steps during dinner, but it was
>>> quite convivial!
>>>
>>> As we discussed, I had ideas as to what my next steps should be,
>>> and I just want to get your and Bill's agreement before I start.
>>> These are in priority order:
>>>
>>> 1) Now that we have a smoke test against which to test, take the
>>> current code and refactor each clustering method into a proper c++
>>> class, with a .cpp and .cu file. This would also include
>>> scrubbing the current code of comments and optimizing code (not
>>> including optimizing memory handling) as if we were presenting it
>>> to the outside world. This would significantly help us work
>>> toward our first release as per Russ' comments last night.
>>> 2) Take any new clustering algorithms you have and put them into
>>> the format that we've created up to now and as in (1).
>>> 3) Optimize memory handling and data structures. This would be
>>> done in tandem with Bill's profiling work.
>>>
>>> Let me know about those algorithms you have. Don't worry about
>>> putting anything in the repository, we can always reorganize the
>>> repository as we see fit, so just go ahead. Probably the best way
>>> would just to be to create a subdirectory off trunk/dev, put your
>>> work in there, and do an svn add directory_name from the parent
>>> directory of directory_name.
>>>
>>> Again, many thanks!
>>>
>>> Marc
>>> _______________________________________________
>>> Campaign-news mailing list
>>> Campaign-news at simtk.org
>>> https://simtk.org/mailman/listinfo/campaign-news
>>
>> -----------------------------------------------------
>> Kai Kohlhoff, PhD
>> Stanford University
>> School of Medicine, Bioengineering
>> Stanford, CA 94305-5448, USA
>> T: ++1 (650) 724 1575
>> E: kohlhoff at stanford.edu
>>
>> _______________________________________________
>> Campaign-news mailing list
>> Campaign-news at simtk.org
>> https://simtk.org/mailman/listinfo/campaign-news
>
> -----------------------------------------------------
> Kai Kohlhoff, PhD
> Stanford University
> School of Medicine, Bioengineering
> Stanford, CA 94305-5448, USA
> T: ++1 (650) 724 1575
> E: kohlhoff at stanford.edu
>
>
More information about the Campaign-news
mailing list