GSoC in Shogun – week 3

During this week, I focused on removing the global random variable. Basically speaking, I removed the global random(sg_rand), then introduced a variable(CRandom* m_rng) in SGObject and which can be used everywhere.

Benchmark for stander random and CRandom

The original idea of Wiking and me was to replace CRandom with the c++11 random framework. So we did some benchmark experiments about the performance of the standard random generator and CRandom, and the results we got were very interesting: we found, a huge difference between the gcc and clang implementations (left and right, respectively).

we found, as wiking said, there are some crazy difference between the clang and gcc’s implementation(the left one is the result obtained from gcc and the right one from clang).

Screen Shot 2017-06-27 at 10.17.46 PMScreen Shot 2017-06-27 at 10.18.04 PM

From the result you can see gcc has better random implementation than clang, and CRandom is better than the C++11 random generator on clang(\o/)

Remove sg_rand

Even though we know that the C++11 random framework could improve our performance on random, if we want to build it into Shogun, we need to start the refactor with small steps, because random is used so much. In order to avoid broken unit tests, we need to get rid of sg_rand and use a variable in SGObject to instead of it. I haven’t done this job yet, because even though it only needs adding a couple of lines to make things work, there are over 100 files involved into this issue so I must test each thing carefully first. Hopefully I can commit a pr within this week, and the progress about this issue can be found in get_rid_of_sg_rand branch.

Some other issues

In the last week, I did some GMM refactoring by SGVector and linalg lib, and fixed the broken GMM notebook. When working on GMM and Gaussian, I found that their max likelihood computation are actually duplicated. And as Heikos commented on the pr, we are going to further re-design the GMM and CMixtureModel. Another issue wasto add a Gaussian checkerboard fixture and use it to generate BinaryLabel and MultipleClassLabel data for unit tests. I created that pr a couple of weeks ago and I was struggling with the DynamicArray issue then. Most things I do is to copy and paste the data generator function in SVMOcas_unittest and make it global. Some loop like:

for (index_t i = 0, j = 0; i < data.num_cols; ++i)
    if (i % 2 == 0)
        train_idx[j] = i;
        test_idx[j++] = i;

Which is very hard to read and maintain. Actually, I also spend a lot of time to figure out what exactly it is supposed to do. Therefore, I will refactor it ASAP.

PR in progress:

PR been merged: refactor)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s