essay on programming languages, computer science, information techonlogies and all.

Wednesday, June 26, 2019

Integer factorization and combination in C#

 
     F
R = --- 
     D 
When transmitting data in a certain rate ( R ), clock rate ( F ) has to be divided ( D ). And sometimes, it is desirable to have the rate as integer. Usually, F is given from the transmitting device. And we want to know every possible R.

This asks R to be integer and asks D to divide F without fraction. Then F has to be integer multiple of D. First F has to be factored into prime numbers. Then all the permutation of prime numbers has to be generated. Then each permuation can be D ( or R in here).

Prime numbers of clock can be calculated on the fly but not the point in this article. Assume prime numbers are already known like { 2, 3, 5, 7, 11, ... }. Then factors of R can be enumerated by continuously divide with each primes like below function.

IEnumerable< int> Factors(int n, IEnumerable< int> primes)
{
  foreach (int p in primes) {
    if (p * p > n) break;

    while (n % p == 0) {
      yield return p;
      n /= p;
    }
  }

  if (n > 1) yield return n;
}


Then each factors can be permuted and enumerated so that each dividible product can be calculated.

IEnumerable< int[]> Combination(Tuple< int, int>[] groups )
{
  var product = new int[groups.Count()];
  foreach (var p in RecursivelyCombine(product, groups, 0)) {
    yield return p;
  }
}

IEnumerable< int[]> RecursivelyCombine(int[] products, Tuple< int, int>[] factors, int index)
{
  var factor = factors[index];
  for (int i = 0; i <= factor.Item2; ++i) {
    products[index] = i;
    if (index == factors.Length - 1) {
      yield return products;
    }
    else {
      foreach (var p in EnumerateProduct(products, factors, index + 1)) {
        yield return p;
      }
    }
  }
  yield break;
}


Then here is function that enumerate all possible rates with a given clock F.

IEnumerable< int> Rates(int F) // F = 50
{
  auto factors = Factors( 500, new int [] {  2, 3, 5, 7 } );
  // factors = { 2, 2, 5, 5, 5 }
    
  var groups = factors.GroupBy(f => f).Select(g => new Tuple(g.Key, g.Count())).ToArray();
  // groups = { {2,2}, {5,3} }  <- 2^2 x 5^3

  foreach (int[] pm in Combination(groups)) { // {0,0}, {0,1}, {0,2}, {0,3}, {1,0}, .... , {2,3}

    // pm can be { 1, 2 } which means D = 2^1 x 5^2
    int D = 1;
    for (int i = 0; i < groups.Length; ++i) {
      D *= (int)Math.Pow(groups[i].Item1, pm[i]);
    }

    yield return D;
  }
}

Monday, June 17, 2019

Criticism vs Comparison

"Criticism is constructive, comparison is abuse" from The Rehearsal by Eleanor Catton.


When you have no other measure on the field, you just need to look into the object directly. But if there is any, you then stop looking at it and start to looking for difference only.

I happend to work on a project which rewrites legacy C++ code to C# component based code. And got a lot of feedback of comparison with old software. It goes like this.

'Hey, this new one does not generate output same as old one.'
'But does the new output cause an issue ?'
'No, but it is not what user have seen.'
'What should be seen ?'
'Don't ask me. Just make it same as before'.


Legacy code has its own reason of being put in such a way. And when it unearthed, it is not easy to see why it was shaped in such a way.

Not to burden the follower, when writing code, code should be clean and concise. And should show intention and possibly reasoning behind the code.

Though no matter how crystal-clear the code, better way is to leave a document. And when referred after 10 years, the document should tell what the software should do not what it does.

Monday, June 3, 2019

Bayesian Inference on program crash

One company consulting have been in trouble with abnormal program crash. The log says that it crashed just before accessing I/O board. And some SW engineers believed that the culprit is the I/O board and asked for replacement.

But it turned out that there are other steps before I/O which don't leave any log and caused the crash but left no trace. But when people doesn't see the IO log , they conclude that IO crashed the program so no log left on it. It is a classic case of 'Correleation is not the causation'

Thought about how to find out falut in systematic way and read about Bayesian inference and tried in here. Refer Beysian Inference at wiki

P( Bad IO | Crash ) = P( Crash | Bad IO ) * P( Bad IO ) / P( Crash )
P( Bad IO | Crash ) : The probability of Bad IO when Crash happens
P( Crash | Bad IO ) : The probability of Crash when IO goes nut
P( Bad IO ) : The probability of IO goes nut
P( Crash ) : The probability of crash


Say, there are 100 running of the program, 1 crash happens. And during the 100 running, IO probably accessed 1000 times and goes nut 1 time. If IO goes nut, it will definitely crash. Then,
P( Bad IO | Crash ) = 1 * 0.001 / 0.01 = 0.1
It is telling that the P( Bad IO ) is too small to tell that it is the culprit. One way to increase probability is to change hypothesis like 'Bad IO Writing'. IO writing may happen 500 times during 100 running and can go nut. Then P( Bad IO Writing ) can be 0.002 ( = 1 / 500 ) and the 'Bad IO writing' probability or confidence can be 0.2.

To increase confidence on a hypothesis, the hypothesis has to be very specific or the probability of the hypothesis - P ( Bad IO ) in here - is too small to be meaniningful. The process of refining hypothesis is the trouble shooting, I guess. More specific, more chance of having valid cause.

In the end, it is a keen eye to find out the bug - unsafe reentrant code in multi-threaded environment. This Beysian Inference is not strightforward to quantize - hard to tell what is probabilty of some hypothesis.