Software Troubles and Troubleshooting: March 2019

Friday, March 29, 2019

outlook calendar events sorted by time

You're very close but you're referencing DateTime incorrectly. The
proper format is {parent}/{child}. These will work:

https://graph.microsoft.com/v1.0/me/events?$orderby=start/dateTime
https://graph.microsoft.com/v1.0/me/events?$orderby=start/dateTime desc

https://stackoverflow.com/questions/47331090/sort-events-by-start-date

amazon linux(centos) clean up /usr/src usr src

package-cleanup --oldkernels --count=1

https://unix.stackexchange.com/questions/233597/how-do-i-safely-delete-old-kernel-versions-in-centos-7

Thursday, March 28, 2019

Metrics DS

RMSE - Root mean squared error
MSE - Mean squared error

MSE and RMSE are similar but behave differently for gradient based methods.
MAE - Mean absolute error

MAE optimal target constant is median - hence handles outliers better.
MSE optimal target constant is mean.

MAPE and MSPE are weighted versions of above.
P is for percentage.
They penalize based on the target absolute value.
For e.g. MSE and MAE will treat error of 1 equally for 9/10 and 999/1000.
But MAPE and MSPE won't. They will penalize more for 9/10 rather than
999/1000 since relative error is higher for 9/10.

Another one is RMSLE - uses logs. It penalizes relative errors and also unbiased towards smaller targets unlike MAPE and MSPE.

Sunday, March 24, 2019

cross validation strategies

leave one out cross validation - used when very few samples are available.
k fold - regular train/validate split(also called hold out) k times.

stratified cross validation

Advantages:
While validating, split the data in such a way that all classes are
represented in both train/validation sets.

Good for small and unbalanced datasets.

Disadvantages:
1. One specific issue that is important across even unbiased or
balanced algorithms, is that they tend not to be able to learn or test
a class that isn't represented at all in a fold, and furthermore even
the case where only one of a class is represented in a fold doesn't
allow generalization to performed resp. evaluated.

2. Also, supervised stratification compromises the technical purity of
the evaluation as the labels of the test data shouldn't affect
training, but in stratification are used in the selection of the
training instances.

tl;dr
Stratification is recommended if very few samples are available. For
large datasets, law of large numbers kicks in i.e. samples in
train/validation data will also be huge and representative of the
actual data distribution.

Sunday, March 3, 2019

Cartesian product code in python php

---php----
$a[0] = ['please', 'could you', ''];
$a[1] = ['create', 'make', ''];
$a[2] = ['a', 'an', ''];
$a[3] = ['quick', 'short',''];

$combined = array();
for($i = 0; $i < count($a); ++$i) {
$combined = combine($combined, $a[$i]);
}

print_r($combined);

function combine($a, $b) {
if(!$a) return $b;
if(!$b) return $a;
$out = array();
for($i = 0; $i < count($a); ++$i) {
for ($j = 0; $j < count($b); ++$j) {
$curr = trim($a[$i].' '.$b[$j]);
$out[] = $curr;
}
}
return $out;
}

---python 1 ---
def combine(a, b):
if not a:
return b
if not b:
return a
out = []
for ai in a:
for bi in b:
out.append((ai + ' ' + bi).strip())
return out

arr = []
arr.append(['please', 'cortana please', ''])
arr.append(['create', 'make', ''])
arr.append(['a', 'an', ''])
arr.append(['quick', 'short',''])
arr.append(['note'])

combined = []
for ci in arr:
combined = combine(combined, ci)
print(len(combined))
for e in combined:
print(e)
--------------------------

Python 2:
def find(all1, ans, i):
if i == len(all1):
print(ans)
return
j = 0
while j < len(all1[i]):
ans.append(all1[i][j])
find(all1, ans, i+1)
ans.pop()
j += 1

all1 = []
all1.append(["hi", "hey"])
all1.append(["there", "here"])
all1.append(["cute", "handsome"])

ans = []
find(all1, ans, 0)