Sunday, December 9, 2018

R syntax

Vectors:
1. All elements of same type. All elements should be atomic, can't be broken down further.
2. c() to create a vector.
3. If you try to create a vector of vectors using c(), both the vectors be deflated into a single vector.
4. Easy to multiply/add an element to the entire vector. Even sin(), log() etc.
5. Aggregating - sum(), product(), mean(),
6. Similarly - operations between vectors of same lengths
7. numeric(6) will instantiate a vector of length 6 with all of them instantiated at 0.
8. operations on vector of different lengths - Recycling - shorter vector will be reused as many times as possible - for e.g. c(1, 2, 3, 4, 5, 6 ) + c(0,1) will give (1,3,3,5,5,7).

Generating sequences:
1:10  will give 1 to 10
10:1 will give 10 to 1
2*1:5 will give 2,4,6,8,10

repeat sequence function - rep() generates complicated seqs
seq() is also useful here

Using conditions for vectors - check each elem - for e.g. numberSeq == 2 will output seq of Logicals.
Similarly 2 vectors can be compared.
----------
applying nchar on str_vec
where str_vec <- c('a', 'bc')
will give (1, 2)
----------
Generate a complex sequence using recycling:
A1,B2,C3,D4,E1,F2,G3,H4

simpleSequence <- 1:4
stringSequence <- c("A","B","C","D","E","F","G","H")
out <- paste(stringSequence, simpleSequence, sep="")
paste() will combine any number of variables into a string
Notice that paste() has taken 2 diff sequences of types numeric and string.
------
[] are use to select elements of a vector, they are indexing operators.
-----
indexing in R starts from 1 not 0.
stringSequnce[-6] will give you all elements except 6th.
> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12

> print(mySeq[-3])
[1]  3  6 12 15
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12

> print(mySeq[-3])
[1]  3  6 12 15

> print(mySeq[rep(c(1,3), times=5)])
 [1] 3 9 3 9 3 9 3 9 3 9
-------------------------
> print(mySeq[c(-1,-3)])
[1]  6 12 15
-------------------
> print(mySeq[c(TRUE, FALSE)])
[1]  3  9 15
-----------
Let's invert the sign of vectors which are not equal to 9:
> mySeq[mySeq != 9] <- -mySeq[mySeq != 9]

> print(mySeq)
[1]  -3  -6   9 -12 -15
-----------------
When you use a logical vector for indexing, if it's not the same length of the original vector, it's recycled.
-----------
Each element in a vector can be given a name:
> names(mySeq) <- c("A","B","C")

> print(mySeq[c("A","C")])
 A  C
-3  9
---------------
Arrays:
Arrays are like vectors in that they can have elements of same type.
They have dimensions.
An Array is a vector with an additional attribute dimensions.
By assigning dimensions to a vector, you can turn a vector into an array.
mySeq <- 3*1:6
myArray <- mySeq
dim(myArray) <- c(2,3)
> print(myArray)
     [,1] [,2] [,3]
[1,]    3    9   15
[2,]    6   12   18

Product of all dimensions equal to number of elements in the array.
When you assign dimensions to a vector, elements are arranged accordingly.

array() function
anotherArray <- array(c(1:12), dim=c(3,2,2))
------------
Solving a set of linear equations using metrices.
--------
Factors:
Answer questions like what are top selling product categories, What are the sales in each city?
City and Category are Categorical variables. They take a limited set of values.
Factor vector is for handling categorical variables.
factor() method.
Internally Factor maps each Level(Value) to an integer.
It's like an ENUM.
-------------
tapply() and table() functions for Aggregating the data, for e.g. sum and group by
----------------
Lists and Data Frames:
---------
List can have any kind of elements.
---------
DataFrame is like a SQL table - row and columns(named) - you can perform Aggregations.
-----------
Regression:
Predict the value of one variable using other variables.
Example:
CAPM - Capital Asset Pricing Model - Find Beta of Google against NasDaq
Multiple linear regression- multiple dependent variables.
Summary: read up more on linear regression. How to determine efficacy/robustness of the model? What is T-stat/F-stat/R-squared/Adjusted R-squared etc.?





Monday, December 3, 2018

git show name and status for a commit

git show --name-status HEAD

R syntax

1. For assignment, safest to use are: <- and ->.

Printing:
2. Using a variable without assigning will print it.
3. print() prints single expression, show() can print plots,graphs,tables etc
4. cat() can print multiple results.
5. paste() can store the result of resulting string unlike cat() which prints it.
6. result of paste() can be printed by message()
7. message() can also be used like print() but it adds a newline
8. message() won't print list indices etc.

Data types:
1. Numeric - all types of numbers
2. class() function tells the datatype
3. is.numeric() is.integer() will give True/False. is.<datatypename>()
4. append L after the number to make it integer. for e.g. 4L
5. typecasting: val <- as.integer(3+5)
6. An integer is also a numeric. So numeric is like super class of integer.
7. Double is synonym of numeric.
8. Character is the datatype for strings. nchar() is like strlen().
9. For dates: DATE
10. For timestamp: POSIXCT
11. Logical - TRUE/FALSE

Data structures:
1. Vector(Default)/Array/DataFrame/Matrix/List
2. List can have different kind of elements unlike Vector.
3. Vector can have only simple datatypes.
4. Array - only same type elements. Arrays have dimensions.
5. Matrix - is a 2D array - has different functions exclusively for math operations.
6. DataFrame - is like a SQL table
7. 

Blog Archive