Sunday, December 9, 2018

R syntax

Vectors:
1. All elements of same type. All elements should be atomic, can't be broken down further.
2. c() to create a vector.
3. If you try to create a vector of vectors using c(), both the vectors be deflated into a single vector.
4. Easy to multiply/add an element to the entire vector. Even sin(), log() etc.
5. Aggregating - sum(), product(), mean(),
6. Similarly - operations between vectors of same lengths
7. numeric(6) will instantiate a vector of length 6 with all of them instantiated at 0.
8. operations on vector of different lengths - Recycling - shorter vector will be reused as many times as possible - for e.g. c(1, 2, 3, 4, 5, 6 ) + c(0,1) will give (1,3,3,5,5,7).

Generating sequences:
1:10  will give 1 to 10
10:1 will give 10 to 1
2*1:5 will give 2,4,6,8,10

repeat sequence function - rep() generates complicated seqs
seq() is also useful here

Using conditions for vectors - check each elem - for e.g. numberSeq == 2 will output seq of Logicals.
Similarly 2 vectors can be compared.
----------
applying nchar on str_vec
where str_vec <- c('a', 'bc')
will give (1, 2)
----------
Generate a complex sequence using recycling:
A1,B2,C3,D4,E1,F2,G3,H4

simpleSequence <- 1:4
stringSequence <- c("A","B","C","D","E","F","G","H")
out <- paste(stringSequence, simpleSequence, sep="")
paste() will combine any number of variables into a string
Notice that paste() has taken 2 diff sequences of types numeric and string.
------
[] are use to select elements of a vector, they are indexing operators.
-----
indexing in R starts from 1 not 0.
stringSequnce[-6] will give you all elements except 6th.
> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12

> print(mySeq[-3])
[1]  3  6 12 15
> source('C:/projects/R/a.r', echo=TRUE)

> mySeq <- 3*1:5

> print(mySeq)
[1]  3  6  9 12 15

> print(mySeq[2:4])
[1]  6  9 12

> print(mySeq[-3])
[1]  3  6 12 15

> print(mySeq[rep(c(1,3), times=5)])
 [1] 3 9 3 9 3 9 3 9 3 9
-------------------------
> print(mySeq[c(-1,-3)])
[1]  6 12 15
-------------------
> print(mySeq[c(TRUE, FALSE)])
[1]  3  9 15
-----------
Let's invert the sign of vectors which are not equal to 9:
> mySeq[mySeq != 9] <- -mySeq[mySeq != 9]

> print(mySeq)
[1]  -3  -6   9 -12 -15
-----------------
When you use a logical vector for indexing, if it's not the same length of the original vector, it's recycled.
-----------
Each element in a vector can be given a name:
> names(mySeq) <- c("A","B","C")

> print(mySeq[c("A","C")])
 A  C
-3  9
---------------
Arrays:
Arrays are like vectors in that they can have elements of same type.
They have dimensions.
An Array is a vector with an additional attribute dimensions.
By assigning dimensions to a vector, you can turn a vector into an array.
mySeq <- 3*1:6
myArray <- mySeq
dim(myArray) <- c(2,3)
> print(myArray)
     [,1] [,2] [,3]
[1,]    3    9   15
[2,]    6   12   18

Product of all dimensions equal to number of elements in the array.
When you assign dimensions to a vector, elements are arranged accordingly.

array() function
anotherArray <- array(c(1:12), dim=c(3,2,2))
------------
Solving a set of linear equations using metrices.
--------
Factors:
Answer questions like what are top selling product categories, What are the sales in each city?
City and Category are Categorical variables. They take a limited set of values.
Factor vector is for handling categorical variables.
factor() method.
Internally Factor maps each Level(Value) to an integer.
It's like an ENUM.
-------------
tapply() and table() functions for Aggregating the data, for e.g. sum and group by
----------------
Lists and Data Frames:
---------
List can have any kind of elements.
---------
DataFrame is like a SQL table - row and columns(named) - you can perform Aggregations.
-----------
Regression:
Predict the value of one variable using other variables.
Example:
CAPM - Capital Asset Pricing Model - Find Beta of Google against NasDaq
Multiple linear regression- multiple dependent variables.
Summary: read up more on linear regression. How to determine efficacy/robustness of the model? What is T-stat/F-stat/R-squared/Adjusted R-squared etc.?





No comments:

Blog Archive