Learn Basic R programming in one day

Satya R Programming , , , ,

Learn Basic R programming in one day: R is  a programming language to perform business analytics. It has all the statistics module to perform statistical operation like, linear regression,mean, mode and other basic to complex stats operation. As a result, it makes business analytics simple. R is open source language and it is becoming popular in Data science and Artificial intelligence.

R Software Installation

Install in Ubuntu use below command.

sudo apt-get update
sudo apt-get install r-base

Installing R in window

Download R executable file from site https://cran.r-project.org/  and follow the installation instruction provided in this site.

 

To start R console in window just double click to open it.

Learn R basic programming
Image: R console

To get help from R just type help() on console. To get help on any R function just use it as shown below.

 

This is to get help on ls() function, which list all the R object/variables

help(ls)

 

This is to get help on list.files() function, which list all the files and directory in current R working directory.

help(list.files)

To quit from R console just type q() or quit(). 

Mathematical calculation using R

R is powerful calculator. You can perform lot of basic Maths calculation.

To add two number just type two numbers with addition sign in console and hit enter to get result.

> 35 + 34
[1] 69

To perform multiplication

>6 * 12
[1] 72

Mathematical operations follow BEDMAS order .

> (23 * 10) - 24/6 + 2
[1] 228

Basic Mathematical operations

Maths Operation


Rational Operations

R programming Rational operation

 

Logical Operations

R programming Logical Operation

 

 Variable Declaration

Like any other programming language R also store data into variable. Lets she how we can do in R.

You can assign the variable with data by simply using assignment operator ‘=’ and in order to get/retrieve data just type variable name on console and hit enter.

> RateOfInterest=8.5
> RateOfInterest
[1] 8.5

> Deposit=50000
> Deposit
[1] 50000


> TotalAmount=Deposit + RateOfInterest/100 * Deposit
> TotalAmount
[1] 54250

R Inbuilt function

R has many inbuilt functions, which are vary useful to do analysis. Lets see few inbuilt functions.


> sqrt(49)
[1] 7

> sum(44, 55)
[1] 99
 
> abs(-37)
[1] 37

> round( digit=2, 3.1415)
[1] 3.14

Vectors

Vector are collection of data elements of same elements like string, numbers etc. Let see the example of vectors of numbers. In this I have used concatenate function c(). It combines all the temperature reading taken through out  the day and assigned it to variable.


> TemperatureInDay
[1] 15 18 21 23 24 27 24 21 17 16

To get the all the element from vector just type vector name on console and hit enter(as shown above).

 

There are many more ways to retrieve element from vector.

Get the third element.

> TemperatureInDay[3]
[1] 21

Retrieve all the elements starting from 4th and end at 9th.

> TemperatureInDay[4:9]
[1] 23 24 27 24 21 17

Get 2nd , 5th and 7th elements.

> positions = c(2,5,7)

> TemperatureInDay[positions]
[1] 18 24 24

Retrieve all elements except 7th element.
> TemperatureInDay[-7]
[1] 15 18 21 23 24 27 21 17 16

Alter vector element at specific position. 
> TemperatureInDay
 [1] 15 18 21 23 24 27 24 21 17 16
TemperatureInDay[3]=22
> TemperatureInDay
 [1] 15 18 22 23 24 27 24 21 17 16

You can see above that 3rd element changed from 21 to 22.

Get the length of vector.

> length(TemperatureInDay)
[1] 10

Vector of numbers can be created by three different function. These are c(), rep() & seq(). c() is concatenation function, which we have already discussed.

 

Repetition, rep()

Using rep you can repeat particular numbers. For e.g. create a vector of 2’s repeated twenty times.

> t=rep(2,10) 
> t 
[1] 2 2 2 2 2 2 2 2 2 2

Vector of numbers (3,4) repeated 10 times.

> b=rep(c(3,4),10)  
> b 
[1] 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 

Sequence, seq()

Create a sequence of numbers from 1 to 10 incremented by .75.

> seq(1,10,by=0.75)
[1] 1.00 1.75 2.50 3.25 4.00 4.75 5.50 6.25 7.00 7.75 8.50 9.25
[13] 10.00

 

 Multiplication between Vector’s elements with scalar

In the following example 10 is multiplied with each element of vector.

> TemperatureInDay * 10
[1] 150 180 210 230 240 270 240 210 170 160

Addition of scalar with vector

You can see that scalar value 150 adds to each element of vector.

TemperatureInDay + 150

[1] 165 168 171 173 174 177 174 171 167 166

Likewise, we can do any operation between scalar and vector (Like division subtraction, modulus operation etc).

Operation between Vectors

We can perform calculation between Vectors. In below example we have divided earning.per.month vector with days.in.months to identify earning per day.


> earning.per.month
[1] 30000 33450 3333445 902300 44483838 494949 49494949 250000 4230000 123099484 1093838 29393739
> days.in.months
[1] 31 28 31 30 31 30 31 31 30 31 30 31
> earning.per.month/days.in.months
[1] 967.7419 1194.6429 107530.4839 30076.6667 1434962.5161 16498.3000 1596611.2581 8064.5161 141000.0000 3970951.0968 36461.2667 948185.1290

 

String Vector

You can also store string in vector. In below example “some thing” is store in vector . nchar function shows ten character (note: space is also counted) .

>vector_str="some thing"
>vector_str
[1] "some thing"

> nchar(vector_str)
[1] 10

> vector_var=c("one","two","three","four")
> vector_var
[1] "one" "two" "three" "four" 

You can also store logical operators in vector. See the example below.


> is_it_true=c(TRUE,FALSE,TRUE,TRUE)
> is_it_true
[1] TRUE FALSE TRUE TRUE
 

Empty vector

The vector function create empty vector. It accept two argument, first define the type of value it is storing and the second defines the length. But it contain false or an empty string.

Below code create empty numeric vector vec_num. You can see second argument defined as 5, therefore length is 5 and it display 5 zero.

> vec_num=vector("numeric",5)
> vec_num
[1] 0 0 0 0 0

Below code create empty character vector vec_str. You can see second argument defined as 7, therefore length is 7 and it display  7 empty quotes .

> vec_str=vector("character",7)
> vec_str
[1] "" "" "" "" "" "" ""

 

Below code create empty logical vector vec_logi. You can see second argument defined as 4, therefore length is 4 and it display four FALSE as it is empty.

> vec_logi=vector("logical",4)
> vec_logi
[1] FALSE FALSE FALSE FALSE

 

Naming vector

We can also name vector element using name function.

> num
[1] 22 45 55 34

We can name each element of num vector using names .

> names(num)=c("n1","n2","n3","n4")

Now if you type num, will see names(n1 to n2) associated with each number

> num
n1 n2 n3 n4 
22 45 55 34

 

Matrices

Matrix is a variable which stores data in column and row format. It is a two dimensional data object, contain items with similar datatype. It can be of ‘n’ row with ‘n’ columns.

In order to create matrix variable we will use matrix function which accept first argument as list data, and second and third you can define number of row and number of columns. You can see first argument we have passed numbers by using concatenate function.

> matx = matrix(c(33,44,55,66,22,32,543,3,4,5,6,7), nrow=3,ncol=4)
> matx
     [,1]  [,2]  [,3]  [,4]
[1,]  33    66    543     5
[2,]  44    22      3     6
[3,]  55    32      4     7

Indexing in Matrices

You can retrieve data from particular row of a particular column by using Indexes.

To get number from first row and third column, we will use matx variable with [1,3] index. Here 1 is first row and 3 is third column. Type this to get 543.

> matx[1,3]
[1] 543

Get one to three rows and second to fourth columns.

>matx[1:3,2:4]
     [,1]  [,2] [,3]
[1,]   66   543   5
[2,]   22     3   6
[3,]   32     4   7

In the preceding example you can see, matx[1:3,2:4] does not display 1st column.

 

In the next example, you will see it display all rows from third column.

> matx[,3]
[1] 543 3 4

and the next one display 3rd rows with all columns.

> matx[3,]
[1] 55 32 4 7

Function dim() provides dimension of a matrix. In other word, it give number of rows and number of columns.

> dim(matx)
[1] 3 4

rbind and cbind

You can combine vectors into matrix row-wise or column-wise using rbind() or cbing() respectively. Let see the examples.

> num1=c(1,4,2,3)
> num2=c(55,2,3,1

Below is the example to combine num1 and num2 vectors column wise into a matrix using cbind().

> mat2=cbind(num1,num2)
> mat2
     num1 num2
[1,]    1   55
[2,]    4    2
[3,]    2    3
[4,]    3    1

Below is the example to combine num1 and num2 vectors row wise into a matrix using rbind().

> mat3=rbind(num1,num2)
> mat3
      [,1] [,2] [,3] [,4]
num1     1    4    2    3
num2    55    2    3    1

 

apply

You can do vectorized calculation on matrix  row-wise or columns-wise using apply function. You can perform calculation like sum, mean, mode etc.

> matx
     [,1] [,2] [,3] [,4]
[1,]   33   66   543   5
[2,]   44   22     3   6
[3,]   55   32     4   7

Following code calculate sum of each row. Here 1 means row-wise calculation.


> apply(matx,1,sum)
[1] 647 75 98

Following code calculate mean of each column. Here 2 signify column-wise calculation.

> apply(matx,2,mean)
[1]  44.0000  40.0000 183.3333   6.0000

You can use logical operation in a matrix index to pull out elements if it is logicaly true. Let see the example to understand this. I have 10 temperature readings calculated in first 10 hours. In order to find out the hours which has temperature less than 20 I will use following code.

Ten temperature readings –


> TemperatureInDay
 [1] 15 18 21 23 24 27 24 21 17 16

Logical operation which shows True for the temperature less than 20.

> TemperatureInDay < 20
 [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE


> first.ten.hours=c(1,2,3,4,5,6,7,8,9,10)
> first.ten.hours[TemperatureInDay < 20]
[1]  1  2  9 10

Work space management

There are many R commands, which are used for works pace management. Lets see description of these commands.

List all the variables,  vectors , matrices etc in a work space.

> ls()
[1] "first.ten.hours" "is_it_true" "mat2" "mat3" "matx" "num" "num_col" "num1" 
[9] "num2" "TemperatureInDay" "vec_1" "vec_logi" "vec_num" "vec_str" "vector_str" "vector_var" 

 

Removing a variable from work space.

>rm(vec_1)

 

Get the current working directory.

> getwd()
[1] "C:/Users/satya/Documents"

 

Set working directory.

> setwd("C:/Users/satya/Documents/MyR-WorkSpace")
> getwd()
[1] "C:/Users/satya/Documents/MyR-WorkSpace"

 

List all the files in work space.

> list.files()
[1] "myreport1.txt" "myreport2.txt"

DATA FRAME

Data frame, like matrix, is a two dimensional data object. It store data in rows and columns. But it is different than matrix, as it store mixed data. In other word, it store data with different datatype. One column can be number and another column can be characters. It is equivalent to csv file. Lets see how can we create data frame from data stored in csv file.

> datf=read.csv("sales-profit.csv")

Retrieving data from data frame.

Get first row and second column.

> datf[1,2]
[1] CA-2017-152156

 

Second rows all columns.

>datf[2, ]

 

Get data from sales column

>datf$Sales

 

Display names/header of a data frame.

> names(datf)
[1] "Row.ID" "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode" "Customer.ID" "Customer.Name" "Segment" "Country" "City" 
[11] "State" "Postal.Code" "Region" "Product.ID" "Category" "Profit" "Product.Name" "Sales" 

Head

Retrieve top six (default) rows of data frame.

>head(datf)

Display top 10 rows of a data frame.

>head(datf,n=10)

You can also use head(datf,10)

Tail

Display last six(default) rows of a data frame.

>tail(datf)

Display last four rows of a data frame.

>tail(datf,n=4)

 

LISTS

In R List is collection of variables. It can be list of vector, matrix, simple variable etc.  Below I have created list (lst) from numeric variable, vector, data frame and matrix.

>lst=list(var,vector_var, datf, matx)

Class function tells type of variable it is.

>class(lst)
[1] "list"

> class(lst[1])
[1] "list"

We can check the type of variable used in a list by using class command with list variable and  indexing it in double square bracket [[]]. See the example below.


> class(lst[[1]])
[1] "numeric"
> class(lst[[2]])
[1] "character"
> class(lst[[3]])
[1] "data.frame"
> class(lst[[4]])
[1] "matrix"

Saving work space variables

We can save work space variables in a file by using save() and save.image() functions.

You can save all the work space content in a file using save.image().

> save.image(file= “allfile.RData”)

To save selected variable use save() function. For e.g if I want to save few variables out of all the available variables in current workspace I will use save() function.

> ls()
[1] "datf" "is_it_true" "lst" "matx" "var" "vec_1" "vec_logi" "vec_num" "vec_str" "vector_str" "vector_var"

> few_var=c("matx", "var", "vec_str")

I have saved three variables matx, var and var_str in selected.RData file.

> save(file="selected.RData", list=few_var)

I can load all the variables from the saved file by using load() function.

> load("selected.RData")

Reading and writing file

Reading text files.

>data=read.table(file="data.txt", header=TRUE)
> data
V1 V2 V3 V4
1 33 66 543 5
2 44 22 3 6
3 55 32 4 7

Writing text file in a file.

>write.table(matx,"matrix.txt")
>write.table(matx,"matrix2.txt", row.name=F)

Reading & writing CSV file

We have already seen how we can load csv file using read.csv() function. Let see it again.

>sales=read.csv(file="sales-profit.csv", header=TRUE)

We can write variable data into a csv file using write.csv() function.

> write.csv(matx,"matrix2.csv")

> write.csv(datf,"matrix2.csv", row.names=F)

R programming Loops

While Loop

While loop is very similar to the most of the programming language. Program execution iterate through code statements until while condition become false. You can write while loop code on R console and execute as it is written below.


> x=10
> while(x>1){
+    x=x-1
+    print(x)
+ }

After writing it onto the console just hit enter to get output.

[1] 9
[1] 8
[1] 7
[1] 6
[1] 5
[1] 4
[1] 3
[1] 2
[1] 1

You can store R code in program/script file and name it appropriately with extension as ‘.r’ or ‘.R’. For example “myScript.R”.

In Unix, you can run the R script as shown below.

$ Rscript  myScript.R

In window, right click on script file and choose the R program intalled directory in PC , look for bin\Rscript.exe. Choose this to run. Alternately,  you can open cmd and onto the console run as shown below.

C:\Program Files\R\R-3.3.1\bin>Rscript.exe  "C:\Users\satya\Documents\MyR-WorkSpace\myScript.R"

 

FOR Loop

> num=seq(1,10)
>
> for( i in num ){
+    print(i)
+ }

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
>

 

Conditional Statement

Conditional statement are very import feature in any programming language. In R, it is very much similar to other programming languages. If the condition is true then execute the statement inside the curly braces immediately after condition statement, otherwise execute the else part if the condition is not true.

> i=44
> x=34
> if(i > x){
+   print("i is greater than x")
+ }else{
+   print("i is not greater than x")
+ }
[1] "i is greater than x"

R Function

In R, we can use function() to create function/method. Let see the example below. In this, I have created sum_of_2 function which accept two arguments.


> sum_of_2 = function(x,y){
+   result=x+y
+   return(result)
+ } 

If you execute this function (sum_of_2) by passing two numbers as arguments, it will return desire result(sum of two numbers).

> sum_of_2(10,31)
[1] 41

PLOTTING GRAPH

I will let you know how we can PLOT normal graph in R. To plot graph with X and Y axis I will use plot function. Lets see how we can do that.

Scatterplot

> x=c(1,2,3,5,6,7)
> y=c(1,3,4,5,7,9)
> plot(x,y)

 

R graph plot
R graph plot

 

Instead of small circle we can use line to plot graph. In order to do that I will define type as “l” as a argument to the plot().

>plot(x,y,type="l")

R plot with line


 

Let see more option to plot the graph.

> plot(x,y, main="This is my Graph",xlab="x numbers",ylab="y numbers",col="blue",type="l")
> points(6,1,pch=2,col="yellow")
> points(6,5,pch=2,col="yellow")

Learn Basic R programming in one day

 

You can save the current graph in a pdf file using pdf() function. See the example below.

> pdf("myGraph.pdf")
> plot(x,y)
> graphics.off()

After plotting the graph you have to close the graphics by using graphics.off() function. This PDF file will be saved in you current working directory(i.e. getwd()).

Similarly, you can save plot in a image file by using bmp(), png(), and tiff() functions.

3D PLOT

By default 3D plot packages are not loaded. In order to load that we will use library function. If you try to invoke unloaded function then you will get below error.

> scatterplot3d()
Error: could not find function "scatterplot3d"

To avoid this Load the scatterplot3d package.

> library(scatterplot3d)

In order to display 3D scatterplot, I will create sales data frame from below CSV file, and then will use it with scatterplot3d function.

Sales Quantity Discount Profit
261.96 2 0 41.9136
731.94 3 0 219.582
14.62 2 0 6.8714
957.5775 5 0.45 -383.031
22.368 2 0.2 2.5164
48.86 7 0 14.1694
7.28 4 0 1.9656
907.152 6 0.2 90.7152

(Note: Only 8 rows shown above,  actually CSV file has many thousands rows)

I have created sales data frame.

>sales=read.csv(file="sales-profit.csv", header=TRUE)

Now to create the 3D scatterplot I will use below statement.

> scatterplot3d(sales$Quantity,sales$Sales,sales$Profit)
3 D Scatter plot
3 D Scatter plot

 

Leave a Reply

Your email address will not be published. Required fields are marked *