bmb splus notes for week 1 Feb, 3 Feb Splus ----- Object Oriented Statistical Programming Language - apply procedures to objects - flexible/extensible - interpreted so can be a bit slow (avoid loops at all costs) Main Data Object types ---------------------- Data frame - used to store related data. - matrix like - usually rows are observations and columns are measurements of various variables - can mix different scalar types together. ie numbers, strings, logical values may be mixed together - usually created by reading in data with read.data() command Matrix - stores information all of the same type in a 2 dimesional array - can not mix types in same matrix ie must be all numeric or all logical or .... etc - matrix algebra is supported - created with the matrix() command Vector - one dimensional version of the matrix List - collection of itmes of different types that are stored together because the are some what related. eg a student record .. Scalar - a plain old number, or character string or single logical value Indexing -------- We can index information stored in these data structures in a number of different ways. Assume we have the following datafile which we will read into Splus eg name age height weeight gender Fred 22 185 80 M Bob 35 172 78 M Jane 27 167 63 F Mary 19 163 57 F this can be read in using the following command > data <- read.table("data",header=T) the first parameter is the name of the datafile, the second parameter specifies that the first row of the file contains the column names. To index a single cell we can use data[i,j] to return the jth element of the ith row. for example > data[3,4] returns 63 You can however also index an entire row with data[i,] to get the ith row. so > data[2,] would return Bob 35 172 78 M or index an entire column, so data[,j] would return the jth column of data. so > data[,4] would return weight 80 78 63 57 we can also refer to columns by name so data$gender refers to the column M M F F Even more powerful we may use logical indexing. do for example > data[data$age > 25,] would return name age .... Bob 35 .... Jane 27 .... or >data[data$gender ="m",] would return name ..... Fred ..... Bob ..... Note that all of the above indexing methods work equally well with matrices. We can create a matrix withthe call > matrix(data, nrows, ncols) Vectors have only 1 dimension so we may use only one index. one simple way to create a vector is to use the concatenate function c(). For example > avec <- c(3.1,8.0,5.8,-4.2) > avec[1] returns 3.1. the colon (:) is used as the range operator in format lowerlimit:upperlimit and will in fact return a vector containing the ordered elements (lowerlimit, lowerlimit+1, ..... ,upperlimit) >avec[2:3] would thus return 8.0 5.8 or you could use > avec[c(1,4)] to get 3.1 -4.2 logical operators would also work. note that you can use the range operator (or any other suitable vector) to index both data frames and matrices also. to create a list that is say a student record > alist <- list(name="john", major="Statistics",age=23,courses=c(200,248)) indexing works slightly differently from above. name indexing works as might be expected. so > alist$name would return john or we could use > alist[[1]] to do the same thing note the last item of our list is a vector. this introcudes slightly more complexity to the situation alist$courses or alist[[4]] would return 200 248 to index say the first element of the vector we would use alist$courses[1] or alist[[4]][1] to get 200 Some useful functions for dealing with data objects --------------------------------------------------- dim() - get dimensions names() - column names row.names() - row names length() - lenght of a vector cbind(), rbind() - join dataframes or matrices together by columns or rows nrows(),ncols() - number of rows or columns Operators --------- +,-,*,/ arthemetical <- assignment operator <,>,<=,>= logical comparison == equal to != not equal ! logical not & logical and | logical or Special data items ------------------ F = 0 (false) T = 1 (true) NA missing data marker Matrix operations ----------------- t(X) transpose matrix X solve(X) invert a square matrix X solve(A,B) solve linear system AX = B for X %*% - matrix multiplication