OP Blog 5: Review

In this blog, I will give my full review and opinion on the Datacamp R Introductory course, the edx-platform and learning online in general. The course itself seems to indeed provide the necessary basics, as far as I’ve followed it. The explanations are clear, there is a person actually talking in the video instead of just a voice, which makes it more personal. The video also provides clear examples, which the lecturer follows. Next to the video is space provided for a transcript of the video. Below the video and transcript are options to download a handout that provides a powerpoint with the content, a textfile with the content, and even download the video itself. All in all, very user friendly and even anticipating users that are blind or deaf.

The exercises are challenging sometimes, but the hint-feature and introduction with each exercise help a lot. The option to skip them and return to them later is also provided, which makes it less stressed. However, it was sometimes difficult to actually be sure if you had finished the exercises or not, as for the exercises, you were linked to the actual Datacamp website. You have to know to actually close the page after the exercises are done and return to the edx-platform, to actually continue.

Another thing I do have to point out, is that the exercises do try to simulate the R environment, but any person that has actually downloaded and opened R knows that there are four sections to use instead of the two shown here. Also, it doesn’t suffice to just hit “Enter” to run a code. While you do get a full resume on how to construct code in R, an actual module that explains in depth how R works might prove in handy. Of course, this could possibly already be included in the intermediate course, but I do think it has a place here, as I can imagine people would be hyped to actually use their experience earned in this course to test out “the real deal”, so to speak.

As for my experience with learning online, I do think that for subjects such as these, closely linked to computer and data, it does provide a fitting platform. I did have some problems with asking questions, as I am quite shy and wary on the internet of taking with people I do not know. Other sites that provided their own reviews and feedback on parts of this course in written format have proven helpful in this regard however. I haven’t tried to actually follow a course on a more humanities-based subject, but in light of my studies on Digital Humanities it would be an interesting project to compare the two and see if they are both effective on an educational level. I would encourage everyone to at least take a look at the edx-platform, as it offers a multitude of interesting courses, and for a small fee also provides you with an actual valid certificate of you passing a course.

OP Blog 4: Matrices

In this part of the course, the subject is about matrics in R. The matrix is kind of related to the vector. Where a vector is a sequence of data elements, which is one-dimensional, a matrix is a similar collection of data elements, but this time arranged into a fixed number of rows and columns. Because of this, it is called two-dimensional. As with the vector, the matrix can contain only one atomic vector type. To build a matrix, you use the matrix() function. Most importantly, it needs a vector, containing the values you want to place in the matrix, and at least one matrix dimension, as shown in the first image.

Apart from the `matrix()` function, there’s yet another easy way to create matrices that is more intuitive in some cases: Paste vectors together using the `cbind()` and `rbind()` functions. Have a look at the second image to see how this is done. These functions can come in handy, because they’re often more easy to use than the `matrix()` function. These can easily be used to paste another row or another column to an already existing matrix. Next up is naming the matrix. In the case of vectors, the names() function is used, but in the case of matrices, you can assign names to both columns and rows. R came up with the rownames() and colnames() functions for this . Their use is pretty straightforward, and can be seen in the third picture below.

As matrices are just an extension of vectors, they can also contain only a single atomic vector type. If you try to store different types in a matrix, coercion automatically takes place, just as with vectors. And again drawing parallels with vectors, subsetting is also possible using square brackets in matrices. The two-dimensional nature of matrices complicates this, in the way that you will have to specify both the row and the column to get one specific element. The first index typically refers to the row, the second to the column. See the fourth image for illustration of this.

When selecting a single row in a matrix, you get a vector as a result, as a matrix is not necessary anymore to store this information. If you want to select from the beginning or to the end of a position, you can also just leave the last element out. For example [3, ] gets everything from the third until the last element, and [ ,3] starts at the beginning and ends at the third element. You can also select over rows and columns, using a combination of two [] pairs. (See the fifth image).

For arithmetic using matrices, a few things are also useful to learn, for example the `colSums()` and `rowSums()` functions. The `colSums()` function, for example, takes the sum of each column, and stored the result in a vector. Apart from these matrix-specifc math functions, you can also do standard arithmetic with other functions that can also apply to vectors. This is because R can perform recycling, in which, even when the subtractor has less components that the thing it is subtracted from, R realizes this and repeats the subtraction to the other parts.

For matrices: R realizes that the dimensions of the matrix and the vector don’t match. Therefore, the vector is extended to a matrix of the same size, and is filled up with the vector elements column by column. It is easy to take this for granted however, so you should be always aware of how this is actually happening. Multiplication then, is done element-wise and follows the same principle: multiplicating one matrix with another results in another matrix.

In the end, it is clear that both vectors and matrices are very similar: they simply are data structures that can store elements of the same type. The vector does this in a one-dimensional sequence, while the matrix uses a two-dimensional grid structure. Both of them perform coercion when you want to store elements of different types and both of them perform recycling when necessary. Similarly, vector and matrix arithmetic are straightforward: all calculations are performed element-wise. Take a look at the pictures below to view some examples.

I have less problems with the exercises now, as I’ve started to actually look back more and more to the videos and find other examples on the internet. Stay tuned for my last blogpost to read my opinion on this R adventure.

OP Blog 3: Vectors

In this module on R, the element of vectors is introduced. A vector is a sequence of elements of the same data type. It is created using the c() function. You can store a vector into a variable, for example: “numbers <– c(12, 13, 14)”. You can even use the names() function to name the elements in your vector (see the first image for examples of this). To keep track of it all, the str() function is useful to actually view the structure and the attributes of each vector and variable. Also interesting is that when you really think about it, all variables with only one element, are also vectors containing one element. Checking the length of a vector is very easily done by using the length([name of vector]) function. It is important to keep in mind that a vector can only hold elements of the same type, in comparison to lists, which can hold different data types.

The next section in this module talks about making calculations with vectors containing more than one element. Simple calculations such as multiplication, subtraction,… can easily be done by calling them in combination with the name of the vector. The operation then counts for all the elements in the vector. You can also make calculations using two vectors of the same length, as is shown in the second image. The sum() function calculates the total of all the elements in the vector. You can also compare vectors using operators such as “<”, “>” and “=”. The result will be “TRUE” or “FALSE”.

The last section of this module talks about vector subsetting. As the name reveals, it basically comes down to selecting parts of your vector to end up with a new vector, which is a subset of the original vector. To get to one of the elements in a vector, you simply use the name of the vector, followed by the number of its position in square brackets. For example: vector[1] to get to the first element. If the element has a name, it is also kept. You can also use the names instead of the index [1] to get to the element. To select more elements and put them in another vector, you can use a vector too. For example: b <–a[c(1,5)] takes the first five elements of vector “a” and stores them in vector “b”. The same can also be done with named vectors. To delete one element of the vector you can use [-1] to delete the first element of the vector.

The exercises in this module were a lot tougher. The learning curve is quite steep sometimes, and as I am not the most logical thinker, I did have some difficulties with the last exercises. I am still enjoying it, but find that some additional hints might not go amiss.

OP Blog 2: The Basics

As always with anything new to learn, it is best to start with the basics. In this module, R is given some background and the basic tools of R are discussed. R was developed by Ross Ihaka and Robert Gentleman at the University of Auckland in the nineties. It is considered an open source implementation of the S language, which was developed by John Chambers in the Bell Laboratories in the eighties. R is that it is highly extensible. Because of this and more importantly because R is open source, it actually was the vehicle to bring the power of S to a larger community.

Advantages to R are as follows. Firstly, it’s open source, so it’s free and does not ask for any payment, a bonus in every research. Next, R’s graphical capabilities are easy to build publication quality plots with. In comparison to many other statistical software packages, R uses a command line interface, which means that you have to actually code things in your console and in scripts. This seemed like a turn-off to me at first, but the module states that it makes your work reproducible and easy to share with others. Moreover, it seems to be fairly easy to create R packages, extensions of R, aimed at solving particular problems.

Next up is an introduction of the basic tools and components R provides. One of the most important components of R, and where most of the action happens, is the R console. It’s a place where all R commands are executed. You simply type something at the prompt in the console, hit Enter, and R interprets and executes your command. Another one of the most important concepts in R is the variable. A variable allows you to store a value or an object in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable. Using commands such as “r

Through assigning variables in the R console, an R workspace is created. Accessing the objects in the workspace is possible with the ls() function, showing list of all the variables you have created in the R session. Printing a non-existing variable naturally results in an error. Variables also have their uses for the reproductivity R is lauded for. Through simply changing some of the values of the variables, you can keep using a script again and again, to get different results. Sometimes, to clarify your code, comments could be useful. These are always preceded by a “#” sign. To clean up your workspace the rm() function is used.

Next up, the function class() is introduced, showing what type a variable is (numerical, logical,…) For example, “TRUE” is a logical variable. Other kinds of variables, such as integers and character strings are also introduced here, together with the concept of “coercion”, in which you can change one variable type to another. This however, only works in some cases, as you cannot change a character string to a numeric.

Below you can find some images of the codes I wrote and added to in the labs/exercise session. For now, I find this a refreshing change of pace compared to classroom-learning. It is sometimes a bit difficult to keep up with, but you can always go back to the videos, that also provide a handout of the material and even have the speech of the instructor written next to the video.

OP Blog 1: What is R?

For my course on Online Publishing, it was asked that I write a blog on learning a digital skill or tool via online tutorials. I choose to follow an online course on the edX platform that would provide me with some basic knowledge on R/Rstudio. This online course is provided by Datacamp, an online data science school, that together with Microsoft, has created a great introduction-to-R course.

You might wonder, what is R, and why learn it?
R is the lingua franca of data science, used by millions of data experts around the globe to map marketing trends, model financial data, analyse research data and so much more. It is literally used by professionals in every industry for both small and big data applications. And the best thing is, R is free. R is an open source statistical programming language supported by a great community that develops new tools every day. Check out the website for more information: https://www.rstudio.com/products/rstudio/

The course states that the best way to learn R is by doing. And that seems to really be the focus of this course: using an interactive learning interface, each module start with a short video lesson, and then moves into the interactive coding environments, with fun challenges on how to use R for statistics, visualizations, and much more.

In this blog series, I will be moving through a few of the modules of this course, learning R, providing information that I learn and reviewing the modules on their efficiency as I go. I hope this might prove useful for any aspiring or beginning R-using researchers, or other people that are interested.