What is the best way to learn a skill? An attempt to pick up R
You can’t teach an old dog new tricks.
For the ambitious group of us who still believe that we will study a new language, pick up a new musical instrument, or learn how to surf, there are many reminders that these prospects are quickly subsiding beyond reach as we age. It seems to be an every day occurrence that I struggle to recall a detail about a literary plot, a musical lyric, or a movie cast (of course Google, RapGenius, and IMDB are here to save the day). There are proposals for ways to prevent mental decline: crossword puzzles keep the mind sharp, ginko biloba helps concentration, and abstinence from alcohol improves memory. A $1.3 billion industry has sprung up for brain training games to help aging Americans fight forgetfulness and promote continued perspicacity.
I remain positive that my mind will continue to develop strongly through middle age, and that an old dog can in fact learn new tricks. For me the question isn’t “can I?”, but rather “how should I?” Over a decade of schooling has taught me that hitting the books to work through problemsets is the proper way to learn a subject. But is it the most effective? I hardly remember anything from my 19th Century Romantic Poets college class, except for the wavy-haired cute girl who sat at the front of the room. On the other hand, the poetic verses of the Notorius BIG’s Juicy are sketched into my brain from late nights at bars and lounges.
I’ve recently taken an increased interest in the nerdy field of data analysis. The captivating and popularized stories of smart people like the statisticians in Moneyball and the economists in Freakonomics who use their numerical skills to draw practical and counter-intuitive insights from large sets of information has appeal beyond snoozy textbooks. Hollywood has even gotten in on the act. I’m no mathematician, but it sounds fun and useful to be able to look at numbers and make evidence based statements about traffic, politics, sports, anything really. So I gave myself a side project to learn the basics of R, a software environment/programming language that is the lingua franca for data manipulation. I don’t expect to become fluent in R, and it won’t be listed on my CV, but the exercise is a challenge of learning to test my old dog chops while being immersed in a new area of interest. How should I begin?
First Attempt: Read
I went to my local and Barnes & Noble and found the Programming bookshelf sandwiched between the Sales (Little Red Book of Selling, Secrets of Closing the Sale) and the Networks/Databases (SQL Certified Expert Exam Guide, Using Filemaker Bento) sections. I expected to see a pony-tailed, thick rimmed man in baggy jeans scouring the shelf, but it was just me there. At eye level were books about C#, .NET, Java, Python, and other coding topics. This really was shaping up to be the classic instance of learning a new language. By my feet were a few books about R, including the simply titled Learning R, a 400 page manual that described itself as a step-by-step function guide to data analysis. I appreciated the “step-by-step” part and put the book under my arm as I turned to look for the seated reading area in the store.
The book was chock full of informative and readable instruction. It started off with a general overview of the statistical computing environment and went on to the details about the language including types of data structures like lists and strings, coding techniques such as looping and transforming, and also ways to visualize the data such as plots and graphs. After reading (well OK, skimming) the book I definitely had a better understanding of what R is, but I still couldn’t DO anything. When it comes to languages, books are good for introductions, motivation, and reference, but for the skills to stick I needed a more interactive approach.
Second Attempt: Video Tutorial
“How to program in R” returns about 2,390,000 results on YouTube. That’s even more options than “How to tie a bow tie” (about 165,000 results), a repeated search that I perform every time the situation calls for it. Where do I even begin?
I avoided YouTube altogether and choose a more focused source for online learning: lynda.com. The site provides training video for many subjects, especially those related to technology and business. It also helped that access was free through the Public Library. A quick search helped me find exactly what I was looking for: R Training Tutorials.
The tutorials were much more hands-on than the book and I switched between watching the video and doing the routines on my own. Being immersed in the software development environment and actually clicking on menus and typing in operations definitely gave my cognition something to latch onto. The drawback was that the experience quickly became tedious and boring. The monotonous tone of the narrator and the basic tasks reminded me why I often fell asleep during college classes. The method was more practical, but the motivation was lacking.
Third Attempt: The Competition
Kaggle adds an interesting spin to the learning process. The website is the self-proclaimed “Home of Data Science” and acts as a marketplace between companies with data-related problems a community of data scientists (smart people from various backgrounds) who get paid to solve such problems. The site hosts competitions that any data-minded dork with spare time can enter. Sounds a bit heavy huh? Well they also have a few example competitions that help newbies pick up the concepts of data science and the field’s tools, including R.
It’s most popular sample competition is Titanic: Machine Learning from Disaster. The premise is actually really neat: the goal is to predict who aboard the Titanic (yes the same ship that took Jack’s/Leo’s life in the mid-nineties) survives. The competition provides some basic information about the passengers – name, gender, age, ticket class, survival status, etc. – only tells you the survival status of half of them. Using analysis and tools such as R, you have to provide a response with the list of remaining passengers and a prediction on if they survived or not.
For example, let’s say you want to test part of the hypothesis “women and children survived”. A simple look at the presented data (for the first half of passengers) shows that about 74% of women survived, but only 18% of men did. Therefore a simple response to the competition would be to guess that all of the remaining women survived and the men died. Immediately I have a deluge of ideas and hypotheses I want to test. Were richer people with expensive cabins on top of the ship more likely to survive than those in cheaper steerage? If people have the same last name (assumed relatives) did they share the same fate? You can see how added a mission or a goal encourages the desire to want to experiment more.
How does all this relate back to R? Well the tool is well-suited for testing these hypotheses by plugging in data (the passenger information provided) and applying rules to it (a person likely survives if she is female). There is even a step by step guide to help get started with the Gender test use case. And I think that is they key to successfully learning a new trick, even in old age: find an articulate source of instruction combined with a motivating, interest-building scenario to practice the skill.