Introduction and Setup

How to use this course

This course is an adaptation of Doran’s Lab training materials and is designed to provide a concise foundation in some of the most important tools and concepts for data analysis in Python. We think the best way to learn is by applying concepts and working through examples, so each chapter focuses on a few main concepts and skills, instead of trying to cover every command you will ever need. We suggest working through the course like so:

  • Read a chapter for the main ideas, skipping details if you want.
  • Go to the tasks at the end of the chapter, and try to do them.
  • When you get stuck on the tasks, try any one of the following:
    • Go back to the chapter to read in more detail.
    • Compare what you want to do for the task with the sample code for the chapter. Can you copy-paste some of the sample code and use it to help with the task?
    • Look up commands you don’t understand on Google and read their documentation or questions on StackExchange. (Reading this stuff can be hard at first, but don’t worry — it’s a skill you will get better at.)
  • Move on to the next chapter when you want!

Expected background & supplemental materials

This course is designed for people with some familiarity with basic Python and a thirst to learn more about League of Legends data analysis. If you have experience with the following concepts in Python, you should be good to go:

  • Variables
  • Basic string manipulation
  • Lists, dictionaries
  • Conditional statements (if/else), for loops
  • Functions

Setup

We recommend getting Python from the Anaconda distribution, which includes everything you’ll need to get started. You can download it here.

Make sure to get the version for Python 3 (not Python 2). Installing Anaconda will also install Spyder, the development environment you will use to write and execute code and view the preliminary results of your analyses. The course comes with data and sample code, which you can download all at once using this link. We include a bit of code to get you started with loading the data into memory. To get started, open Spyder and use it to open the first script in the folder called Example-code. Press F5 or hit the green triangle “play” button to run the script and confirm you are able to load the data.

Python libraries

Throughout the course, we will rely on three popular Python libraries (a library is a collection of code you can use):

  • Numpy is an optimized library for working with data stored in multi-dimensional arrays and for applying operations across these arrays. The core object of Numpy is the Numpy Array, which can be used to store a vector, a matrix, or a higher-dimensional collection of numbers or other values.
  • Pandas builds on Numpy to provide convenient ways of working with tables of data. Its core object is the Pandas DataFrame. This course is mostly focused on manipulations of Pandas DataFrames.
  • Matplotlib is a data visualization library. In chapter 4, we will cover some of its commonly used plotting commands.

These libraries are part of SciPy, a collection of open-source software for scientific computing. They come preinstalled with the Anaconda distribution, so you don’t need to worry about downloading and installing them. To use them, you just need to include an import statement at the beginning of your code, like so:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

The as syntax allows you to give an abbreviated name to the library. This allows you to write np.sum(my_data) instead of numpy.sum(my_data), for example.

Chapter 1: DataFrames