Bias vs. Variance Tradeoff, Cross-Validation, and Overfitting in Prediction (Part 2)

This is the second part of the two-part video series that discusses the bias vs. variance tradeoff, overfitting, and basic cross-validation.

The R source code used in this video can be found here.

Note: You will want to play this video at 1080p HD and full screen in order to be able to see what is going on.

Bias vs. Variance Tradeoff, Cross-Validation, and Overfitting in Prediction (Part 1)

This video discusses the bias vs. variance tradeoff, overfitting, and basic cross validation. There are two parts. Part 1 (below) discusses the ideas. Part 2 shows an example using regression trees.

A pdf file of the slides used in this video can be obtained here.

How to Update R and Rstudio on Windows

Updated 11/28/2016.

In this post, I provide instructions for updating R and Rstudio on Windows.

Specifically, these instructions are intended to apply to updating R and Rstudio running on a Windows 2012 server instance at Amazon Web Services. I am assuming that the set up and provisioning of this instance are as described in my series of videos AWS Windows Instance Set Up. Such an instance is running Windows 8.

These instructions, however, should work for other Windows set-ups as well. For example, they work on my Windows notebook computer running Windows 7 as well.

The sources of information I used to figure this out are as follows:

Here are the step-by-step instructions:

  1. Shut down R and Rstudio.
  2. It is best to turn off you antivirus software. The update will probably work if you do not, but you may have to re-do things with your antivirus software disabled if it quarantines any files involved in the installation.
  3. Next run the R GUI. Do not run Rstudio. You will probably find a “shortcut” to the R GUI on your desktop. If so, you can click on that. If you do not see a link on your desktop, you will need to find the program.
  4. If you are running the R GUI, it will say RGui in the upper-left corner of the title bar of the outer window:
RguiWindow
  • In Windows 8, you can find the program by pressing the Windows key (so the Start Page appears) and then typing “R” (just the R, not the quotes). There should be links to the R GUI among the search results. If you do not see it there, then you can right click on the Start Page and then click All Apps in the bar that appears at the bottom of the screen.
  • In Windows 7, you can find the program by pressing the Start Key and then clicking on “All Programs.” Next look for the “R” folder and a link to the RGui should be in that folder.
  1. Once you have found the R GUI, run it.
  2. As of updating R on 11/28/2016, the command discussed in this step failed and was unnecessary. So go to the next step. However, if you get an error message about not being able to open the URL and about the connection changing from a secured to an unsecured connection, you might try the following command: In the R Console at the > prompt type:

setInternet2(TRUE)

  1. Next type:

if(!require(installr)) { install.packages("installr"); require(installr) }

Note that the require() function returns FALSE if the package is not available. So the above if() statement installs “installr” if it is not already available and it returns TRUE (hidden) when it is installed.

  1. Then type:

updateR()

Note: On 4/8/2016 when upgrading from R version 3.2.1 to 3.2.4, updateR() failed with an error message that begins with “Error in strsplit(version_with_dots,” and continued with more opaque information. The problem is that the installr package version that you already have installed is too old install the current version of R. The fix is to remove the installr package and reinstall it. You can remove the installr package with the command
remove.packages("installr")
Next stop the RGui with the q() command and then restart it (this is one way to unload the old version package installr so that a new version can be installed). Follow the instructions above to re-start the RGui. Then you can begin again at step 7.

  1. You will then need to answer the questions that are a part of the installation process.

I recommend: (1) Not reading the News; (2) Copying the installed packages; (3) updating the installed packages; (4) copying the Rprofile.

  1. Once the update is complete, exit the RGui by using q().
  2. Run Rstudio
  3. Go to the Help menu and the Check for Updates. If an update is available, you will be taken to the Rstudio download page.
  4. If you need the update, download it, and run the installation file. I think this is straightforward so I won’t give more detailed instructions here.
  5. The last thing you need to do is update your already installed R packages. In RStudio, go to the menu Tools and then click on “Check for package updates…” Again this process is straightforward.
  6. Once you are finished with the installation and updates, turn your antivirus software back on!

The above should take care of updating both R and Rstudio. If you run into problems or have other questions, please comment below. Based on the comments, I will correct any errors or ambiguities in the above instructions.

Python With Spyder 14: If Statements

This is the 14th in a series of videos providing a tutorial on Python 2.7 using Anaconda Python and the Spyder IDE. Click here to go to a “home page” for the video series.

This video introduces the “if” statement in Python. “If” statements are, of course, extremely important in programming languages as they are the most direct way to implement conditional logic. And it is the ability for programs to make “decisions” based on such conditional logic that allows program to perform non-trivial tasks.

The source code used in this video can be found here. You can right click on the link and use “Save As” to save the file.

Note: The source code files are plain text files with a “.txt” extention. You will probably want to change the extensions to “.py” after you download them. If you do so, please be aware that if you have Python installed, the file will become executable, so that it will run if you click on it (accidentally or otherwise).

The video is about 27 minutes long.

PWS14-DSSThumb

Video Index: For Loops

Click on the topics below to jump to that location in the video.

Time Topic
00:00Title slide
00:06Introduction
00:48Comments between three double quotes (“””)
00:58The syntax of the “if” statement in Python
02:00The “elif” statement
02:42The “else” statement
03:18More than one “elif” statement allowed
03:42Good coding practices focus on readable code.
03:59Comparison to the if() function in Excel
05:16The examples of “if” statements in this video build on the GetItems() function from the last video.
05:40List data for the examples: Names of the highest paid CEOs in 2014.
06:15Define the function GetItems() in the console
06:45Example of running GetItems()
07:16Reminder: Python list indices start at 0
08:09Review of how the GetItems() function works.
08:59Need to check for valid function arguments.
10:04Need to check that the first function argument is a list.
10:20Use of the “if” statement to check the argument
10:22Use of isinstance() built-in function to check an objects type.
10:47Logical “not”
10:55isinstance() syntax
11:21Use print statement to print an error message.
11:40Return and empty list.
11:52Check that second argument to GetItems() is a list.
12:36Need to check for valid indices in IndexList argument.
13:15Need to first check that index is an integer.
13:30Use “if” statement and isinstance() to check that index is an interger.
13:50Use “else” statement to print a warning.
14:04Append the value of None to the returned list.
14:37Check that the index is value for the GetItems() argument InputList.
14:46The usual range of indices for a list is 0 to n-1 (n = length of the list).
15:00Negative indices are possible.
15:18Use len() built-in function to same the length of InputList.
15:48Use “if” statement to check the index value.
16:26Use “else” statement to print a warning and return None.
17:00Change the name of the functions GetItems() to GetListItems()
17:33Spyder editor warns of a syntax error.
18:19Run code cell with cntr-Enter to define the function GetListItems() and the test case.
18:46Test list checking for the second argument by using a string.
19:22Test list checking for the first argument using a dictionary.
20:39Test checking for valid index values with an out-of-range positive index
21:21Test negative indices with a valid example.
21:41Test negative index that is out of range.
22:10Test empty list for the first argument.
23:03Test with single integer as the second argument.
24:17Modify the function to handle as single integer index value
26:06Re-run the function definition in the console
26:17Re-run the test case with a single integer index.
26:26Summary and conclusion

Python With Spyder 13: For Loops

This is the 13th in a series of videos providing a tutorial on Python 2.7 using Anaconda Python and the Spyder IDE. Click here to go to a “home page” for the video series.

This video introduces looping or iteration in Python using the “for” statement. Looping, of course, is one of the most important things in programming languages as many problems require iteration over an index or a set.

The source code used in this video can be found here. You can right click on the link and use “Save As” to save the file.

Note: The source code files are plain text files with a “.txt” extention. You will probably want to change the extensions to “.py” after you download them. If you do so, please be aware that if you have Python installed, the file will become executable, so that it will run if you click on it (accidentally or otherwise).

The video is about 34 minutes long.

PWS13-DSSThumb

Video Index: For Loops

Click on the topics below to jump to that location in the video.

Time Topic
00:00Title Slide
00:06Inrtoduction
00:23The idea of program flow control
01:12The syntax of the Python “for” statement
01:33Iterables
01:58The colon (“:”) in the for statement signals a following block of code
02:11The end of the block of code
02:21What the “for” statement does.
02:54Comparison of the Python “for” to other computer languages
04:18Data example: The names of the 200 highest paid CEOs in 2014
04:35CEO name list defined in a code cell
04:44Execution of the code in a cell
05:02Length of a list using built-in function len()
05:18List of CEO first names defined
05:45List of CEO last names defined
06:32First example: get an arbitrary set of list items — the GetItems() function
07:10How the GetItems() example function works.
08:29Create the GetItems() function with “def”
09:11Define and empty list for output
09:25The “for” statement in GetItems()
09:47The “for” statement ends with a colon (“:”)
10:15Summary of the GetItems() code.
10:43Return the output from GetItems()
11:00No error checking done! A bad practice!
11:42Run the code cell with cntrl-enter to define the GetItems() function
11:58Test the GetItems() function.
12:34Summary and review of the GetItems() function.
13:10Final example in this video will iterate over an index.
13:23Iterating over an index is consistent with earlier computer languages.
13:53Final example will compute a correlation.
14:03Example will correlation the length of the CEO first and last names.
14:22Review formulas for the sample correlation.
14:45The formula for the sample mean.
14:54The summation notation from mathematics.
15:25The formula for the sample standard deviation.
16:21The formula for the sample covariance.
17:18The formula for the sample correlation.
18:19Define the function mean() to compute the sample mean.
18:52The “for” loop in the mean() function.
19:18The “+=” notation.
20:12The “/=” notation.
20:53Define the function sd() to compute the sample standard deviation.
21:09Call the mean() function from the sd() function.
21:23The “for” loop in the sd() function.
22:39An index was not used in the mean() or sd() functions.
22:57Stepping through X and Y at the same time using an index.
23:33The range() function
25:30Defining the cor() function to compute the sample correlation.
26:53The “for” loop in the cor() function using the range() function and an index.
27:45Using the index to retrieve values from X and Y (subscripts for X and Y).
28:33Summary: Why the index approach is natural for two lists X and Y.
31:05Compute the length of a string with the len() function.
31:20Computing the length of the first names and last names using the map() function.
32:30Computing the correlation between the first name length and the last name length using cor(), map(), and len().
33:28Conclusion

Python With Spyder 12: Dictionaries

This is the 12th in a series of videos providing a tutorial on Python 2.7 using Anaconda Python and the Spyder IDE. Click here to go to a “home page” for the video series.

This video introduces dictionaries in Python. Dictionaries are Python’s implementation of the idea of key-value pairs.

The source code used in this video can be found here. You can right click on the link and use “Save As” to save the file.

Note: The source code files are plain text files with a “.txt” extention. You will probably want to change the extensions to “.py” after you download them. If you do so, please be aware that if you have Python installed, the file will become executable, so that it will run if you click on it (accidentally or otherwise).

The video is about 31 minutes long.

PWS12-DSSThumb

Next Video: For Loops

Video Index: Dictionaries

Click on the topics below to jump to that location in the video.

Time Topic
00:00Title slide
00:06Introduction
00:44Creating a dictionary using curly brackets {}.
01:01Key-value pairs separated by the colon (:).
02:06Keys and values do not have to be strings.
02:20Keys and values can be any Python object.
02:27Keys must be unique.
02:40Using a duplicate key overwrites earlier value.
03:05Run the Python code to create the dictionary.
02:12Cells of code in the editor (review)
03:38Run the code in a cell using cntrl-Enter.
03:47Make the Python console larger.
04:06Print out the dictionary in the Python console.
04:37Reference the value corresponding to a key.
05:04Execute a line of code using F9.
05:22Assign a new value corresponding to a key in the dictionary.
06:36Assign a value to a new key not already in the dictionary created a new key-value pair.
07:31Dictionaries in Python do not have an order.
07:51Cannot reference dictionary items using an index.
08:31Dictionaries cannot be sorted.
09:04Second way to create a dictionary starting with an empty dictionary and adding key-value pairs.
09:40Create an empty dictionary using curly brackets {}.
09:53Load key-value pairs into the dictionary by assigning values to new keys.
10:30Virtually everything in Python is an object.
10:43Dictionaries are objects.
10:52The class constuctor for dictionaries in Python is dict()
10:57Create an empty dictionary using dict().
11:17As objects, dictionaries have methods.
11:31Two important dictionary methods are keys() and values()
11:47Example using the keys() method.
12:15The type returned by the keys() method is a list.
12:24Example using the values() method.
12:44The keys and values returned by the key() and values() methods are in the same order.
13:36The pop() method.
13:47pop() method example.
14:38The popitem() method.
14:59popitem() method example.
16:06The type of the key-value pair returned by the popitem() method is a tuple.
16:33Test for a key in the dictionary using the “in” keyword.
16:39Clear the console window using cntrl-L.
16:57Example of using “in” for a key that is in the dictionary.
17:37Example of using “in” for a key that is not in the dictionary.
18:16Using the dir() builtin function to list out all the attributes and methods of a dictionary.
18:35Attributes and methods beginning with a double underscore (__) are intended to be private.
19:05Getting information about dictionary methods using internet search.
19:24Python.org is the “official” web site for Python.
19:38Finding dictionary (mapping types) documentation in the “Builtin Types” documentation.
20:09Cannot create copies of dictionaries by assignming the dictionary name.
20:17Dictionary name assignment example.
21:24Example of creating a copy of a dictionary using dict().
22:39Use caution when using assignment with dictionary names.
23:22More complicated dictionary example using objects as values.
24:11Example builds on code from the last video: Python With Spyder 11: Inheritence.
24:16Review objects defined in PythonWithSpyder11.py
24:56Execute object definition code from PythonWithSpyder11.py
26:00Create a dictionary with objects as values.
27:49Print the dictionary (from the example with objects as values).
28:28Accessing an attribute of an object in the dictionary.
29:00“AttributeError when accessing an attribute not defined for an object in the dictionary.
29:26Second example of accessing attributes of objects in the dictionary.
30:04Summary of the dictionary example using objects.
30:44Conclusion

Next Video: For Loops

Recording Video Directly from an IPEVO Ziggi-HD Document Camera Using the VLC Video Player

I just bought ahere.

In any case, recording directly from this document camera is NOT straightforward, in part because of the lack of documentation, and also a bug in the current version of the VLC player.

Here are the instructions. They were quickly put together, mostly for my own purposes, so they are a bit terse. I may make a video of this at some point, in which case I will edit this message. Note: My VLC player is up-to-date (version 2.2.1) and I did all of this on a Windows 7 notebook computer.

  1. Plug in the IPEVO Ziggi document camera to a USB port on your computer.
  2. Start the VLC player.
  3. Go to Media menu and then the Open Capture Device menu item. A dialog will appear.
  4. On this dialog, Capture mode should be set to Direct Show.
  5. Video device name should be IPEVO Ziggi+HD.
  6. Audio device name: I use Categories UncategorizedLeave a comment

Spyder Broken in Anaconda Python (August 2015)

I regularly update my installation of Anaconda Python by running the Anaconda Python console and typing:

conda update anaconda

at the C:\Anaconda> prompt.

When I tried to run the Spyder IDE today, it died with the message:

…./anaconda/lib/python2.7/site-packages/IPython/qt.py:13: ShimWarning: The `IPython.qt` package has been deprecated. You should import from qtconsole instead.
“You should import from qtconsole instead.”, ShimWarning)

Apparently, according to the discussion here, the problem is a compatibility issue the new version 4.0 of iPython. The solution is to downgrade the iPython console to version 3.2.1:

Solution

Here is how you “downgrade” the iPython console to the needed version 3.2.1. At the Anaconda command prompt, enter

conda install ipython=3.2.1

Thus, with the install command of conda, you can control the version by using the “=” suffix as above.

This solved the problem for me! (Wasted quite a lot of my time, however.)

I imagine that this problem will be corrected fairly quickly. By the way, I am using Python 2.7. I don’t know if the problem also occurs in Python 3.

Python With Spyder 11: Objects Part 4 — Inheritance

This is the 11th in a series of videos providing a tutorial on Python 2.7 using Anaconda Python and the Spyder IDE.

This video is the fourth video introducing objects in Python. This video discusses the idea of inheritance.

The source code used in this video can be found here. You can right click on the link and use “Save As” to save the file.

Note: The source code files are plain text files with a “.txt” extention. You will probably want to change the extensions to “.py” after you download them. If you do so, please be aware that if you have Python installed, the file will become executable, so that it will run if you click on it (accidentally or otherwise).

The video is about 26 minutes long.

PWS11-DSSThumb

Next Video: Dictionaries

Video Index: Objects Part 4 — Inheritance

Click on the topics below to jump to that location in the video.

Time Topic
00:00Title Slide
00:26Review example: supermarket products
03:12Define class for milk products
03:23A class with inheritance
03:56Non-dynamic attributes for every instance of a class
04:16Dynamically creating attributes using __init__()
04:41Arguments of __init__() begin with self and include arguments of parent class
05:10Spyder “trick:” split the editor window
06:03Default values for arguments and the value None
08:07Initialization derived from the parent class
09:01Initialization unique to the child class
10:05Using parent class methods
10:59How inheritance works and how names are resolved.
12:08Definition of a second child class inheriting from the Product class
14:48Summary of inheritance so far.
15:18Executing the class (object) definitions
15:34Spyder “trick:” removing the editor window split
15:53Cells in the Spyder editor: The code for each class is in a cell
16:03Execute the code in a cell using cntrl-enter
17:10Create two instances (milk and laudry detergent)
19:58Skipping arguments and using defaults: subsequent variables names must be specified
21:31Spyder: Move console to make it big
21:40Reference values in object attributes
22:25Reference object methods
22:50Error referencing a method
23:10Using dir() to see an object’s attributes and methods
25:20Summary of inheritance in Python

Next Video: Dictionaries

Python With Spyder 10: Objects Part 3 – Private Data and Encapsulation

This is the 10th in a series of videos providing a tutorial on Python 2.7 using Anaconda Python and the Spyder IDE. Click here to go to a “home page” for the video series.

This video is the third of four videos introducing objects in Python. This video discusses private data and encapsulation.

The source code used in this video can be found here. You can right click on the link and use “Save As” to save the file.

Note: The source code files are plain text files with a “.txt” extention. You will probably want to change the extensions to “.py” after you download them. If you do so, please be aware that if you have Python installed, the file will become executable, so that it will run if you click on it (accidentally or otherwise).

The video is about 20 minutes long.

PWS10-DSSThumb

Next Video: Inheritance

Video Index: Private Data and Encapsulation

Click on the topics below to jump to that location in the video.

Time Topic
00:05Introduction
00:41Review of the example
01:22Phone number example of private data
02:10Basic idea of encapsulation
02:42Begin phone number method example
02:58Phone number formats
03:42Private attribute __PhoneNumber
04:53SetPhone() and GetPhone() methods
05:42All methods get self as first argument
07:14“if” statement to test if the phone number is a character string
08:18Use of the “or” keyword in the example
09:37Use of the “in” keyword in the example
10:16Use of the “and” keyword in the example
11:05The line continuation character (“\”)
11:23Use of the “not” keyword in the example
11:45The return statement in the SetPhone() method
11:59Set or create the private attribute __PhoneNumber
13:46Using a “cell” in the editor and cntrl-enter to execute a block of code
14:33Create two instances
14:57Private data cannot be directly accessed
15:15Use the SetPhone() method to set the phone number
16:03Private attribute __PhoneNumber still cannot be directly accessed
16:17Read the phone number with the GetPhone() method
16:45Test the SetPhone() methods with incorrectly formatted phone numbers
18:04Summary of private data in Pythin objects
18:53Summary of Encapsulation
19:12Private methods (functions)

Next Video: Inheritance