With the R Users DC Meetup broadening its topic base to include other statistical programming tools, it seemed only reasonable to write a meta post highlighting some of the best Python tutorials and resources available for data science and statistics. What you don’t know is often the hardest part of picking up a new skill, so hopefully these resources will help make learning Python a little easier. Prepare yourself for code indentation heaven. Python is such an incredible language because it can do practically anything, from high performance scientific computing to web frameworks such as Django or Flask. Python is heavily used at Google so the language must be doing something right. And, similar to R, Python has a fantastic community around it and, luckily for you, this community can write. Don’t just take my word for it, watch the following video to fully understand.
DistributionsPython is available for free from http://www./ and there are two popular versions, 2.7 or 3.x. Which should you choose? I would either go with whatever is currently installed on your system or 2.7. For a better discusion, check out this site. Commercial distributions are also available that have included and tested various useful packages such as the Enthought Python Distribution. This distribution provides a comprehensive, cross-platform environment for scientific computing with the Python programming language. A single-click installer allows immediate access to over 100 libraries and tools. Our open source initiatives include SciPy,NumPy, and the Enthought Tool Suite. Python Developer Tools Getting started with a new programming language often requires getting started with a new tool to use the language, unless you are a hardcore VI, VIM, or EMACS person. Python is no exception and there are a great number of editors or full-blown IDEs to try out: Sublime Text2 - If you have never used it, you should try this editor. “Sublime Text is a sophisticated text editor for code, markup and prose. You’ll love the slick user interface, extraordinary features and amazing performance.” IPython provides a rich architecture for interactive computing with:
NINJA-IDE (free) (from the recursive acronym: “Ninja-IDE Is Not Just Another IDE”), “is a cross-platform integrated development environment (IDE). NINJA-IDE runs on Linux/X11, Mac OS X and Windows desktop operating systems, and allows developers to create applications for several purposes using all the tools and utilities of NINJA-IDE, making the task of writing software easier and more enjoyable.” PyCharm by Jetbrains (not free) – the folks at Jetbrains make great tools and PyCharm is no exception.
Learning Python
Package Management and Installation Once you know a bit about packages, you will start installing them. There is no better ways to get this done than with either the EasyInstall or PIP package managers. It is recommended that you use PIP as it newer and seems to have larger support. For Windows users sometimes it helpful to use the pre-built binaries maintained here: http://www.lfd./~gohlke/pythonlibs/ You will notice that not all packages have been ported to 3.x. This is true of many popular libraries and it is why 2.6 or 2.7 is recommended. Virtualenv – learn it early and use it Package management can be a pain point when working across systems or when deploying larger applications in production environments. For this reason it is HIGHLY RECOMMENDED that you get comfortable with the wonderful virtualenv package. Here is a good intro to virtualenv for ubuntu (for the windows users… well just go install ubuntu) . The basic idea is that each of your projects gets a self-contained python environment which can be shipped to a new machine and carry its Gordian knot of dependencies with it. Python Koans – the zen of python This project is great for those who want to dive right in. It is based on a ruby project which presents the language as a series of failed unit tests. You must edit the source until the unit test passes. It is wonderful and is an introduction to TTD(Test Driven Development) while you learn python. https://github.com/gregmalcolm/python_koans/wiki
Yes, here is an entire book on python for free online or you can upgrade for even more content and videos. And yes, the book is pretty good.
Python’s Execution Model
Python for Numerical and Scientific ComputingNumPy, SciPy, and matplotlib form the basis for scientific computing in Python.
Python for DataPandas is really the Python approximation to R, although most would argue that it isn’t yet as full featured as R. Or, in the words of the website, ”pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.”
The following two tabs change content below.
Sean MurphySenior Scientist and Data Science Consultant at JHU Sean Patrick Murphy, with degrees in math, electrical engineering, and biomedical engineering and an MBA from Oxford, has served as a senior scientist at Johns Hopkins University for over a decade, advises several startups, and provides learning analytics consulting for EverFi. Previously, he served as the Chief Data Scientist at a series A funded health care analytics firm, and the Director of Research at a boutique graduate educational company. He has also cofounded a big data startup and Data Community DC, a 2,000 member organization of data professionals. Find him on LinkedIn, Twitter, and Google+.
|
|