The Lede Program: Courses
The Lede Program offers an intensive summer program in data and computation, or a comprehensive two-semester program for students interested in pursuing more advanced work. Each semester consists of four 3-credit courses.
In Summer 2016, Lede students will take the following four courses.
Foundations of Computing
During this introduction to the ins and outs of the Python programming language, students build a foundation upon which their later, more coding-intensive classes will depend. Dirty, real-world data sets will be cleaned, parsed and processed while recreating modern journalistic projects. The course will also touch upon basic visualization and mapping, and how to use public resources such as Google and StackOverflow to build self-reliance.
Focus: Familiarize yourself with the data-driven landscape
Topics & tools include: Python, basic statistical analysis, OpenRefine, CartoDB, pandas, HTML, CSVs, algorithmic story generation, narrative workflow, csvkit, git/GitHub, StackOverflow, data cleaning, command line tools, and more
Data and Databases
Students will become familiar with a variety of data formats and methods for storing, accessing and processing information. Topics covered include comma-separated documents, interaction with web site APIs and JSON, raw-text document dumps, regular expressions, text mining, SQL databases, and more. Students will also tackle less accessible data by building web scrapers and converting difficult-to-use PDFs into useable information.
Focus: Finding and working with data
Topics & tools include: SQL, APIs, CSVs, regular expressions, text mining, PDF processing, pandas, Python, HTML, BeautifulSoup, IPython Notebooks, and more
Machine learning and data science are integral to processing and understanding large data sets. Whether you're clustering schools or crime data, analyzing relationships between people or businesses, or searching for a needle in a hayhack of documents, algorithms can help. Through supervised and unsupervised learning, students will generate leads, create insights, and figure out how to best focus their efforts with large data sets. A critical eye toward applications of algorithms will also be developed, uncovering the pitfalls and biases to look for in your own and others' work.
Focus: Analyzing your data
Topics & tools include: linear regression, clustering, text mining, natural language processing, decision trees, machine learning, scikit-learn, Python, and more
Data Analysis Studio
In this project-driven course, students refine their creative workflow on personal work, from obtaining and cleaning data to final presentation. Data is explored not only as the basis for visualization, but also as a lead-generating foundation, requiring further investigative or research-oriented work. Regular critiques from instructors and visiting professionals are a critical piece of the course.
Focus: Applying your skillset
Topics & tools include: Tableau, web scraping, mapping, CartoDB, GIS/QGIS, data cleaning, documentation, and more
Lede 24 students continue on in the Fall to choose from a selection of courses around Columbia University. Coursework can be divded into two tracks: the prepares them to apply for computer science or other computational graduate degrees, including Columbia's dual Master's in Journalism and Computer Science. The second track gives them enhanced practice in data analysis, using the contexts of the social sciences, digital humanities and data-driven storytelling.
Students interested in advanced computational work or in applying to Columbia's dual-degree program may take:
Essential Data Structures (W3136): A coding course for non-computer science majors who have at least one semester of experience. Basic elements of programming in C and C++ are covered, along with trees, graphs, generic programming and hash tables.
Applied Linear Regression (W4150 or W1211): Develops critical thinking and data analysis skills for regression analysis in science and policy settings. Simple and multiple linear regression, non-linear and logistic models, random-effects models, penalized regression methods. Implementation in a statistical package. Emphasis on real-world examples and on planning, proposing, implementing, and reporting.
Discrete Math (W3203): Topics include logic and formal proofs, sequences and summation, mathematical induction, binomial coefficients, elements of finite probability, recurrence relations, equivalence relations and partial orderings and topics in graph theory.
Computational Linear Algebra (W3251): Topics include computational linear algebra, linear system solutions, sparse linear systems, least squares, eigenvalue problems, and numerical solution of other multivariate problems, as time permits.
Data Analysis Track
Students interested in advancing their skills in the context of applied digital humanities, social sciences, and data-driven storytelling choose from a selection of courses across the university, which may include:
Data Analysis for Social Sciences (QMSS G4015): Explore specific statistical tools used in social sciences research using the R programming environment. Topics covered include multiple regression analysis, statistical data structures, naive Bayesian classifiers, and much more.
Big Data with Python (Envsci BC3050): Use the Python programming to analyze and visualize large environmental and earth's systems data sets in ways that Excel is not equipped to do. This will include both time series and spatial analyses with programming occurring interactively during class and assignments designed to strengthen methods and results.
Digital Activism (SIPA U6203): Explore how people around the world use digital media to promote political and social causes. Through news reports, case studies and research, we’ll see how the Internet has - and has not - changed activism and organizing. Students learn practical computer skills key to sensitive work, including threat awareness, encrypted communication, and anti-censorship tools. In three web scraping sessions, students build a new tool to collect and organize data.