Spreadsheet on Undergraduate Class Sizes at Columbia

To accompany An Investigation of the Facts Behind Columbia’s U.S. News Ranking

Michael Thaddeus
Professor of Mathematics
mt324@columbia.edu
February 2022

The spreadsheet presented here tabulates enrollment data for classes at Columbia University that are likely to have enrolled at least one undergraduate. It includes classes offered in Fall 2019, Fall 2020, and Fall 2021 in subjects covered by the Faculty of Arts & Sciences and the School of Engineering and Applied Science. Classes intended for undergraduates in a few other subjects (Architecture, Business, Physical Education, and Sustainable Development) are included as well. The purpose is to estimate the Undergraduate Class Size figures that would appear in section I-3 of Columbia’s Common Data Set, if it issued one.

Download the spreadsheet here in Excel format (.xlsx, 1.4 MB).

SOME NOTES ON METHODOLOGY

Course section data, including the number, point value, and enrollment of each course section, were gathered online. For Fall 2021 classes, the data were gathered directly from the Columbia University Directory of Classes; enrollment figures were current as of late October, and Directory information remains online as of this writing. For Fall 2019 and Fall 2020 classes, enrollments are end-of-semester figures and were drawn from archived copies of Directory of Classes pages on the Internet Archive.

The courses for which data were gathered included all those in subjects covered by departments in the Faculty of Arts & Sciences (including the School of the Arts, but excepting the School of Professional Studies) and in the School of Engineering and Applied Science, as well as courses in the Core Curriculum. These comprise the vast majority of courses taken by undergraduates at Columbia. Similar information was also gathered on courses offered by other divisions of the University, namely those in Architecture, Business, Physical Education, and Sustainable Development, some of which are intended at least partly for undergraduates.

This information was assembled in spreadsheet form. Following the guidelines of the Common Data Set, the following types of courses were excluded from consideration:

Also excluded were courses offered by Barnard College, which is an independent institution with separate government reporting and a separate U.S. News ranking from Columbia, although it is closely affiliated with Columbia and lists courses in Columbia’s Directory of Classes.

The Common Data Set’s guidelines also stipulate that calculations of undergraduate class size should include only course sections in which “at least one degree-seeking undergraduate student is enrolled for credit.” It is not possible to determine from publicly available data exactly which courses satisfy this condition. Nevertheless, we may confidently assume the following (as stated in the text of the main article).

(1) All but a negligible number of classes offered by Arts & Sciences and by Engineering with numbers in the 1000–4000 ranges have at least one undergraduate enrolled. Courses numbered in the ranges 1000–3000 are described by the University as being “undergraduate courses,” while those in the 4000 range are “geared toward undergraduate students” or “geared toward both undergraduate and graduate students.” (A partial exception is 4000-level courses in Engineering. Unlike Arts & Sciences, Engineering does not offer master’s courses at the 5000 level. Rather, its 4000-level courses are a mix of courses intended for undergraduates, for master’s students, and for both. When such courses are open solely to master’s students, they are often so labeled in the Engineering School Bulletin or the Directory of Classes; these courses have been excluded from consideration.) Consequently, courses at the 1000–4000 levels offered under Arts & Sciences or Engineering were included in the count of undergraduate class sizes, unless they were excluded for one of the reasons already stated.

(2) Only a negligible number of classes offered by Columbia’s professional schools enroll undergraduates (excepting certain classes in Architecture, Business, and Sustainable Development mentioned above). Columbia Law School does not allow undergraduates in any of its courses. Columbia Business School allows undergraduate seniors, but not in its core courses (and its electives, which are open to undergraduates, almost invariably have more than 20 students). Columbia Journalism School allows cross-registration in only a handful of courses, typically five to seven per semester. The School of Architecture, Planning and Preservation does likewise. Consequently, courses offered by Columbia’s professional schools were excluded from the count of undergraduate class sizes, except for certain courses in Architecture, Business, and Sustainable Development.

(3) Some, but not all, of the graduate courses offered by Arts & Sciences and by Engineering have undergraduates enrolled. It is common for undergraduates to take graduate courses, and these courses naturally tend to be much smaller than undergraduate courses. Because graduate courses in Arts & Sciences and in Engineering are few in number compared to undergraduate courses, however, the overall effect of these courses on undergraduate class size, while uncertain, will not prevent a fairly precise estimate from being made. All such courses (including those in the School of the Arts) were therefore kept under consideration, except for courses at the 9000 level, which are almost invariably courses of reading, research, or independent study, and which were therefore excluded.

These considerations are reflected in the format of the spreadsheet. In the records under the Courses tab are listed all courses offered by Arts & Sciences (including the Core Curriculum), Engineering, and Athletics, as well as courses offered by the schools of architecture, business, and international affairs in subjects that are sometimes open to undergraduates.

FORMAT OF THE SPREADSHEET

“Courses” Tab: This tab presents the raw data, with some 17,283 records, each corresponding to a class, that is, a course section, offered in Fall 2019, Fall 2020, or Fall 2021. The fields in each record are as follows.
  1. Subject: The four-letter subject code for the course. A few subject codes include underscores as well as letters. A key to subject codes may be found here.
  2. Division: The unit of Engineering, Arts & Sciences, or Athletics offering the course. Possible values are Engineering, Arts, Core (= Core Curriculum), Humanities, SocSci (= Social Sciences), NatSci (= Natural Sciences), or PhysEd (= Physical Education).
  3. Prefix: One-letter prefix used for internal purposes by the Directory of Classes, e.g. X = Barnard.
  4. Course #: The four-digit number of the course, by which it is commonly known.
  5. Year: Year in which the course section was offered (all classes under consideration were in the Fall semester).
  6. Section #: The section number, which is 1 for a single-section course, and 1 or more for a multi-section course. It may carry a prefix such as H for a hybrid section or V for a video section.1
  7. Call #: A five-digit unique identifier used for internal purposes by the Directory of Classes.
  8. Points: The number of credit points carried by the course.
  9. Enrollment: The number of students enrolled in the course section.
  10. Automatic: A parameter, assigned automatically using an Excel function based on columns H and I, that equals 1 if a course bears no credit, 2 if the enrollment is one student or no students, and 0 otherwise. Classes assigned any value other than 0 are to be excluded from consideration.
  11. Manual: A parameter, assigned manually, that equals:
    1 if the Engineering Bulletin clearly indicates that the course is not open to undergraduates;
    2 if the course is offered by NYU (in a few rare cases);
    4 if the Notes in the Directory of Classes clearly indicate that the course is not open to undergraduates;
    5 if the course is at 9000 level;
    6 if the course is at a level in the professional schools not normally enrolling undergraduates (4000-level in SIPA with a few exceptions; 5000-level in business; 4000-level in architecture)
    7 if the course is offered by Barnard;2
    8 if the course is a point-bearing recitation or laboratory associated with a lecture course, or a course of individual music lessons;
    9 if the course is an internship or a reading or research course, including all courses numbered 3997, 3998, 3999, 4997, 4998, 4999, and 6999 (except CSER 3999 and ANTH 3999);
    0 if none of the above criteria apply. Classes assigned any value other than 0 are to be excluded from consideration.
  12. Overall: A parameter that equals 0 if Automatic = 0 and Manual = 0, and 1 otherwise. Consequently, the classes with 0 in the Overall column are those which are included in the count of undergraduate class sizes. This is the case with 7,242 of the 17,283 records in the spreadsheet. Of these, 4,737 were offered in Fall 2019 or Fall 2020 and are therefore pertinent to the 2022 U.S. News ranking figure.
  13. Notes: Reproduces the Notes field of the Directory of Classes.

Subsequent Tabs: The remaining five tabs on the spreadsheet are labeled with one or more years. Each of these tabs displays a table whose entries are numbers of classes, sorted by size, level, and division, and given in the year or years stated. Since the 2022 U.S. News National Universities ranking considered classes from Fall 2019 and Fall 2020, the relevant tab for U.S. News is the one entitled “2019 + 2020.”

Each of the five tables has the same format. Each row specifies a division (Arts, Core, Engin, Hum, NatSci, PhysEd, SocSci, or Total, the sum over all divisions) and a level (1000, 2000, 3000, 4000, Ugrad, 5000, 6000, 8000, Grad). Here Ugrad denotes 1000- to 4000-level courses, which are chiefly for undergraduates, and Grad denotes 5000- to 8000-level courses, which are chiefly for graduates. As mentioned before, 9000-level courses are almost entirely reading and research courses; they have been excluded from consideration. Likewise, 0000-level courses are invariably non-credit courses and have been excluded. There are no 7000-level courses.

Each column specifies a range of class sizes, as called for by the Common Data Set: 2–9, 10–19, 20–29, 30–39, 40–49, 50–99, 100+, and Total, the sum over all class sizes. The entries in these columns are the number of classes satisfying all the relevant conditions. For instance, in the 2019 tab, the entry in row Hum3000 and column 10–19 is 82, meaning that in Fall 2019, the number of 3000-level Humanities classes having between 10 and 19 students (and not excluded for any of the reasons stated above, such as being a discussion section, a Barnard course, etc.) was exactly 82.

Two additional columns display the percentage of classes in the Total column enrolling under 20 students and the percentage of classes in the Total column enrolling 100 students or more. These figures are rounded to one decimal place.

The figures of relevance to the main article, namely 62.7%, 12.4%, 66.9%, and 10.6%, appear in the tab entitled “2019 + 2020” as the last two entries in the two rows TotalUgrad and GrandTotal.


Footnotes

  1. It is apparent from scrutinizing the data that many H and V sections were not truly distinct sections; rather, they were taught simultaneously with in-person sections. Nevertheless, these sections were left as is, because it was difficult to decide when H and V sections were truly distinct. Since there were several hundred H and V sections, mostly with very few students, overall class sizes would become slightly larger if these sections were systematically combined with the in-person sections that they appear to have accompanied. This might cause the proportion of classes with under 20 students to fall by, perhaps, another percentage point or so.
  2. Unlike Columbia, Barnard issues a Common Data Set. This allows us to make an interesting check on our methods by counting Barnard courses. The class size counts in Barnard's Common Data Set may be compared with class size counts from the spreadsheet, counting credit-bearing classes, enrolling at least 2 students, which carry a Barnard prefix. As the tables below illustrate, this leads to a fairly complete enumeration of Barnard class sizes. It is not perfect, however: about one-sixth of classes are missed, including some in all size ranges. The main reason for this appears to be that some courses, which strictly speaking are Barnard courses, have Columbia prefixes. In some departments (like my own, Mathematics), courses taught by Barnard faculty are listed alongside Columbia courses, carry Columbia prefixes, and enroll substantial numbers of Columbia students. Because they are listed as Columbia courses, it is appropriate to treat these courses as Columbia courses, but, in any case, the total number of such courses is too small to have a significant effect on the Columbia estimate. (Indeed, even if all Barnard courses were included in the Columbia count, the change would still be fairly modest.) If we calculate the percentage of Barnard classes with under 20 students using our spreadsheet figures, we get 73.6%, in exact agreement with the figure reported by Barnard to U.S. News. If we calculate the percentage of Barnard classes with 50 students or more in the same fashion, we get 6.5%, somewhat less than the 8.2% figure reported by Barnard to U.S. News.
Fall 2019 Barnard class sizes from Common Data Set     Fall 2020 Barnard class sizes from Common Data Set
2–9 10–19 20–29 30–39 40–49 50–99 100+ Total
170 292 46 23 17 26 10 584
2–9 10–19 20–29 30–39 40–49 50–99 100+ Total
114 254 46 29 16 33 8 500

Fall 2019 Barnard class sizes from spreadsheet     Fall 2020 Barnard class sizes from spreadsheet
2–9 10–19 20–29 30–39 40–49 50–99 100+ Total
111 225 43 21 18 21 7 446
2–9 10–19 20–29 30–39 40–49 50–99 100+ Total
104 225 65 20 13 26 5 458