
class dataframe(data=None, index=None, columns=None, dtype=None, copy=None, dtypes=None, nrows=None, **kwargs)[source]#

Bases: DataFrame

An extension of the pandas DataFrame with additional convenience methods for accessing rows and columns and performing other operations, such as adding rows.

  • data (dict/array/dataframe) – the data to use; passed to pd.DataFrame()

  • index (array) – the index to use; passed to pd.DataFrame()

  • columns (list) – column labels (if a dict is supplied, the value sets the dtype)

  • dtype (type) – a dtype for the whole datafrmae; passed to pd.DataFrame()

  • dtypes (list/dict) – alternatively, list of data types to set each column to

  • nrows (int) – the number of arrows to preallocate (default 0)

  • kwargs (dict) – if provided, treat these as data columns

Hint: Run the example below line by line to get a sense of how the dataframe changes.


df = sc.dataframe(cols=['x','y'], data=[[1238,2],[384,5],[666,7]]) # Create data frame
df['x'] # Print out a column
df[0] # Print out a row
df['x',0] # Print out an element
df[0,:] = [123,6]; print(df) # Set values for a whole row
df['y'] = [8,5,0]; print(df) # Set values for a whole column
df['z'] = [14,14,14]; print(df) # Add new column
df.rmcol('z'); print(df) # Remove a column
df.addcol('z', [14,14,14]); print(df) # Alternate way to add new column
df.poprow(1); print(df) # Remove a row
df.append([555,2,14]); print(df) # Append a new row
df.insertrow(1,[556,2,14]); print(df) # Insert a new row
df.sort(); print(df) # Sort by the first column
df.sort('y'); print(df) # Sort by the second column
df.findrow(123) # Return the row starting with value 123
df.rmrow(); print(df) # Remove last row
df.rmrow(555); print(df) # Remove the row starting with element '555'

# Direct setting of data
df = sc.dataframe(a=[1,2,3], b=[4,5,6])

The dataframe can be used for both numeric and non-numeric data.

New in version 2.0.0: subclass pandas DataFrame
New in version 3.0.0: “dtypes” argument; handling of item setting
New in version 3.1.0: use panda’s equality operator by default (unless an exception is raised); new “equal” method; “cat” can be an instance method now



The transpose of the DataFrame.


Access a single value for a row/column label pair.


Dictionary of global attributes of this dataset.


Return a list representing the axes of the DataFrame.


Get columns as a list


The column labels of the DataFrame.


Return the dtypes in the DataFrame.


Indicator whether Series/DataFrame is empty.


Get the properties associated with this pandas object.


Access a single value for a row/column pair by integer position.


Purely integer-location based indexing for selection by position.


The index (row labels) of the DataFrame.


Access a group of rows and columns by label(s) or a boolean array.


Get the number of columns in the dataframe


Return an int representing the number of axes / array dimensions.


Get the number of rows in the dataframe


Return a tuple representing the dimensionality of the DataFrame.


Return an int representing the number of elements in this object.


Returns a Styler object.


Return a Numpy representation of the DataFrame.


property cols#

Get columns as a list


Set dtypes in-place (see df.astype() for the user-facing version)

New in version 3.0.0.

col_index(col=None, *args, die=True)[source]#

Get the index of the column named col.

Similar to df.columns.get_loc(col), and opposite of df.col_name.

  • col (str/list) – the column(s) to get the index of (return 0 if None)

  • args (list) – additional column(s) to get the index of

  • die (bool) – whether to raise an exception if the column could not be found (else, return None)


df = sc.dataframe(dict(a=[1,2,3], b=[4,5,6], c=[7,8,9]))
df.col_index('b') # Returns 1
df.col_index(1) # Returns 1
df.col_index('a', 'c') # Returns [0, 2]

New in version 3.0.0: renamed from “_sanitizecols”; multiple arguments

col_name(col=None, *args, die=True)[source]#

Get the name of the column(s) with index col.

Similar to df.columns[col], and opposite of df.col_index.

Note: This method always looks for named columns first. If col is name of a column, it will return col rather than columns[col]. See example below for more information.

  • col (int/list) – the column(s) to get the index of (return 0 if None)

  • args (list) – additional column(s) to get the index of

  • die (bool) – whether to raise an exception if the column could not be found (else, return None)


df = sc.dataframe(dict(a=[1,2,3], b=[4,5,6], c=[7,8,9]))
df.col_name(1) # Returns 'b'
df.col_name('b') # Returns 'b'
df.col_name(0, 2) # Returns ['a', 'c']

New in version 3.0.0.


Alias to pandas __getitem__ method; rarely used

set(key, value=None)[source]#

Alias to pandas __setitem__ method; rarely used

flexget(cols=None, rows=None, asarray=False, cast=True, default=None)[source]#

More complicated way of getting data from a dataframe. While getting directly by key usually returns the array data directly, this usually returns another dataframe.

  • cols (str/list) – the column(s) to get

  • rows (int/list) – the row(s) to get

  • asarray (bool) – whether to return an array (otherwise, return a dataframe)

  • cast (bool) – attempt to cast to an all-numeric array

  • default (any) – the value to return if the column(s)/row(s) can’t be found


df = sc.dataframe(cols=['x','y','z'],data=[[1238,2,-1],[384,5,-2],[666,7,-3]]) # Create data frame
df.flexget(cols=['x','z'], rows=[0,2])
classmethod equal(*args, equal_nan=True)[source]#

Class method returning boolean true/false equals that allows for more robust equality checks: same type, size, columns, and values. See df.equals() for equivalent instance method.


df1 = sc.dataframe(a=[1, 2, np.nan])
df2 = sc.dataframe(a=[1, 2, 4])

sc.dataframe.equal(df1, df1) # Returns True
sc.dataframe.equal(df1, df1, equal_nan=False) # Returns False
sc.dataframe.equal(df1, df2) # Returns False
sc.dataframe.equal(df1, df1, df2) # Also returns False

New in version 3.1.0.

equals(other, *args, equal_nan=True)[source]#

Try the default equals(), but fall back on the more robust sc.dataframe.equal() if that fails.

New in version 3.1.0.

disp(nrows=None, ncols=None, width=999, precision=4, options=None, **kwargs)[source]#

Flexible display of a dataframe, showing all rows/columns by default.

  • nrows (int) – maximum number of rows to show (default: all)

  • ncols (int) – maximum number of columns to show (default: all)

  • width (int) – maximum screen width (default: 999)

  • precision (int) – number of decimal places to show (default: 4)

  • options (dict) – an optional dictionary of additional options, passed to pd.option_context()

  • kwargs (dict) – also passed to pd.option_context(), with ‘display.’ preprended if needed


df = sc.dataframe(data=np.random.rand(100,10))
df.disp(precision=1, ncols=5, colheader_justify='left')

New in version 2.0.1.

replacedata(newdata=None, newdf=None, reset_index=True, inplace=True)[source]#

Replace data in the dataframe with other data; usually not used directly by the user, but used as part of e.g. df.concat().

  • newdata (array) – replace the dataframe’s data with these data

  • newdf (dataframe) – substitute the current dataframe with this one

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place

New in version 3.0.0: improved dtype handling

appendrow(row, reset_index=True, inplace=True)[source]#

Add row(s) to the end of the dataframe.

See also df.concat() and df.insertrow(). Similar to the pandas operation df.iloc[-1] = ..., but faster and provides additional type checking.

  • value (array) – the row(s) to append

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place

Note: “appendrow” and “concat” are equivalent, except appendrow() defaults to modifying in-place and “concat” defaults to returning a new dataframe.

Warning: modifying dataframes in-place is quite inefficient. For highest performance, construct the data in large chunks and then add to the dataframe all at once, rather than adding row by row.


import sciris as sc
import numpy as np

df = sc.dataframe(dict(
    a = ['foo','bar'],
    b = [1,2],
    c = np.random.rand(2)
df.appendrow(['cat', 3, 0.3])           # Append a list
df.appendrow(dict(a='dog', b=4, c=0.7)) # Append a dict

New in version 3.0.0: renamed “value” to “row”; improved performance

append(row, reset_index=True, inplace=True)[source]#

Alias to appendrow().

Note: pd.DataFrame.append was deprecated in pandas version 2.0; see pandas-dev/pandas#35407 for details. Since this method is implemented using pd.concat(), it does not suffer from the performance problems that append did.

New in version 3.0.0.

insertrow(index=0, value=None, reset_index=True, inplace=True, **kwargs)[source]#

Insert row(s) at the specified location. See also df.concat() and df.appendrow().

  • index (int) – index at which to insert new row(s)

  • value (array) – the row(s) to insert

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place

  • kwargs (dict) – passed to :meth:`df.concat() <dataframe.concat>

Warning: modifying dataframes in-place is quite inefficient. For highest performance, construct the data in large chunks and then add to the dataframe all at once, rather than adding row by row.


import sciris as sc
import numpy as np

df = sc.dataframe(dict(
    a = ['foo','cat'],
    b = [1,3],
    c = np.random.rand(2)
df.insertrow(1, ['bar', 2, 0.2])           # Insert a list
df.insertrow(0, dict(a='rat', b=0, c=0.7)) # Insert a dict

New in version 3.0.0: renamed “row” to “index”

concat(data, *args, columns=None, reset_index=True, inplace=False, dfargs=None, **kwargs)[source]#

Concatenate additional data onto the current dataframe.

Similar to df.appendrow() and df.insertrow(); see also for the equivalent class method.

  • data (dataframe/array) – the data to concatenate

  • *args (dataframe/array) – additional data to concatenate

  • columns (list) – if supplied, columns to go with the data

  • reset_index (bool) – update the index

  • inplace (bool) – whether to append in place

  • dfargs (dict) – arguments passed to construct each dataframe

  • **kwargs (dict) – passed to pd.concat()


arr1 = np.random.rand(6,3)
df2 = sc.dataframe(np.random.rand(4,3))
df3 = df2.concat(arr1)
New in version 2.0.2: “inplace” defaults to False
New in version 3.0.0: improved type handling
classmethod cat(data, *args, dfargs=None, **kwargs)[source]#

Convenience class method for concatenating multiple dataframes. See df.concat() for the equivalent instance method.

  • data (dataframe/array) – the dataframe/data to use as the basis of the new dataframe

  • args (list) – additional dataframes (or object that can be converted to dataframes) to concatenate

  • dfargs (dict) – arguments passed to construct each dataframe

  • kwargs (dict) – passed to df.concat()


arr1 = np.random.rand(6,3)
df2 = pd.DataFrame(np.random.rand(4,3))
df3 =, df2)

New in version 2.0.2.

merge(*args, reset_index=True, inplace=False, **kwargs)[source]#

Alias to pd.merge, except merge in place.

  • reset_index (bool) – update the index

  • inplace (bool) – whether to append in place

  • **kwargs (dict) – passed to pd.concat()

New in version 3.0.0.


df = sc.dataframe(dict(x=[1,2,3], y=[4,5,6]))
df2 = sc.dataframe(dict(x=[1,2,3], z=[9,8,7]))
df.merge(df2, on='x', inplace=True)
property ncols#

Get the number of columns in the dataframe

property nrows#

Get the number of rows in the dataframe

addcol(key=None, value=None, data=None, inplace=True, **kwargs)[source]#

Add new column(s) to the data frame

See also assign(), which is similar, but returns a new dataframe by default.

  • key (str) – the name of the column

  • value (array) – the values for the column

  • data (dict) – alternatively, specify a dictionary of columns to add

  • inplace (bool) – whether to return a new dataframe

  • kwargs (dict) – additional columns to add

NB: a single argument is interpreted as “data”


df = sc.dataframe(dict(x=[1,2,3], y=[4,5,6]))
new_cols = dict(z=[1,2,3], a=[9,8,7])
popcols(col=None, *args, die=True)[source]#

Remove a column or columns from the data frame.

Alias to pop(), except allowing multiple columns to be popped.

  • col (str/list) – the column(s) to be popped

  • args (list) – additional columns to pop

  • die (bool) – whether to raise an exception if a column is not found


df = sc.dataframe(cols=['a','b','c','d'], data=np.random.rand(3,4))
findind(value=None, col=None, closest=False, die=True)[source]#

Find the row index for a given value and column.

See df.findrow() for the equivalent to return the row itself rather than the index of the row. See df.col_index() for the column equivalent.

  • value (any) – the value to look for (default: return last row index)

  • col (str) – the column to look in (default: first)

  • closest (bool) – if true, return the closest match if an exact match is not found

  • die (bool) – whether to raise an exception if the value is not found (otherwise, return None)


df = sc.dataframe(data=[[2016,0.3],[2017,0.5]], columns=['year','val'])
df.findind(2016) # returns 0
df.findind(0.5, 'val') # returns 1
df.findind(2013) # returns None, or exception if die is True
df.findind(2013, closest=True) # returns 0

New in version 3.0.0: renamed from “_rowindex”

poprow(row=-1, returnval=True)[source]#

Remove a row from the data frame.

Alias to drop, except drop by position rather than label, and modify in-place. To pop multiple rows, see meth:df.poprows() <dataframe.poprows>.

  • row (int) – index of the row to pop

  • returnval (bool) – whether to return the row that was popped

To pop a column, see df.pop().

New in version 3.0.0: “key” argument renamed “row”

poprows(inds=-1, value=None, col=None, reset_index=True, inplace=True, **kwargs)[source]#

Remove multiple rows by index or value

To pop a single row, see meth:df.poprow() <dataframe.poprow>.

  • inds (list) – the rows to remove

  • values (list) – alternatively, search for these values to remove; see df.findinds for details

  • col (str) – if removing by value, use this column to find the values

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place

  • kwargs (dict) – passed to df.findinds


df = sc.dataframe(np.random.rand(10,3))

df = sc.dataframe(dict(x=[0,1,2,3,4], y=[2,3,2,7,8]))
df.poprows(value=2, col='y')
enumrows(cols=None, type='objdict')[source]#

Efficiently enumerate the rows of the dataframe

Similar to df.iterrows(), but up to 30x faster since uses tuples instead of pd.Series.

  • cols (list) – the list of columns to include in the enumeration (by default, all)

  • type (str/type) – the output type for each row: options are ‘objdict’ (default), tuple (fastest), list (very fast), dict (pretty fast)


df = sc.dataframe(dict(x=[0,1,2,3,4], y=[2,3,2,7,8], z=[5,5,4,3,2]))
for i,row in df.enumrows(): print(i, row.x+row.y) # Typical use case
for i,row in df.enumrows(type=tuple): print(i, row[0]+row[1]) # Fastest
for i,row in df.enumrows(type=dict): print(i, row['x']+row['y']) # Still fast
for i,(x,y) in df.enumrows(cols=['x', 'y'], type=tuple): print(i, x+y) # Even faster
replacecol(col=None, old=None, new=None)[source]#

Replace all of one value in a column with a new value


Convert dataframe to a dict of columns, optionally specifying certain rows.


row (int/list) – the rows to include

findrow(value=None, col=None, default=None, closest=False, asdict=False, die=False)[source]#

Return a row by searching for a matching value.

See df.findind() for the equivalent to return the index of the row rather than the row itself, and df.findinds() to find multiple row indices.

  • value (any) – the value to look for

  • col (str) – the column to look for this value in

  • default (any) – the value to return if key is not found (overrides die)

  • closest (bool) – whether or not to return the closest row (overrides default and die)

  • asdict (bool) – whether to return results as dict rather than list

  • die (bool) – whether to raise an exception if the value is not found


df = sc.dataframe(cols=['year','val'],data=[[2016,0.3],[2017,0.5], [2018, 0.3]])
df.findrow(2016) # returns array([2016, 0.3], dtype=object)
df.findrow(2013) # returns None, or exception if die is True
df.findrow(2013, closest=True) # returns array([2016, 0.3], dtype=object)
df.findrow(2016, asdict=True) # returns {'year':2016, 'val':0.3}
findinds(value=None, col=None, **kwargs)[source]#

Return the indices of all rows matching the given key in a given column.

  • value (any) – the value to look for

  • col (str) – the column to look in

  • kwargs (dict) – passed to sc.findinds()


df = sc.dataframe(cols=['year','val'],data=[[2016,0.3],[2017,0.5], [2018, 0.3]])
df.findinds(0.3, 'val') # Returns array([0,2])
filterin(inds=None, value=None, col=None, verbose=False, reset_index=True, inplace=False)[source]#

Keep only rows matching a criterion; see also df.filterout()

filterout(inds=None, value=None, col=None, verbose=False, reset_index=True, inplace=False)[source]#

Remove rows matching a criterion (in place); see also df.filterin()

filtercols(cols=None, *args, keep=True, die=True, reset_index=True, inplace=False)[source]#

Filter columns keeping only those specified – note, by default, do not perform in place

  • cols (str/list) – the columns to keep (or remove if keep=False)

  • args (list) – additional columns

  • keep (bool) – whether to keep the named columns (else, remove them)

  • die (bool) – whether to raise an exception if a column is not found

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify in-place


df = sc.dataframe(cols=['a','b','c','d'], data=np.random.rand(3,4))
df2 = df.filtercols('a','b') # Keeps columns 'a' and 'b'
df3 = df.filtercols('a','c', keep=False) # Keeps columns 'b' and 'd'
sortrows(by=None, reverse=False, returninds=False, reset_index=True, inplace=True, **kwargs)[source]#

Sort the dataframe rows in place by the specified column(s).

Similar to df.sort_values(), except defaults to sorting in place, and optionally returns the indices used for sorting (like np.argsort()).

  • col (str or int) – column to sort by (default, first column)

  • reverse (bool) – whether to reverse the sort order (i.e., ascending=False)

  • returninds (bool) – whether to return the indices used to sort instead of the dataframe

  • reset_index (bool) – update the index

  • inplace (bool) – whether to modify the dataframe in-place

  • kwargs (dict) – passed to df.sort_values()

New in version 3.0.0: “inplace” argument; “col” argument renamed “by”

sort(by=None, reverse=False, returninds=False, inplace=True, **kwargs)[source]#

Alias to sortrows().

New in version 3.0.0.

sortcols(sortorder=None, reverse=False, inplace=True)[source]#

Like sortrows(), but change column order (usually in place) instead.

  • sortorder (list) – the list of indices to resort the columns by (if none, then alphabetical)

  • reverse (bool) – whether to reverse the order

  • inplace (bool) – whether to modify the dataframe in-place

New in version 3.0.0: Ensure dtypes are preserved; “inplace” argument; “returninds” argument removed


Convert to a plain pandas dataframe

classmethod read_csv(*args, **kwargs)[source]#

Alias to pd.read_csv <pandas.read_csv(), returning a Sciris dataframe

classmethod read_excel(*args, **kwargs)[source]#

Alias to pd.read_excel <pandas.read_excel(), returning a Sciris dataframe