Welcome to pythologist-reader’s documentation!

Modules

class pythologist_reader.CellProjectGeneric(h5path, mode='r')[source]
append_sample(sample)[source]

Append sample to the project

Parameters:sample (CellSampleGeneric) – sample object
cdf

Return the pythologist.CellDataFrame of the project

channel_image_dataframe

dataframe within info about channels and images

frame_iter()[source]

An interator of CellFrameGeneric

get_image(sample_id, frame_id, image_id)[source]

Get an image by sample frame and image id

Parameters:
  • sample_id (str) – unique sample id
  • frame_id (str) – unique frame id
  • image_id (str) – unique image id
Returns:

2d image array

Return type:

numpy.array

get_sample(sample_id)[source]

Get the sample_id

Parameters:sample_id (str) – set the sample id
id

Returns the (str) UUID4 string

key

Get info about the project

microns_per_pixel

Return or set the (float) microns_per_pixel

project_name

Return or set the (str) project_name

qc(*args, **kwargs)[source]
Returns:QC class to do quality checks
Return type:QC
sample_ids

Return the list of sample_ids

sample_iter()[source]

An interator of CellSampleGeneric

set_id(name)[source]

Set the project ID

Parameters:name (str) – project_id
class pythologist_reader.formats.inform.sets.CellProjectInForm(h5path, mode='r')[source]
gates

Get all the gates from the frames / samples in the project

qc(*args, **kwargs)[source]

Returns: QC: QC class to do quality checks

read_path(path, project_name=None, sample_name_index=None, channel_abbreviations=None, verbose=False, require=True, require_score=True, microns_per_pixel=None, **kwargs)[source]

Read in the project folder

Parameters:
  • path (str) – location of the project directory
  • project_name (str) – name of the project
  • sample_name_index (int) – where in the directory chain is the foldername that is the sample name if not set use full path. -1 is last directory
  • channel_abbreviations (dict) – dictionary of shortcuts to translate to simpler channel names
  • verbose (bool) – if true print extra details
  • require (bool) – if true (default), require that channel componenet image be present
  • require_score (bool) – if true (default), require there be a score file in the data
  • microns_per_pixel (float) – conversion factor
class pythologist_reader.formats.inform.custom.CellProjectInFormCustomMask(h5path, mode='r')[source]

Read in a project that has a region set by a custon hand drawn area

Accessed via read_path with the additonal parameters

read_path(*args, **kwargs)[source]

Read in the project folder

Parameters:
  • path (str) – location of the project directory
  • project_name (str) – name of the project
  • sample_name_index (int) – where in the directory chain is the foldername that is the sample name if not set use full path. -1 is last directory
  • channel_abbreviations (dict) – dictionary of shortcuts to translate to simpler channel names
  • verbose (bool) – if true print extra details
  • require (bool) – if true (default), require that channel componenet image be present
  • microns_per_pixel (float) – conversion factor
  • custom_mask_name (str) – the mask name that will end in <maskname>.tif
  • other_mask_name (str) – what you want to call areas not contained in your custom mask
class pythologist_reader.formats.inform.custom.CellProjectInFormLineArea(h5path, mode='r')[source]

Read in a project that has a region set by a custon hand drawn area, and a margin set by a line

Accessed via read_path with the additonal parameters

read_path(*args, **kwargs)[source]

Read in the project folder

Parameters:
  • path (str) – location of the project directory
  • project_name (str) – name of the project
  • sample_name_index (int) – where in the directory chain is the foldername that is the sample name if not set use full path. -1 is last directory
  • channel_abbreviations (dict) – dictionary of shortcuts to translate to simpler channel names
  • verbose (bool) – if true print extra details
  • require (bool) – if true (default), require that channel componenet image be present
  • require_score (bool) – if true (default), require that score be present
  • microns_per_pixel (float) – conversion factor
  • steps (int) – how many pixels out from the hand drawn line to consider the margin
pythologist_reader.formats.inform.immunoprofile.read_InFormImmunoProfileV1(path, save_FOXP3_intermediate_h5=None, save_PD1_PDL1_intermediate_h5=None, channel_abbreviations={'CD8 (Opal 480)': 'CD8', 'Cytokeratin (Opal 690)': 'CYTOKERATIN', 'Foxp3 (Opal 570)': 'FOXP3', 'PD-1 (Opal 620)': 'PD1', 'PD-L1 (Opal 520)': 'PDL1'}, grow_margin_steps=40, microns_per_pixel=0.496, project_name='ImmunoProfileV1', project_id_is_project_name=True, skip_margin=False, auto_fix_phenotypes=True, verbose=False, tempdir=None)[source]

Read the InForm Exports from ImmunoProfile and merge them into a single CellDataFrame.

This read takes place by reading in the two Exports seperately then doing a QC then combination.

If identical segmentation between the exports can be garunteed upstream then this method could be modified to read the data into a single intermediate file.

Structure directories as the folowing input for TEST_READ:

TEST_READ/
├── IP-99-A00001
│   └── INFORM_ANALYSIS
│   ├── FOXP3
│   ├── GIMP
│   └── PD1_PDL1
├── IP-99-A00002
│   └── INFORM_ANALYSIS
│   ├── FOXP3
│   ├── GIMP
│   └── PD1_PDL1
└── IP-99-A00003
└── INFORM_ANALYSIS
├── FOXP3
├── GIMP
└── PD1_PDL1

Or a single sample such as IP-99-A00001:

IP-99-A00001/
└── INFORM_ANALYSIS
├── FOXP3
├── GIMP
└── PD1_PDL1
Parameters:
  • path (str) – location of the ImmunoProfile sample or folder of samples
  • save_FOXP3_intermediate_h5 (str) – path to save the FOXP3 export images as h5. Keep this one if you want to tie the CellDataFrame to the images.
  • save_PD1_PDL1_intermediate_h5 (str) – path to save the PD1_PDL1 export images as h5. Probably do not save this unless you are trying to debug a failed import.
  • channel_abbreviations (dict) – convert stain names to abbreviations
  • grow_margin_steps (int) – number of pixels to grow the margin
  • microns_per_pixel (float) – conversion factor for pixels to microns
  • project_name (str) – name of the project
  • verbose (bool) – if true print extra details
  • skip_margin (bool) – if false (default) read in margin line and define a margin acording to steps. if true, only read a tumor and stroma.
  • auto_fix_phenotypes (bool) – if true (default) automatically try to fill in any missing phenotypes with zero-values. This most commonly happens when there are no CD8’s on an image and thus the image is not phenotyped for them.
  • project_id_is_project_name (bool) – if true (default) make the project_id be the same as your project_name. This will make concatonating sample dataframes simpler.
Returns:

Pass,Fail (tuple of CellData Frames) Pass is the CellDataFrame to use which is based on the FOXP3 intermediate h5 and has PD1 and PDL1 scoring added to it. Fail should be empty if the Exports were equivelent.

class pythologist_reader.formats.inform.immunoprofile.CellProjectInFormImmunoProfile(h5path, mode='r')[source]

Read in an ImmunoProfile sample that could have either a Tumor mask alone, or a Tumor mask and a hand drawn margin, this will read the two projects into a cell project. This will only read in one InForm export at a time.

Accessed via read_path with the additonal parameters

read_path(path, export_name=None, project_name=None, channel_abbreviations=None, verbose=False, require=True, require_score=True, microns_per_pixel=None, steps=40, skip_margin=False, **kwargs)[source]

Read in the project folder

called by read_InFormImmunoProfileV1 see that function for detailed input descriptions

Parameters:
  • path (str) – location of the project directory
  • export_name (str) – specify the name of the export to read (required)
  • project_name (str) – name of the project
  • channel_abbreviations (dict) – dictionary of shortcuts to translate to simpler channel names
  • verbose (bool) – if true print extra details
  • require (bool) – if true (default), require that channel componenet image be present
  • require_score (bool) – if true (default), require there be a score file in the data
  • microns_per_pixel (float) – conversion factor
  • steps (int) – number of pixels to grow the margin
  • skip_margin (bool) – if false (default) read in margin line and define a margin acording to steps. if true, only read a tumor and stroma.
class pythologist_reader.formats.inform.immunoprofile.CellSampleInFormImmunoProfile[source]
read_path(path, sample_name=None, channel_abbreviations=None, verbose=False, require=True, require_score=True, steps=76, skip_margin=False)[source]

Read in the project folder

Parameters:
  • path (str) – location of the project directory
  • project_name (str) – name of the project
  • sample_name_index (int) – where in the directory chain is the foldername that is the sample name if not set use full path. -1 is last directory
  • channel_abbreviations (dict) – dictionary of shortcuts to translate to simpler channel names
  • verbose (bool) – if true print extra details
  • require (bool) – if true (default), require that channel componenet image be present
  • require_score (bool) – if true (default), require that score file be present
  • microns_per_pixel (float) – conversion factor
class pythologist_reader.CellSampleGeneric[source]
cdf

Return the pythologist.CellDataFrame of the sample

frame_ids

Return the list of frame IDs

frame_iter()[source]

An iterator of frames

Returns:CellFrameGeneric
get_frame(frame_id)[source]
Parameters:frame_id (str) – the ID of the frame you want to access
Returns:the cell frame
Return type:CellFrameGeneric
id

Return the UUID4 str

key

Return a pandas.DataFrame of info about the sample

class pythologist_reader.CellFrameGeneric[source]

A generic CellFrameData object

binary_calls()[source]

Return all the binary feature calls (alias)

cdf

Return the pythologist.CellDataFrame of the frame

cell_map()[source]

Return a dataframe of cell ID’s and locations

cell_map_image()[source]

Return a the image of cells by ID’s

edge_map()[source]

Return a dataframe of cells by ID’s of coordinates only on the edge of the cells

edge_map_image()[source]

Return an image of edges of integers by ID

get_channels(all=False)[source]

Return a dataframe of the Channels

Parameters:all (bool) – default False if all is set to true will also include excluded channels (like autofluoresence)
Returns:channel information
Return type:pandas.DataFrame
get_data(table_name)[source]

Get the data table

Parameters:table_name (pandas.DataFrame) – the table you access by name
get_image(image_id)[source]
Parameters:image_id (str) – get the image by this id
Returns:an image representing a 2d array
Return type:numpy.array
get_raw(feature_label, statistic_label, all=False, channel_abbreviation=True)[source]

Get the raw data

Parameters:
  • feature_label (str) – name of the feature
  • statistic_label (str) – name of the statistic to extract
  • all (bool) – default False if True put out everything including excluded channels
  • channel_abbreviation (bool) – default True means use the abbreivations if available
Returns:

the dataframe

Return type:

pandas.DataFrame

id

Returns the project UUID4

interaction_map()[source]
Returns:return a dataframe of which cells are in contact with one another
Return type:pandas.DataFrame
phenotype_calls()[source]

Return all the binary feature calls

processed_image

Returns (numpy.array) of the processed_image

processed_image_id

Returns (str) id of the frame object

segmentation_info()[source]

Return a dataframe with info about segmentation like cell areas and circumferences

set_data(table_name, table)[source]

Set the data table

Parameters:
  • table_name (str) – the table name
  • table (pd.DataFrame) – the input table
set_interaction_map(touch_distance=1)[source]

Measure the cell-cell contact interactions

Parameters:touch_distance (int) – optional default is 1 distance to look away from a cell for another cell
set_processed_image_id(image_id)[source]
Parameters:image_id (str) – set the id of the frame object
set_regions(regions, use_processed_region=True, unset_label='undefined', verbose=False)[source]

Alter the regions in the frame

Parameters:
  • regions (dict) – a dictionary of mutually exclusive region labels and binary masks if a region does not cover all the workable areas then it will be the only label and the unused area will get the ‘unset_label’ as a different region
  • use_processed_region (bool) – default True keep the processed region subtracted
  • unset_label (str) – name of unset regions default (undefined)
shape

Returns the (tuple) shape of the image (rows,columns)

table_names

Return a list of data table names

Indices and tables