drep package¶

Submodules¶

drep.WorkDirectory module¶

This module provides access to the workDirectory

The directory layout:

workDirectory
./data
...../MASH_files/
...../ANIn_files/
...../gANI_files/
...../Clustering_files/
...../checkM/
........./genomes/
........./checkM_outdir/
...../prodigal/
./figures
./data_tables
...../Bdb.csv  # Sequence locations and filenames
...../Mdb.csv  # Raw results of MASH comparisons
...../Ndb.csv  # Raw results of ANIn comparisons
...../Cdb.csv  # Genomes and cluster designations
...../Chdb.csv # CheckM results for Bdb
...../Sdb.csv  # Scoring information
...../Wdb.csv  # Winning genomes
./dereplicated_genomes
./log
...../logger.log
...../cluster_arguments.json

class drep.WorkDirectory.WorkDirectory(location)¶

Bases: object

Object to interact with the workDirectory

Parameters:	location (str) – location to make the workDirectory

firstLevels = ['data', 'figures', 'data_tables', 'dereplicated_genomes', 'log']¶

get_cluster(name)¶

Get the cluster passed in

Parameters:	name – name of the cluster
Returns:	cluster

get_db(name, return_none=True, forPlotting=False)¶

Get database from self.data_tables

Parameters:	name – name of dataframe return_none – if True will return None if database not found; otherwise assert False forPlotting – if True don’t do fancy dType loading; it messes with order of names for dendrograms

get_dir(dir)¶

Get the location of one of the named directory types

Parameters:	dir – Name of directory to find
Returns:	Location of requested directory
Return type:	string

get_loc(what)¶

Get the location of Things

Parameters:	what – string of what to get the location of
Returns:	location of what
Return type:	string

get_primary_linkage()¶: Get the primary linkage cluster

hasDb(db)¶: If db is in the data_tables, return True

import_arguments(loc)¶: Given the location of the log directory, load it

import_clusters(loc)¶: Given the location of the cluster files, load them

import_data_tables(loc)¶: Given the location of the datatables, load them

load_cached()¶: The wrapper to load everything it has into attributes

make_fileStructure()¶: Make the top level file structure

store_db(db, name, overwrite=None)¶

Store a dataframe in the workDirectory

Will make a physical copy in the datatables folder

Parameters:	db – pandas dataframe to store name – name to store it under (will add .csv automatically) overwrite – if True, overwrite if DataFrame with same name already exists

store_special(name, thing)¶

Store special items in the work directory

Parameters:	name – what to store thing – actual thing to store

drep.argumentParser module¶

dRep- parse command-line arguemnts

class drep.argumentParser.SmartFormatter(prog, indent_increment=2, max_help_position=24, width=None)¶: Bases: argparse.ArgumentDefaultsHelpFormatter

drep.argumentParser.VERSION = '3.4.5'¶

drep.argumentParser.parse_args(args)¶

drep.argumentParser.printHelp()¶

drep.argumentParser.version()¶

drep.controller module¶

Controller- takes input from argparse and calls correct modules

class drep.controller.Controller¶

Bases: object

compare_operation(**kwargs)¶

dereplicate_operation(**kwargs)¶

loadDefaultArgs()¶

parseArguments(args)¶: Parse user options and call the correct pipeline

setup_logger(loc)¶: set up logger such that DEBUG goes only to file, rest go to file and console

drep.controller.version()¶

drep.d_adjust module¶

drep.d_adjust.accounce_changes(newWdb, oriWdb)¶

drep.d_adjust.adjust_cluster_wrapper(wd, **kwargs)¶

drep.d_adjust.cluster_type(cluster)¶

drep.d_adjust.d_adjust_wrapper(wd, **kwargs)¶

drep.d_adjust.remove_cluster_wrapper(wd, **kwargs)¶

drep.d_adjust.remove_primary_cluster(Rcluster, wd, **kwargs)¶

drep.d_adjust.remove_secondary_cluster(Rcluster, wd, **kwargs)¶

drep.d_adjust.test_adjust()¶

drep.d_analyze module¶

d_analyze - a subset of drep

Make plots based on de-replication

drep.d_analyze.calc_dist(x1, y1, x2, y2)¶

Return distance from two points

Args: self explainatory

Returns:	distance
Return type:	int

drep.d_analyze.cluster_test_wrapper(wd, **kwargs)¶: DEPRICATED

drep.d_analyze.d_analyze_wrapper(wd, **kwargs)¶

Controller for the dRep analyze operation

Keyword Arguments:
Parameters:	wd – The current workDirectory **kwargs – Command line arguments
	plots – List of plots to make [list of ints, 1-6]
Returns:	Makes some plots

drep.d_analyze.fancy_dendrogram(linkage, names, name2color=False, threshold=False, self_thresh=False)¶: Make a fancy dendrogram

drep.d_analyze.gen_color_dictionary(names, name2cluster)¶

Make the dictionary name2color

Parameters:	names – key in the returned dictionary name2cluster – a dictionary of name to it’s cluster
Returns:	name -> color
Return type:	dict

drep.d_analyze.gen_color_list(names, name2cluster)¶: Make a list of colors the same length as names, based on their cluster

drep.d_analyze.get_highest_self(db, genomes, min=0.0001)¶: Return the highest ANI value resulting from comparing a genome to itself

drep.d_analyze.mash_dendrogram_from_wd(wd, plot_dir=False)¶

From the wd and kwargs, call plot_MASH_dendrogram

Parameters:	wd – WorkDirectory plot_dir (optional) – Location to store figure
Returns:	Shows plot, makes a plot in the plot_dir

drep.d_analyze.normalize(df)¶

Normalize all columns in df to 0-1 except ‘genome’ or ‘location’

Parameters:	df – DataFrame
Returns:	Nomralized
Return type:	DataFrame

drep.d_analyze.plot_ANIn_vs_ANIn_cov(Ndb)¶