Input/Output in files — pbxplore.io

Deals with writing and reading of files in various formats.

Fasta

pbxplore.io.read_fasta(name)[source]

Read fasta file and output sequences in a list.

Parameters:name (str) – Name of file containing sequences in fasta format.
Returns:
  • header_lst (list) – List of headers (str)
  • sequence_lst (list) – List of sequences (str)
pbxplore.io.read_several_fasta(input_files)[source]

Read several fasta files

Note that each fasta file may contain several sequences.

Parameters:input_files (a list of fasta file paths.) –
Returns:
  • pb_name (a list of the headers)
  • pb_seq (a list of the sequences)
pbxplore.io.write_fasta(outfile, sequences, comments)[source]

Write fasta entries (header + sequence) in an open file

Parameters:
  • name (file descriptor) – The file descriptor to write in. It must allow writing.
  • header_lst (list) – List of headers (str)
  • sequence_lst (list) – List of sequences (str)
pbxplore.io.write_fasta_entry(outfile, sequence, comment, width=60)[source]

Write a fasta entry (header + sequence) in an open file

Parameters:
  • name (file descriptor) – The file descriptor to write in. It must allow writing.
  • sequence (str) – Sequence to format.
  • comment (str) – Comment to make header of sequence.
  • width (int) – The width of a line. FASTA_WIDTH by default.

Results af analyses

pbxplore.io.write_count_matrix(pb_count, outfile, first=1)[source]

Write a PB occurence matrix in a file.

Parameters:
  • pb_count (an occurence matrix as a 2D numpy array.) –
  • outfile (an open file where to write the matrix.) –
  • first (the residue number of the first position.) –
pbxplore.io.write_neq(outfile, neq_array, idx_first_residue=1, residue_min=1, residue_max=None)[source]

Write the Neq matrix in an open file

Parameters:
  • outfile (file descriptor) – The file descriptor to write in. It must allow writing.
  • neq_array (numpy array) – a 1D array containing the neq values.
  • idx_first_residue (int) – the index of the first residue in the array
  • residue_min (int) – the lower bound of residue frame
  • residue_max (int) – the upper bound of residue frame