Skip to content

ebete/MC_HiC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Contact 4C Project

This directory contains most of the scripts used during my internship at the Hubrecht Institute, Utrecht. This readme contains a quick description of what each script does.

Execute all alignment configurations in config.csv.

Plot the bowtie2 scoring functions.

Convert the 2-mer count generated by Jellyfish to a CSV format.

Write the alignments that pass the length/matches cutoff to a new SAM file.

Convert the samtools depth output to a range-based list of covered regions.

Print statistics of alignments that occur in both SAM files to a CSV format.

Nextflow workflow that consumes raw FASTQ files and creates filtered/digested fragments in FASTA format.

Sums the number of nucleotides in all reads that were used in alignment.

Plot a heatmap from the CSV file generated by fragments_to_matrix.py.

Generate a bowtie2, bwa, and last database from a genome.

Demultiplex a SAM file. This will separate reads based on the source file.

Compute the distance between alignment start/ends to the closest restriction site.

ExtendMap script

Compare the original SAM file with the one created by extendmap.py and print whether ExtendMap improved upon each alignment.

Nextflow pipeline for executing ExtendMap/MergeMap.

Extract all reads that have at least a certain number of alignments (split-reads).

Write the 50 bp tail-ends of reads to a new FASTA file.

Convert a FASTQ to FASTA and perform digestion/filtering.

Get the number of fragments per read from the read_map_freq.py CSV and print it in a matrix format.

Gets the viewpoint region from a reference genome in FASTA format based on the primer positions given.

Writes the coverage of a SAM file to a BED formatted file that can be loaded in IGV.

Split a FASTA file into a FASTA with the aligned and one with the unaligned reads.

Write the MAPQ values of the reads to a CSV format.

Print the aligned positions of a single read.

Count the number of homopolymers in a FASTA file and print it in CSV format.

Script that I used to store a bunch of random plots.

MergeMap script. Generate new fragments from two unmapped ones. All adjoined unmapped fragments will be concatenated to create a single, larger fragment.

Nextflow workflow configuration file for pipeline.nf.

Calculate an alignment score for each alignment that has been normalised based on alignment length. Not really used.

Runs part of the MC-4C filtering steps. Configured using nextflow.config.

Plot statistics about restriction enzymes in a genome.

Plot found interactions in a 2D plot.

Plot the base association rates of different reads compared to the background rates.

Plot the alignment length distribution of unmapped and mapped fragments superimposed.

Plot the MAPQ values distribution of alignments.

Print the unclipped length of all alignments.

Get the alignment positions of the reads and write it to CSV. Read fragments are merged into a single fragment if they are too close together.

Calculate various statistics about the read fragment alignment contiguity (e.g. current and next fragment both mapped i times out of j).

Create a dictionary containing all positions of restriction sites on a reference genome. This will be pickled to a file.

Calculate 2-mer rates of a FASTA file and write it to CSV. Uses Jellyfish.

Create an interaction matrix from a SAM file.

Plot estimates of mismatches in k-mers in Oxford Nanopore reads.

Splits a FASTA file into n FASTA files, randomly distributing the records in a balanced fashion.

Python file containing helper functions.

Highlights all DpnII sites with at most a single point mutation.