Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: option to not bin and not rescale bedgraph tracks #34

Open
jnmaloof opened this issue Nov 6, 2022 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@jnmaloof
Copy link
Contributor

jnmaloof commented Nov 6, 2022

There are times where binning the data in a bedgraph track may not be the correct thing to do. I hardcoded the changes that I needed to make this plot (that I needed...). Other uses might want this use case as well. If of interest I might be able to work this into a more general solution for a pull request (but probably not any time soon... a lot on my plate).

Anyway, thanks again for this package, it works really well and was pretty easy to modify for my needs.

plotsr

@mnshgl0110
Copy link
Member

Hi Julin. You are correct that binning might not be always optimal. If I recall correctly, the reason I had to that was to improve memory requirements. As BEDGRAPH files can have values for each base, it can take a lot of memory without binning. And for users inexperienced in informatics, it can become a challenge to run it locally.

Of course, it would be cool to have a general solution which can handle large volume of data and be more accurate.

@jnmaloof
Copy link
Contributor Author

jnmaloof commented Nov 9, 2022 via email

@fishercera
Copy link

I actually need this for my bed file of already-binned SNPs. @jnmaloof could you possibly share how you hardcoded not scaling the y axis on bed files?

I really need to be able to show that the max 183-count SNPs in a 10k region on Chr1 is a taller peak than the max 12-count SNPs in a 10k region on Chr2

@mnshgl0110 mnshgl0110 added the enhancement New feature or request label Aug 14, 2023
@jnmaloof
Copy link
Contributor Author

Sadly I didn't comment my code, but I believe the relevant changes were in the readbedgraph function in the file func.py. My modified code for that function is at the bottom of this email. Changes either follow a commented line that has the original code, or in some cases the original code is commented at the end of a line.

My fully changed files is available at https://github.com/MaloofLab/Davis_B_napus_assembly_2023/blob/main/HE_Analysis/plotsr/modified_func.py , but we changed a lot of other things as well (custom colors for translocations based on the chromosome of origin, etc).

If you want a working example you can clone the whole repo and then run the code in https://github.com/MaloofLab/Davis_B_napus_assembly_2023/blob/main/HE_Analysis/Julin_plotsr.Rmd (it is an R file that first creates the plotting files and eventually calls the python plotsr)

    # Read input bedgraph file
    def _readbedgraph(self, chrlengths):
        from collections import deque, defaultdict
        import numpy as np
        from math import ceil

        bw = int(self.bw)
        _chrs = set([c for c in chrlengths[0][1].keys()])
        bincnt = defaultdict(deque)
        skipchrs = []
        curchr = ''
        added_chrs = list()
        with open(self.f, 'r') as fin:
            for line in fin:
                line = line.strip().split()
                try:
                    v = float(line[3])
                except ValueError:
                    if len(line) < 4:
                        self.logger.warning("Incomplete information in bedgraph file at line: {}. Skipping it.".format("\t".join(line)))
                        continue
                if line[0] not in _chrs:
                    if line[0] == '#': continue
                    if line[0] == 'track': continue
                    if line[0] not in skipchrs:
                        self.logger.warning("Chromosome in BEDGRAPH is not present in FASTA or not selected for plotting. Skipping it. BED line: {}".format("\t".join(line)))
                        skipchrs.append(line[0])
                    continue
                if curchr == '':
                    curchr = line[0]
                    #binv = np.zeros(ceil(chrlengths[0][1][curchr]/bw), dtype=float)
                    binv = np.full(ceil(chrlengths[0][1][curchr]/bw), np.nan, dtype=float)
                    s = int(line[1])
                    e = int(line[2])
                    if s//bw == e//bw:
                        binv[s//bw] = v #+= (e-s)*v
                    else:
                        binv[s//bw] = v #+= (bw-(s%bw))*v
                        binv[e//bw] = v #+= (e%bw)*v
                elif curchr == line[0]:
                    s = int(line[1])
                    e = int(line[2])
                    if s//bw == e//bw:
                        binv[s//bw] = v #+= (e-s)*v
                    else:
                        binv[s//bw] = v #+= (bw-(s%bw))*v
                        binv[e//bw] = v #+= (e%bw)*v
                else:
                    if line[0] in added_chrs:
                        self.logger.error("BedGraph file: {} is not sorted. For plotting tracks, sorted BedGraph file is required. Exiting.".format(self.f))
                        sys.exit()
                    bins = np.concatenate((np.arange(0, chrlengths[0][1][curchr], bw), np.array([chrlengths[0][1][curchr]])), axis=0)
                    bins = [(bins[i] + bins[i+1])/2 for i in range(len(bins) - 1)]
                    bincnt[curchr] = deque([(bins[i], binv[i]) for i in range(len(bins))])
                    added_chrs.append(curchr)
                    # Set the new chromosome
                    curchr = line[0]
                    #binv = np.zeros(ceil(chrlengths[0][1][curchr]/bw), dtype=float)
                    binv = np.full(ceil(chrlengths[0][1][curchr]/bw), np.nan, dtype=float)
                    s = int(line[1])
                    e = int(line[2])
                    if s//bw == e//bw:
                        binv[s//bw] = v # += (e-s)*v
                    else:
                        binv[s//bw] = v # += (bw-(s%bw))*v
                        binv[e//bw] = v # += (e%bw)*v
        bins = np.concatenate((np.arange(0, chrlengths[0][1][curchr], bw), np.array([chrlengths[0][1][curchr]])), axis=0)
        bins = [(bins[i] + bins[i+1])/2 for i in range(len(bins) - 1)]
        bincnt[curchr] = deque([(bins[i], binv[i]) for i in range(len(bins))])
        ## Scale count values
        # maxv = 0
        # for k, v in bincnt.items():
        #     for r in v:
        #         if r[1] > maxv:
        #             maxv = r[1]
        # for k, v in bincnt.items():
        #     bincnt[k] = deque([(r[0], r[1]/maxv) for r in v])
        self.bincnt = bincnt
        return
    # END

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants