Feature request: option to not bin and not rescale bedgraph tracks #34

jnmaloof · 2022-11-06T00:57:45Z

There are times where binning the data in a bedgraph track may not be the correct thing to do. I hardcoded the changes that I needed to make this plot (that I needed...). Other uses might want this use case as well. If of interest I might be able to work this into a more general solution for a pull request (but probably not any time soon... a lot on my plate).

Anyway, thanks again for this package, it works really well and was pretty easy to modify for my needs.

mnshgl0110 · 2022-11-07T10:52:03Z

Hi Julin. You are correct that binning might not be always optimal. If I recall correctly, the reason I had to that was to improve memory requirements. As BEDGRAPH files can have values for each base, it can take a lot of memory without binning. And for users inexperienced in informatics, it can become a challenge to run it locally.

Of course, it would be cool to have a general solution which can handle large volume of data and be more accurate.

jnmaloof · 2022-11-09T05:56:28Z

Thanks, that makes sense. Although in some cases averaging instead of binning might be better. I understand that it is trickier than I originally thought because the bedgraph file may or may not have windows already, and those may or may not correspond to what plotsr is doing. Maybe it would be possible to have an "as-is" option for the bedgraph file with a warning about the memory. Anyway I'll think about it and see if I can come up with something generalizable.

…

On Mon, Nov 7, 2022 at 2:52 AM Manish Goel ***@***.***> wrote: Hi Julin. You are correct that binning might not be always optimal. If I recall correctly, the reason I had to that was to improve memory requirements. As BEDGRAPH files can have values for each base, it can take a lot of memory without binning. And for users inexperienced in informatics, it can become a challenge to run it locally. Of course, it would be cool to have a general solution which can handle large volume of data and be more accurate. — Reply to this email directly, view it on GitHub <#34 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMB7GCFPOO64WX2GAN7IO3WHDNN5ANCNFSM6AAAAAARYFRNBQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

fishercera · 2023-08-12T18:24:17Z

I actually need this for my bed file of already-binned SNPs. @jnmaloof could you possibly share how you hardcoded not scaling the y axis on bed files?

I really need to be able to show that the max 183-count SNPs in a 10k region on Chr1 is a taller peak than the max 12-count SNPs in a 10k region on Chr2

jnmaloof · 2023-08-14T20:52:19Z

Sadly I didn't comment my code, but I believe the relevant changes were in the readbedgraph function in the file func.py. My modified code for that function is at the bottom of this email. Changes either follow a commented line that has the original code, or in some cases the original code is commented at the end of a line.

My fully changed files is available at https://github.com/MaloofLab/Davis_B_napus_assembly_2023/blob/main/HE_Analysis/plotsr/modified_func.py , but we changed a lot of other things as well (custom colors for translocations based on the chromosome of origin, etc).

If you want a working example you can clone the whole repo and then run the code in https://github.com/MaloofLab/Davis_B_napus_assembly_2023/blob/main/HE_Analysis/Julin_plotsr.Rmd (it is an R file that first creates the plotting files and eventually calls the python plotsr)

    # Read input bedgraph file
    def _readbedgraph(self, chrlengths):
        from collections import deque, defaultdict
        import numpy as np
        from math import ceil

        bw = int(self.bw)
        _chrs = set([c for c in chrlengths[0][1].keys()])
        bincnt = defaultdict(deque)
        skipchrs = []
        curchr = ''
        added_chrs = list()
        with open(self.f, 'r') as fin:
            for line in fin:
                line = line.strip().split()
                try:
                    v = float(line[3])
                except ValueError:
                    if len(line) < 4:
                        self.logger.warning("Incomplete information in bedgraph file at line: {}. Skipping it.".format("\t".join(line)))
                        continue
                if line[0] not in _chrs:
                    if line[0] == '#': continue
                    if line[0] == 'track': continue
                    if line[0] not in skipchrs:
                        self.logger.warning("Chromosome in BEDGRAPH is not present in FASTA or not selected for plotting. Skipping it. BED line: {}".format("\t".join(line)))
                        skipchrs.append(line[0])
                    continue
                if curchr == '':
                    curchr = line[0]
                    #binv = np.zeros(ceil(chrlengths[0][1][curchr]/bw), dtype=float)
                    binv = np.full(ceil(chrlengths[0][1][curchr]/bw), np.nan, dtype=float)
                    s = int(line[1])
                    e = int(line[2])
                    if s//bw == e//bw:
                        binv[s//bw] = v #+= (e-s)*v
                    else:
                        binv[s//bw] = v #+= (bw-(s%bw))*v
                        binv[e//bw] = v #+= (e%bw)*v
                elif curchr == line[0]:
                    s = int(line[1])
                    e = int(line[2])
                    if s//bw == e//bw:
                        binv[s//bw] = v #+= (e-s)*v
                    else:
                        binv[s//bw] = v #+= (bw-(s%bw))*v
                        binv[e//bw] = v #+= (e%bw)*v
                else:
                    if line[0] in added_chrs:
                        self.logger.error("BedGraph file: {} is not sorted. For plotting tracks, sorted BedGraph file is required. Exiting.".format(self.f))
                        sys.exit()
                    bins = np.concatenate((np.arange(0, chrlengths[0][1][curchr], bw), np.array([chrlengths[0][1][curchr]])), axis=0)
                    bins = [(bins[i] + bins[i+1])/2 for i in range(len(bins) - 1)]
                    bincnt[curchr] = deque([(bins[i], binv[i]) for i in range(len(bins))])
                    added_chrs.append(curchr)
                    # Set the new chromosome
                    curchr = line[0]
                    #binv = np.zeros(ceil(chrlengths[0][1][curchr]/bw), dtype=float)
                    binv = np.full(ceil(chrlengths[0][1][curchr]/bw), np.nan, dtype=float)
                    s = int(line[1])
                    e = int(line[2])
                    if s//bw == e//bw:
                        binv[s//bw] = v # += (e-s)*v
                    else:
                        binv[s//bw] = v # += (bw-(s%bw))*v
                        binv[e//bw] = v # += (e%bw)*v
        bins = np.concatenate((np.arange(0, chrlengths[0][1][curchr], bw), np.array([chrlengths[0][1][curchr]])), axis=0)
        bins = [(bins[i] + bins[i+1])/2 for i in range(len(bins) - 1)]
        bincnt[curchr] = deque([(bins[i], binv[i]) for i in range(len(bins))])
        ## Scale count values
        # maxv = 0
        # for k, v in bincnt.items():
        #     for r in v:
        #         if r[1] > maxv:
        #             maxv = r[1]
        # for k, v in bincnt.items():
        #     bincnt[k] = deque([(r[0], r[1]/maxv) for r in v])
        self.bincnt = bincnt
        return
    # END

…ck values (#67 #34)

mnshgl0110 added the enhancement New feature or request label Aug 14, 2023

KewinOgink mentioned this issue Sep 1, 2023

How is track height determined? #67

Closed

mnshgl0110 pushed a commit that referenced this issue May 22, 2024

Added a new base.cfg option norm to adjust the normalisation of tra…

e80c52f

…ck values (#67 #34)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: option to not bin and not rescale bedgraph tracks #34

Feature request: option to not bin and not rescale bedgraph tracks #34

jnmaloof commented Nov 6, 2022

mnshgl0110 commented Nov 7, 2022

jnmaloof commented Nov 9, 2022 via email

fishercera commented Aug 12, 2023

jnmaloof commented Aug 14, 2023

Feature request: option to not bin and not rescale bedgraph tracks #34

Feature request: option to not bin and not rescale bedgraph tracks #34

Comments

jnmaloof commented Nov 6, 2022

mnshgl0110 commented Nov 7, 2022

jnmaloof commented Nov 9, 2022 via email

fishercera commented Aug 12, 2023

jnmaloof commented Aug 14, 2023