barbos
Contributor
It's python, reallocation is a last thing you should worry about.Test mine
I added "extending". Does not really improve that much. Cumulative skips only small bit larger than before, but added code makes single n-pass a bit longer.
Not really worth it. "Better" is an enemy of "good"
Code:#!/usr/bin/python3 def count(arr, r0): i=r0 for i in range(r0, len(arr)): if arr[i] == 0: break return i-r0 def scan_array(arr,k, sum): best=sum solutions=[0] for i in range(k,len(arr)): sum+=(arr[i]-arr[i-k]) if sum>best: best=sum solutions=[i-k+1] elif sum == best: solutions.append(i-k+1) ad2=0 bs=solutions[0] for sol in solutions: n=count(arr,sol+k) if n>ad2: ad2=n bs=sol return bs, best, ad2 def scan_2k(s2,k): arr=[int(x) for x in s2] sum=0 for i in range(k): sum+=arr[i] k2=min(2*k,len(arr)) best_density=0. ones=0 add=0 skipped=0 scanned=0 pos=0 bl=k for i in range(k,k2): if add: sum+=arr[i] skipped+=1 add-=1 continue ones=ones+1 b=float(ones)/float(i) if b > best_density: scanned+=1 p, ones, add = scan_array(arr,i,sum) ones+=add b=float(ones)/float(i+add) if(b>best_density): best_density=b pos=p bl=i+add add+=1 else: skipped+=1 sum+=arr[i] frm="k={} Best={} pos={} len={} skipped={}/{}" print(frm.format(k,best_density, pos,bl,skipped,skipped+scanned)) print("substring", s2[pos:(pos+bl)], "\n") str2="1011110011010110000010110001111000111010111101001010100100011101010111010101001010000011010000100001100011101000100111010111111111100000" for i in range(1,50): scan_2k(str2, i) scan_2k("011101011111111110", 13) scan_2k("00000000000000000000000", 13)
when doing as many passes as anything that does this, there are two concerns, one is reallocation, wich absolutely must be avoided.
Yours does not work yet, and does not really have comments/documentation.also, yours is written in python with no comments or documentation, duck typing, indexing into bloated types and no inlining, and no reusability, and use of floating point arithmetic.
Again, you are complaining about python being python.
I used it to demonstrate that algorithm is working and efficient (and it is) and because it was used before me, not because I wanted to crash real world speed records.
OK, I did not ask you about that.extend isn't correct because it directly lowers cycle count, its correct because it retains the integrity of the numerators and denominators while operating full segments and not testing every segment.
it also prevents rounding errors at high N and low K.
python is for big, high level things.
Rounding errors for substrings k=10^15 characters long. That's 1000 terabytes of characters. or 1000 terabits of bits
Your 64bit integer arithmetic is limited to 4GB strings. So I think I am better than you in this regard