• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Programming puzzle: highest proportion of 1s in a substring

I would say your best bet is to jog your string and find the index of every 0. Then map their distances. You aren't doing an exhaustive search on every count, just looking for the high water mark on distance.

You have to read each value once, at a minimum. Why not just read each value .. exactly once?

For int pos = 0; pos < strlen; pos++{
if c == 0, {
diff = cur-pos
If diff > min && diff > hwm{
Hwm = diff
Longestpos = pos}
Pos = cur}}

Oh, I'm wrong, you can also skip work by jumping FORWARD by HWM at each 0 and then searching backwards for a breaking 0 from that forward point, possibly skipping HWM search bits

So, "11100011110001" could be searched by "111" found, skip forward at 0 by 3, see a 1, if you don't count back to the 0, restart from the first 0 before the 1 (since it cannot contain enough 1s to generate a 3 string, our HWM), and then count forward by 3 again, until we have either beaten our HWM or stepped forward again from a 0.

I'll post a general solver with short circuit look-ahead after work
 
Forgive the capitalized bits .. phone is being Derp
Code:
int first = 0
While(str[first] != '0')
 First++;
Int hwm = first>=(k-1)?first:0;
Int winner = first>=(k-1)?0:-1;
Int jump = k;
Int j;

For(int I = first; I < str.len; I+= jump)
{
  If (hwm != Jump && hwm > k)
    Jump = hwm;
  If (I + jump > len)
    Break;
  //Note, empty loop. This is to calculate j!
  For( j = 0; j < jump && str[I + jump - j]!='0' j++);
  If (j == jump)
  {
    While(len < (I+ jump +  1)&&str[I+jump+1] != '0')
      Jump++;
    //Can test >= for last instance, else > for first instance)
    If (jump > HWM)
    {
      hwm = jump;
      //I = '0', I+1 is our first nonzero!
      Winner = I+1;
    }
    
  }
  Else
    Jump -= j;
}

Or something like this. I haven't tested it.

There is probably a do-while formulation from "first 1", possibly saving some lines of code, but the idea is to jump past impossible segments that cannot contain a new winner.
 
Also, it's obvious that solution must start and end with 1s. This means that you don't need to scan every length=[k,.....,2k-1]
You can find max for some n and then try to extend it with 1s to some length=m>n and if you are successful you can skip these lengths in the search and continue with m+1. Proof is the same as for [k,.....,2k-1]
 
Also, it's obvious that solution must start and end with 1s. This means that you don't need to scan every length=[k,.....,2k-1]
You can find max for some n and then try to extend it with 1s to some length=m>n and if you are successful you can skip these lengths in the search and continue with m+1. Proof is the same as for [k,.....,2k-1]
The solution doesn't necessarily have to start and end with a 1. Consider sequence 001100 and k=4.
 
Last edited:
Also, it's obvious that solution must start and end with 1s. This means that you don't need to scan every length=[k,.....,2k-1]
You can find max for some n and then try to extend it with 1s to some length=m>n and if you are successful you can skip these lengths in the search and continue with m+1. Proof is the same as for [k,.....,2k-1]
The solution doesn't necessarily have to start and end with a 1. Consider sequence 001100 and k=4.
It's a degenerate case which does not change the algorithm:
1. Find solution for length=n,
2. expand it to length=m (if it's possible)
3. skip scanning lengths which are m or shorter.
 
Also, it's obvious that solution must start and end with 1s. This means that you don't need to scan every length=[k,.....,2k-1]
You can find max for some n and then try to extend it with 1s to some length=m>n and if you are successful you can skip these lengths in the search and continue with m+1. Proof is the same as for [k,.....,2k-1]
The solution doesn't necessarily have to start and end with a 1. Consider sequence 001100 and k=4.
It's a degenerate case which does not change the algorithm:
1. Find solution for length=n,
2. expand it to length=m (if it's possible)
3. skip scanning lengths which are m or shorter.

Ok, but which one of the solutions of size n do you try to expand? If it's always the last one, it seems rather arbitrary for long inputs. I suppose you could try to expand all of them as you find them and then pick the length of the longest expanded string as m.
 
It's a degenerate case which does not change the algorithm:
1. Find solution for length=n,
2. expand it to length=m (if it's possible)
3. skip scanning lengths which are m or shorter.

Ok, but which one of the solutions of size n do you try to expand? If it's always the last one, it seems rather arbitrary for long inputs. I suppose you could try to expand all of them as you find them and then pick the length of the longest expanded string as m.
Yes, all solutions for n should be tried for expansion. The longest one would obviously have highest density and will allow skipping more scans.
 
I have even better idea for skipping.
Have ratio for the current n-scan and compare to current maximum density record and see if it is even possible to exceed current max density in the next n+1, n+2,.... scan. If it is not possible then one can safely skip it.
It has zero cost over my current code.
 
Here is a code with skipping
Code:
#!/usr/bin/python3
def scan_array(arr,k, sum):
  pos=k-1
  best=sum
  for i in range(k,len(arr)):
    sum+=(arr[i]-arr[i-k])
    if sum>best:
      pos=i
      best=sum
  return pos-k+1, best

def scan_2k(str,k):
  arr=[int(x) for x in str]
  sum=0
  for i in range(k):
    sum+=arr[i]
  k2=min(2*k,len(arr))
  best_density=0.
  ones=0
  for i in range(k,k2):
    ones=ones+1
    b=float(ones)/float(i)
    if b > best_density:
      p, ones=scan_array(arr,i,sum)
      b=float(ones)/float(i)
      if(b>best_density):
        best_density=b
        pos=p
        bl=i
    else:
      print("skipping:", i)
    sum+=arr[i]
  frm="k={} Best={} pos={}"
  print(frm.format(k,best_density, pos))
  print("substring", str[pos:(pos+bl)])

str2="0110111011101111100011"
for i in range(3,10):
  scan_2k(str2, i)

If it reaches density=1 it skips the rest automatically.
 
Last edited:
Here is a code with skipping
Code:
#!/usr/bin/python3
def scan_array(arr,k, sum):
  pos=k-1
  best=sum
  for i in range(k,len(arr)):
    sum+=(arr[i]-arr[i-k])
    if sum>best:
      pos=i
      best=sum
  return pos-k+1, best

def scan_2k(str,k):
  arr=[int(x) for x in str]
  sum=0
  for i in range(k):
    sum+=arr[i]
  k2=min(2*k,len(arr))
  best_density=0.
  ones=0
  for i in range(k,k2):
    ones=ones+1
    b=float(ones)/float(i)
    if b > best_density:
      p, ones=scan_array(arr,i,sum)
      b=float(ones)/float(i)
      if(b>best_density):
        best_density=b
        pos=p
        bl=i
    else:
      print("skipping:", i)
    sum+=arr[i]
  frm="k={} Best={} pos={}"
  print(frm.format(k,best_density, pos))
  print("substring", str[pos:(pos+bl)])

str="0110111011101111100011"
for i in range(3,10):
  scan_2k(str, i)

If it reaches density=1 it skips the rest automatically.

Overwriting `str`? cheeky :)
 
Forgive the capitalized bits .. phone is being Derp
Code:
int first = 0
While(str[first] != '0')
 First++;
Int hwm = first>=(k-1)?first:0;
Int winner = first>=(k-1)?0:-1;
Int jump = k;
Int j;

For(int I = first; I < str.len; I+= jump)
{
  If (hwm != Jump && hwm > k)
    Jump = hwm;
  If (I + jump > len)
    Break;
  //Note, empty loop. This is to calculate j!
  For( j = 0; j < jump && str[I + jump - j]!='0' j++);
  If (j == jump)
  {
    While(len < (I+ jump +  1)&&str[I+jump+1] != '0')
      Jump++;
    //Can test >= for last instance, else > for first instance)
    If (jump > HWM)
    {
      hwm = jump;
      //I = '0', I+1 is our first nonzero!
      Winner = I+1;
    }
    
  }
  Else
    Jump -= j;
}

Or something like this. I haven't tested it.

There is probably a do-while formulation from "first 1", possibly saving some lines of code, but the idea is to jump past impossible segments that cannot contain a new winner.

Guess why I have 0% confidence in your "solution". Even without actually trying to understand it.
 
Forgive the capitalized bits .. phone is being Derp
Code:
int first = 0
While(str[first] != '0')
 First++;
Int hwm = first>=(k-1)?first:0;
Int winner = first>=(k-1)?0:-1;
Int jump = k;
Int j;

For(int I = first; I < str.len; I+= jump)
{
  If (hwm != Jump && hwm > k)
    Jump = hwm;
  If (I + jump > len)
    Break;
  //Note, empty loop. This is to calculate j!
  For( j = 0; j < jump && str[I + jump - j]!='0' j++);
  If (j == jump)
  {
    While(len < (I+ jump +  1)&&str[I+jump+1] != '0')
      Jump++;
    //Can test >= for last instance, else > for first instance)
    If (jump > HWM)
    {
      hwm = jump;
      //I = '0', I+1 is our first nonzero!
      Winner = I+1;
    }
    
  }
  Else
    Jump -= j;
}

Or something like this. I haven't tested it.

There is probably a do-while formulation from "first 1", possibly saving some lines of code, but the idea is to jump past impossible segments that cannot contain a new winner.

Guess why I have 0% confidence in your "solution". Even without actually trying to understand it.

Because you commonly disregard the ideas of others without trying to understand it, In my experience. You didn't have to be an ass.

At any rate, the solution works actually works, now.

returns 8, with a little tweaking for typos that I expected to exist.

Code:
//find longest substring of "1" in a string containing only 1 and 0.
int fn(char* str, int len, int k, char** substr)
{
    int first = 0;
    int found = 0;
    int hwm = 0;
    int winner = -1;
    int jump = k;
    int j;

    //start the process at the head of the string. Why not?
    for(int i = -1; i < len; i+= jump)
    {
        if (i + jump > len)
            break;
           
        jump = hwm > k ? hwm : k;

        //set jump if necessary to the larger size. For next time
        for (j = 0; j < jump && str[i + (jump - j)] != '0'; j++);
       
        if (j + found == jump)
            j += found;

        if (j == jump)
        {
            found = 0;
       
            while (len > (i + jump + 1) && str[i + jump +1]!= '0')
                jump++;
       
            if (jump > hwm)
            {
                hwm = jump;
                *substr = &(str[i+1]);
            }
        }
        else{
            jump -= j;
            found = j;
        }
    }
    return hwm;
}

int main(int argc, char *argv[])
{
    //               123456789012345678901234567890123456
    char string[] = "010100111101010100000111111110101110";
    char* result;
    int hwm = fn(string, 36, 4, &result);
    printf("%d, %s", hwm, result);
   
}

I'll take a look at how it's missing the count: I missed a +1
 
Forgive the capitalized bits .. phone is being Derp
Code:
int first = 0
While(str[first] != '0')
 First++;
Int hwm = first>=(k-1)?first:0;
Int winner = first>=(k-1)?0:-1;
Int jump = k;
Int j;

For(int I = first; I < str.len; I+= jump)
{
  If (hwm != Jump && hwm > k)
    Jump = hwm;
  If (I + jump > len)
    Break;
  //Note, empty loop. This is to calculate j!
  For( j = 0; j < jump && str[I + jump - j]!='0' j++);
  If (j == jump)
  {
    While(len < (I+ jump +  1)&&str[I+jump+1] != '0')
      Jump++;
    //Can test >= for last instance, else > for first instance)
    If (jump > HWM)
    {
      hwm = jump;
      //I = '0', I+1 is our first nonzero!
      Winner = I+1;
    }
    
  }
  Else
    Jump -= j;
}

Or something like this. I haven't tested it.

There is probably a do-while formulation from "first 1", possibly saving some lines of code, but the idea is to jump past impossible segments that cannot contain a new winner.

Guess why I have 0% confidence in your "solution". Even without actually trying to understand it.

Because you commonly disregard the ideas of others without trying to understand it, In my experience. You didn't have to be an ass.

At any rate, the solution works actually works, now.

returns 8, with a little tweaking for typos that I expected to exist.

Code:
//find longest substring of "1" in a string containing only 1 and 0.
int fn(char* str, int len, int k, char** substr)
{
    int first = 0;
    int found = 0;
    int hwm = 0;
    int winner = -1;
    int jump = k;
    int j;

    //start the process at the head of the string. Why not?
    for(int i = -1; i < len; i+= jump)
    {
        if (i + jump > len)
            break;
           
        jump = hwm > k ? hwm : k;

        //set jump if necessary to the larger size. For next time
        for (j = 0; j < jump && str[i + (jump - j)] != '0'; j++);
       
        if (j + found == jump)
            j += found;

        if (j == jump)
        {
            found = 0;
       
            while (len > (i + jump + 1) && str[i + jump +1]!= '0')
                jump++;
       
            if (jump > hwm)
            {
                hwm = jump;
                *substr = &(str[i+1]);
            }
        }
        else{
            jump -= j;
            found = j;
        }
    }
    return hwm;
}

int main(int argc, char *argv[])
{
    //               123456789012345678901234567890123456
    char string[] = "010100111101010100000111111110101110";
    char* result;
    int hwm = fn(string, 36, 4, &result);
    printf("%d, %s", hwm, result);
   
}

I'll take a look at how it's missing the count: I missed a +1

Your solution may work, but it seems to solve a different problem. Also, result is not null terminated so you can't print it out with %s.
 
Because you commonly disregard the ideas of others without trying to understand it, In my experience. You didn't have to be an ass.

At any rate, the solution works actually works, now.

returns 8, with a little tweaking for typos that I expected to exist.

Code:
//find longest substring of "1" in a string containing only 1 and 0.
int fn(char* str, int len, int k, char** substr)
{
    int first = 0;
    int found = 0;
    int hwm = 0;
    int winner = -1;
    int jump = k;
    int j;

    //start the process at the head of the string. Why not?
    for(int i = -1; i < len; i+= jump)
    {
        if (i + jump > len)
            break;
           
        jump = hwm > k ? hwm : k;

        //set jump if necessary to the larger size. For next time
        for (j = 0; j < jump && str[i + (jump - j)] != '0'; j++);
       
        if (j + found == jump)
            j += found;

        if (j == jump)
        {
            found = 0;
       
            while (len > (i + jump + 1) && str[i + jump +1]!= '0')
                jump++;
       
            if (jump > hwm)
            {
                hwm = jump;
                *substr = &(str[i+1]);
            }
        }
        else{
            jump -= j;
            found = j;
        }
    }
    return hwm;
}

int main(int argc, char *argv[])
{
    //               123456789012345678901234567890123456
    char string[] = "010100111101010100000111111110101110";
    char* result;
    int hwm = fn(string, 36, 4, &result);
    printf("%d, %s", hwm, result);
   
}

I'll take a look at how it's missing the count: I missed a +1

Your solution may work, but it seems to solve a different problem.

Ah, I see now. I read the problem wrong. Most dense substring of any length, not the longest substring of solid, minimal length k.
 
Because you commonly disregard the ideas of others without trying to understand it, In my experience. You didn't have to be an ass.

At any rate, the solution works actually works, now.

returns 8, with a little tweaking for typos that I expected to exist.

Code:
//find longest substring of "1" in a string containing only 1 and 0.
int fn(char* str, int len, int k, char** substr)
{
    int first = 0;
    int found = 0;
    int hwm = 0;
    int winner = -1;
    int jump = k;
    int j;

    //start the process at the head of the string. Why not?
    for(int i = -1; i < len; i+= jump)
    {
        if (i + jump > len)
            break;
           
        jump = hwm > k ? hwm : k;

        //set jump if necessary to the larger size. For next time
        for (j = 0; j < jump && str[i + (jump - j)] != '0'; j++);
       
        if (j + found == jump)
            j += found;

        if (j == jump)
        {
            found = 0;
       
            while (len > (i + jump + 1) && str[i + jump +1]!= '0')
                jump++;
       
            if (jump > hwm)
            {
                hwm = jump;
                *substr = &(str[i+1]);
            }
        }
        else{
            jump -= j;
            found = j;
        }
    }
    return hwm;
}

int main(int argc, char *argv[])
{
    //               123456789012345678901234567890123456
    char string[] = "010100111101010100000111111110101110";
    char* result;
    int hwm = fn(string, 36, 4, &result);
    printf("%d, %s", hwm, result);
   
}

I'll take a look at how it's missing the count: I missed a +1

Your solution may work, but it seems to solve a different problem. Also, result is not null terminated so you can't print it out with %s.

I wasn't looking to snprintf or any of that, I just wanted to see the remainder of the string. I could have done the strtok thing and nulled the character, I guess? I just wanted the index properly calculated.

Interesting to me is the fact that for the case where there are K or more consecutive 1's my solution does solve the problem. I am almost certain both N! And potentially even a linear solution is possible, though. The N! Solution would be (find the maximally dense string of at least K size for each point going forward from index i++), with some early termination possible.

There's a fair bit of work necessary to prove out a linear solution though for a non-consecutive best K+

I'll also note that using (ad>bc) can free the function entirely of division and floating point math in general.
 
Last edited:
Because you commonly disregard the ideas of others without trying to understand it, In my experience. You didn't have to be an ass.

At any rate, the solution works actually works, now.

returns 8, with a little tweaking for typos that I expected to exist.

Code:
//find longest substring of "1" in a string containing only 1 and 0.
int fn(char* str, int len, int k, char** substr)
{
    int first = 0;
    int found = 0;
    int hwm = 0;
    int winner = -1;
    int jump = k;
    int j;

    //start the process at the head of the string. Why not?
    for(int i = -1; i < len; i+= jump)
    {
        if (i + jump > len)
            break;
           
        jump = hwm > k ? hwm : k;

        //set jump if necessary to the larger size. For next time
        for (j = 0; j < jump && str[i + (jump - j)] != '0'; j++);
       
        if (j + found == jump)
            j += found;

        if (j == jump)
        {
            found = 0;
       
            while (len > (i + jump + 1) && str[i + jump +1]!= '0')
                jump++;
       
            if (jump > hwm)
            {
                hwm = jump;
                *substr = &(str[i+1]);
            }
        }
        else{
            jump -= j;
            found = j;
        }
    }
    return hwm;
}

int main(int argc, char *argv[])
{
    //               123456789012345678901234567890123456
    char string[] = "010100111101010100000111111110101110";
    char* result;
    int hwm = fn(string, 36, 4, &result);
    printf("%d, %s", hwm, result);
   
}

I'll take a look at how it's missing the count: I missed a +1

Your solution may work, but it seems to solve a different problem. Also, result is not null terminated so you can't print it out with %s.

I wasn't looking to snprintf or any of that, I just wanted to see the remainder of the string. I could have done the strtok thing and nulled the character, I guess? I just wanted the index properly calculated.
Code:
printf("%d, %.*s", hwm, hwm, result);
 
I wasn't looking to snprintf or any of that, I just wanted to see the remainder of the string. I could have done the strtok thing and nulled the character, I guess? I just wanted the index properly calculated.
Code:
printf("%d, %.*s", hwm, hwm, result);

Interesting. I've been writing c++ for 20 years, and this is my first time seeing a length operator for printf. Learn something new every day I guess. For my own defense, I've never needed it, and it's hard to read. That said, I won't be able to start milling out a roll at the non-consecutive case until tomorrow night or maybe lunch tomorrow.
 
Back
Top Bottom