Programming puzzle: highest proportion of 1s in a substring

barbos · Dec 9, 2020

Jarhyn said:
barbos said:

Guess why I have 0% confidence in your "solution". Even without actually trying to understand it.

Click to expand...

Because you commonly disregard the ideas of others without trying to understand it, In my experience. You didn't have to be an ass.

Original problem (as you realized by now) is different from what you thought now and sane solution MUST involve division (multiplications if you are insane) densities. So I quickly scanned your code for them and found none, hence my conclusion.

Jarhyn · Dec 9, 2020

barbos said:
Jarhyn said:

barbos said:

Guess why I have 0% confidence in your "solution". Even without actually trying to understand it.

Click to expand...

Because you commonly disregard the ideas of others without trying to understand it, In my experience. You didn't have to be an ass.

Click to expand...

Original problem (as you realized by now) is different from what you thought now and sane solution MUST involve division (multiplications if you are insane) densities. So I quickly scanned your code for them and found none, hence my conclusion.

The most sane solution will use a comparator and a retained A/B against current C/D because of the cost of a flop. That said, the solution only requires division for the case where there is no string of size K. This is probably a good first pass filter to the problem regardless: it solves this case in linear time.

That said, I really do think there is a linear solution here. I'm thinking something like shrinking the pool by each end, and calculating whether lopping off regions of contiguous ones on one end or the other would decrease the density, and if it does, don't do that? Just something that's been stewing hard. like, my brain really expects this to be O=N.

barbos · Dec 9, 2020

Jarhyn said:
barbos said:

Original problem (as you realized by now) is different from what you thought now and sane solution MUST involve division (multiplications if you are insane) densities. So I quickly scanned your code for them and found none, hence my conclusion.

Click to expand...

The most sane solution will use a comparator and a retained A/B against current C/D because of the cost of a flop. That said, the solution only requires division for the case where there is no string of size K. This is probably a good first pass filter to the problem regardless: it solves this case in linear time.

That said, I really do think there is a linear solution here. I'm thinking something like shrinking the pool by each end, and calculating whether lopping off regions of contiguous ones on one end or the other would decrease the density, and if it does, don't do that? Just something that's been stewing hard. like, my brain really expects this to be O=N.

I am not sure what are you talking about, but go ahead and post solution without multiplication and division

My solution is O(N) already, technically.

barbos · Dec 9, 2020

Jayjay said:
Code:

double accuracy = max_hits / static_cast<double>(len);

Is there a reason to use static_cast instead of (double) ?

Jayjay · Dec 9, 2020

barbos said:
Jayjay said:

Code:

double accuracy = max_hits / static_cast<double>(len);

Click to expand...

Is there a reason to use static_cast instead of (double) ?

No reason except C++ style versus C.

barbos · Dec 10, 2020

Jayjay said:
barbos said:

Jayjay said:

Code:

double accuracy = max_hits / static_cast<double>(len);

Click to expand...

Is there a reason to use static_cast instead of (double) ?

Click to expand...

No reason except C++ style versus C.

Then I suggest changing to:

Code:

auto count_hits(const string& s) -> int
{
    int n{};
    for(char c : s)
        if(c == '1')
            ++n;
    return n;
}

As well

Need to get used to the new style of C++

Jarhyn · Dec 12, 2020

Ok, a complete solution now that I've had a minute to sit down and look at it.

I've done some things I've not done before, using the newer less-vexing-parse (thanks Jayjay!).

I had originally set out to write in hard C Because, well, I'm an embedded programmer... But with FreeRTOS and other embedded c++, and larger memory, I'm not sure that's necessary.

Instead I made it in c++, as a class. Some of the math here was really interesting. It's the first time I've had to do boolean term reduction in almost a decade.

This solution features TWO branches. My original "trivial density" solution but cleaned up, and then a second "hard density" solution for when the trivial case falls through.

I went with using the stack to manage the objects created in main similar to how some mutex locks are operated in QC.

In the interest of making readable code, I went with going by a linear time expand/contract model that will "worm" its way along the string and expand or retract the window.

main.cpp:

Code:

#include <iostream>
#include <cstring>
#include "kplus.h"


using namespace kplus;

int main(int argc, char *argv[])
{
	//               123456789012345678901234567890123456
	char string[] = "010100111101010100000111111110101110";
	char string2[] = "01010010101110110010101011110110101101011010010111101011010110101000100100000111101001101101001";
	const char* result;
	int hwm;

	DensestKPlus d1(string, strlen(string), 4);
	hwm = d1.GetAnswer(&result);
	printf("%d, %.*s\r\n", hwm, hwm, result);
	DensestKPlus d2(string2, strlen(string2), 5);
	hwm = d2.GetAnswer(&result);
	printf("%d, %.*s\r\n", hwm, hwm, result);
	while (1);
}

kplus.cpp:

Code:

#include "kplus.h"

namespace kplus
{
	//find longest substring of "1" in a string containing only 1 and 0.

	DensestKPlus::DensestKPlus(const char *const str, const int len, const int k)
		:	_str{ str }
		,	_len{ len }
		,	_k{ k } 
	{
		SetStr(str);
	} //END DensestKPlus::DensestKPlus(const char *const str, const int len, const int k) 

	void DensestKPlus::SetStr(const char * const str)
	{
		a = b = c = e = i =
			0;
		d = f =
			_k;
		_str = str;
	}

	int DensestKPlus::GetAnswer(const char** dest)
	{
		d = _k;
		f = _k;
		
		//don't repeat work.
		if (!c && !TrivialDensity() && d != b)
			HardDensity();

		*dest = answer;
		return d;
	} //END GetAnswer(const char** dest)

	bool DensestKPlus::TrivialDensity()
	{
		int found = 0; //temporary found counter to prevent doing work.

		//start the process at the head of the string. Why not?
		for (i = -1; i < _len; i += found ? (d - found) : d)
		{
			if (i + d > _len)
				break;

			//probe backward from the jump for a 1. we want to complete
			for (b = 0; b < (d - found) && Val(i + (d - b)); b++);

			//we have won, so far.
			if (b + found == d)
			{
				//raise while winning.
				for (; _len >(i + d + 1) && Val(i + d + 1); d++);
				answer = &(_str[i + 1]);
				c = d;
			}
			else
				found = b;
		}
		return c;
	}//END DensestKPlus::TrivialDensity()

	//nontrivial case. This is a slightly harder ask.
	void DensestKPlus::HardDensity()
	{
		//another special case; this only works when we do the first pass.
		//a knut is a hand that can't be beat ex:1110111 where k=4
		const int c_knut{ 2 * _k - 1 };
		const int d_knut{ 2 * _k - 2 };
		bool extend = true;

		//reset our pertienent variables; D and C are not impacted by trivial if fail
		i = 0;
		b = 0;
		
		do {
			if (extend)
				Extend();
			else
				Retract();
			
			//retain if is winner
			if (IsBetter())
			{
				LogLocalHWM();
				if (IsWinner())
					SaveWinner();
			}
			else
				if (extend = !extend) //flip extend; retraction resets our "local" since it's a score per i;
					ClearLocal();
				else				  //roll back a bad extension
					RestoreLocal();

		} while (i < _len && (b + i < _len || b > _k) && (c != c_knut || d != d_knut));
	} //END DensestKPlus::HardDensity()

}//END namespace kplus

kplus.h (helper functions, class definition, mechanical protection, inlines)

Code:

#ifndef KPLUS_H
#define KPLUS_H

#include <iostream>

namespace kplus
{
	class DensestKPlus
	{
	public:
		DensestKPlus(const char *const str, const int len, const int k);
		int GetAnswer(const char** answer);
		void SetStr(const char* const str);

	private:
		const char * _str;
		const int _len;
		const int _k;

		const char* answer{ NULL };

		//these may seem strange or meaningless, but they are not.
		//A/B > C/D when AD > CB.

		int a{ 0 };
		int b{ 0 };
		int c{ 0 };
		int d{ 0 };

		//E and F here for 
		int e{ 0 };
		int f{ 0 };

		int i{ 0 };

		int iSaved{ 0 };
		char lp{ 0 };

		bool TrivialDensity();
		void HardDensity();

		inline unsigned char Val(int x) { return (_str[x] - '0'); }
		inline void LogLocalHWM()
		{
			e = a;
			f = b;
			iSaved = i;
		}

		inline void SaveWinner()
		{
			c = a;
			d = b;
			answer = &_str[i];
		}

		inline void RestoreLocal()
		{
			a = e;
			b = f;
			iSaved = i;
		};

		inline void ClearLocal()
		{
			e = 0;
			f = _k;
		}

		inline bool IsWinner()
		{
			if ((b >= _k && a*d > b*c) || (b >= d && a*d == b * c))
				return true;
			return false;
		}

		inline bool IsBetter()
		{
			if ((b >= _k && a*f > b*e) || (b > f && a * f == b * e))
				return true;
			return false;
		}

		inline void Extend()
		{
			bool pad{ false };
			bool re{ false };
			bool fe{ false };

			//if we have less density at the front, we won't lose
			//any by moving it to the end instead... 
			for (; !Val(i) && (i + b< _len) && (b - i >= _k - 1); i++, b--);

			//advance up to K if less than, and not final
			for (; (i + b < _len) && (b < _k ); b++)
			{
				pad = true;
				if (Val(i + b)) a++;
				lp = Val(i + b);
			}

			//until we either padded and didn't get a 1, or we had a raising edge and didn't get a 1.			
			for (;(i + b < _len) && (Val(i+b) || (!re && !pad)); b++)
			{
				if (Val(i + b))
				{
					a++;
					re |= !lp;
				}
				lp = Val(i + b);
			}
		}

		inline void Retract()
		{
			bool re = false;
			bool fe = false;

			//extend didn't work out; move forward.
			for(;!(fe && Val(i)) && (i+_k < _len) && b > 0; i++, b--)
				if (Val(i))
					a--;
				else
					fe = true;
		}
	};
}
#endif /*KPLUS_H*/

barbos · Dec 12, 2020

This is some insane (in a bad way) coding.
here is a sane and straightforward piece of code which finds longest substring of 1s:

Code:

const char* find_char(const char* st, char what)
{
  while(*st)
  {
    if(*st == what) return st;
    st++;
  }
  return st;
}
int  longest_ones(const char* start)
{
  int best=0;
  const char* res=start;
  while(*start)
  {
    start=find_char(start,'1');
    const char* end=find_char(start,'0');
    int len=end-start;
    if(len>best)
    {
      best=len;
      res=start;
    }
    start=end;
  }
  printf("Longest substring of 1s: %.*s\n",best,res);
  return best;
}

Jarhyn · Dec 12, 2020

barbos said:

This is some insane (in a bad way) coding.
here is a sane and straightforward piece of code which finds longest substring of 1s:

Code:

const char* find_char(const char* st, char what)
{
  while(*st)
  {
    if(*st == what) return st;
    st++;
  }
  return st;
}
int  longest_ones(const char* start)
{
  int best=0;
  const char* res=start;
  while(*start)
  {
    start=find_char(start,'1');
    const char* end=find_char(start,'0');
    int len=end-start;
    if(len>best)
    {
      best=len;
      res=start;
    }
    start=end;
  }
  printf("Longest substring of 1s: %.*s\n",best,res);
  return best;
}

ni, its code that can skip very quickly through the string. there are some other changes to that end because i wanted O(N/k) best case at O(N) worst at N = K.

it is insane in the right way for an operation that must be repeated very fucking quickly, as a linear density function must. I'm making a gamble of being able to end it quickly, and then taking a method that I spelled out for you functionally, with plenty of comments.

at any rate, I realized some of my const-ness was unnecessarily bottlek-necking the entry rate, and then my husband dragged me to lunch, and I lost the window ro edit it.

the class, as revised, is written to take in a fixed length string and then repeat the operation for K and Len, to make a buffer scan of specific width and operate it ridiculously fast.

the hard case, well, thats functional.

I very much like that you hate it, because yiu don't understand the context of why its important.

TL;DR: I give the user a very simple to operate class, you throw strings at an instance, and it does job FAST.

Jayjay · Dec 12, 2020

Jarhyn, I'm not sure your algorithm even works. The code is too messy for me to check why, but I am getting wrong results with some random strings. Here is one:

Code:

101111001101011000001011000111100011101011110100101010010001110101011101010100101000001101000010000110001110100010011101011111111110
N = 132, k = 13

Jarhyn · Dec 12, 2020

Jayjay said:
Jarhyn, I'm not sure your algorithm even works. The code is too messy for me to check why, but I am getting wrong results with some random strings.

what code are you using to generate strings? i would like to use it, largely because i like to watch the code execute sometimes. it makes a pretty dance.

i have newer code but I'm not at my computer at the moment, and am quite stoned.

I'll test it against the random strings, and then see what it does and how it falls down. at any rate I'll look more in the morning. i'm stoned, i'm married, i'm entirely unattached except for my own "insane" purposes, and im not looking for a job with this problem.

barbos · Dec 12, 2020

Oh, I see, your code is faster than scanning every char in the string. OK. You should have clearly stated that, that would have avoided accusation of being an Indian.

Jayjay · Dec 12, 2020

Jarhyn said:
Jayjay said:

Jarhyn, I'm not sure your algorithm even works. The code is too messy for me to check why, but I am getting wrong results with some random strings.

Click to expand...

what code are you using to generate strings? i would like to use it, largely because i like to watch the code execute sometimes. it makes a pretty dance.

Code:

    const int N = 132;
    
    // generate a random string of length N
    string s;
    s.reserve(N);
    for(int i = 0; i < N; ++i)
        s += rand() % 2 + '0';
    
    int k = max(4, N/10);

Of course that could be more flexible, for example the density of 1s in a string could be a variable. I am deliberately not using a random seed (yet), so that the results are predictable.

Jarhyn · Dec 12, 2020

hmm, thanks! I'll dive into this again. as i said it is a really fun problem.

for the record, the system i work with normally is a magnetic sensor. the math inside that thing is gnarly. it took a team of 10 people writing it for 5 years, and it had to fit on a PIC33. oh, and it has to control an entire hardware platform in addition to the a analog signalling.

oh, and i am not going to bother reposting the code till tomorrow, and i've had a chance ro watch this i would like to compare const to non-const machines for timing, branch prediction as the consideration.

Jarhyn · Dec 13, 2020

I lied. I did the work I wanted to do. The error was a snafu in my strip-pad behavior before the actual extension. In main.cpp you will see a continue if < 4; that was just the first one in the progression I used that expressed the bug.

There are still things that can make this faster. I mention in some of my comments, particularly concerning the fact that at some point, the gamble of trivial density becomes bad, at high len with moderate k; at some point it will no longer be worth the N/K cycles to take a ~0:infinity chance. hard density needs to be capable of detecting c == d, though and then devolving into a trivial density solution.

main.cpp

Code:

#include <iostream>
#include <cstring>
#include <random>
#include "kplus.h"

using namespace kplus;

#define len (20)
#define max(a,b) ((a>b)?(a):(b))

int main(int argc, char *argv[])
{
	//len+1 for null terminator.
	char str[len+1];
	const char* result;
	
	int hwm{ 0 }, 
		result_len{ 0 },
		k{ 0 },
		i{ 0 },
		loops{ 0 };

	//this probably needs a constructor, though placement-new maybe?
	DensestKPlus dkp(NULL, len, 0);
	
	while (1)
	{
		for (i = 0; i < len; i++)
			str[i] = rand() % 2 + '0';
		str[i] = '\0';
		
		k = max(4, len / 10);
		
		printf("i = %d; k = %d, %.*s\r\n", loops++, k, len, str);
		
		//if (loops < 5) continue;
		
		dkp.SetStr(str, k);
		result = dkp.GetAnswer(&hwm, &result_len);
		printf("%d, %.*s\r\n", hwm, result_len, result);
	}
}

kplus.cpp

Code:

#include "kplus.h"

namespace kplus
{
	//find longest substring of "1" in a string containing only 1 and 0.

	/*
	*	DensestKPlus - A machine to locate the densest
	*					substring of size k or greater.
	*
	*	@param str - the new string target.
	*
	*	@param len - the length of the string
	*
	*	@param k - the new k
	*/
	DensestKPlus::DensestKPlus(const char * str, const int len, int k)
		:	_str{ str }
		,	_len{ len }
	{
		SetStr(str,k);
	} //END DensestKPlus::DensestKPlus(const char *const str, const int len, const int k) 

	/*
	*	GetAnswer - this is what you really want to see:
	*				The top level abstraction of all the rest.
	*				TrivialDensity is used to gamble on a substring
	*				of size K with density = 1.
	*				
	*				The Hard Density function, if the trivial fails.
	*				there is going to be a probabilistic zero on the
	*				"worth" graph of taking this gamble. This could
	*				be improved on by adding this constant to the
	*				test case here, and having HardDensity devolve
	*				to trivial.
	*				
	*	@param c - total 1's found per string;
	*
	*	@param d - the width of the segment.
	*
	*	@return - the answer for s.
	*/
	const char* DensestKPlus::GetAnswer(int * c, int * d)
	{
		if (!answer && !TrivialDensity())
			HardDensity();

		*c = this->c;
		*d = this->d;

		return answer;
	} //END GetAnswer(const char** dest)
	  
	/*
	*	@param str - the new string target.
	*/
	void DensestKPlus::SetStr(const char * str)
	{
		SetStr(str, _k);
	}

	/*
	*	SetStr - sets the string data
	*
	*	@param str - the new string target.
	*
	*	@param k - the new k
	*/
	void DensestKPlus::SetStr(const char * str, int k)
	{
		//our mathematical integers, and index.
		a = b = c = e = i =
			0;
		
		//our minimal denominators.
		d = f = _k = 
			k;
		
		//the string itself.
		_str = 
			str;

		answer =
			NULL;
	} //END SetStr(const char * const str, int k)
}//END namespace kplus

kplus.h

Code:

#ifndef KPLUS_H
#define KPLUS_H

#include <iostream>

namespace kplus
{
	class DensestKPlus
	{
	public:
		DensestKPlus(const char *const str, const int len, int k);

		const char* GetAnswer(int * c, int * d);
		void		SetStr(const char* const str);
		void		SetStr(const char* const str, int k);

	private:
		const char	* _str; //the string to search
		const int	_len;	//the length of the string
		int			_k;		//the length of the min substring.

		const char* answer{ NULL };

		//these may seem strange or meaningless, but they are not.
		//A/B > C/D when AD > CB.

		int a{ 0 };
		int b{ 0 };
		int c{ 0 };
		int d{ 0 };

		//E and F here for local
		int e{ 0 };
		int f{ 0 };

		//the index at head of search substr
		int i{ 0 };

		char lp{ 0 };

		/************************
		BEGIN  INLINE DEFINITIONS
		************************/

		// TrivialDensity - finds longest substr of len >= k at density = 1
		inline bool TrivialDensity()
		{
			int found = 0; //temporary found counter to prevent doing work.

			//start the process at the head of the string. Why not?
			for (i = -1; i < _len; i += found ? (d - found) : d)
			{
				//if our next jump is past the end, end it, we are done.
				if (i + d > _len)
					break;

				//probe backward from the jump for a 1. we want to complete
				for (b = 0; b < (d - found) && Val(i + (d - b)); b++);

				//we have won, so far.
				if (b + found == d)
				{
					//raise while winning.
					for (; _len > (i + d + 1) && Val(i + d + 1); d++);
					answer = &(_str[i + 1]);
					c = d;
				}
				else
					found = b;
			}
			return c;
		}//END DensestKPlus::TrivialDensity()

		//	HardDensity-
		//	find density of longest substring where trivial case exhausted
		//	once we pass the value threshold for trivial, perhaps have this
		//	to devolve back to trivial on first trivial?
		inline void HardDensity()
		{
			//another special case; this only works when we do the first pass.
			//a knut is a hand that can't be beat ex:1110111 where k=4
			const int c_knut{ 2 * _k - 1 };
			const int d_knut{ 2 * _k - 2 };
			bool extend = true;

			//reset our pertienent variables; D and C are not impacted by trivial if fail
			i = 0;
			b = 0;

			do {
				if (extend)
					Extend();
				else
					Retract();

				//retain if is winner
				if (IsBetter())
				{
					LogLocalHWM();
					if (IsWinner())
						SaveWinner();
				}
				else
					if (extend = !extend) //flip extend; retraction resets our "local" since it's a score per i;
						ClearLocal();
					else				  //roll back a bad extension
						RestoreLocal();

			} while (i < _len && (b + i < _len || b > _k) && (c != c_knut || d != d_knut));
		} //END DensestKPlus::HardDensity()

		//member management
		inline unsigned char Val(int x) { return (_str[x] - '0'); }
		inline void LogLocalHWM()
		{
			e = a;
			f = b;
		}

		inline void SaveWinner()
		{
			c = a;
			d = b;
			answer = &_str[i];
		}

		inline void RestoreLocal()
		{
			a = e;
			b = f;
		};

		inline void ClearLocal()
		{
			e = 0;
			f = _k;
		}

		//This is how we avoid flops
		inline bool IsWinner()
		{
			if ((b >= _k && a*d > b*c) || (b >= d && a*d == b * c))
				return true;
			return false;
		}

		//also to avoid flops
		inline bool IsBetter()
		{
			if ((b >= _k && a*f > b*e) || (b > f && a * f == b * e))
				return true;
			return false;
		}

		//extend the tail
		inline void Extend()
		{
			bool pad{ false };
			bool re{ false };
			bool fe{ false };

			//if we have less density at the front, we won't lose
			//any by moving it to the end instead... 
			for (; !Val(i) && (i + b < _len) && (b > 1 ); i++, b--);

			//advance up to K if less than, and not final
			for (; (i + b < _len) && (b < _k); b++)
			{
				pad = true;
				if (Val(i + b)) a++;
				lp = Val(i + b);
			}

			//until we either padded and didn't get a 1, or we had a raising edge and didn't get a 1.			
			for (; (i + b < _len) && (Val(i + b) || (!re && !pad)); b++)
			{
				if (Val(i + b))
				{
					a++;
					re |= !lp;
				}
				lp = Val(i + b);
			}
		}

		//retract the head towards tail.
		inline void Retract()
		{
			bool re = false;
			bool fe = false;

			//extend didn't work out; move forward.
			for (; !(fe && Val(i)) && (i + _k < _len) && b > 0; i++, b--)
				if (Val(i))
					a--;
				else
					fe = true;
		}
	};
}
#endif /*KPLUS_H*/

Jayjay · Dec 13, 2020

Still doesn't seem quite right. Try it with string "011101011111111110" and k=13. Your algo gives the answer "1011111111110" (last 13 characters) when it should be "1110101111111111".

barbos · Dec 13, 2020

Test mine

I added "extending". Does not really improve that much. Cumulative skips only small bit larger than before, but added code makes single n-pass a bit longer.
Not really worth it. "Better" is an enemy of "good"

Code:

#!/usr/bin/python3

def count(arr, r0):
  i=r0
  for i in range(r0, len(arr)):
    if arr[i] == 0:
      break
  return i-r0
def scan_array(arr,k, sum):
  best=sum
  solutions=[0]
  for i in range(k,len(arr)):
    sum+=(arr[i]-arr[i-k])
    if sum>best:
      best=sum
      solutions=[i-k+1]
    elif sum == best:
      solutions.append(i-k+1)
  ad2=0
  bs=solutions[0]
  for sol in solutions:
    n=count(arr,sol+k)
    if n>ad2:
      ad2=n
      bs=sol
  return bs, best, ad2

def scan_2k(s2,k):
  arr=[int(x) for x in s2]
  sum=0
  for i in range(k):
    sum+=arr[i]
  k2=min(2*k,len(arr))
  best_density=0.
  ones=0
  add=0
  skipped=0
  scanned=0
  pos=0
  bl=k
  for i in range(k,k2):
    if add:
      sum+=arr[i]
      skipped+=1
      add-=1
      continue
    ones=ones+1
    b=float(ones)/float(i)
    if b > best_density:
      scanned+=1
      p, ones, add = scan_array(arr,i,sum)
      ones+=add
      b=float(ones)/float(i+add)
      if(b>best_density):
        best_density=b
        pos=p
        bl=i+add
      add+=1
    else:
      skipped+=1
    sum+=arr[i]
  frm="k={} Best={} pos={} len={} skipped={}/{}"
  print(frm.format(k,best_density, pos,bl,skipped,skipped+scanned))
  print("substring", s2[pos:(pos+bl)], "\n")
str2="1011110011010110000010110001111000111010111101001010100100011101010111010101001010000011010000100001100011101000100111010111111111100000"
for i in range(1,50):
  scan_2k(str2, i)

scan_2k("011101011111111110", 13)
scan_2k("00000000000000000000000", 13)

barbos · Dec 13, 2020

Jayjay said:
Still doesn't seem quite right. Try it with string "011101011111111110" and k=13. Your algo gives the answer "1011111111110" (last 13 characters) when it should be "1110101111111111".

mine says:

k=13 Best=0.875 pos=1 len=16 skipped=4/5
substring 1110101111111111

Jarhyn · Dec 13, 2020

barbos said:

Test mine

I added "extending". Does not really improve that much. Cumulative skips only small bit larger than before, but added code makes single n-pass a bit longer.
Not really worth it. "Better" is an enemy of "good"

Code:

#!/usr/bin/python3

def count(arr, r0):
  i=r0
  for i in range(r0, len(arr)):
    if arr[i] == 0:
      break
  return i-r0
def scan_array(arr,k, sum):
  best=sum
  solutions=[0]
  for i in range(k,len(arr)):
    sum+=(arr[i]-arr[i-k])
    if sum>best:
      best=sum
      solutions=[i-k+1]
    elif sum == best:
      solutions.append(i-k+1)
  ad2=0
  bs=solutions[0]
  for sol in solutions:
    n=count(arr,sol+k)
    if n>ad2:
      ad2=n
      bs=sol
  return bs, best, ad2

def scan_2k(s2,k):
  arr=[int(x) for x in s2]
  sum=0
  for i in range(k):
    sum+=arr[i]
  k2=min(2*k,len(arr))
  best_density=0.
  ones=0
  add=0
  skipped=0
  scanned=0
  pos=0
  bl=k
  for i in range(k,k2):
    if add:
      sum+=arr[i]
      skipped+=1
      add-=1
      continue
    ones=ones+1
    b=float(ones)/float(i)
    if b > best_density:
      scanned+=1
      p, ones, add = scan_array(arr,i,sum)
      ones+=add
      b=float(ones)/float(i+add)
      if(b>best_density):
        best_density=b
        pos=p
        bl=i+add
      add+=1
    else:
      skipped+=1
    sum+=arr[i]
  frm="k={} Best={} pos={} len={} skipped={}/{}"
  print(frm.format(k,best_density, pos,bl,skipped,skipped+scanned))
  print("substring", s2[pos:(pos+bl)], "\n")
str2="1011110011010110000010110001111000111010111101001010100100011101010111010101001010000011010000100001100011101000100111010111111111100000"
for i in range(1,50):
  scan_2k(str2, i)

scan_2k("011101011111111110", 13)
scan_2k("00000000000000000000000", 13)

when doing as many passes as anything that does this, there are two concerns, one is reallocation, wich absolutely must be avoided.

also, yours is written in python with no comments or documentation, duck typing, indexing into bloated types and no inlining, and no reusability, and use of floating point arithmetic.

extend isn't correct because it directly lowers cycle count, its correct because it retains the integrity of the numerators and denominators while operating full segments and not testing every segment.

it also prevents rounding errors at high N and low K.

python is for big, high level things.

Jarhyn · Dec 13, 2020

Jayjay said:
Still doesn't seem quite right. Try it with string "011101011111111110" and k=13. Your algo gives the answer "1011111111110" (last 13 characters) when it should be "1110101111111111".

also, what is your i?

Programming puzzle: highest proportion of 1s in a substring

Contributor

Wizard

Contributor

Contributor

Contributor

Contributor

Wizard

Contributor

Wizard

Contributor

Wizard

Contributor

Contributor

Wizard

Wizard

Contributor

Contributor

Contributor

Wizard

Wizard