• Welcome to the Internet Infidels Discussion Board.

Algoritms

I am not set up to-do it right now. It would be easy to create a Radom array of values and create an array or random indexes and see what the average search time, number of trials to find a value, vs a linear search of the data array.

In the linear search you'll find it on average in n/2 tries.

In the random case you'll find it on average in n/2 unique tries. Duplicate tries have no chance of finding it and you have the random call added to the code.

It should be obvious that the latter is on average slower.
 
For data with length = n, searching unsorted data will have a search time O(n). Sorting it will take time O(n*log(n)), but searching can go down to O(log(n)) by doing binary search. One can go even further with a hash table. One creates a table with size h, and one maps each data value into one of the h table indices. This works O(1) on average for n < h, but for n > h, one gets lots of "hash collisions", and one gets O(n/h) for unsorted data in each hash bin, and O(log(n/h)) for sorted data.


Hash searching only works for known data, data values that are among those used to create the data. But I once created a hash search that checked to see whether the input value was in the hash table, and if it wasn't, then it did a full-scale search and updated the hash table with it.
 
Hazing is essentially a coarse presort. Hence the term 'hash tag'. To add to the list the new entry is run through an algorithm to generate a tag. The new item is added to the bucket for that hash tag.

To search run the search item through the algorithm and search the bucket for the hash tag.

You could have a hash tag based on the first two letters of a name. All names with those two letters in any order are placed in a file or an array in memory.
 
I'll now get into sorting algorthms. There are several of them that have been devised.

Bubble sort is a simple and dumb one. Compare neighbors going from one end to the other, swapping them if they are out of order. Repeat until sorted. Time: O(n^2)

Insertion sort features looking for the maximum (or minimum), then putting it at the end (or beginning). This is then repeated for the next-to-maximum (or next-to-minimum) until one has sorted all the array members. Time: O(n^2)

Selection sort is like insertion sort, but it exchanges the maximum (or minimum) with the member on the end (or beginning). Time: O(n^2)

Shell sort, named after its inventor, Donald Shell, works like bubble sort, but with jumps over members in its comparisons. The jumps start out around half the size of the list, then get smaller and smaller until they become 1. The jumps are to do long-distance comparisons, since some list members may have to be moved a long way in the sorting. Time: around O(n^(3/2))

Mergesort involves dividing the list into two halves, sorting them, then merging the sorted half-lists. This kind of sort can be done recursively. Time: O(n*log(n))

Heapsort involves creating an in-place heap tree and then traversing the tree to get the sorted members. Time: O(n*log(n))

Quicksort involves selecting one member as a "pivot", and then moving all members less than it to one side of it, and all members greater than it to the other side. This kind of sort can be done recursively. Time: O(n*log(n))

Quicksort3 is a modification of Quicksort that sorts members into three regions: less than the pivot, equal to the pivot, and greater than the pivot. It is sometimes called the Dutch flag algorithm from that nation having a flag divided into red, white, and blue horizontal bars, from top to bottom. Time: O(n*log(n))
 
Of these, Quicksort gives the best performance on average, though its worst case is O(n^2).


Related to sorting is algorithms for calculating convex hulls, Delaunay tessellation, and Voronoi diagrams of points in n-D flat space. A Delaunay tessellation is a division of part of the space into n-D simplexes, each one with one of its (n+1) vertices a point, and with their circumscribed circles including no other points. A convex hull is the surface of a Delaunay tessellation, with (n-1)-D simplexes. A Voronoi diagram is the dual of a Delaunay tessellation. It breaks he space into regions that are the closest to each one of the points.

In particular, sorting is a 1D version of a Delaunay tessellation, since it involves finding which points are neighbors. In 2D, a Delaunay tessellation is called a Delaunay triangulation. A 1D convex hull is the minimum and maximum points, and a 2D convex hull is a polygon.

A n-D simplex is a set of (n+1) vertices connected to each other. 0D: point, 1D: line segment, 2D: triangle, 3D; tetrahedron, ...
 
There are several algorithms for doing Delaunay triangulations / tessellations, and they often calculate convex hulls as they go.

A simple one that I've coded is to create a preliminary triangulation and then do edge flipping to improve it. This is taking two triangles with a shared edge and making that edge run between the remaining two vertices instead. That is done whenever at least one of those two vertices is outside the other triangle's circumscribed circle (circumcircle).

Descriptions of this algorithm usually prescribe doing the flipping after adding each point, but I prefer to do it after adding all the points. The first three of them make the first triangle, with the convex hull being its perimeter. For the remaining points, if a point falls inside an existing triangle, then that triangle is split into three triangles, with that point a vertex shared by them. Otherwise, triangles are added from the hull to the new point, when they do not overlap existing triangles, and the hull is moved outward to contain the new point.

A variation is the radial sweephull algorithm, where one adds points in radially sorted order from some point. This guarantees that all the points added will be on the outside of the existing triangle.

Another one is divide-and-conquer. This one divides the points into two subsets using some dividing line, finding triangulations for them, and then stitching those triangulations together across that line. One then improves the overall triangulation with edge flipping. This algorithm has obvious analogies with mergesort.
 
For data with length = n, searching unsorted data will have a search time O(n). Sorting it will take time O(n*log(n)), but searching can go down to O(log(n)) by doing binary search. One can go even further with a hash table. One creates a table with size h, and one maps each data value into one of the h table indices. This works O(1) on average for n < h, but for n > h, one gets lots of "hash collisions", and one gets O(n/h) for unsorted data in each hash bin, and O(log(n/h)) for sorted data.


Hash searching only works for known data, data values that are among those used to create the data. But I once created a hash search that checked to see whether the input value was in the hash table, and if it wasn't, then it did a full-scale search and updated the hash table with it.

Searching is n/2 because on average you'll only have to look at half of it before finding your value.

And hash searching can also work when you're going to do enough searches that it's worthwhile to scan the entire space and build hashes. Or save the hashes from when you did look at the whole thing. There's also the quasi-hash I wrote once: I hashed the data but then folded the hash down to 32 bits, those values were saved. I loaded the entire set of the 32 bit values at the start. While 32 bits certainly wasn't enough to identify the record I wanted it was enough to reject the vast majority of records. I only needed to check the real data (which was on disk) one in 4 billion times.
 
For data with length = n, searching unsorted data will have a search time O(n). Sorting it will take time O(n*log(n)), but searching can go down to O(log(n)) by doing binary search.
You can generally sort in linear time, O(n), using radix sort. (There are sorting orders it doesn't work on, but it works on numeric and lexical orders, which are far and away the most commonly needed ones.)

Of these, Quicksort gives the best performance on average, though its worst case is O(n^2).
For "best performance", the devil is always in the details. Going strictly by number of comparisons, Mergesort beats Quicksort by a small constant factor. But for a lot of applications on a lot of machines, Quicksort will make up the difference with better memory locality and fewer cache misses. (Also, there are implementation tricks you can do in Quicksort that make its O(n^2) worst case vanishingly unlikely.)
 
If one only wants a convex hull, then one can use several algorithms, especially in 2D.

A simple one is Jarvis's march or gift wrapping. One starts with the farthest point from the center or the farthest point in some direction, then gives it a vector for what makes it the farthest. One then finds the angles of the other points to that vector, and selects the point with the lowest angle in some direction. One adds it to the hull, then uses the direction to it as its vector and repeats until one gets the first point again. Time: O(n*h) for n points and h hull points.

Graham's scan is like gift wrapping, but works with points sorted in order of their angles relative to the first point and its vector. One searches for each new hull point in the direction of increasing angle. Time: O(n*log(n))

Quickhull is like quicksort. One starts with the two most separated points along one direction, connects them with a line, then finds the point that is the farthest from that line. All three points connected form a triangle, the initial hull. Drop the points inside the triangle from further consideration. For each edge, look for the farthest outside point, and add that point to the hull. The edge with that point forms a triangle, and drop all points inside of it. Repeat with each edge until no more points can be added to the hull. Time: O(n*log(n)) on average, O(n^2) in the worst case, like quicksort.

Divide and conquer. Like for Delaunay triangulation, except that one calculates a hull for each part. One then connects the two hulls to make the overall one.

Andrew's monotone chain, like Graham's scan, uses sorted vertices, but sorted along some linear direction, like a coordinate axis. One first finds the points with minimum and maximum position along that direction, then constructs a half-hull on each side of the line between the two points. Combining the two half-hulls gives the overall one.

The Kirkpatrick–Seidel algorithm, "the ultimate planar convex hull algorithm", is something like divide and conquer, but I have not been able to interpret it any further.

Chan's algorithm consists of dividing up the points into several subsets, finding the convex hull from each subset, and then finding the overall convex hull by treating these hulls' points as input points.


There is a nice trick for reducing the number of points that one needs to look at. In 2D, it's to remove the points inside of the quad (max x) - (max y) - (min x) - (min y). One can do similar things in more dimensions, like remove a similar sort of irregular octahedron in 3D.

Some of these algorithms can be generalized to more dimensions, like gift wrapping.
 
For data with length = n, searching unsorted data will have a search time O(n). Sorting it will take time O(n*log(n)), but searching can go down to O(log(n)) by doing binary search.
You can generally sort in linear time, O(n), using radix sort. (There are sorting orders it doesn't work on, but it works on numeric and lexical orders, which are far and away the most commonly needed ones.)
For unsigned integers: repeat over all the digits, starting with the ones digit and working upwards:

Take the original numbers and put them into two bins, the 0's and the 1's, depending on their digit values. Do not do any other rearrangement. Then concatenate the 0's and the 1's and put them back into the original array.

For signed integers, do sorting by sign after sorting by digit. For twos-complement, one does not have to reverse the order of the negative numbers, but for sign-magnitude, one does. IEEE 754 floating-point numbers may be sortable by treating them as integers with the same content of bits, though they use sign-magnitude instead of twos-complement.
 
For a Delaunay triangulation, a sweepline can be linear as well as radial. It works by guaranteeing that all points added will be outside the existing triangulation.

Flipping works well in 2D, but it is poorly defined in more dimensions. Two triangles with a shared edge may be joined into a quad and then split into two other triangles. But the 3D case is more difficult. Two tetrahedra with a shared face may be joined to make a bipyramid, but there is no straightforward way of splitting it into a different pair of tetrahedra. One can split it into three tetrahedra, however, with both poles and two equator points each.

There is an alternative way to find a Delaunay tessellation. It is to do a "lifting map", to project the points onto a paraboloid with an additional dimension: (x, k*(x.x)) for point x where k is a scaling factor. One then finds the convex hull, projects it back to the points' space, and then selects out the simplexes in that hull with the appropriate orientation. This can be done in any number of dimensions of the points' space.


I now turn to algorithms for generating Voronoi diagrams. There are some algorithms for generating them directly, as opposed to taking the dual of a Delaunay tessellation. Algorithms like incremental construction, divide-and-conquer, and Fortune's sweepline algorithm. As the sweepline progresses, it passes points, and as it does so, the algorithm makes a focus-directrix construction with each point and the sweepline. This construction gives a parabola, and the algorithm constructs the diagram from intersections of these parabolas. It seems rather complicated to implement, I must note.
 
Converting text to a number. Returns a positive integer for a number character, a -1 if there is no match.

int text2int(char text_digit){
switch(text_digit){
case '0': return(0);break;
case '1': return(1);break;
case 2': return(2);break;
case '3': return(3);break;
case '4': return(4);break;
case '5': return(5);break;
case '6': return(6);break;
case '7': return(7);break;
case '8': return(8);break;
case '9': return(9);break;
default: return(-1); } // end switch
} // text2digit() ends

A way with less lines.

ASCII or UNICODE have the same 8 bit codes for numbers, The lower 8 bits are the numerical value. Force the upper 4 bits to 0 and retune the result. The compiler should cast the char to an int.

int text2int(char text_digit){
if(text_digit >= '0' && text_digit <= '9') return(text_digit & 0x0f);
return(-1);
}

ASCII 1 in hex is 0x31 or binary 00110001

'1' 0x31 & 0x0f = 0x01 or a numerical 1.
 
Last edited:
Converting text to a number. Returns a positive integer for a number character, a -1 if there is no match.

int text2int(char text_digit){
switch(text_digit){
case '0': return(0);break;
case '1': return(1);break;
case 2': return(2);break;
case '3': return(3);break;
case '4': return(4);break;
case '5': return(5);break;
case '6': return(6);break;
case '7': return(7);break;
case '8': return(8);break;
case '9': return(9);break;
default: return(-1); } // end switch
} // text2digit() ends

A way with less lines.

ASCII or UNICODE have the same 8 bit codes for numbers, The lower 8 bits are the numerical value. Force the upper 4 bits to 0 and retune the result. The compiler should cast the char to an int.

int text2int(char text_digit){
if(text_digit >= '0' && text_digit <= '9') return(text_digit & 0x0f);
return(-1);
}

ASCII 1 in hex is 0x31 or binary 00110001

'1' 0x31 & 0x0f = 0x01 or a numerical 1.

The first version would get you tarred and feathered in a code review. Never thought of doing the latter with an and, it's normally done with subtraction.
 
The way I'd do it is something like this:

Code:
const int BAD_DIGIT = -1;

int digit_value(char digit) {
  int digval = (int)digit;
  if (digval < '0') return BAD_DIGIT;
  else if (digval > '9') return BAD_DIGIT;
  else return (digval - '0');
}
I'm using C++ instead of plain C, because C++ has some nice type-safe alternatives to the C preprocessor, alternatives that are much less bug-prone.

C++:
const int BAD_DIGIT = -1;
plain C:
#define BAD_DIGIT (-1)

C++:
template <class N> N square(N &x) {return x*x;}
plain C:
#define square(x) (x*x)
 
I'll do an integer-power algorithm that implicitly uses a number's binary representation.

Code:
// Handles non-positive n
template<class T, class N> T int_power(T &x, N n)
{
  if (n > 0) return posint_power(x,n);
  else if (n < 0) return posint_power(1/x,-n);
  else return 1;
}

// Recursively goes up the binary representation of n
template<class T, class N> T posint_power(T &x, N n)
{
  // Quotient and remainder by 2
  N nhalf = n >> 1;
  N lastdigit = n - (nhalf << 1);
  // Default value of the output
  T val = 1;
  // Decide whether or not to go to the next digit up
  if (nhalf > 0)
  {
    val = posint_power(x*x,nhalf);
    if (lastdigit != 0)
      val *= x;  
  }
  else
  {
    if (lastdigit != 0)
      val = x;
  }
  return val;
}
I haven't tested any of this code, but you can try doing so if you wish.
 
Converting text to a number. Returns a positive integer for a number character, a -1 if there is no match.

int text2int(char text_digit){
switch(text_digit){
case '0': return(0);break;
case '1': return(1);break;
case 2': return(2);break;
case '3': return(3);break;
case '4': return(4);break;
case '5': return(5);break;
case '6': return(6);break;
case '7': return(7);break;
case '8': return(8);break;
case '9': return(9);break;
default: return(-1); } // end switch
} // text2digit() ends

A way with less lines.

ASCII or UNICODE have the same 8 bit codes for numbers, The lower 8 bits are the numerical value. Force the upper 4 bits to 0 and retune the result. The compiler should cast the char to an int.

int text2int(char text_digit){
if(text_digit >= '0' && text_digit <= '9') return(text_digit & 0x0f);
return(-1);
}

ASCII 1 in hex is 0x31 or binary 00110001

'1' 0x31 & 0x0f = 0x01 or a numerical 1.

The first version would get you tarred and feathered in a code review. Never thought of doing the latter with an and, it's normally done with subtraction.

The first method is bullet proof and portable. It should work on any compiler. The second method should work but I am not set up to run it.

My largest project was around 10,000 lines of code. Isuues of mainyainablity and readability matter.

For someone later on getting into the code the first method would be obvious to read and understand. I new of situations when people left a company code had to be compely rewritten. I had someone under me on a project who left the company and I had to rewrite from scratch.

The only difference between the two methods would be the number of assembly language instructions the compiler generates and hence speed of execution, possibly.

I took FORTRAN in the 70s. The teacher walked into the class and wrote on the board KISS, and said 'Keep It Simple Stupid'.

Being clever can be counter productive. Adds time to try and always minimize lines of code.

Ipetrich has the right idea, explore. It improves your skill even if it is not the best.
 
Next converting a real number text string to a number. After that a simple calculator that will parse a string with two numbers and add, subtract, multiply, or divide.
 
The first version would get you tarred and feathered in a code review. Never thought of doing the latter with an and, it's normally done with subtraction.

The first method is bullet proof and portable. It should work on any compiler. The second method should work but I am not set up to run it.

My largest project was around 10,000 lines of code. Isuues of mainyainablity and readability matter.

For someone later on getting into the code the first method would be obvious to read and understand. I new of situations when people left a company code had to be compely rewritten. I had someone under me on a project who left the company and I had to rewrite from scratch.

If the problem is readability, then surely the solution is comments?

For example:

Code:
/**
 * Converts a digit character (0 to 9) to an integer. Assumes ASCII or Unicode encoding.
 *
 * @return int the integer represented by the given character. Returns -1 if the character is not a digit
 */
int text2int(char text_digit)
{
    // If the given char is actually a digit...
    if(text_digit >= '0' && text_digit <= '9') 
        // ...then use a bitmask to get its value as an integer and return it as an int.
        // @see http://www.asciitable.com
        return text_digit & 0x0f;

    return -1;
}

Disclaimer: I'm not competent in C.

I rely on open-source software to build websites, so I have to read a lot of other people's code. I actually do far more code-reading than code-writing. If the comments tell me what an algorithm is doing at each statement or block, I can understand it what it does without immediately understanding the code's logic.

Comments also help people find bugs, because it makes it easy to see when a algorithm doesn't do what the author says it does. I've had to bughunt in code I wrote years ago and I don't remember what I was thinking when I wrote it.
 
I'm using C++ instead of plain C, because C++ has some nice type-safe alternatives to the C preprocessor, alternatives that are much less bug-prone.

Yup. I've long said that programs, like ships, sink in the C.
 
Converting text to a number. Returns a positive integer for a number character, a -1 if there is no match.

int text2int(char text_digit){
switch(text_digit){
case '0': return(0);break;
case '1': return(1);break;
case 2': return(2);break;
case '3': return(3);break;
case '4': return(4);break;
case '5': return(5);break;
case '6': return(6);break;
case '7': return(7);break;
case '8': return(8);break;
case '9': return(9);break;
default: return(-1); } // end switch
} // text2digit() ends

A way with less lines.

ASCII or UNICODE have the same 8 bit codes for numbers, The lower 8 bits are the numerical value. Force the upper 4 bits to 0 and retune the result. The compiler should cast the char to an int.

int text2int(char text_digit){
if(text_digit >= '0' && text_digit <= '9') return(text_digit & 0x0f);
return(-1);
}

ASCII 1 in hex is 0x31 or binary 00110001

'1' 0x31 & 0x0f = 0x01 or a numerical 1.

The first version would get you tarred and feathered in a code review. Never thought of doing the latter with an and, it's normally done with subtraction.

The first method is bullet proof and portable. It should work on any compiler. The second method should work but I am not set up to run it.

It's usually done by subtracting an ascii 0, that makes it very clear what is going on.

My largest project was around 10,000 lines of code. Isuues of mainyainablity and readability matter.

What's open in Visual Studio right now is 33k lines of code. My largest I would guess was upwards of 200k.

The only difference between the two methods would be the number of assembly language instructions the compiler generates and hence speed of execution, possibly.

And speed of loading the program.

I took FORTRAN in the 70s. The teacher walked into the class and wrote on the board KISS, and said 'Keep It Simple Stupid'.

Being clever can be counter productive. Adds time to try and always minimize lines of code.

Ipetrich has the right idea, explore. It improves your skill even if it is not the best.

I do agree that being too clever can be counterproductive. I find myself avoiding some of the fancy C# features for this reason. I just don't think the one line version is being too clever if it's done by subtraction. The and version is unnecessarily clever.
 
Back
Top Bottom