[ 3 / biz / cgl / ck / diy / fa / ic / jp / lit / sci / vr / vt ] [ index / top / reports ] [ become a patron ] [ status ]
2023-11: Warosu is now out of extended maintenance.

/sci/ - Science & Math


View post   

File: 9 KB, 312x375, circles.png [View same] [iqdb] [saucenao] [google]
4340373 No.4340373 [Reply] [Original]

Help me, /sci/, you're my only hope.

I have a set S of 1.7 million points on a 2D plane. I want to reduce that to a more manageable set, S', which is a subset of S such that for a chosen fixed radius r every point in the that plane [i.e. any point, not just those in S] that was within radius r of a point in S remains within radius of a point in S'.

The specific case is to do with postcodes in the UK; I want a minimal* list of postcodes so that every point in the UK is within a given distance of one.

*: I don't really care about it being provably minimal, but smaller is far better than bigger.

Any suggestions on how I should approach this?

>> No.4340388

Provably minimal would be impossible anyway unless given some futuristic magic computing machine.

Let me think about this for a few minutes and I'll get back to you.

>> No.4340424

>>4340388
That'll be really appreciated, cheers.

>> No.4340426

Okay, so I'd first create an array of length 1.7M that contains the list of every node within 2r of the current node. In 2D, I'm pretty sure this takes less than <span class="math">O(n^2)[/spoiler] time and is actually doable. This also fits withing your RAM assuming you don't have that many postboxes too close to each other.

I'd create an empty list L which will store the nodes I want in the end.
Starting from there, I'd:
1) run through all the centers in my array, adding to L the nodes I go through, unless I've already added to L one of their neighbors: <span class="math">O(nd)[/spoiler] (where <span class="math">d[/spoiler] is the degree of the neighborhood graph),
2) run through all the centers, computing the maximum new area that a new center can cover if added to L: <span class="math">O(nd)[/spoiler] or <span class="math">O(nd^2)[/spoiler] or something like that which is linear in <span class="math">n[/spoiler] and maybe more than linear in <span class="math">d[/spoiler], let's say <span class="math">O(nf(d))[/spoiler],
3) run through all the centers, adding to L those that add almost the maximum new area (figure out this parameter by trial and error, the higher the better as long as the algorithm completes in a reasonable time), under the condition that no neighbors has been added yet: <span class="math">O(nd)[/spoiler] if you stored the new area per center, <span class="math">O(nf(d))[/spoiler] otherwise,
4) Go back to step 2 until your maximum is 0.

>> No.4340436

>>4340426
I think this is a realistic approach if <span class="math">f(d)[/spoiler] isn't too big for your value of <span class="math">d[/spoiler]. The only hard part is to code the procedure to figure out the new area covered by a new disk, but this can be estimated instead of being done in an exact manner. You can use squares instead of circles, for instance, maybe squares larger than the circle for the first passes of the algo, and then squares inscribed in the circle for the last passes, so the there is indeed no point left that should have been covered and isn't covered because of the approximation.

>> No.4340443

Also if you don't mind, I think I'll use that problem in tests. It's kinda cool.

>> No.4340510

> use that problem in tests.
Go for it. If people post code for other people on the internet, even better :)

The way I was thinking of attacking it was to create a grid of some kind (rectilinear/triangular overlaps) and then try to cover the gaps inbetween by adding new points; still undecided on how to tackle it, but will certainly consider your approach.

I'm expecting a LOT of neighbours. There's 1.7 million postcode centroids and the UK has a surface area of 94,000 square miles; so for a 5km radius that's around thousand or so postcodes on average; some regions (London) are going to have a lot more.

And yet in some regions (e.g. Scotland) they're probably of the order of miles apart from each other...

>> No.4340527

>>4340510
Yeah. I didn't think of it that way. If the array of neighbors still fits in the RAM, I think the idea can still work, even if it might be less efficient than I though. Otherwise I don't really see how it can be cleverly adapted, besides running the algorithm area by area with a limited number of points, and then once dense areas have been pruned, run it again on the result (losing a bit more of optimality on the borders of the areas, but whatever...).

>> No.4340571

Oooh.
Sort list by distance from a single point.
Start at that point, our distance so far d=0
Find all postcodes at distances between d and d+2r
Sort by angle between north pole, origin and position of postcode.
Select points such that annulus of radius r..r+1 is complete
Iterate.
Any holes in that plan?

>> No.4340609

>>4340571
If I understood you well and you pick the centers in an annulus and you want some disks centered in it that cover the whole annulus, you might not be able to, since part of the coverage might come from the outer or inner annulus. However this can probably be fixed and is a way to reduce the problem to smaller problems (filling the annuli). I'm not sure whether it's actually better than dividing into regions with square shapes, for instance. I think that using squares as shapes reduces the amount of borders (if taking roughly same kind of size of subproblems), which reduces the border effects. Though, if for some reason it's much simpler to treat the subproblems because of the easy way in which you can sort the centers by angle, it would probably be a good pruning before maybe running a more complex and closer to optimal algorithm.