Skip to content

Conversation

@etienneJr
Copy link
Contributor

Hi,
I have adapted the osmosis backend for osm2pgsql.

  • warning n°1 : I haven't extensively tested if the query results are correct, but at least it works !
  • warning n°2 : I haven't tested with docker at all, only with bundle

If anyone would like to try it, that would be great!
Comments welcome! Thanks!

@etienneJr
Copy link
Contributor Author

I corrected 3 bugs :

  • a typo in config.ru
  • the ids of areas coming from relations require the overpass format 3600000000+id
  • some types conversion : it seems there has been types modification (from text to jsonb) for tags and members columns in planet_osm_rels table between osm2pgsql 1.6 (ubuntu 22) and 1.11 (ubuntu 24)

@etienneJr
Copy link
Contributor Author

I've improved view.sql to provide different versions depending on whether the osm2pgsql database has been created with the --slim option or not. And I've improved the metadata commands so that they remain valid even if the database was created without the --extra-attributes option (in which case all the metadata are NULL).

@etienneJr
Copy link
Contributor Author

Salut @frodrigo j'ai bien avancé. La plupart des requêtes fonctionnent (sélection d'1 élément, filtrage par tag, filtrage par area) mais pas le filtrage sur une bbox : ça ne donne pas d'erreur, mais ça renvoie toujours 0 éléments...

Quand je copie la requête sql complète depuis le terminal dans pgadmin pour tester, je vois que c'est bien cette ligne qui pose pb, celle qui teste l'intersection de la bbox avec les objets :
AND ST_Intersects(ST_Envelope('SRID=4326;LINESTRING(-1.75 48.05, -1.6 48.15)'::geometry), geom)

J'ai beau faire des modifs dans cette ligne, je n'arrive pas à la faire fonctionner.

Peux tu me confirmer que ça marche bien dans la version pour osmosis ? Une idée d'où pourrait venir le pb ?
Merci !

@frodrigo
Copy link
Member

Nice.

Please can you move the non docker readme part to an other PR , but into the main readme, as non specific to any backend.

@frodrigo
Copy link
Member

AND ST_Intersects(ST_Envelope('SRID=4326;LINESTRING(-1.75 48.05, -1.6 48.15)'::geometry), geom)

It as about projection mismatch, no ? Overpass-QL are in 4326, but osm2Pgsql data should be in other (3857 ?).

@etienneJr
Copy link
Contributor Author

It as about projection mismatch, no ? Overpass-QL are in 4326, but osm2Pgsql data should be in other (3857 ?).

You were right, thanks! I modified my view.sql with ST_Transform(p.way,4326) AS geom to get consistent SRID, and bbox filtering is now working perfectly. Do I need to test other request types ? (other than filtering by id, tags, area, bbox)

I forced push 1 commit merging all the previous ones, but based on #4. Maybe I will have to rebase this PR after you merge #4?

The localhost is relative the docker container, not to your "localhost" host. Use eg. your host name.

Warning: I only tested with bundle. I did not manage to test with docker. Is it possible to set the DATABASE_URL value in the yaml file so as to connect to my local database ?!?

@etienneJr
Copy link
Contributor Author

@frodrigo I managed to run the server using docker! I documented in the readme the changes I made in docker-compose.yaml to allow connection to an "external" database.

The instance is currently deployed at http://51.159.100.169:9292/interpreter covering whole France, you can try it.

In my first tests, tags and bbox filtering works well. But area filtering reaches timeout... It is not the case on my local instance with a filtered Bretagne. Any idea of how to solve it?

@etienneJr
Copy link
Contributor Author

You were right, thanks! I modified my view.sql with ST_Transform(p.way,4326) AS geom to get consistent SRID, and bbox filtering is now working perfectly.

But area filtering reaches timeout...

@frodrigo I spoke too soon, and I've understood where the timeouts come from: requests using geom in SRID 4632 are 50x slower than exactly the same request using geom in SRID 3857 (whether the geometry is transformed on the fly or pre-calculated in a separate column)

For example this simple request (2 playgrounds in a park) takes 44ms using SRID 3857 and 2,2s using SRID 4632

[out:json][timeout:25];
area(23255048)->.a;
nwr["leisure"="playground"](area.a);
out tags;

Is this normal and known? Or is there a bug somewhere? What can I do to fix it? Thanks!

@frodrigo
Copy link
Member

frodrigo commented Apr 1, 2025

The issue is the ST_Transform here avoid using the spatial index.

The good way to to this is to transform the bbox (or other spatial filter) to database projection, to use the spatial index, then transform the result to 4326.

Maybe the transformation should be done in at the engine level.

@etienneJr
Copy link
Contributor Author

The issue is the ST_Transform here avoid using the spatial index.

yes! I'd thought of that myself in the meantime, thanks for confirming!

Maybe the transformation should be done in at the engine level.

You mean here in overpass_parser-rb ?

@frodrigo
Copy link
Member

frodrigo commented Apr 1, 2025

@etienneJr
Copy link
Contributor Author

More here, the to_sql should know the projection and transform the filter
https://github.com/teritorio/overpass_parser-rb/blob/0937495e2eba50ff1a3b4c1f42541fbb41814ec7/lib/overpass_parser/nodes/filters.rb#L46

OK understood I need to transform the filter from 4326 (used in bbox coordinates) to the one used in DB geom (3857 here) before comparing to geom. For example for a bbox filter, it gives :
ST_Intersects(ST_Transform(ST_Envelope('SRID=4326;LINESTRING(-1.75 48.05, -1.6 48.15)'::geometry),3857), geom)

But I don't know how to optimally detect the SRID of the geom column in the DB... and where to put this detection.
I tried to replace 3857 by ST_SRID(geom) but the request is 10x slower since it recalculate ST_SRID for each geom value in the column.

I though about using find_SRID(db/schema, table, column) but table and column will be different whether we use osmosis or osm2pgsql ... Do I need to put it in view.sql ? (seems strange to me since this file is outside overpass_parser-rb, but indeed view.sql will know on which column detect the SRID)

@frodrigo
Copy link
Member

frodrigo commented Apr 1, 2025

I was just thinking about passing it to the program, without any kind of detection. Eg; with en env var, like the DB conn.

@etienneJr
Copy link
Contributor Author

etienneJr commented Apr 1, 2025

OK clear. I tried but in fact there is 9 different definitions def to_sql(sql_dialect, ... with different parameter list, and then 30 uses of to_sql in overpass_parser-rb, so this is too complicated for me to understand how to pass a srid parameter from the top to nodes/filters.rb ...

@etienneJr
Copy link
Contributor Author

I've just discovered that osm2pgsql has a --proj=SRID option for indicating which projection you want to use when creating the database, replacing the default spherical Mercator 3857! I tried with 4632 and it worked well. Using the option while creating the database will resolve all the SRID issues in overpass_parser-rb, no more need to modify it to take account of different SRIDs.

But it won't work for people who already have their osm2pgsql database (created with default 3857) before deploying an Underpass instance (Typically if OSM-FR wants to deploy an instance on the database used for raster tiles computation ...).

@frodrigo What do you think about it?

@frodrigo
Copy link
Member

frodrigo commented Apr 5, 2025

If you osm2pgsql import is also used for tile rendering you need to be in 3857. Because standard tiles are based on it.

OK clear. I tried but in fact there is 9 different definitions

Filter are used from query_object
https://github.com/teritorio/overpass_parser-rb/blob/master/Overpass.g4#L64

It called from https://github.com/teritorio/overpass_parser-rb/blob/master/lib/overpass_parser/nodes/query_objects.rb#L56

@joto
Copy link

joto commented Apr 6, 2025

May I suggest that you have a look at the flex output of osm2pgsql. That allows much more flexible definition of the output of osm2pgsql which would probably allow you to define the tables in the way you need them without extra views and such. The old pgsql output is deprecated and will be removed at some point, so if you are starting with a new project, it definitely makes more sense to base it on current osm2pgsql.

@etienneJr
Copy link
Contributor Author

etienneJr commented Apr 6, 2025

May I suggest that you have a look at the flex output of osm2pgsql.

Yes indeed! When I started this PR, I was a beginner on osm2pgsql, so I thought I could write sql queries adapted to existing osm2pgsql DBs (I only knew the pgsql output*). I've since realised that this makes no sense, as there are as many possible schemas as there are users, thanks to the power of flex output. So I'm thinking of rewriting this PR for the case of a new osm2pgsql DB, which would be built specifically for a new underpass instance to mimic overpass with all OSM elements (at least in a given geographic area). My strategy :

  • osm2pgsql:
    • define output tables with only the geometry : 3 tables for points, lines and polygons (I don't think I need to split long ways or multipolygon since I won't use the DB for tiles calculation, but maybe it's still necessary?)
    • create an index for the tags in the middle tables (they are already there, identical to those in OSM, so there is no need to duplicate or filter them in the output tables)
  • underpass:
    • create views to join middle tables and output tables to get tags and geometries together.

* my feedback suggests that osm2pgsql manual is giving a wrong message: I didn't even understand that an output other than pgsql was possible. I think I will create an issue. [edit] it's done.

@etienneJr etienneJr force-pushed the osm2pgsql-backend branch 3 times, most recently from aa159f8 to ddbf255 Compare April 7, 2025 22:21
Copy link

@joto joto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments from glancing over the code.

-- Returns true if there are no tags left.

-- modifié : retourne vrai si aucun tag
local function clean_tags(tags)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this if you don't change the tags. The process_* functions will no be called for objects without tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ! I've kept it in case the user (including myself) changes his mind and wants to filter tags. For example, to remove all objects that only have the building key, since they're almost useless for an underpass instance. But this will only reduce the size of the base slightly, as it's the middle tables that weigh the most.

@@ -0,0 +1,8 @@
CREATE EXTENSION IF NOT EXISTS htsore;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need hstore? If yes, htsore will not work. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops a typo 😬 Anyway, I don't know if it's worth keeping the postgres docker or not ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fixed on original postgres backend.


tables.points:insert({
-- tags = object.tags,
geom = geom -- the point will be automatically be projected to 3857
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is wrong, because you set 4326 above.

-- modif : on enregistre la géométrie directement en multilinestring
-- en mergeant les lignes le plus possible
tables.lines:insert({
geom = object:as_multilinestring():line_merge()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geom = object:as_linestring() is sufficient. A way can not contain a multilinestring that needs to be merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed I had mixed up with the relations

local relation_type = object:grab_tag('type')

-- Store route relations as multilinestrings
if relation_type == 'route' or relation_type == 'associatedStreet' or relation_type == 'public_transport' or relation_type == 'waterway' then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you representing an associatedStreet relation as a linestring? It is supposed to connected streets with houses. The "geometry" of that is much more complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I wanted to include associatedStreet relations since the French community loves them, but this lines table isn't appropriate. But neither is the polygons table, since I wanted to include in it only the geometries corresponding to areas in overpass...

@@ -0,0 +1,156 @@
/************** ABOUT TABLES *****************/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of these comments are out of date now.

@etienneJr
Copy link
Contributor Author

Some quick comments from glancing over the code.

Thanks a lot for the review and the constructive comments!

@etienneJr
Copy link
Contributor Author

@joto By the way, what do you think about the strategy? (using middle tables for tags, and flex output tables for geometries) Do you have any advice or warning ?

I made this choice since I want a synchronised DB. But in the end, I think I'll also push another backend (or another view.sql) for people who want a very light instance with no middle tables.

@joto
Copy link

joto commented Apr 8, 2025

@etienneJr It is certainly a valid approach. Queries will be a bit slower but uses less disk space. If you are using a flat node file though, you'll not have all nodes in the database, which might or might not be what you want.

@frodrigo
Copy link
Member

frodrigo commented Apr 8, 2025

Note. I just add the support for SRID and ST_Transform.

@etienneJr
Copy link
Contributor Author

Note. I just add the support for SRID and ST_Transform.

Thanks, that's perfect for existing osm2pgsql databases which use SRID 3857.
For DB using 4326, I am wondering whether the useless ST_Transform (geom, srid) will slow down the requests ?

@etienneJr
Copy link
Contributor Author

etienneJr commented Apr 11, 2025

I pushed a new version yesterday to solve an issue with negative ids for relations in osm2pgsql polygons table (area type) : indexation was not used when I wanted to join the middle table (which uses positive ids) with an output table (which uses negative ids), so the execution time was very long. So I choose to store geometries in 1 table per element type (nodes_geom, ways_geom, rels_geom).

But when I try this new version, I am wondering whether I still have indexation issues or not. For example this simple SQL request (to get 419 playgrounds near Rennes):

SELECT * FROM nwr  WHERE
    (tags->>'leisure' = 'playground') AND
    ST_Intersects(ST_Envelope('SRID=4326;LINESTRING(-1.75 48.05, -1.6 48.15)'::geometry), geom);

takes :

  • 50 ms on a database covering Bretagne (7Go, 380k OSM elements)
  • 900 ms on a database covering France (60 Go, 100M OSM elements)

(geom has been automatically indexed by osm2pgsql, tags(json) has been indexed manually after DB creation)

2 different people told me that I may have an issue in indexation since the execution time should be almost the same whatever the DB size if the indexation is correct.
What do you think about it ?

EXPLAIN ANALYZE on FR database

Gather (cost=1812.38..1097873.60 rows=774 width=419) (actual time=560.387..926.899 rows=419 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Parallel Append (cost=812.38..1096796.20 rows=322 width=419) (actual time=509.517..866.942 rows=140 loops=3)
  -> Nested Loop Left Join (cost=1164.87..624385.96 rows=188 width=266) (actual time=162.982..294.217 rows=29 loops=3)
    -> Nested Loop (cost=1164.57..624218.11 rows=188 width=156) (actual time=162.961..294.016 rows=29 loops=3)
     -> Parallel Bitmap Heap Scan on nodes_geom g (cost=1164.14..549515.56 rows=37834 width=40) (actual time=160.483..186.684 rows=42757 loops=3)
       Filter: st_intersects('0103000020E61000000100000005000000000000000000FCBF6666666666064840000000000000FCBF33333333331348409A9999999999F9BF33333333331348409A9999999999F9BF6666666666064840000000000000FCBF6666666666064840'::geometry, geom)
       Rows Removed by Filter: 3
       Heap Blocks: exact=60
       -> Bitmap Index Scan on nodes_geom_geom_idx (cost=0.00..1141.43 rows=90802 width=0) (actual time=15.998..15.998 rows=128444 loops=1)
        Index Cond: (geom && '0103000020E61000000100000005000000000000000000FCBF6666666666064840000000000000FCBF33333333331348409A9999999999F9BF33333333331348409A9999999999F9BF6666666666064840000000000000FCBF6666666666064840'::geometry)
     -> Index Scan using planet_osm_nodes_pkey on planet_osm_nodes n (cost=0.44..1.97 rows=1 width=124) (actual time=0.002..0.002 rows=0 loops=128271)
       Index Cond: (id = g.id)
       Filter: ((tags ->> 'leisure'::text) = 'playground'::text)
       Rows Removed by Filter: 1
    -> Index Scan using planet_osm_users_pkey on planet_osm_users u (cost=0.29..0.89 rows=1 width=14) (actual time=0.006..0.006 rows=1 loops=87)
     Index Cond: (id = n.user_id)
  -> Nested Loop Left Join (cost=812.38..449214.73 rows=129 width=501) (actual time=243.775..517.529 rows=164 loops=2)
    -> Nested Loop (cost=812.09..448947.35 rows=129 width=427) (actual time=243.751..516.863 rows=164 loops=2)
     -> Parallel Bitmap Heap Scan on ways_geom g_1 (cost=811.52..384733.47 rows=25812 width=197) (actual time=235.777..305.550 rows=64469 loops=2)
       Filter: st_intersects('0103000020E61000000100000005000000000000000000FCBF6666666666064840000000000000FCBF33333333331348409A9999999999F9BF33333333331348409A9999999999F9BF6666666666064840000000000000FCBF6666666666064840'::geometry, geom)
       Rows Removed by Filter: 6
       Heap Blocks: exact=1196
       -> Bitmap Index Scan on ways_geom_geom_idx (cost=0.00..796.03 rows=61948 width=0) (actual time=21.138..21.139 rows=129389 loops=1)
        Index Cond: (geom && '0103000020E61000000100000005000000000000000000FCBF6666666666064840000000000000FCBF33333333331348409A9999999999F9BF33333333331348409A9999999999F9BF6666666666064840000000000000FCBF6666666666064840'::geometry)
     -> Index Scan using planet_osm_ways_pkey on planet_osm_ways w (cost=0.57..2.49 rows=1 width=238) (actual time=0.003..0.003 rows=0 loops=128938)
       Index Cond: (id = g_1.id)
       Filter: ((tags ->> 'leisure'::text) = 'playground'::text)
       Rows Removed by Filter: 1
    -> Index Scan using planet_osm_users_pkey on planet_osm_users u_1 (cost=0.29..2.07 rows=1 width=14) (actual time=0.003..0.003 rows=1 loops=329)
     Index Cond: (id = w.user_id)
  -> Nested Loop Left Join (cost=33.67..23193.90 rows=7 width=4093) (actual time=559.139..683.024 rows=3 loops=1)
    -> Nested Loop (cost=33.38..23190.79 rows=7 width=4019) (actual time=559.123..682.996 rows=3 loops=1)
     -> Parallel Bitmap Heap Scan on rels_geom g_2 (cost=32.95..20147.36 rows=1417 width=3426) (actual time=539.848..670.406 rows=3474 loops=1)
       Filter: st_intersects('0103000020E61000000100000005000000000000000000FCBF6666666666064840000000000000FCBF33333333331348409A9999999999F9BF33333333331348409A9999999999F9BF6666666666064840000000000000FCBF6666666666064840'::geometry, geom)
       Rows Removed by Filter: 56
       Heap Blocks: exact=973
       -> Bitmap Index Scan on rels_geom_geom_idx (cost=0.00..32.35 rows=2409 width=0) (actual time=6.465..6.465 rows=4249 loops=1)
        Index Cond: (geom && '0103000020E61000000100000005000000000000000000FCBF6666666666064840000000000000FCBF33333333331348409A9999999999F9BF33333333331348409A9999999999F9BF6666666666064840000000000000FCBF6666666666064840'::geometry)
     -> Index Scan using planet_osm_rels_pkey on planet_osm_rels r (cost=0.43..2.15 rows=1 width=601) (actual time=0.003..0.003 rows=0 loops=3474)
       Index Cond: (id = g_2.id)
       Filter: ((tags ->> 'leisure'::text) = 'playground'::text)
       Rows Removed by Filter: 1
    -> Index Scan using planet_osm_users_pkey on planet_osm_users u_2 (cost=0.29..0.44 rows=1 width=14) (actual time=0.005..0.005 rows=1 loops=3)
     Index Cond: (id = r.user_id)
Planning Time: 3.772 ms
JIT:
 Functions: 135
 Options: Inlining true, Optimization true, Expressions true, Deforming true
 Timing: Generation 11.064 ms, Inlining 243.900 ms, Optimization 740.414 ms, Emission 459.115 ms, Total 1454.493 ms
Execution Time: 932.413 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants