Skip to content

Conversation

@markmaker
Copy link
Contributor

@markmaker markmaker commented Nov 21, 2025

Problem

We get spurious heap corruption.

Diagnosis

After a steep learning curve in such matters, we finally found a way to pinpoint it using valgrind (the key was using the redzone, so it would not immediately corrupt everything):

valgrind --tool=memcheck --leak-check=summary --redzone-size=8 <executable>

It outputs these sections (snippets).
EDIT: installed Debug symbol, now we can even better pinpoint it:

==2666205== Invalid write of size 1
==2666205==    at 0x154AEDA7: UnknownInlinedFun (convert.c:6414)
==2666205==    by 0x154AEDA7: convert_from_pgbinary.isra.0 (convert.c:6235)
==2666205==    by 0x154ABE8E: UnknownInlinedFun (convert.c:1115)
==2666205==    by 0x154ABE8E: UnknownInlinedFun (convert.c:1189)
==2666205==    by 0x154ABE8E: copy_and_convert_field.isra.0 (convert.c:1726)
==2666205==    by 0x154754FA: UnknownInlinedFun (convert.c:810)
==2666205==    by 0x154754FA: SC_fetch (statement.c:1839)
==2666205==    by 0x15469B14: PGAPI_ExtendedFetch (results.c:1902)
==2666205==    by 0x15485922: SQLFetchScroll (odbcapi30.c:245)
==2666205==    by 0x556BF4D: SQLFetchScroll (in /usr/lib/x86_64-linux-gnu/libodbc.so.2.0.0)
...

==2666205==  Address 0x180015b8 is 0 bytes after a block of size 3,576 alloc'd
==2666205==    at 0x4851E10: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2666205==    by 0x154ACDDB: UnknownInlinedFun (convert.c:1071)
==2666205==    by 0x154ACDDB: UnknownInlinedFun (convert.c:1189)
==2666205==    by 0x154ACDDB: copy_and_convert_field.isra.0 (convert.c:1726)
==2666205==    by 0x154754FA: UnknownInlinedFun (convert.c:810)
==2666205==    by 0x154754FA: SC_fetch (statement.c:1839)
==2666205==    by 0x15469B14: PGAPI_ExtendedFetch (results.c:1902)
==2666205==    by 0x15485922: SQLFetchScroll (odbcapi30.c:245)
==2666205==    by 0x556BF4D: SQLFetchScroll (in /usr/lib/x86_64-linux-gnu/libodbc.so.2.0.0)
...

Due to the sequence of ODBC calls, and the exact size of the buffer (3576), we knew it was the BYTEA column in our row:

v_base_pg=# select length(prstntpacked) from v.gusrstns where usrstgusr_userv = 46035;
 length 
--------
   3576
(1 row)

Because we use bound buffers that are smaller than that, and we haven't yet come to the point of calling SQLGetData(), we know this must be an internal buffer that already knows the size of the BYTEA data. Therefore it couldn't be one of our buffers being too small.

It follows that this looks like the extra NULL terminator byte that so plagues the C world. And sure, the code says:

psqlodbc/convert.c

Lines 1058 to 1066 in 6e99e75

if (SQL_C_BINARY == fCType)
{
/*
* Though Binary doesn't have NULL terminator,
* bindcol_localize_exec() needs output buffer
* for NULL terminator.
*/
len_for_wcs_term = 1;
}

Proposed fix

It seems that immediately after this, the len_for_wcs_term is not included in the comparison to assess the need of realloc(). If by chance the buffer was already exactly needbuflen bytes large, it would not be realloc'd.

psqlodbc/convert.c

Lines 1067 to 1073 in 6e99e75

if (changed || needbuflen > cbValueMax)
{
if (needbuflen > (SQLLEN) pgdc->ttlbuflen)
{
pgdc->ttlbuf = realloc(pgdc->ttlbuf, needbuflen + len_for_wcs_term);
pgdc->ttlbuflen = needbuflen;
}

Therefore this PR intends to fix that.

We were unable to test this due to time constraints (this is currently impeding our rollout, we have our hands full 🤕!)

Help in setting up a psqlodbc compilation and local installation and deployment is welcome.

@markmaker
Copy link
Contributor Author

I'm trying to compile the driver myself. Any feedback very welcome. So far I managed to scratch this together:

# this is on a Kubuntu 25.04 with installation according
# to https://wiki.postgresql.org/wiki/Apt (modernized).

sudo apt-get install unixodbc odbcinst unixodbc-dev libpq-dev

# my PR version
git clone https://github.com/markmaker/psqlodbc.git

./bootstrap

# if somebody knows a better/automatic/more correct way to configure these paths, please speak up 🤠
./configure \
  --with-unixodbc=/usr/include/x86_64-linux-gnu/unixODBC/ \
  --with-libpq=/usr/include/postgresql \
  --enable-pthreads \
  CPPFLAGS="-DSQLCOLATTRIBUTE_SQLLEN"

make

sudo make install

Then I copied the regular [PostgreSQL Unicode] section in the /etc/odbcinst.ini to have the /usr/local/lib/ path prefix where make install reported it installed:

...
[PostgreSQL Unicode Fixed]
Description=PostgreSQL ODBC driver (Unicode version)
Driver=/usr/local/lib/psqlodbcw.so
Setup=/usr/local/lib/libodbcpsqlS.so
Debug=0
CommLog=1
UsageCount=2

Then I changed my driver connect to point to the new one...

"DRIVER={PostgreSQL Unicode Fixed};SERVER=..."

... tested again and it seems the problem is gone! 🎈🎉

@davecramer davecramer merged commit 3c33055 into postgresql-interfaces:main Nov 24, 2025
1 of 2 checks passed
@markmaker
Copy link
Contributor Author

@davecramer, thank you for the fast merge. 🚀

How do we know when this went into the apt repo? How long do you suppose this takes?

Sorry, I'm not very familiar with how these repos work 😅

_Mark

@davecramer
Copy link
Contributor

probably not yet, let me send an email to the guys who do that stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants