Menu

#330 checking for _FILE_OFFSET_BITS is wrong on platforms where it does nothing

svn
closed-fixed
nobody
None
5
2023-01-15
2021-12-30
manx
No

When having _FILE_OFFSET_BITS=64 defined accidentally on platforms where libc does not care whatsoever at all about that macro, mpg123.h should not change symbol names depending on that macro.

i.e. #if (!defined MPG123_NO_LARGENAME) && ((defined _FILE_OFFSET_BITS) || (defined MPG123_LARGESUFFIX)) is wrong on e.g. OpenBSD, and causes client code to fail to link if it accidentally has _FILE_OFFSET_BITS=64 defined.

Checks for _FILE_OFFSET_BITS should be conditional on having a libc that cares about this macro. Maybe #ifdef __GLIBC__, maybe also others.
Another solution might be to provide a pkg-config file that has -DMPG123_NO_LARGENAME on such platforms.

The situation is complicated by having no way whatsoever to do a pre-processor check for GLIBC before including any standard library header, thus having no way to set _FILE_OFFSET_BITS conditionally via only the pre-processor. This induces a check for libc into the build system, which results in a maintenance nightmare. Glibc is to blame here. Client code has to implement build system logic (or just guess) to even know whether it should define _FILE_OFFSET_BITS=64.

I again suggest revising the API by getting rid of off_t completely and using a neutral API definition with int64_t.

Discussion

  • Thomas Orgis

    Thomas Orgis - 2021-12-31

    Oh, really. Does this never end? I really took pains to get the LFS stuff right.

    But you do need to get out of your way to define _FILE_OFFSET_BITS on a build that does not need it.

    But anyhow … we do have configure code to check if the platform is largefile sensitive.

    dnl Detect large file support, enable switches if needed.
    AC_SYS_LARGEFILE
    dnl If we do have a switch for large files, rename off_t-aware API calls.
    dnl Using the file_offset_bits variable here is fine for linux (possibly Solaris),
    dnl Others... we'll have to see.
    dnl Note: I started writing this with with multiline replacements.
    dnl Does not work. Automake insists on putting these into Makefiles where they break things.
    dnl It is also assumed that a system that does not set file offset bits is not
    dnl sensitive to largefile changes, i.e. FreeBSD always using 64 bit off_t.
    if test "x$ac_cv_sys_file_offset_bits" = x || echo "$ac_cv_sys_file_offset_bits" | $GREP  '@<:@^0-9@:>@' > /dev/null; then
            dnl if it has non-numeric chars or is empty... ignore...
            largefile_sensitive=no
    else
            largefile_sensitive=yes
    fi
    

    So we could use that to preprocess mpg123.h to define MPG123_NO_LARGENAME, no? Or just use HAVE_LFS_WRAP, which is already defined/or not. The only issue I see is that one needs to be careful about any changes to the header that might affect multiarch systems. A 32 bit and a 64 bit build on the same machine should have identical header files from mpg123.

    I presume 32 bit and 64 bit builds on OpenBSD systems have the same off_t rules? Or, well, each a fixed definition of off_t independent of any switchery.

     
  • Thomas Orgis

    Thomas Orgis - 2021-12-31

    Eh, HAVE_LFS_WRAP doesn't matter, of course. We're only talking about using largefile_sensitive at configure time to process mpg123.h.in. And at that time, don't forget the ports/cmake stuff that should not get broken by any changes.

     
  • Thomas Orgis

    Thomas Orgis - 2022-06-26

    So this is still lingering … I don't really intend to change the API for this nuisance. I do understand the argument for creating new API. I also repeat my anger at folks screwing up the idea of a system-specific off_t by making it build-specific. You cannot win. Fixing int64_t all around also sucks. I'm getting angy again … But I also wonder: Would my suggestion make you happy? Make the installed headers robust against ill-defined _FILE_OFFSET_BITS? I am still not sure where that might backfire, though, with one header being used by differing toolchains, things like that.

     
    • manx

      manx - 2022-06-26

      Well, it's a difficult problem, which has been made more difficult by various C libraries implementing not even POSIX, but only something that remotely resembles POSIX. I am basically equally annoyed as you about this situation, and in my library, libopenmpt, I have actually given up to try to solve this at all in our public headers (see https://github.com/OpenMPT/openmpt/commit/8fedeca885445ab1702cae8be92a5a8b81e7fb0d), and defer to client code for choosing a fitting implementation. However unlike libmpg123, libopenmpt had from the start chosen to only rely on ABI-safe types and avoid any type that could change size depending on some macro definition in its core public API. The core file API always used int64_t in libopenmpt, thus, this change was arguably easier for us.

      Frankly, I do not think anything that still relies at all on any such macro or macro-dependent type will be able to fix all corner cases. Any change in this area again risks breaking some other obscure case.
      I did ask on IRC in ##posix on Libera.Chat, and people agreed that this is basically unsolvable in practice.

      So, if you do not want to rework the API and ABI and remove off_t completely from the API and ABI surface, I would argue to not change anything at all in this area (and thus keep things as stable as possible, so that problems that are already work-arounded some way or another can stay that way). Feel free to close as wont-fix for now in that case.

       
      • manx

        manx - 2022-09-14

        Thinking even more about the issue and also looking into mpg123 code some more, I now do think that revising the API and ABI is the only solution. The current API uses off_t, which is a type not even specified by C - it comes from POSIX. Windows is not POSIX, and the only reason for this type even existing there at all is probably some crude "get things to compile" effort by Microsoft presumably ~30 years ago. off_t on Windows got never updated to 64bit, not even for 64bit Windows.
        So, if you value the whole "we want 64bit offsets" (which is the main reason for all the largefile trickery existing), why is it at all ok to have non of that work on Windows?

        I am actually considering to spend some time working on that issue, but the result will highly likely (otherwise, development and maintenance effort would be way higher) be a libmp123 that is ABI-incompatible with the current one (an additional wrapper implementing old ABI is - in theory - of course always possible), thus this could only ever be libmpg123 version 2.0.0.

         
  • Thomas Orgis

    Thomas Orgis - 2022-09-14

    Well … Windows was a bit POSIX for some time … https://brianreiter.org/2010/08/24/the-sad-history-of-the-microsoft-posix-subsystem/ … it also has a bit of C99 support in MSVC … What I can imagine for the API to make the wrapper names more explicit. After all the work put into keeping ABI stable with 32 bit and 64 bit offsets in client code, I feel like cheating my earlier self when now just dropping it.
    You're also missing a bit: When building with MinGW, off_t is properly handled, apparently. What client applications wanting to use the resulting DLL do, though … I don't have the complete picture right now. I think we checked things about the wrappers some time ago. With the cmake-based MSVC build, we only get 32 bits, apparently. Or I just forgot some hack we did there. Part of the picture is: It only ever matters for many-hour long streams. 64 bit for WAVs took some time to be relevant, or it is still common to work around the 32 bit limit there, even. MP3 files over 1G size are probably very rare.
    I do not really think that changing the ABI is necessary. You could explicitly define

    int32_t mpg123_tell_32(mpg123_handle *mh);
    int64_t mpg123_tell_64(mpg123_handle *mh);
    

    and use them from client code to avoid an messing with off_t. On Unix, those symbols already exist in the current ABI. Any user not wanting to deal with off_t can use the _64 names directly. I'm not sure right now if our Windows DLL builds contain them, but I suspect so. The build needs some change if we'd decide to get rid of off_t and use the 32 or 64 bit types explicitly, but it shouldn't change the library interfaces at all.

     
  • Thomas Orgis

    Thomas Orgis - 2022-09-14

    (Oh, and about it being OK that something doesn't work right on Windows: mpg123 started as the UNIX MPEG audio decoder. A Windows port came later. For the library API I looked at the definition of lseek() and figured that off_t is part of the system API for the systems I cared for. Later I realized that off_t is a cruel joke.)

     
  • manx

    manx - 2022-09-17

    Well … Windows was a bit POSIX for some time … https://brianreiter.org/2010/08/24/the-sad-history-of-the-microsoft-posix-subsystem/

    That is just not native Windows, it never was. That's about as Windows being POSIX as the FreeBSD kernel can run Linux binaries. It is/was a completely separate kernel binary interface. You could never mix a library of the POSIX subsystem with an application of the native Win32 subsystem. Thus, it is completely irrelevant to this whole discussion.

    it also has a bit of C99 support in MSVC …

    MSVC with UCRT has C11 / C17 support.

    What I can imagine for the API to make the wrapper names more explicit. After all the work put into keeping ABI stable with 32 bit and 64 bit offsets in client code, I feel like cheating my earlier self when now just dropping it.

    Frankly, none of that work was ever necessary at all, and it never even worked portably. Cheating yourself would be to invest even more time maintaining all the trickery. IMHO, all this off_t mangeling is the wrong thing to do, and always was. It makes using your library in portable (and portable does not mean just POSIX) applications very difficult. I have already wasted multiple days of my life dealing with this nonsense alone. libmpg123 is 1 of 2 libraries I have ever seen that does such things (zlib is the other one, luckily zlib can be used without touching any z_off_t interface at all, so the situation is not nearly as catastrophic as with libmpg123). libmpg123 really is multiple orders of magnitude worse in that regard than any other library I have ever used.

    You should take a step back and really question your design choice here: It never did work on all platforms, and it very fundamentally cannot.

    You're also missing a bit: When building with MinGW, off_t is properly handled, apparently.

    MSVC and Windows do ship with multiple native C libraries, the older MSVCRT and the newer UCRT (and some ancient thing from the Win95 days which we do not care about). These all define off_t as long, which is 32bit on all Windows architectures. Additionally, MinGW uses a completely different C++ ABI and C++ standard library than MSVC C++. MinGW/MSYS2 also still do not fully support ARM/ARM64. I cannot at all consider MinGW usable on Windows. Using anything MinGW-built from a MSVC application is by construction not ABI compatible if one uses weird 64bit off_t, redefining all kinds of MSVCRT functionality, and the other uses 32bit off_t. Claiming to have Windows support because of MinGW is somewhat similar to claiming to have Linux support because of Winelib. Even if what you said would make sense from a practical perspective (building any software with 2 completely separate and incompatible toolchains and multiple build systems does not make much sense in my book), your header files would still be just wrong. Using mpg123.h from MSVC with 32bit off_t with a libmpg123.dll built with MinGW 64bit off_t cannot work.

    MinGW is frankly just irrelevant in the context of proper "Windows support". It's an additional toy gimmick at best. You could not claim "Linux support" when your thing only builds with Winelib or with libc++ instead of libstdc++ either. Supporting a platform requires supporting the platform's native toolchain.

    Part of the picture is: It only ever matters for many-hour long streams. 64 bit for WAVs took some time to be relevant, or it is still common to work around the 32 bit limit there, even. MP3 files over 1G size are probably very rare.

    As libmpg123 also uses off_t for sample offsets (which does not make much sense), when dealing with stream recordings, the ~12h limit of a 48kHz file is actually a real problem.

    I do not really think that changing the ABI is necessary. You could explicitly define
    int32_t mpg123_tell_32(mpg123_handle mh);
    int64_t mpg123_tell_64(mpg123_handle
    mh);
    and use them from client code to avoid an messing with off_t.

    No. That does not help at all with sample offsets in MSVC builds.

    On Unix, those symbols already exist in the current ABI. Any user not wanting to deal with off_t can use the _64 names directly. I'm not sure right now if our Windows DLL builds contain them, but I suspect so.

    No. None of the wrappers/aliases use a type that is 64bit on MSVC. Even then, it would not solve the problem as libmpg123 uses off_t internally as well as in the reader API and ABI.

    The build needs some change if we'd decide to get rid of off_t and use the 32 or 64 bit types explicitly, but it shouldn't change the library interfaces at all.

    No. That may only partially work if you continue all the unneccessary 32bit/64bit wrappers around 64bit/32bit. But ultimately, MSVC builds currently are 32bit internally, and that cannot stay. Keeping ABI compatibility here would mean that on different platforms, the unmangled name AND the reader API and ABI would continue to be 32bit or 64bit - contributing even more confusion. It would make maintaining this mess about equally as time consuming as the current situation. I personally do not want to work in that direction at all. It makes absolutely no sense to me. I would maybe consider to write a (header-only) API wrapper around a new ABI/implementation, anything else, especially maintaining ABI compatibility, would be an absolutely huge waste of development time in my opinion.

    An ABI-incompatible version of libmpg123 would also change all internal and external symbol names: mpg123_ would become mpg123v2_, The API header would be named mpg123v2.h (or rather moved into mpg123v2/mpg123v2.h). INT123_ would become mpg123v2_int_ (or similar). It would change the library soname, it would be installable and linkable side-by-side with current libmpg123 1.x (this is neccessary to make incremental migration possible for distributions). A (possible) compatibility wrapper for the old API would allow rebuilding current client code with NO CHANGE. Again, I do offer to implement ALL OF THAT (as well as further cleanups, related and unrelated). The current situation (or anything based on it) is just unbearable for me. I would prefer to do this development as part of the mpg123 project. I have no intention to break or remove the command line utility mpg123 itself, or to do anything substantial about libsyn123, libnet123, libout123, out123 (even though the latter 3 could probably be just killed and replaced by libcurl and PortAudio/libsoundio/SDL2/SDL/libao).

    (Oh, and about it being OK that something doesn't work right on Windows: mpg123 started as the UNIX MPEG audio decoder. A Windows port came later. For the library API I looked at the definition of lseek() and figured that off_t is part of the system API for the systems I cared for.

    If it's broken, it needs fixing. The historic reasons why it came to be the way it is today do not matter. The longer you wait to fix it, the longer you will continue to waste time on adding further broken hacks on top of other hacks.

    Later I realized that off_t is a cruel joke.)

    An MPEG Audio decoding library like libmpg123 should not even require ANYTHING (that includes off_t) from POSIX. Not now, not when it was originally developed.

     
    • Jonathan Yong

      Jonathan Yong - 2022-09-17

      I never liked the idea of sharing or distributing binaries, mingw and msvc is different enough it should be thought of different platforms.

      While all of them produce binaries that run on Windows, they are different enough under the hood that it should not be assumed mingw conveniences would carry over completely to msvc.

      Perhaps it is also time to stop shipping shared libraries completely so users don't simply assume they are compatible.

       
      • Thomas Orgis

        Thomas Orgis - 2022-09-19

        So this is a tangent: Can we query the actual use of the DLL, at least via the -users list? I remember the libmpg123 DLL being the whole point of a windows port, initially. I am not sure anymore if that was in MSVC (some preview/student edition I used back then?).

        I'd be fine if we didn't ship binaries at all, but there are people who actually use the mpg123 console player and who regularily complain if the binaries are missing. But if actual users of the decoder lib are able to build it easily in MSVC via the CMake build, for example, there might be less need to pre-build it with our toolchain that might not match the end user's/developer's. In practice, toolchain differences don't matter that much. Sizeof off_t is one isse, safety of using mpg123_open_fd() is another. And you can cover lots of applications that don't bother seeking with already safe API.

        Anyhow: Jonathan: Would you care to start a discussion about this on the user list (I guess user is appropriate)?

         
    • Thomas Orgis

      Thomas Orgis - 2022-09-19

      This is about some detail remarks, I'm also writing a top-level comment soon on how I'd like us to proceed. I'm trying to separate constructive discussion about the future from complaining about the past.

      Frankly, none of that work was ever necessary at all, and it never even worked portably.

      It did work for all platforms that implement off_t and had support in the autoconf macros for that. Largefile support does not work in MSVC. It depends to what platforms full portability is claimed. The whole of mpg123 runs on current incarnations of OS/2, for that matter. However relevant you consider the Windows platform with MS toolchain as opposed to mingw-w64, which works fine with large files, even (correct me if I'm wrong, but I take it that that toolchain does handle large file support with 64 bit offsets and off_t switchery), there are lots of other platforms to fulfill a definition of portable.

      An MPEG Audio decoding library like libmpg123 should not even require ANYTHING (that includes off_t) from POSIX. Not now, not when it was originally developed.

      It is a library that implements access to files in a certain format. I think it is reasonable to offer API that mimicks the base libc file access, with the same concepts at work, just with transparent MPEG decompression. The old precursor mpglib just did stream decoding and didn't know of any I/O methods, with no support for sample-accurate seeking, for example.

      To better abstract the stream handling and isolate the user from MPEG details, the file access abstraction serves a purpose. Using off_t to count samples as a position in the extracted file keeps the semantics of the offset associated with an appropriate type. It's just not a byte offset, directly. It's a number in a similar range.

      As libmpg123 also uses off_t for sample offsets (which does not make much sense),

      I disagree. Sample offsets correspond to file byte offsets when you store the decoded PCM data, just with a small factor. If a platform can handle 96 bit file offsets, it would make sense to use this type also for sample offsets. Same as using size_t for counting things that fit into memory, not only bytes, but also things that are multiples of a single byte. We might have a basic disagreement here which still should not prevent us from reaching a solution for a largefile build of libmpg123 with MSVC.

      You can use a subset of libmpg123, the feed API, without proper seeking, and thus get away without I/O specifics. Wouldn't there be the file access API, mpg123_feedseek() probably would take a long argument, and at some point we'd have added mpg123_feedseek64(int64_t) or mpg123_feedseekll(long long).

      We had the decision to not just ship the MPEG decoding algorithm, but also stream parsing, including extraction of ICY metadata from web radio (the latter I am more willing to accept as a historic mistake/unnecessary feature creep). When you do that I/O stuff, you end up dealing with POSIX interfaces.

      MinGW is frankly just irrelevant in the context of proper "Windows support". It's an additional toy gimmick at best.

      No. It provide the working console application binaries to people who, for whatever reason, want to run a console mp3 player on Windows. I do not know how many people this entails, but they are present and complain when something doesn't work. Even after fixing up the decoder library build in msvc, mingw helps building the whole application. It is the tool to port the whole of mpg123 to be able to run natively on Windows, even if it is written only with bits of win32 API (or how a modern incarnation would be called) to yield sensible application behaviour.

      MinGW-W64 is the toolchain for the shipped binary application.

      Now, people wanting to use the decoder library in their projects, likely prefer embedding it into their MSVC build setup. That is why we currently offer ports/cmake for parts of the codebase (libmpg123) on WIN32.

      So, whatever you think about other parts of the mpg123 family (output and synth libraries) should be irrelevant to giving people a proper libmpg123 DLL build in MSVC. I'll continue on that route on the toplevel comment.

       
      • manx

        manx - 2022-09-20

        Part 1/3

        I'll reply in 3 parts:
        1. kind-of off-topic details
        2. reply to your proposal
        3. my proposal which I had already mostly finished

        This is about some detail remarks, I'm also writing a top-level comment soon on how I'd like us to proceed. I'm trying to separate constructive discussion about the future from complaining about the past.

        Agreed. I also intend to post a proposal on how to proceed (probably as a separate new issue).

        Frankly, none of that work was ever necessary at all, and it never even worked portably.

        It did work for all platforms that implement off_t and had support in the autoconf macros for that. Largefile support does not work in MSVC. It depends to what platforms full portability is claimed. The whole of mpg123 runs on current incarnations of OS/2, for that matter.

        I think there may be a kind of fundamental misunderstanding about the word "portable" between you (and POSIX) on the one side, and me (or most other Windows developers) on the other side. POSIX advertises and touts itself as a way/tool/standard/whatever to write "portable software". However, that is just not true in practice in the general case, as POSIX only ever cares about portability across the platforms that agreed to follow POSIX. Windows (and to some degree also macOS (in places not relevant to the discussion here), and also some of the more esoteric/niche platforms) however does not. Thus, if you claim "portability" (in the POSIX sense), you are explicitly excluding the Windows platform (and in particular MSVC). This a an attitude that Windows developers do experience quite often, and it frankly is massively annoying and insulting.

        off_t is a type invented and standardized by POSIX, and by POSIX only. It does not exist in ISO C. The sole aspect of "basing a public API on that type" alone will thus inevitable cause trouble when dealing with non-POSIX platforms.

        POSIX is not at all helpful for portability. Assuming POSIX semantics actively works against portability.

        However relevant you consider the Windows platform with MS toolchain as opposed to mingw-w64, which works fine with large files, even (correct me if I'm wrong, but I take it that that toolchain does handle large file support with 64 bit offsets and off_t switchery), there are lots of other platforms to fulfill a definition of portable.

        There are some caveats: In general, C libraries built with a MinGW MSVCRT toolchain are not ABI compatible with C client code built with the MSVC toolchain, as they are using different C libraries (VS6 used the same MSVCRT as MinGW MSVCRT uses, newer version up until including VS2013 used each their own C library, VS2015 and later use the UCRT). Now, that can only work as long as no C library objects are shared between the library and client code. Also, once you rely on MinGW-specific functionality that got bolted on top of MSVCRT (like _FILE_OFFSET_BITS=64), ABI compatibility breaks for sure.

        As libmpg123 also uses off_t for sample offsets (which does not make much sense),

        I disagree. Sample offsets correspond to file byte offsets when you store the decoded PCM data, just with a small factor. If a platform can handle 96 bit file offsets, it would make sense to use this type also for sample offsets.

        No. Sample offsets are not file offsets or even related in any way to them. It is perfectly reasonable to work with >32bit sample offsets on a platform that does not support >32bit file offsets. The situation is pretty simple: you work with data stored on a network. Local file offsets bear no relation whatsoever at all to the maximum size of a stream accessible via HTTP GET Range. Using off_t for sample offsets is very fundamentally the wrong type.

        To better abstract the stream handling and isolate the user from MPEG details, the file access abstraction serves a purpose. Using off_t to count samples as a position in the extracted file keeps the semantics of the offset associated with an appropriate type. It's just not a byte offset, directly. It's a number in a similar range.

        Same as using size_t for counting things that fit into memory, not only bytes, but also things that are multiples of a single byte.

        The situation for size_t is different, in that counting bytes in memory is an upper bound for counting bigger-than-bytes objects in memory.

        We might have a basic disagreement here which still should not prevent us from reaching a solution for a largefile build of libmpg123 with MSVC.

        Agreed!

        An MPEG Audio decoding library like libmpg123 should not even require ANYTHING (that includes off_t) from POSIX. Not now, not when it was originally developed.

        It is a library that implements access to files in a certain format. I think it is reasonable to offer API that mimicks the base libc file access, with the same concepts at work, just with transparent MPEG decompression. The old precursor mpglib just did stream decoding and didn't know of any I/O methods, with no support for sample-accurate seeking, for example.

        You can use a subset of libmpg123, the feed API, without proper seeking, and thus get away without I/O specifics. Wouldn't there be the file access API, mpg123_feedseek() probably would take a long argument, and at some point we'd have added mpg123_feedseek64(int64_t) or mpg123_feedseekll(long long).

        That might work, but I would prefer if the non-feed API also works.

        We had the decision to not just ship the MPEG decoding algorithm, but also stream parsing, including extraction of ICY metadata from web radio (the latter I am more willing to accept as a historic mistake/unnecessary feature creep). When you do that I/O stuff, you end up dealing with POSIX interfaces.

        Frankly, no, you do not have to use anything from POSIX here. Please do look at how libvorbisfile or libopenmpt handle that exact problem: They provide a 64bit read handler interface in the API/ABI (very similar to libmpg123's reader API) which does not rely on POSIX types, and let optionally/only (optionally for libvorbisfile (like libmpg123), only for libopenmpt) client code handle the file io.

        Calling fopen/_wfopen internally has further portability problems concerning filename encoding. See my to-be-posted proposal for more details.

        MinGW is frankly just irrelevant in the context of proper "Windows support". It's an additional toy gimmick at best.

        No. It provide the working console application binaries to people who, for whatever reason, want to run a console mp3 player on Windows. I do not know how many people this entails, but they are present and complain when something doesn't work. Even after fixing up the decoder library build in msvc, mingw helps building the whole application. It is the tool to port the whole of mpg123 to be able to run natively on Windows, even if it is written only with bits of win32 API (or how a modern incarnation would be called) to yield sensible application behaviour.

        MinGW-W64 is the toolchain for the shipped binary application.

        Fair enough. I did not mean to argue against using MinGW in that context. It's just that for people developing on Windows using MSVC, anything MinGW-related is just not very relevant or useful.

        Now, people wanting to use the decoder library in their projects, likely prefer embedding it into their MSVC build setup. That is why we currently offer ports/cmake for parts of the codebase (libmpg123) on WIN32.

        CMake has its own set of problems IMHO, but it certainly is a very useful addition for some people.

        So, whatever you think about other parts of the mpg123 family (output and synth libraries) should be irrelevant to giving people a proper libmpg123 DLL build in MSVC. I'll continue on that route on the toplevel comment.

        Totally agreed!

         
  • Thomas Orgis

    Thomas Orgis - 2022-09-19

    So, let's revisit the actual issues:

    1. The header assumes a largefile-sensitive system and reacts to _FILE_OFFSET_BITS being defined even where it should be a no-op. Can we agree that this is fixed by working this fact into mpg123.h.in, using the result stored in largefile_sensitive in configure?
    2. One specific platform, Windows with MSVC, never had largefile support and always builds 32 bit offsets into the ABI. We shall address that.
    3. Using mingw-generated DLL from MSVC may have funny behaviour … but this is a hypothesis. My assumption is that people did use it happily in the past, just being limited to 32 bit offsets, then. No definition of _FILE_OFFSET_BITS, no mappings. Default symbols use 32 bit off_t. Things are suboptimal, as stream offsets are limited, but not totally broken (i.e. crashing, not being able to decode).

    Is this list correct? Especially for item 3:

    /dev/shm/mpg123-1.30.1-x86-64$ winedump -j export libmpg123-0.dll |grep mpg123_seek
      00035530   126 mpg123_seek
      000350F0   127 mpg123_seek_32
      00017980   128 mpg123_seek_64
      000355F0   129 mpg123_seek_frame
      00035110   130 mpg123_seek_frame_32
      00017400   131 mpg123_seek_frame_64
    /dev/shm/mpg123-1.30.1-x86-64$ grep mpg123_seek libmpg123-0.def 
    mpg123_seek
    mpg123_seek_32
    mpg123_seek_64
    mpg123_seek_frame
    mpg123_seek_frame_32
    mpg123_seek_frame_64
    

    The DLL from mingw-w64 does contain 64 bit offset variants with suffix in the ABI. When explicitly defining

    MPG123_EXPORT int64_t mpg123_seek_64( mpg123_handle *mh
    ,       int64_t sampleoff, int whence );
    

    in the header, shouldn't you be able to use the DLL from MSVC just fine, with large files? Apart from mpg123_open_fd(), which might use incompatible file descriptors (?), this should work, right? Can someone here test this?

    What is still missing, is a proper build inside MSVC. But if the above even works, I'd work towards fixing the internal code to use a 64 bit type, always, but keep off_t in the API with the mappings only as a shortcut for people on platforms where it works already.

    Keeping ABI compatibility means that client code that does not rely on a POSIX platform with proper off_t cannot use off_t mpg123_seek(), but needs to switch to int64_t mpg123_seek_64() use. As long as the whole of mpg123 is built with autoconf as primary platform, the detail of largefile support is handled there and the off_t mappings work, also in mingw-w64, and in OpenBSD (and even OS/2, I have fun repeating that from time to time;-).

    Truly portable applications, as you define it, would have to switch to explicit mpg123_seek_64() use. I consider this a lesser nuisance than making a big ABI jump with parallel installs of the new and old library, especially since this is not an issue at all on Linux, for example, with the current handling. The soft fallback for applications failing to call the _64 API is that they just get a limit on sample and file position offsets

    I can happily change the internals to explicit 64 bits. This eases the wrapper code, too. We have a requirement then for platforms to define int64_t and offer lseek64() or _lseeki64() (did I get that right?).

    What I need help on is ensuring that ports/cmake works properly in MSVC, with all DLL export stuff set up right. I'm no CMake expert and I don't use MSVC (and have no inclination to do so). Manx, can you support us there? Would you be fine, if grudgingly, with adding _64 functions as official explicit API, without breaking ABI for old programs? Did I miss something, still?

    Edit: Of course, the mpg123 header as produced by the CMake MSVC build could map the plain names to the _64 functions, unconditionally or based on some other macro than _FILE_OFFSET_BITS. When you use that in your project/build environment, I guess one could have some more freedom with such than for a header installed system-wide to be used by many differing applications.

     

    Last edit: Thomas Orgis 2022-09-19
  • manx

    manx - 2022-09-20

    Part 2/3

    I'll reply in 3 parts:
    1. kind-of off-topic details
    2. reply to your proposal
    3. my proposal which I had already mostly finished

    1. The header assumes a largefile-sensitive system and reacts to _FILE_OFFSET_BITS being defined even where it should be a no-op. Can we agree that this is fixed by working this fact into mpg123.h.in, using the result stored in largefile_sensitive in configure?

    I must admit, I have some trouble wrapping my head around all the renaming and symbol availability. Would that result in _64 symbols being available on not-largefile_sensitive systems like OpenBSD? If it does not, would it remap _64 names to unmangled symbols? Because otherwise, an application that uses _64 names explicitly in order to stay portable to MSVC (like you suggest below) would then not compile on OpenBSD. libopenmpt would catch that immediately once we switch to _64 names.

    Another point here is the universality of the public header itself. I really dislike the idea of public header files depending on build system decisions. When cross-compiling from a single shared source tree to various platforms, this requires possibly different generated header files. Now, I am fully aware that Autotools can handle that situation just fine, however, it sill is an obstacle that users of other build systems face and have to deal with or work around. A universal header implemented in standard ISO C, that is equally valid on any platform, completely avoids issues in that area.

    1. One specific platform, Windows with MSVC, never had largefile support and always builds 32 bit offsets into the ABI. We shall address that.

    It's actually not just MSVC. You keep mentioning OS/2, so I'll mention DJGPP/DOS here :). In DJGPP/DOS there are no 64bit file offsets, off_t is just 32bit. However, the libmpg123 ABI is still limited to 32bit sample offsets (which is a problem and an inconsistency with other platforms).

    1. Using mingw-generated DLL from MSVC may have funny behaviour … but this is a hypothesis. My assumption is that people did use it happily in the past, just being limited to 32 bit offsets, then. No definition of _FILE_OFFSET_BITS, no mappings. Default symbols use 32 bit off_t. Things are suboptimal, as stream offsets are limited, but not totally broken (i.e. crashing, not being able to decode).

    I think most people stopped using your pre-built DLLs roughly when the last patents expired. The main reason, I think, was people avoiding any (maybe theoretical) issues with distributing patented code in binaries themselves. Once that problem went away, I think most people switched to building their own. That at least was the reasoning and what happened for us with OpenMPT und libopenmpt.

    I do not remember exact details, but we actually had some trouble with 32bit/64bit inconsistency between header file and your pre-built DLL back then. If desired, I can dig back into history to find the details, but I do not think that point should be that important going forward. Being able to mix toolchains is a somewhat nice-to-have feature, but really should not be the primary design driver here, IMHO.

    Is this list correct? Especially for item 3:

    /dev/shm/mpg123-1.30.1-x86-64$ winedump -j export libmpg123-0.dll |grep mpg123_seek
      00035530   126 mpg123_seek
      000350F0   127 mpg123_seek_32
      00017980   128 mpg123_seek_64
      000355F0   129 mpg123_seek_frame
      00035110   130 mpg123_seek_frame_32
      00017400   131 mpg123_seek_frame_64
    /dev/shm/mpg123-1.30.1-x86-64$ grep mpg123_seek libmpg123-0.def 
    mpg123_seek
    mpg123_seek_32
    mpg123_seek_64
    mpg123_seek_frame
    mpg123_seek_frame_32
    mpg123_seek_frame_64
    

    The DLL from mingw-w64 does contain 64 bit offset variants with suffix in the ABI. When explicitly defining

    MPG123_EXPORT int64_t mpg123_seek_64( mpg123_handle *mh
    ,       int64_t sampleoff, int whence );
    

    in the header, shouldn't you be able to use the DLL from MSVC just fine, with large files?

    I am actually not 100% sure if MinGW defaults to the same calling convention as MSVC for int64_t. If it does, this would work. If it doesn't, things would just crash.

    Apart from mpg123_open_fd(), which might use incompatible file descriptors (?), this should work, right?

    As far as I am aware, file descriptors in the POSIX sense do not exist on Windows. They are emulated by the C library (or better the part of the C library that tries to rudimentary emulate POSIX) on top of a Win32 file HANDLE. So, the fd compatibility situation should be the same as any other C library level compatibility situation, like e.g. FILE* compatibility: If the C library type (and instance) matches (i.e. both use UCRT) (and MinGW does not re-define any fd functionality in its headers, which I did not check), it should work. However, I do not think mixing a fd from a MinGW MSVCRT (which is still the default for most MinGW toolchains) with MSVC UCRT (which is the only option for VS2019 or later) can work.

    The Windows situation is kind of special with regards to the system's C library: There are MSVCRT (VS6), newer version specific MSVCR, and UCRT, there are debug and release builds of each, and one can use the system installed one or a statically linked one (MSVC only). Sharing any C library object (that also includes memory obtained with malloc/free, or FILE*, or fd) does only work if client code and library use the exact same C library type and instance. Windows apliatiopns thus typically build everything consistently in that regard (and use a static C library only ever when linking everything statically into a single binary) in order to avoid most such issues. Library APIs that do not design around this issue very carefully (that implies e.g. that a library calling malloc should never assume that the client code would be able to call free on that memory) are not usable otherwise.

    Now, as far as I am aware, libmpg123 is mostly fine in that regard, short of mpg123_open_fd, which may not (or may, I frankly never used file descriptors on Windows) work across mismatched C libraries.

    Can someone here test this?

    I could in theory, but that would take some amount of time, which I would rather not spend on a use case that should IMO not be the primary concern here.

    Keeping ABI compatibility means that client code that does not rely on a POSIX platform with proper off_t cannot use off_t mpg123_seek(), but needs to switch to int64_t mpg123_seek_64() use. As long as the whole of mpg123 is built with autoconf as primary platform, the detail of largefile support is handled there and the off_t mappings work, also in mingw-w64, and in OpenBSD (and even OS/2, I have fun repeating that from time to time;-).

    I would be fine with having to use explicit new _64 names in order to achieve portability. After all, my suggestions of completely breaking ABI and API would also imply having to change client code to get the full benefits.
    (see last paragraph for why I think reusing potentially existing _64 symbols from the old API a bad idea).

    In this case, may I suggest nudging client code into migrating to new explicit _64 names?

    #ifndef MPG123_NO_DEPRECATE
    #if defined(__clang__)
    #define MPT123_DEPREACTED __attribute__((deprecated))
    #elif defined(_MSC_VER)
    #define MPT123_DEPREACTED __declspec(deprecated)
    #elif defined(__GNUC__)
    #define MPT123_DEPREACTED __attribute__((deprecated))
    #endif
    #endif
    #ifndef MPT123_DEPREACTED
    #define MPT123_DEPREACTED
    #endif
    

    And tagging the old non-_64 and old _64 interfaces with MPG123_DEPRECATED.
    That way, client code would get warnings, and could disable the warnings via #define MPG123_NO_DEPRECATE.
    That would allow preparing for a future where the old interfaces could be removed from the ABI.

    What is still missing, is a proper build inside MSVC. But if the above even works, I'd work towards fixing the internal code to use a 64 bit type, always, but keep off_t in the API with the mappings only as a shortcut for people on platforms where it works already.

    The hardest problem here is the reader API. The current unmangled API uses a 32bit off_t on affected platforms.

    Truly portable applications, as you define it, would have to switch to explicit mpg123_seek_64() use. I consider this a lesser nuisance than making a big ABI jump with parallel installs of the new and old library, especially since this is not an issue at all on Linux, for example, with the current handling. The soft fallback for applications failing to call the _64 API is that they just get a limit on sample and file position offsets

    I somewhat fear converting the reader API to unconditionally 64bit internally would result in a significant amount of work to provide the proper backwards-ABI-compatible unmangled implementation (which uses 32bit off_t currently on affected platforms) via something like lfs_wrap, while keeping all other current directions of wrapping and renaming intact. I already now have trouble wrapping my head around the current multiple layers of wrapping and renaming.

    I can happily change the internals to explicit 64 bits. This eases the wrapper code, too. We have a requirement then for platforms to define int64_t and offer lseek64() or _lseeki64() (did I get that right?).

    lseek/lseek64 for POSIX fd, _lseeki64 for Windows fd, fseeko64/fseeko for POSIX FILE, _fseeki64 for Windows FILE. I am not sure if largefile insensitive platforms like OpenBSD even provide the explicit lseek64/fseeko64 at all. I could check on OpenBSD if desired.
    DJGPP cannot offer any _FILE_OFFSET_BITS or 64bit off_t file io of course, it would just use regular fseek/lseek. It looks like it does provide off64_t and fseeko64 though, but there just are no >32bit files in DOS at all. 64bit sample offsets will still be useful, even in DOS, even if only for consistency across all supported platforms.

    The int64_t requirement is not a problem for any platform I care/know about. The last MSVC version to not provide stdint.h was VS2008, and even then, people commonly used a 3rd party project (msinttypes) that provided stdint.h and inttypes.h. I have no idea if libmpg123 still supports VS2008. Other projects, like libsndfile, have removed support for that compiler ~9 years ago. libopenmpt has still supported VS2008 (only for x86 32bit) up until 2020, but only in a long outdated stable branch; in development trunk, we had removed all support in 2016. Microsoft does not support VS2008 any more either (https://learn.microsoft.com/en-us/visualstudio/productinfo/vs-servicing).

    What I need help on is ensuring that ports/cmake works properly in MSVC, with all DLL export stuff set up right. I'm no CMake expert and I don't use MSVC (and have no inclination to do so). Manx, can you support us there?

    I cannot help much with CMake. I do not use it and I actively dislike most of its design decisions. Maybe Evgeni Poberezhnikov who did work on the mpg123 vcpkg port using CMake and who I think (I may be wrong) is the same person as https://github.com/evpobr working on libsndfile would be able to help on CMake issues. Testing CMake builds with come CI could at least help catching obvious build issues (see my proposal).

    I can certainly help and give advice for anything related to MSVC and Windows in general, as well as considerations for Emscripten platforms (WebAssembly and Javascript) and DJGPP/DOS.
    For Emscripten, there are currently no major portability issues as far as I know. When targeting Javascript, libmpg123 could benefit from defining real as double though, as Javascript's only number type is IEEE754 64bit double precision. Emscripten will emulate 32bit float on top of Javascript's 64bit double, which does imply some performance overhead. For WebAssembly, float is fine.

    Would you be fine, if grudgingly, with adding _64 functions as official explicit API, without breaking ABI for old programs?

    Very grudgingly, somewhat yes :)

    There is an additional thing to consider: Changing the type of _64 functions from off_t to int64t may imply that the exact underlying type changes between long and long long (on platforms where both are 64bit) or __int64 (which is a separate type on MSVC). For C, this difference does not matter. However, if some C++ exposes or expects this via function pointers or types in any way, this can cause type mismatch in C++ code which causes comilation failure. The C++ name mangling does also change, which could be an ABI break for such code. This problem is not Windows-related and applies equally to POSIX platforms. Changing some arguments from unsigned char to void (which was a good change) actually broke libopenmpt compile in the past, so this is not just a theoretical issue. Therefore, I still suggest using completely NEW names when changing argument types for explicit int64_t functions.

     
  • Thomas Orgis

    Thomas Orgis - 2022-09-20

    Thanks for the elaborate answer. I'll spend some time digesting that. Your observation about OpenBSD possibly not offering 64 bit offset API could be frightening, but is connected to the DOS case. So one would check if the size of off_t is 64 bit without switchery, and just use plain lseek() in that case. As for DOS … no largefile stuff, off_t is 32 bit … but int64 is available? So we could offer the i64 API while internally using 32 bit offsets. You talk about separating sample offsets from file offset size. So far I didn't really care for a platform that does offer 64 bit types, but does not offer 64 bit file offsets. That's weird. But, well … didn't anyone port ZFS to DOS yet? ;-)

    I need to chew a bit on the C++ ABI issues. So the function pointers you'd give the reader replacement would be of differing type for C++, and similar issues. That's lame. I do wonder about all the differing names for the same types being a good thing or not. Defining yet another set of names? Meh.

    From:

    mpg123_seek (either 32 or 64 bit, default off_t)
    mpg123_seek_32 (32 bit, wrapper or alias)
    mpg123_seek_64 (64 bit, wrapper or alias)
    

    To:

    mpg123_seek64 (int64_t, always)
    mpg123_seek (default off_t wrapper/alias over mpg123_seek64)
    mpg123_seek_32 (wrapper over mpg123_seek64)
    mpg123_seek_64 (alias to mpg123_seek64)
    

    Reader replacement callbacks accordingly. Internally, I can make use of the C compiler not bothering with the differing names for int and longs.

    The mpg123_seek64 function would be available on all platforms, regardless of it in the end triggering lseek or lseek64 inside (actually, a compat_lseek64 would cover that), depending on availability. Sample offsets would be 64 bit always. While I like extending the metaphor of file offsets to the virtual PCM files from the decoder, I realized that using more off_t than absolutely necessary doesn't help. But at the same time … this did not stop me from using off_t in syn123 API (you don't need to bother with that, just I do — got my reasons to re-use the code elsewhere;-). I just thought of off_t as something meaning 'type for the maximum size of things this machine can handle.' Well … a DOS box with 4 G RAM … is that a thing? size_t of 64 bits and off_t of 32 bits?!

    Regarding supporting platforms: We started using C99 for real quite recently only, as MSVC has a spotted history of really supporting it. I try not to cut off platforms for imperceptible gains. There was one person recently who insists on using Windows XP with our mingw binaries. That should still work. I don't think we need to support older stuff than that. There are older releases of mpg123 to use on early IRIX releases.

    I am not keen on deprecating API. That's a separate discussion, of course, but as long as I am the one with the burden to maintain the old interface, I'll refrain from removing it. I don't intend big changes in libmpg123, anyway. It's not like the MPEG1/2 audio standard is getting enhanced each day, or PCM encodings, even.

    Anyhow … I think I got a picture how things could work. Ensuring no renaming in the header for largefile-insensitive-setups. Adding 64-suffixed functions with int64_t to all platforms, distinct from the _64 ones. Simplifying the wrapper and alias stuff, as the insecurities of off_t wander to the surface only. Promise to never introduce off_t into API again, ever.

    Edit: To be clear, with the separate 64-suffixed functions, the idea of making the _32 and _64 functions explicit in the header without _FILE_OFFSET_BITS is moot. You get the current machinery when you use the plain name and care only for POSIX platforms, otherwise you get the stable new symbol with the new name.

     

    Last edit: Thomas Orgis 2022-09-20
  • Thomas Orgis

    Thomas Orgis - 2022-09-26

    Regarding the main topic of this bug report, we got this in 1.28.0:

    -- Added BUILD_NO_LARGENAME define to be used by MSVC builds. Note that
       an MSVC build of libmpg123 does not support 64 bit file offsets.
       That would need more morting to the explicit API. Thanks to MS for
       making off_t even more messy and less useful.
    

    This is a hack for MSVC only right now, enabled in the cmake build. I'll extend that to lfs-insensitive systems. The missing part was this:

     if test "x$ac_cv_sys_file_offset_bits" = x || echo "$ac_cv_sys_file_offset_bits" | $GREP  '@<:@^0-9@:>@' > /dev/null; then
        dnl if it has non-numeric chars or is empty... ignore...
        largefile_sensitive=no
    +   BUILD_NO_LARGENAME=1
     else
        largefile_sensitive=yes
    +   BUILD_NO_LARGENAME=0
     fi
    

    That avoids renames where there is no switchery. But please note that on OpenBSD, the aliases are still built (just verified in a VM). So -D_FILE_OFFSET_BITS=64 just works fine. Just -D_FILE_OFFSET_BITS=32 will fail, which is the proper thing to do.

     
  • manx

    manx - 2022-10-09

    I have to admit I was somewhat puzzled by your claim that the aliases would be available on OpenBSD, because if they were, there would not have been anything to report for me in the first place.

    Turns out the situation is more complex:
    While you are correct that a default build of libmpg123 on OpenBSD in fact provides the _64 aliases, that is not true for the libmpg123 that OpenBSD provides. Why? They are building with --disable-lfs-alias (see https://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/audio/mpg123/Makefile?rev=1.114&content-type=text/x-cvsweb-markup), which means no HAVE_LFS_ALIAS and because it's correctly detected as largefile-insensitive also no HAVE_LFS_WRAP.

    I do not think it is likely that OpenBSD will change what they are doing (BSDs tend to prefer "recompile world" over "any backwards compatibility", so from their perspective anything non-64-bit-off_t just does not exist).

    I do agree though that your change in r5162 does improve the situation for this case.

    However, for the bigger picture, I think this does support my case that a clean truly portable API is needed. Even today, and when only looking at POSIX systems, one cannot rely on explicit _64 aliases being available.

     
  • Thomas Orgis

    Thomas Orgis - 2022-10-09

    Of course, if there are options, people use them. Good point not only to test my default build, but the build the OS people chose. Practically, I consider the situation on OpenBSD as fixed with the updated header.

    About the non-off_t-API, I think we already agree that those will be new symbols, anyway (which possibly wrap over _64 or are wrapped over by those, depending on implementation stage/detail).

     
  • manx

    manx - 2022-10-09

    I agree, this issue can be closed now.

    Further discussion is a better fit for issue https://sourceforge.net/p/mpg123/bugs/344/.

     
  • Thomas Orgis

    Thomas Orgis - 2023-01-15
    • status: open --> closed-fixed
     

Log in to post a comment.