Discussion:
[boost] [nowide] request for clarification
Andrzej Krzemienski via Boost
2017-06-16 15:12:27 UTC
Permalink
Hi Everyone,
I admit I am not quite familiar with the problem, but I understand that as
one of the features, nowide offers a replacement for std::fstream that can
be constructed with its string types. At the same time we have
boost::filesystem that offers its own replacement for std::fstream that can
be constructed with filesystem::path. Now, if I want to use
`filesystem::path`s in my program (to be able to tell just any string from
a filesystem path), can I still use the benefits of `nowide` library?

Also, in the docs for nowide::ifstream, we read, "Same as
std::basic_ifstream<char> but accepts UTF-8 strings under Windows." What
about other systems? What does it accept on Linux? ascii?

In documentation for `nowide::args`, we read, "args is a class that fixes
standard main() function arguments and changes them to UTF-8 under
Microsoft Windows."
Does it write to the input strings in-place? is it even legal in C++?
It "fixes", which implies that otherwise the args are "broken". How are
args in function main() broken? (other than not being UTF-8)?

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Artyom Beilis via Boost
2017-06-16 20:10:46 UTC
Permalink
On Fri, Jun 16, 2017 at 6:12 PM, Andrzej Krzemienski via Boost
Post by Andrzej Krzemienski via Boost
Hi Everyone,
I admit I am not quite familiar with the problem, but I understand that as
one of the features, nowide offers a replacement for std::fstream that can
be constructed with its string types. At the same time we have
boost::filesystem that offers its own replacement for std::fstream that can
be constructed with filesystem::path. Now, if I want to use
`filesystem::path`s in my program (to be able to tell just any string from
a filesystem path), can I still use the benefits of `nowide` library?
Yes of course. There is an integration between nowide and filesystem
to make sure it considers narrow API to be UTF-8.

Also note the nowide::fstream works on MinGW as well
as filesystem.fstream calls std::fstream and only MSVC version
has open(wchar_t const *).
Post by Andrzej Krzemienski via Boost
Also, in the docs for nowide::ifstream, we read, "Same as
std::basic_ifstream<char> but accepts UTF-8 strings under Windows." What
about other systems? What does it accept on Linux? ascii?
Under Linux it accepts "char *" in whatever encoding it is considered.
See: http://cppcms.com/files/nowide/html/index.html#qna
Post by Andrzej Krzemienski via Boost
In documentation for `nowide::args`, we read, "args is a class that fixes
standard main() function arguments and changes them to UTF-8 under
Microsoft Windows."
Does it write to the input strings in-place? is it even legal in C++?
It replaces values of argc and argv and points them to other location
not modifying the original values.
Post by Andrzej Krzemienski via Boost
It "fixes", which implies that otherwise the args are "broken". How are
args in function main() broken? (other than not being UTF-8)?
That main(argc,argv) receives parameters converted from native UTF-8
internal API to current locale's codepage - generally not being able
to represent the all the required charset (since Windows does not
support UTF-8 as native locale)
Post by Andrzej Krzemienski via Boost
Regards,
&rzej;
Best,
Artyom

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Andrzej Krzemienski via Boost
2017-06-16 21:37:11 UTC
Permalink
Post by Artyom Beilis via Boost
On Fri, Jun 16, 2017 at 6:12 PM, Andrzej Krzemienski via Boost
Post by Andrzej Krzemienski via Boost
Hi Everyone,
I admit I am not quite familiar with the problem, but I understand that
as
Post by Andrzej Krzemienski via Boost
one of the features, nowide offers a replacement for std::fstream that
can
Post by Andrzej Krzemienski via Boost
be constructed with its string types. At the same time we have
boost::filesystem that offers its own replacement for std::fstream that
can
Post by Andrzej Krzemienski via Boost
be constructed with filesystem::path. Now, if I want to use
`filesystem::path`s in my program (to be able to tell just any string
from
Post by Andrzej Krzemienski via Boost
a filesystem path), can I still use the benefits of `nowide` library?
Yes of course. There is an integration between nowide and filesystem
to make sure it considers narrow API to be UTF-8.
Also note the nowide::fstream works on MinGW as well
as filesystem.fstream calls std::fstream and only MSVC version
has open(wchar_t const *).
Post by Andrzej Krzemienski via Boost
Also, in the docs for nowide::ifstream, we read, "Same as
std::basic_ifstream<char> but accepts UTF-8 strings under Windows." What
about other systems? What does it accept on Linux? ascii?
Under Linux it accepts "char *" in whatever encoding it is considered.
See: http://cppcms.com/files/nowide/html/index.html#qna
Post by Andrzej Krzemienski via Boost
In documentation for `nowide::args`, we read, "args is a class that fixes
standard main() function arguments and changes them to UTF-8 under
Microsoft Windows."
Does it write to the input strings in-place? is it even legal in C++?
It replaces values of argc and argv and points them to other location
not modifying the original values.
Ok. It makes sense :)
Post by Artyom Beilis via Boost
Post by Andrzej Krzemienski via Boost
It "fixes", which implies that otherwise the args are "broken". How are
args in function main() broken? (other than not being UTF-8)?
That main(argc,argv) receives parameters converted from native UTF-8
internal API to current locale's codepage - generally not being able
to represent the all the required charset (since Windows does not
support UTF-8 as native locale)
But given that what main() receives is already broken (Windos already could
not handle a name containing letters from two code pages), how can you
recover from this loss of information?

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Artyom Beilis via Boost
2017-06-17 08:37:25 UTC
Permalink
But given that what main() receives is already broken (Windos already could
not handle a name containing letters from two code pages), how can you
recover from this loss of information?

Regards,
&rzej;


Take a look to the code :-)

I use WinAPI to retrieve the original Utf-16 args. I don't relay on the
original strings.

Artyom

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Frédéric Bron via Boost
2017-06-17 09:17:17 UTC
Permalink
Post by Artyom Beilis via Boost
Take a look to the code :-)
I use WinAPI to retrieve the original Utf-16 args. I don't relay on the
original strings.
This is interesting. You can retreive more than what was given!
I think you should document this, not only in the code it-self.

Frédéric

_______________________________________________
Unsubscribe & other changes:
Artyom Beilis via Boost
2017-06-17 09:23:57 UTC
Permalink
Post by Artyom Beilis via Boost
Take a look to the code :-)
I use WinAPI to retrieve the original Utf-16 args. I don't relay on the
original strings.
This is interesting. You can retreive more than what was given!
I think you should document this, not only in the code it-self.

Frédéric


Actually it is

cppcms.com/files/nowide/html/classboost_1_1nowide_1_1args.html

Artyom

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/lis
Frédéric Bron via Boost
2017-06-17 09:45:50 UTC
Permalink
Post by Artyom Beilis via Boost
Actually it is
cppcms.com/files/nowide/html/classboost_1_1nowide_1_1args.html
sorry, yes, everything is there.

Frédéric

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org
Andrzej Krzemienski via Boost
2017-06-17 12:10:28 UTC
Permalink
Post by Andrzej Krzemienski via Boost
But given that what main() receives is already broken (Windos already could
not handle a name containing letters from two code pages), how can you
recover from this loss of information?
Regards,
&rzej;
Take a look to the code :-)
I use WinAPI to retrieve the original Utf-16 args. I don't relay on the
original strings.
This is impressive, and simple.

Thanks,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Niall Douglas via Boost
2017-06-16 21:44:06 UTC
Permalink
Post by Artyom Beilis via Boost
Post by Andrzej Krzemienski via Boost
It "fixes", which implies that otherwise the args are "broken". How are
args in function main() broken? (other than not being UTF-8)?
That main(argc,argv) receives parameters converted from native UTF-8
internal API to current locale's codepage - generally not being able
to represent the all the required charset (since Windows does not
support UTF-8 as native locale)
This is incorrect. You have been able to set the Windows console to
UTF-8 for many years. Just issue `chcp 65001`, your console is now in
UTF-8 and UTF-8 strings will present to argv.

Indeed it is possible to set UTF-8 consoles globally as the default, but
lots of stuff hard assumes Latin1 input and gets very upset if it sees
UTF-8. In particular, MSVCRT, though maybe the VS2015 rewrite of MSVCRT
has fixed that. I do remember .NET programs ran great with UTF-8 input
though, as do NT kernel programs. The blocker is MSVCRT.

Niall
--
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Peter Dimov via Boost
2017-06-16 22:06:55 UTC
Permalink
Post by Artyom Beilis via Boost
That main(argc,argv) receives parameters converted from native UTF-8
internal API to current locale's codepage - generally not being able to
represent the all the required charset (since Windows does not support
UTF-8 as native locale)
This is incorrect. You have been able to set the Windows console to UTF-8
for many years. Just issue `chcp 65001`, your console is now in UTF-8 and
UTF-8 strings will present to argv.
You can set the console to UTF-8 and it will display UTF-8 correctly, but
will UTF-8 strings come in (the narrow) argv? I think not.

#include <iostream>

int main( int argc, char const* argv[] )
{
std::cout << argv[1] << std::endl;
}

C:\Users\Peter Dimov>chcp 65001
Active code page: 65001

C:\Projects\testbed2017>debug\testbed2017.exe проба
?????

Whereas:

#include <boost/nowide/args.hpp>
#include <iostream>

int main( int argc, char const* argv[] )
{
boost::nowide::args args( argc, argv );
std::cout << argv[1] << std::endl;
}

Oops, a compile error, Nowide doesn't take char const*. All right,

#include <boost/nowide/args.hpp>
#include <iostream>

int main( int argc, char * argv[] )
{
boost::nowide::args args( argc, argv );
std::cout << argv[1] << std::endl;
}

And now:

C:\Projects\testbed2017>debug\testbed2017.exe проба
����������

See? Much better. :-) Although there's still room for improvement:

#include <boost/nowide/args.hpp>
#include <boost/nowide/iostream.hpp>

int main( int argc, char* argv[] )
{
boost::nowide::args args( argc, argv );
boost::nowide::cout << argv[1] << std::endl;
}

C:\Projects\testbed2017>debug\testbed2017.exe проба
проба

Or alternatively,

#include <boost/nowide/args.hpp>
#include <cstdio>

int main( int argc, char* argv[] )
{
boost::nowide::args args( argc, argv );
std::puts( argv[1] );
}

with the same result.


_______________________________________________
Unsubscribe & other changes: h
Loading...