can I use some kind of binary string?

Hi,

I would like to use curl to retrieve an image from a web server which I 
want to store in a table in a mariadb database without downloading the 
image to a file.  For this application, I do not want to store 
references to files stored in some file system instead.

So I would want to use something like


my $binary_data = `curl -k "https://www.example.com/some.jpg"`;


The image then needs to be inserted into a LONGBLOB field via DBI in 
such a way that the image can be restored as it was.

Will string conversions or something prevent this from working?

What is the usual way to do this?  It is certainly not ideal to have no 
check on the amount of data that might be retrieved from the web server, 
regardless whether I save it to an intermediate file or not.

To make things more difficult, the web server is using a self-signed 
certificate.
0
hw
5/10/2019 5:09:50 PM
perl.beginners 29343 articles. 3 followers. Follow

5 Replies
41 Views

Similar Articles

[PageSpeed] 9

Hi hwilmer,

On Fri, 10 May 2019 19:09:50 +0200
hwilmer <hw@gc-24.de> wrote:

> Hi,
>=20
> I would like to use curl to retrieve an image from a web server which I=20
> want to store in a table in a mariadb database without downloading the=20
> image to a file.  For this application, I do not want to store=20
> references to files stored in some file system instead.
>=20
> So I would want to use something like
>=20
>=20
> my $binary_data =3D `curl -k "https://www.example.com/some.jpg"`;
>=20

Perl distinguishes between 8-bit/binary strings and unicode ones. See
https://perldoc.perl.org/perlunitut.html .

Note however that you should see https://perl-begin.org/uses/web-automation/
and use a module instead of trapping curl.exe's output. Perl has bindings to
libcurl too if that is what you want.

>=20
> The image then needs to be inserted into a LONGBLOB field via DBI in=20
> such a way that the image can be restored as it was.
>=20
> Will string conversions or something prevent this from working?
>=20
> What is the usual way to do this?  It is certainly not ideal to have no=20
> check on the amount of data that might be retrieved from the web server,=
=20
> regardless whether I save it to an intermediate file or not.
>=20
> To make things more difficult, the web server is using a self-signed=20
> certificate.
>=20



--=20
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
http://www.shlomifish.org/humour/bits/New-versions-of-the-GPL/

He who reinvents the wheel will likely design a square wheel and spend a ye=
ar
trying to figure out why it doesn=E2=80=99t work properly.
    =E2=80=94 Nadav Har=E2=80=99El, http://www.shlomifish.org/humour.html

Please reply to list if it's a mailing list post - http://shlom.in/reply .
0
shlomif
5/11/2019 9:07:18 AM
On 5/11/19 11:07 AM, Shlomi Fish wrote:
> Hi hwilmer,
> 
> On Fri, 10 May 2019 19:09:50 +0200
> hwilmer <hw@gc-24.de> wrote:
> 
>> Hi,
>>
>> I would like to use curl to retrieve an image from a web server which I
>> want to store in a table in a mariadb database without downloading the
>> image to a file.  For this application, I do not want to store
>> references to files stored in some file system instead.
>>
>> So I would want to use something like
>>
>>
>> my $binary_data = `curl -k "https://www.example.com/some.jpg"`;
>>
> 
> Perl distinguishes between 8-bit/binary strings and unicode ones. See
> https://perldoc.perl.org/perlunitut.html .

What kind of string do I get when using backticks like in the above 
example?  One that perl considers as a text string I could use stuff 
like uc or lc on, or as a binary string I could use pack or unpack on? 
Variables are without types, so there is no way to tell.  If I was using 
curl to receive a text string, how would I know which encoding is being 
used?  For all I know that could depend on the machine my program is 
running on after lots of factors I would never know about.  And this 
same encoding could happen to the image data.

Why would I use pack or unpack on the image data curl puts into the 
string?  Do I need to worry that somewhere --- like in my program or in 
some method DBI provides or somewhere else --- some kind of string 
transformation might take place that damages the image data?  Is there a 
way to tell perl that this is actually not a string but some binary data 
that must not be transformed or encoded?

So far, it's working, but that could be just luck ...

> Note however that you should see https://perl-begin.org/uses/web-automation/
> and use a module instead of trapping curl.exe's output. Perl has bindings to
> libcurl too if that is what you want.

First I tried to use WWW::Mechanize, and that failed because it can't 
deal witch the self-signed certificates the web server is using.  I 
couldn't find anywhere in the documentation how to allow such 
certificates.  Otherwise it seemed to be able to do what I wanted.

Using curl via the library bindings is somewhat going to lengths I would 
rather avoid.
0
hw
5/16/2019 11:13:16 AM
On Thu, 16 May 2019 13:13:16 +0200
hwilmer <hw@gc-24.de> wrote:

> On 5/11/19 11:07 AM, Shlomi Fish wrote:
> > Hi hwilmer,
> >=20
> > On Fri, 10 May 2019 19:09:50 +0200
> > hwilmer <hw@gc-24.de> wrote:
> >  =20
> >> Hi,
> >>
> >> I would like to use curl to retrieve an image from a web server which I
> >> want to store in a table in a mariadb database without downloading the
> >> image to a file.  For this application, I do not want to store
> >> references to files stored in some file system instead.
> >>
> >> So I would want to use something like
> >>
> >>
> >> my $binary_data =3D `curl -k "https://www.example.com/some.jpg"`;
> >> =20
> >=20
> > Perl distinguishes between 8-bit/binary strings and unicode ones. See
> > https://perldoc.perl.org/perlunitut.html . =20
>=20
> What kind of string do I get when using backticks like in the above=20
> example?  One that perl considers as a text string I could use stuff=20
> like uc or lc on, or as a binary string I could use pack or unpack on?=20
> Variables are without types, so there is no way to tell.  If I was using=
=20
> curl to receive a text string, how would I know which encoding is being=20
> used?  For all I know that could depend on the machine my program is=20
> running on after lots of factors I would never know about.  And this=20
> same encoding could happen to the image data.
>=20
> Why would I use pack or unpack on the image data curl puts into the=20
> string?  Do I need to worry that somewhere --- like in my program or in=20
> some method DBI provides or somewhere else --- some kind of string=20
> transformation might take place that damages the image data?  Is there a=
=20
> way to tell perl that this is actually not a string but some binary data=
=20
> that must not be transformed or encoded?
>=20
> So far, it's working, but that could be just luck ...
>=20

Perhaps use open "-|" with an encoding - see
https://perldoc.perl.org/functions/binmode.html .

> > Note however that you should see https://perl-begin.org/uses/web-automa=
tion/
> > and use a module instead of trapping curl.exe's output. Perl has bindin=
gs to
> > libcurl too if that is what you want. =20
>=20
> First I tried to use WWW::Mechanize, and that failed because it can't=20
> deal witch the self-signed certificates the web server is using.  I=20
> couldn't find anywhere in the documentation how to allow such=20
> certificates.  Otherwise it seemed to be able to do what I wanted.
>=20

See
https://stackoverflow.com/questions/47662461/how-to-accept-self-signed-cert=
ificates-with-lwpuseragent

> Using curl via the library bindings is somewhat going to lengths I would=
=20
> rather avoid.
>=20



--=20
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
http://youtu.be/xZLwtc9x4yA - Anime in Real Life!! (Parody)

Well, one thing I can tell you about parenthood is that such things
can progress from figurative to literal, extremely quickly.
    =E2=80=94 http://www.shlomifish.org/humour/Summerschool-at-the-NSA/

Please reply to list if it's a mailing list post - http://shlom.in/reply .
0
shlomif
5/16/2019 7:56:30 PM
On 5/16/19 9:56 PM, Shlomi Fish wrote:
> On Thu, 16 May 2019 13:13:16 +0200
> hwilmer <hw@gc-24.de> wrote:
> 
>> On 5/11/19 11:07 AM, Shlomi Fish wrote:
[...]
>>>> So I would want to use something like
>>>>
>>>>
>>>> my $binary_data = `curl -k "https://www.example.com/some.jpg"`;
>>>>   
>>>
>>> Perl distinguishes between 8-bit/binary strings and unicode ones. See
>>> https://perldoc.perl.org/perlunitut.html .
>>
>> What kind of string do I get when using backticks like in the above
>> example?
 > [...]
> Perhaps use open "-|" with an encoding - see
> https://perldoc.perl.org/functions/binmode.html .

I didn't know I could do that ...  I tried it and it works, too.

>>> Note however that you should see https://perl-begin.org/uses/web-automation/
>>> and use a module instead of trapping curl.exe's output. Perl has bindings to
>>> libcurl too if that is what you want.
>>
>> First I tried to use WWW::Mechanize, and that failed because it can't
>> deal witch the self-signed certificates the web server is using.  I
>> couldn't find anywhere in the documentation how to allow such
>> certificates.  Otherwise it seemed to be able to do what I wanted.
>>
> 
> See
> https://stackoverflow.com/questions/47662461/how-to-accept-self-signed-certificates-with-
lwpuseragent

That gives an error: 'Bareword "IO::Socket::SSL::SSL_VERIFY_NONE" not 
allowed while "strict subs" in use ...'.  But this works:

       my $ua = LWP::UserAgent->new(
				   max_size => $MAX_DOWNLOAD_SIZE,
				   ssl_opts => {
						ssl_verify => 0,
						verify_hostname => 0
					       }
				  );
0
hw
5/18/2019 12:02:46 PM
Hi,

On Sat, 18 May 2019 14:02:46 +0200
hwilmer <hw@gc-24.de> wrote:

> On 5/16/19 9:56 PM, Shlomi Fish wrote:
> > On Thu, 16 May 2019 13:13:16 +0200
> > hwilmer <hw@gc-24.de> wrote:
> >  =20
> >> On 5/11/19 11:07 AM, Shlomi Fish wrote: =20
> [...]
> >>>> So I would want to use something like
> >>>>
> >>>>
> >>>> my $binary_data =3D `curl -k "https://www.example.com/some.jpg"`;
> >>>>    =20
> >>>
> >>> Perl distinguishes between 8-bit/binary strings and unicode ones. See
> >>> https://perldoc.perl.org/perlunitut.html . =20
> >>
> >> What kind of string do I get when using backticks like in the above
> >> example? =20
>  > [...]
> > Perhaps use open "-|" with an encoding - see
> > https://perldoc.perl.org/functions/binmode.html . =20
>=20
> I didn't know I could do that ...  I tried it and it works, too.
>=20

Nice.

> >>> Note however that you should see
> >>> https://perl-begin.org/uses/web-automation/ and use a module instead =
of
> >>> trapping curl.exe's output. Perl has bindings to libcurl too if that =
is
> >>> what you want. =20
> >>
> >> First I tried to use WWW::Mechanize, and that failed because it can't
> >> deal witch the self-signed certificates the web server is using.  I
> >> couldn't find anywhere in the documentation how to allow such
> >> certificates.  Otherwise it seemed to be able to do what I wanted.
> >> =20
> >=20
> > See
> > https://stackoverflow.com/questions/47662461/how-to-accept-self-signed-=
certificates-with-
> > =20
> lwpuseragent
>=20
> That gives an error: 'Bareword "IO::Socket::SSL::SSL_VERIFY_NONE" not=20
> allowed while "strict subs" in use ...'.  But this works:
>=20
>        my $ua =3D LWP::UserAgent->new(
> 				   max_size =3D> $MAX_DOWNLOAD_SIZE,
> 				   ssl_opts =3D> {
> 						ssl_verify =3D> 0,
> 						verify_hostname =3D> 0
> 					       }
> 				  );
>=20

Great! Thanks for the tip.

--=20
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
https://is.gd/MQHVF3 - The Atom Text Editor edits a 2,000,001B file

The Bajoran scholars have positively identified Benjamin Sisko as The Emiss=
ary.
They also positively identified the NSA headquarters as The Dungeon.
    =E2=80=94 http://www.shlomifish.org/humour/bits/facts/NSA/

Please reply to list if it's a mailing list post - http://shlom.in/reply .
0
shlomif
5/18/2019 12:18:00 PM
Reply: