Introduction
Serialization gathers up the data from objects and converts them to a string of bytes, and writes to disk. The data can be deserialized and the original objects can be recreated. Many programming languages offer a way to do this including PHP, Java, Ruby and Python (common backend coding languages in web).
Let's talk about serialization in Python. In Python, when we can use the pickle module, the serialization is called “pickling.”
Table of content
●
Serialization in Python
●
Serialization in Web Applications
●
Over Pickling
●
Python YAML vs Python Pickle
●
Mitigation
●
Demonstration
●
Conclusion
Serialization in Python
While using Python, pickle.dumps() is used to
serialize some data and pickle.loads() is used to deserialize it (pickling and
unpickling). For eg: here is an array, pickled.
python3
>>>
import pickle
>>>
variable = pickle.dumps([1,2,3])
>>>
print(variable)
b'\x80\x04\x95\x0b\x00\x00\x00\x00\x00\x00\x00]\x94(K\x01K\x02K\x03e.'
>>>
pickle.loads(variable)
[1, 2,
3]
>>>
As we can see above, when we print the
variable, we see a byte string. This is serialization. Later, with
pickle.loads(variable) we are deserializing the object.
This is helpful in many cases, including when
we want to save some variables from a program on the drive as a binary which
can be later used in other programs. For example, let’s create an array and
save it as a binary file.
import
pickle
variable
= pickle.dumps([1,2,3])
with
open("myarray.pkl","wb") as f:
f.write(variable)
As we can see, a pickle binary is now stored
on the drive. Let's read it using pickle again.
import
pickle
obj =
open("myarray.pkl","rb").read()
pickle.loads(obj)
As you can see, we can now operate on this
deserialized object (obj) just like an array again! Throughout the SDLC, there
may come a time where a developer would want to quit the IDE but wants to save
all the data and states of variables at the moment, that is where this is a
helpful feature.
Serialization in Web
Apps
Okay, so we have talked about serialization in
software applications. But what is the use of serialization in web apps? So,
the HTTP is a stateless protocol. That is, the state of one request doesn't
depend on the previous request. But sometimes there is a need to maintain state.
That's why we have cookies. Cookies would bring a sense of statefulness in HTTP
protocol.
If we want a user's information and some data
to be retained next time they interact with the server, serialization is a
wonderful use case. Just serialize some data, put it into a cookie (which is
taking up user's storage and not server's! WoW) and next request just
deserialize it and use it on the site.
Pickle is used in python web apps to do this. But one caveat is that it
deserializes unsafely and its content is controlled
by the client. Just adding, serialization in json is much safer! Unlike
some other serialization formats, JSON doesn't allow executable code to be
embedded within the data. This eliminates the risk of code injection
vulnerabilities that can be exploited by malicious actors.
It is possible to construct malicious pickle
data which will execute arbitrary code!
Over Pickling
We have talked about pickling well known data
types like an array. But what if we were to pickle our own custom classes?
Python can easily understand and deserialize well known classes but what will
it do with custom classes like connection to servers and all those fancy
networking scripts? It doesn't even make sense to serialize those but Python
developers added a way to pickle that too. There is a chance that discrepancies
might happen when python tries to deserialize such objects.
Custom pickling and unpickling code can be
used. When you define a class you can provide a mechanism that states, 'here is
what you should do when someone asks to unpickle you!' So when python goes to
unpickle this string of bytes, it might have to run some code to figure out how
to properly reconstruct that object. This code will be embedded in this pickle
file.
Let's see a small example.
Here is a code for proof of concept. This code
is creating a class called EvilPickle. To implement support for pickling on
your custom object, you define a method called "__reduce__" which returns a function and pair of arguments to
call that function with. Here, a simple "cat /etc/passwd" would be
run using os.system function.
Finally, this would be written in a binary file called backup.data.
python
import
pickle
import
os
class
EvilPickle(object):
def __reduce__(self):
return (os.system, ('cat /etc/passwd', ))
pickle_data
= pickle.dumps(EvilPickle())
with
open("backup.data", "wb") as file:
file.write(pickle_data)
The idea here is to make the deserializer run
cat /etc/passwd on their system. Let's try it out now! We save the above code
in evilpickle.py file and run it. Just to check, we'll cat the backup.data
file. Here we can clearly see something fishy!
The user deserializes it anyway and ends up
giving out /etc/passwd file.
python
import
pickle
pickle.loads(open("backup.data","rb").read())
We can get even more nerdy and see what is
happening under the hood by disassembling using pickletools. Here, the pickling
is done on unix like os (posix) which is stored in a SHORT variable and stored
in as 0 and each successive command after that in different numeric values on
the stack. The `REDUCE` opcode is used to call a callable (typically a Python
function or method, here os.system (represented as posix and system)) with
arguments (called TUPLE. here, cat /etc/passwd). And finally, the program is stopped.
The primary difference between tuples and
lists is that tuples are immutable as opposed to lists which are mutable.
Therefore, it is possible to change a list but not a tuple. The contents of a
tuple cannot change once they have been created in Python due to the
immutability of tuples.
python3
-m pickletools -a backup.data
note: -a options gives some info about each
steps while using pickletools
So since the pickle object is user controlled
and it unpickles at server, we can even use this to get remote server shell as
well (using sockets and pickling it and finally providing it to the server)
PyTorch ML model up until recent times used
pickle for serialization of ML models and was vulnerable
to arbitrary code execution. Safetensors
overcame this issue.
Python YAML vs Python
Pickle
Python YAML is another serialization format
instead of pickle. But even Python YAML allows execution of arbitrary code by
default. Here is another POC:
import
yaml
document
= "!!python/object/apply:os.system ['cat /etc/passwd']"
yaml.load(document)
This would also execute cat /etc/passwd. We
can avoid this by using "safe_load()" instead of load anyway!
Mitigation
Pickle is just one module in Python. This is a
very well known tool and developers use it still but if the developers are a
little more mindful, they’ll not ignore the warning shown below on pickle’s
documentation page:
Alternatives to pickle and brief POCs on them are as follows:
1. JSON
import
json
#
Serialize
data =
{"key": "value"}
json_data
= json.dumps(data)
#
Deserialize
deserialized_data
= json.loads(json_data)
2. msgpack
import
msgpack
#
Serialize
data =
{"key": "value"}
msgpack_data
= msgpack.packb(data)
#
Deserialize
deserialized_data
= msgpack.unpackb(msgpack_data, raw=False)
Some other safe options to use would be:
protobuf by google, CBOR.
Demonstration
Okay, so the given website is a note taking
website which is using serialization. Here is what happens when I submit a note
with a PNG image.
This looks something like this when processed
by the server. Observe the URL which is rendering a .pickle file
The challenge also provided us with an app.py
source code which tells us all about the background logic. I can’t post the
entire code but here are some relevant snippets.
As we can see, the code is accepting title,
content and image as an object, pickling it and storing it in title.pickle
Here are the key functions of the code:
1. Note() class accepts an object new_note
with 3 items: title, content, image_filename.
2. save_note() is calling pickle.dumps() to
pickle new_note. save_note() is also called to store an image using image.save
which is a flask function. Similarly image.filename extracts image's filename.
3. secure_filename() function converts
insecure names to secure ones. For example: note 1 becomes note_1,
../../../etc/passwd becomes etc_passwd
4. unpickle_file is loading the pickled file
provided to it and unpickles it.
Here are some key takeaways about the
functionality of the code:
1. Site is accepting 3 key items.
2. It is not checking if PNG is safe or not
(as in if it is a valid PNG or not. This is a good attack point)
3. All in all, PNG file upload is a really
strong contender to put code in because: a, site isn't validating safety of PNG
and b, it will unpickle any file we provide.
I tried with a simple cat /etc/passwd command
on my local machine and the evil.png pickled file was deserializing properly!
import
pickle
import
os
class
EvilPickle(object):
def __reduce__(self):
return (os.system, ('cat /etc/passwd', ))
pickle_data
= pickle.dumps(EvilPickle())
with
open("evil.png", "wb") as file:
file.write(pickle_data)
Let's take it a step further and use a netcat
listener to receive data from deserialized local execution of evil.png and have
it give us a shell!
By following the same logic, we could exploit
the server. First I create a PNG file and upload it on the server.
The uploaded data becomes a pickle file which
gets stored on the server and when it is called, data is visible on the screen
(it is unpickled).
Finally we access the uploaded PNG file on the
server.
We get a reverse shell on the netcat listener
we set up this way!
This is how we root the box! Please note that
I hid and altered a few details throughout the CTF section of the article
because the CTF is still an ongoing challenge and I couldn’t obtain permission
to post a complete solution.
Conclusion
Serialization vulnerabilities are easy to
exploit and easy to overlook by developers. One can even achieve arbitrary code
execution on machines. As we saw, when deserialization insecurely or by using
insecure functions, we put our infrastructure at risk for compromise.
Developers should carefully read the documentation page and not ignore
warnings. And finally, use languages like json to serialize/deserialize data
which can’t be used to contain executable code since it is a data-only
language. Thanks for reading.
0 comments:
Post a Comment