STAT 39000: Project 12 — Fall 2021
Motivation: Containers are a modern solution to packaging and shipping some sort of code in a reproducible and portable way. When dealing with R and Python code in industry, it is highly likely that you will eventually have a need to work with Docker, or some other container-based solution. It is best to learn the basics so the basic concepts aren’t completely foreign to you.
Context: This is the first project in a 2 project series where we learn about containers, and one of the most popular container-based solutions, Docker.
Scope: Docker, unix, Python
Questions
Question 1
First thing first. Please read this fantastic article for a great introduction to containers. Afterwards, please review the content we have available here.
In this project, we have a special challenge. Brown does not have Docker installed. This is due to a variety of reasons. Brown does have a tool called Singularity installed, however, it is different enough from more common containerization tools, that it does not make sense to learn for your first "container" experience.
To solve this issue, we’ve created a virtual machine that runs Ubuntu, and has Docker pre-installed and configured for you to use. To be clear, the majority of this project will revolve around the command line from within Jupyter Lab. We will specifically state the "deliverables" which will mainly be text or images that are copied and pasted in Markdown cells.
Please login and launch a Jupyter Lab session. Create a new notebook to put your solutions, and open up a terminal window beside your notebook.
In your terminal, navigate to /depot/datamine/apps/qemu/scripts/
. You should find 4 scripts. They perform the following operations, respectively.
-
Copies our VM image from
/depot/datamine/apps/qemu/images/
to/scratch/brown/$USER/
, so you each get to work on your own (virtual) machine. -
Creates a SLURM job and provides you a shell to that job. The job will last 4 hours, provide you with 4 cores, and will have ~6GB of RAM.
-
Runs the virtual machine in the background, in your SLURM job.
-
SSH’s into the virtual machine.
Run the scripts in your Terminal, in order, from 1-4.
cd /depot/datamine/apps/qemu/scripts/
./1_copy_vm.sh
./2_grab_a_node.sh
./3_run_a_vm.sh
You may need to press enter to free up the command line. |
./4_connect_to_vm.sh
You will eventually be asked for a password. Enter |
Remember, to add an image or screenshot to a markdown cell, you can use the following syntax: ![](/home/kamstut/my_image.png) |
-
A screenshot of your terminal window after running the 4 scripts.
Question 2
Awesome! Your terminal is now connected to an instance of Ubuntu with Docker already installed and configured for you! Now, let’s get to work.
First thing is first. Let’s test out pulling an image from the Docker Hub. wernight/funbox
is a fun image to do some wacky things on a command line. Pull the image (hub.docker.com/r/wernight/funbox), and verify that the image is available on your system using docker images
.
Run the following to get an ascii aquarium.
docker run -it wernight/funbox asciiquarium
Wow! That is wild! You can run this program on any system where an OCI compliant runtime exists — very cool!
To quit the program, press kdb:[Ctrl + c].
For this question, submit a screenshot of the running asciiquarium program.
-
Code used to solve this problem.
-
Output from running the code.
Question 3
Okay, that was fun, but let’s do something a little bit more practical. Check out the ~/projects/whin
directory in your VM. You should pretty quickly realize that this is our version of the WHIN API that we used earlier on in project (8).
If you recall, we had a lot of "extra" steps we had to take in order to run the API. We had to:
-
Install the Python dependencies.
-
Activate the appropriate Python environment.
-
Set the
DATABASE_PATH
environment variable. -
Remember some long and complicated command.
This is a fantastic example of when containerizing your app could be a great idea!
Let’s begin by writing our own Dockerfile.
First thing is first. We want our image to contain the correct version of Python for our app. Our app requires at least Python 3.9. Let’s see if we can find a base image that has Python 3.9 or later. Google "Python docker image" and you will find the following link: hub.docker.com/_/python
Here, we will find a wide variety of different "official" Python docker images. A great place to start. If you click on the "Tags" tab, you will be able to scroll through a wide variety of different versions of Python + operating systems. A great Linux distribution is Debian.
Fun fact: Debian/the Debian project (one of the, if not the most popular linux distribution) was founded by a Purdue alum, Ian Murdock. |
Okay, let’s go for the Python 3.9.9 + Bullseye (Debian) image. The tag for the image is python:3.9.9-bullseye
. But wait a second. If you look at the space required for the base image — it is already up to 370 or so MB — that is quite a bit! Maybe there is a lighter weight option? If you search for "slim" you will find an image with the tag python:3.9.9-slim-bullseye
that takes up only 45 MB by default — much better.
Create a file called Dockerfile
in the ~/projects/whin
directory. Use vim/emacs/nano to edit the file to look like this:
FROM python:3.9.9-slim-bullseye
Now, let’s build our image.
docker build -t whin:0.0.1 .
Once created, you should be able to view your image by running the following.
docker images
Now, let’s run our image. After running docker images
, if you look under the IMAGE
column, you should see an id for you image — something like 3dk35bdl
. To run your image, do the following.
docker run -dit 3dk35bdl
Be sure to replace 3dk35bdl
with the id of your image. Great! Your image should now be running. Find out by running the following.
docker ps
Under the NAMES
column, you will see the name of your running container — very cool! How does this test out anything? Don’t we want to see if we have Python 3.9 running like we want it to? Yes! Let’s get a bash shell inside our container. To do so run the following.
docker exec -it suspicious_lumiere /bin/bash
Replace suspicious_lumiere
with the name of your container. You should now be in a bash shell. Awesome! Run the following to see what version of Python we have installed.
python --version
Python 3.9.9
Awesome! So far so good! To exit the container, type and run exit
. Take a screenshot of your terminal after following these steps and add it to your notebook in a markdown cell.
To clean up and stop the container, run the following.
docker stop suspicious_lumiere
-
Code used to solve this problem.
-
Output from running the code.
Question 4
Okay, great! We have version 0.0.1 of our whin
image. Great.
Now let’s make this thing useful. Use vim/emacs/nano to edit the ~/projects/whin/Dockerfile
to look like this:
FROM python:3.9.9-slim-bullseye WORKDIR /app RUN python -m pip install fastapi[all] pandas aiosql fastapi-responses cyksuid httpie COPY . . EXPOSE 21650 CMD ["uvicorn", "app.main:app", "--reload", "--port", "21650", "--host", "0.0.0.0"]
Here, do your best to explain what each line of code does. Build version 0.0.2 of your image, and run it.
Okay, in theory, that last line should run our API — awesome! Let’s check the logs to see if it is working.
docker logs my_container_name
Remember, to get your container name, run |
What you should get is a Python error! Something about NoneType. Whoops! We forgot to include the DATABASE_PATH
environment variable so our API knows where our WHIN database is. That is critical to our API.
This command will be very useful to achieve this! |
Modify our Dockerfile to include the DATABASE_PATH
environment variable with a value /home/tdm-user/projects/whin/whin.db
. Rebuild your image (as version 0.0.2), and run it. Check the logs again, does it appear to be working?
-
The fixed Dockerfile contents in a markdown cell as code (surrounded by 3 backticks).
-
A screenshot (or more) of the terminal output from running the various commands.
Question 5
Okay, there is one step left. Let’s see if the API is really fully working by making a request to it. First, get a shell to the running container.
docker exec -it container_name /bin/bash
Remember, to get your |
One inside the container, let’s make a request to the API that is running. Run the following:
python -m httpie localhost:21650
If all is well you should get:
HTTP/1.1 200 OK content-length: 25 content-type: application/json date: Thu, 18 Nov 2021 20:28:47 GMT server: uvicorn { "message": "Hello World" }
Awesome! You can see our API is definitely working, cool!
Okay, one final test. Let’s exit the container and make a request to the API again. After all, it wouldn’t be that useful if we had to essentially login to a container when we want to access an API running in that container, would it?
http localhost:21650
Uh oh! Although our API is running smoothly inside of the container, we have no way of accessing it outside of the container. Remember, EXPOSE
only signals that we want to expose that port, it doesn’t actually do that for us. No worries, this can be easily fixed.
docker run -dit -p 21650:21650 --name my_container_name 3kdgj024jn
Here, we named the resulting container |
Where 3kdgj024jn
is the id of your image. Now, let’s try and access the API again.
http localhost:21650
Voila! It works! The following is an equivalent run statement:
docker run -dit -p 21650 --name my_container_name 3kdgj024jn
However, if you want to specify that the API internally is using port 21650, but we want to expose the API running inside our container to outside our container on a different port, say, port 5555, we could run the following.
docker run -dit -p 5555:21650 --name my_container_name 3kdgj024jn
Then, you could access the API by running the following:
http localhost:5555
While our request goes to port 5555, once the request hits the container, it is routed to port 21650 inside the container, which is where our API is running. This can be confusing a may take some experimentation until you are comfortable with it.
-
Screenshot(s) showing the input and output from the terminal.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. |