Thursday, March 15, 2018

Tensorflow 1.6.0 on Mac OSX10.11 (el capitan) with macports

We should start with xcode. If you are doing any coding on a mac you ultimately need this rly horrible software. Get it from apple store or wherever bad software goes to  rest. My version is:


$xcodebuild -version
$Xcode 8.0

We continue with macports.


Macports is an easy to use package manager for installing open source software on Mac OS, allows easy download and install, and I like that. It works as command line, and uses a well-contained local address for all the installations it makes, which is great, does not mess up with other builds on the machine. (Apple's own package manager/repository could have served for this purpose but perhaps they didn't see any value in software they cannot sell, who knows...)


Ok I am on Mac os x el capitan (10.11) so the first thing is downloading the right package for this system. Follow here: https://guide.macports.org/chunked/installing.macports.html

My macports package is : MacPorts-2.4.2-10.11-ElCapitan.pkg

I double click and install without touching any default locations. An important thing to remember is macports will base all its installations under " /opt/local " by default. This is not a user specific location, it is shared by all users, so in order to make modifications here, you need to be a superuser.  (If you want to see all default locations and hierarchy: https://guide.macports.org/chunked/porthier.html )


Main macports command is "port" ,

$port version # tells you which version of macports you have
$port search python # for example, allows you search the macport repository (aka port tree) for software that is related to python. 


macports holds 'portfiles' which are basically descriptions of software, download locations, dependency list and other installation instructions. Before you can do the search above, you need to update macports port tree. To do this macport tries to connect with an rsync server, which is not allowed for my network, so i need to tell macport to do this by using the daily snapshot of the server instead, by changing the source location: 

$sudo vim /opt/local/etc/macports/sources.conf 


At the end of the file comment the rsync address,
#rsync://rsync.macports.org/macports/release/tarballs/ports.tar [default]
add the https tarball address instead:
https://distfiles.macports.org/ports.tar.gz [default]

Apparently this is a common issue for many people so they made a FAQ entry for it here: https://trac.macports.org/wiki/howto/PortTreeTarball


Now that macports has the right address, we can update its port tree:



$sudo port -d sync
--->  Updating the ports tree
Synchronizing local ports tree from https://distfiles.macports.org/ports.tar.gz
...

Next, I get what I need for my tensorflow build: python3.6,  virtualenv, and tf itself. Why the first two:
Py3.6: My collaborators use py3.6 so that is why I am sticking with it. 
Virtualenv: Mac already comes with some python version, as many of its apps use python scripts behind the scenes apparently. But I wanted a build environment I feel more confident in breaking things and rebuilding them, without worrying of messing up anything with this system. So I would like to work in the well-contained world of a virtual environment. Also because, I might want to have different tensorflow versions with different python version dependencies in the future, therefore I do not yet want a system-wide tensorflow. One that is contained to a limited virtual environment is enough for me.
Ok so here I continue:

$port search python36
$sudo port -v install python36
$sudo port select --set python3 python36
Therefore if I install python 3.6 inside macports, and set it such that this py36 still going to be what macports will call when asked for 'python3'. However, note that this is only effective within macport space, apple python3 would still overrule this. For example, I have py26 and py27 as system python by default, thanks to apple:

$which python
/usr/bin/python 
$python --version

Python 2.7.10

And macports knows about these versions I have:

$port select --list python
Available versions for python:
        none (active)
        python26-apple
        python27-apple
        python36


But as you see, it does not know which one I prefer within macports environment, for example when it needs to execute some installation instruction for a port. Which is ok with me, tho. I am interested in python3 command, which I set earlier via port select --set:

$ port select --list python3
Available versions for python3:
        none
        python36 (active)

$ which python3
/opt/local/bin/python3

So, all good. python3 is who it should be. Moving on: virtualenv

$port search virtualenv
$sudo port -v install py36-virtualenv

$sudo port select --set virtualenv virtualenv36
$which virtualenv
/opt/local/bin/virtualenv

$virtualenv --version

15.1.0

All good. Now virtualenv is also installed, and available to all users. 

So far I have been making installations in '/opt' space, hence with sudo. Now I will install tensorflow inside a virtual environment, under my home. 

$cd 
$virtualenv --system-site-packages -p python3 myTFEnv_36

This creates a directory in that address, and allows the installations within that environment to use system-site-packages in particular for python, it says, use python3 of the system, which in my case is python36.

$cd myTFEnv_36
source ./bin/activate #this is actually when that virtual environment becomes active. You would see that the terminal prompt changes when you do that: 
(myTFEnv_36)

The beauty of this is I can have many TF versions with different python versions and use them with as little effort as changing directories. (Remember to source the activate files to activate the virtual environment)

Ok, cool, but we still dont have tensorflow. Hurry:
We have a python package manager (pip) in virtual environment. Let's see its version 'coz TF people say it should be > 8.1

$ pip -V 
pip 9.0.1 from /Users/epsilon/myTFEnv_36/lib/python3.6/site-packages

$which pip
/Users/epsilon/myTFEnv_36/bin/pip
This way, you also see how virtualenv is referencing to python3 we asked it to use from the system-site-package

Good, we have the right version of pip. Moving on to tf:

$pip install tensorflow
..
Downloading tensorflow-1.6.0-cp36-cp36m-macosx_10_11_x86_64.whl
..
Installing collected packages: termcolor, astor, werkzeug, six, html5lib, bleach, markdown, numpy, protobuf, tensorboard, absl-py, gast, grpcio, tensorflow
..
..
..
Successfully installed absl-py-0.1.11 astor-0.6.2 bleach-1.5.0 gast-0.2.0 grpcio-1.10.0 html5lib-0.9999999 markdown-2.6.11 numpy-1.14.1 protobuf-3.5.2 six-1.11.0 tensorboard-1.6.0 tensorflow-1.6.0 termcolor-1.1.0 werkzeug-0.14.1

Voila, we have tf. Validate following TF guideline:
$ python

Python 3.6.4 (default, Dec 21 2017, 20:32:22)
>>> # Python

... import tensorflow as tf

>>> hello = tf.constant('Hello, TensorFlow!')

>>> sess = tf.Session()

>>> print(sess.run(hello))

b'Hello, TensorFlow!

For some other tf related work, you will also need python's data analysis toolkit thingy, pandas, so get it now as well:

$pip install pandas
Collecting pandas
Downloading pandas-0.22.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (14.9MB)
..


Let's say you want to run getting started examples from tensorflow guide, then carry on as:
$git clone https://github.com/tensorflow/models
$cd models/samples/core/get_started/
$python premade_estimator.py

You have run your first tf script on your mac :) 

===
Troubleshooting:

1-At first I tried this whole thing with py3.5 but I quickly noticed that the tensorflow that was packaged as py3.5 version on macports was actually a py36 and named wrongly. And I couldnt find the real source with the py3.5 instead on macports. Maybe the tf compiled for py3.5 has a bug? I don't really know. I just used the tf for py36 to overcome this.

2-"tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA"
When you run tf.Session() in the above validation example you might be getting this. Which tells that your cpu could run tf with higher performance but the software was not compiled for it. I didnt need this optimization, hence decided to not worry about it right now. But as soon as I upgrade to Sierra I will update the tf version to take advantage of such speedups, thanks to kindly provided precompiled code here: https://github.com/lakshayg/tensorflow-build (see also related http://www.andrewclegg.org/tech/TensorFlowLaptopCPU.html and https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/cpu_feature_guard.cc )

3-This is not tensorflow related but still: At some point I needed to make some plots to see the tensorflow results and realized I am lacking a few packages for it. My goto visualizer for this would have been xmgrace or gnuplot but I decided to give python plotting a chance. I'll just share in case someone finds it useful:
I did "pip install matplotlib" inside the virtualenv but kept getting import error: "ImportError: no module named .." when I tried to import matplotlib. Then I installed it via macports, and opened a new clean virtualenv, but that alone did not solve the issue either, this time the problem was the backend tho, the module was found. The default macOS backend kept not working for some reason. So I decided to use another backend "TkAgg" whose only dependency was tk framework:
$sudo port install py36-tkinter
did the job to a greater extend:
$python
Python 3.6.4 (default, Dec 21 2017, 20:32:22) 
>>> import matplotlib as mpl
>>> mpl.use('TkAgg')   # this is where I change the backend

>>> import matplotlib.pyplot as plt

so that was successfully imported and I thought all was done but then I tried to make a scatter plot to see my results:

>>> import pandas as pd
>>> import matplotlib
>>> matplotlib.use('TkAgg')
>>> import matplotlib.pyplot as plt
>>> df=pd.read_csv(r'results_1.csv',header=None)
>>> df.columns=['x','y']

>>> plt.scatter(df['x'],df['y'])
_tkinter.TclError: no display name and no $DISPLAY environment variable

ouch! Just when I thought it was all done I realized I dont even have an x server! so python was simply not able to open a new window to make a plot there. I installed quartz for x11 terminal management from
https://www.xquartz.org/
I restarted the machine, because relaunching from command line did not work:
launchctl load -w /Library/LaunchAgents/org.macosforge.xquartz.startx.plist 

And tried it all out again:
>>> plt.scatter(df['x'],df['y'])
returned:
<matplotlib.collections.PathCollection object at 0x10f846dd8>
which got me worried for a second thinking it was an error, but it looked like just the address of the object in memory. So I moved on:
>>> plt.show()
and I could finally see the results. 

===
Edit:
It has been a few days that I have been testing some algorithms with tf on this mac laptop. 
I like it so far, works good enough for local tests, no performance complaints from me till now. I like tensorflow also. But to be very honest, I dearly miss my linux workstation and in general coding with fortran where it was all much simpler and transparent somehow. 

Monday, April 17, 2017

Show duplicate files in a directory

diff between two files is powerful, but sometimes you want to see the which files are the same among many. In principle diff -s would do this, but in reality you do not really need the detailed scanning of diff ; and it takes a lot of time when there are many files.

In such cases you can use cmp which returns a silent 0 when the files are the same. In my case there were some manual file copying and re-naming over the days; it took me a while to see that I have created several duplicates.

Here is the one-liner I used to find the duplicates:

for i in ./*; do for j in ./*; do if [ $i != $j ]; then cmp -s "$i" "$j" && echo " $i and $j are identical!"; fi ; done; done

Simply what we do is
i is the first file, j is the second file, the if-statement is there because a file is identical to itself so we skip the comparison of files with the same name. Then we compare with cmp and if it returns a zero the echo command is executed thanks to && chain. 
One problem with this one-liner is it would return both i=j and j=i cases for each duplicate pair but I did not have time to beautify that part. Let me know if you have a one-liner addition to fix that! Cheers!!

Monday, May 6, 2013

Ubuntu remote update

http://blog.ryanrampersad.com/2010/07/update-ubuntu-via-terminal/

no magic, just a tiny reminder if you need to update an ubuntu server remotely:

sudo apt-get update
sudo apt-get upgrade

Thursday, April 18, 2013

Phonon and Irreps in Quantum Espresso

http://math.stackexchange.com/a/38963

The above link contains a simple, understandable answer to the frustrating question: What are irreducible representations ?

If you do Quantum Espresso Phonon calculations with ph.x you have heard about them as irreps.

What the code does is that, once you define which phonon wavenumber (q) you want the dynamical matrix to be calculated for, the code looks for the small group of q.

Small group of q is found by selecting , among the point group symmetries of the crystal, the operations that leave the vector q unchanged, or carries it to -q+G , where G is a reciprocal space vector.

And once you know the small group of q, you can write this group in terms of its irreducible representations.

Then the code carries on the understand, for each phonon mode, which irreducible representation they belong to:

The outcome looks like this:

There are   3 irreducible representations

Representation     1      2 modes -  To be done
Representation     2      2 modes -  To be done
Representation     3      2 modes -  To be done

I like the explanation by Arturo Magidin in the mentioned link so much that I decided to mirror it here so that perhaps it will be better protected from the perils of online existence :)
All credits go to original author

!---------------------------------------
A representation of the group G means a homomorphism from G into the group of automorphisms of a vector space V. Essentially, you are trying to interpret each element of G as an invertible linear transformation VV, in order to try to understand the group G by how it "acts on V."

If you have an action ρ1 of G on a vector space W (that is, one representation), and you have some other action ρ2 of G on another vector space Z (another representation), then you can use these two actions to construct an action of G on the vector space WZ: just let G act on the first coordinate using the old action on W, and let it act on the second coordinate using the old action on Z.

The point to observe, however, is that the action of G on WZ defined this way does not give you any new insights into the structure of G: anything you can glean about G from this action, you can learn about G by considering the original actions ρ1 and ρ2. So this new action does not give us anything new.

Conversely, suppose you have one representation ρ, with G acting on V, and that there are proper subspaces W and Z of V that satisfy the following properties:

  1. V=WZ; and
  2. The action of every gG on V maps W to itself; and
  3. The action of every gG on V maps Z to itself.
Then you can look at the restriction of the action of G on W to get a representation, and the restriction on Z to get another representation; and these two representations will give you all the information from the original representation, the same way we had before. The advantage being that since W and Z are proper subspaces of V, they have smaller dimension and, presumably, it's easier to understand a subgroup of linear automorphisms for them than for V.

So the moral is that we want to find representations that cannot be "broken up" into smaller ones, because there's no point in trying to understand ones that do break up, we can focus our attention on those that don't, because all the other representations can be built up in terms of the ones that cannot be broken up.

The irreducible representations are precisely the ones that cannot be broken up into smaller pieces. There is a theorem that says that if you have a representation ρ of G acting on V, and W is a subspace of V such that for all gG, the image of W under the action of g is W itself, then you can find a subspace Z of V such that V=WZ and every gG maps Z to itself (that is, in order to break up ρ into two smaller pieces, it is enough to find a single proper piece on which ρ acts; then you can find a complement for it). With this in mind, we say:

Let ρ:GAut(V) be a representation of G. We say that ρ is irreducible if and only if V is not the zero vector space, and the only subspaces of V that are mapped to themselves under the action of every gG are {0} and V itself.
An irreducible representation of SO(3) will be a representation of SO(3) that is irreducible. SO(3) acts naturally on the vector space R3: it consists of all automorphisms of R3 that respect the inner product, so this is itself a representation of SO(3) (which is irreducible, because no proper subspace of R3 is sent to itself by all elements of SO(3)). 
!---------------------------------------