Thursday, May 2, 2013

Why Suffix Tree and not Prefix Tree.


To find a pattern we need to find sub string in a text.Since Tries can also find sub string, but there is a catch since they find the sub strings with well defined start and end delimiter.
For ex : Suppose I need to search for a word in a text "bull". Then Trie can do it in O(m) time. But let say we need to find pattern such as "l bid be" now it is a sub-string, but it doesn't have a well defined delimiters starting and ending.
So Basically tries doesn't solve our problem for pattern search.It can find string with well defined boundaries, but not sub string with delimiters.Now this has obvious applications in finding some patterns in Human Gnome.
Like :
"ADCNDJCDNCD CNDJCNDNCDXXCAVDA CDNJCDNCD HBADSCD"
Now if I need to find  "CDNCD HBA" then tries can't do this. Since they stores words and not sub strings.
                                  
Therefore we need some other data structures which can do this.

Firstly our requirement is :
a)We need searching like Tries, means searching of sub string should be done in time proportional to the length of the pattern.
b)Instead of storing individual words in a Trie, we can store either Prefixes or Suffixes of the text(not of the words). Since by this way we can catch the patterns (i.e sub strings) other than individual words.

Now why prefixes or suffixes ?
Since to find sub strings, we must require to search in prefixes or suffixes of text. What it really means is ?
First lets look at Sub string definition.
We can define sub string in two ways:
 -->A prefix of some suffix.
 -->A suffix of some prefix.

For ex :
I have some Text as :
 "abcde h"
 Firstly we must find its prefixes :
  "a"
  "ab"
 "abc"
"abcd"
"abcde"
"abcde h"
Now to find a sub string/pattern let say : "de h"
This sub string must be a suffix of some prefix.
Here in our example it is suffix of last prefix.

Similarly for our other definition of sub string :
We calculate suffixes :
 "h"
" h"
"e h"
"de h"
"cde h"
"bcde h"
"abcde h"
Now to find substring "cde ". It must be  prefix of some suffix.
Here in our case it is prefix of suffix "cde h"

So from this we mean : if we store prefixes or suffixes of the text. We can easily search for a sub string (even without delimiters).

So our next step is :
Create a Trie of prefixes or suffixes of text.

Now our next question is Does Trie of prefixes really do our job?

Logically it seems that we don't require Trie of prefixes.
Since our main concern is the starting delimiter of the sub string.Because if we create a prefix tree, from where we should start our search for pattern.Since our Trie of prefixes always starts with prefixes.Now in order to find intermediate sub string like "cde h", we are not able start the Trie traversal from root of the Trie.
  There are two ways to deal with this problem :
 -->Either create a prefix tree and make some techniques to start traversal from bottom-up.Now this seems to be inappropriate , with some complex traversing from bottom to top,while also to make our Data structure to remain as Trie.
-->Or create a Trie of suffixes.
    Now this will solve our problem, because we will get our starting point from the top of the Trie and match till our sub string length.

So in short we will create a Trie , storing suffixes of entire Text(not individual words).










Monday, June 4, 2012

Shell and Linear Algebra blogs

Some links which are useful to explore for beginners.

Bash Scripting : AboutLinux
Awk One Liner : here
Linear Algebra :here
Unix One Liner :here
Node.js :here

Recovery Mode without grub

 1)After start booting : Ctrl + Alt + F1 --> Recovery Mode
 2)To reboot from terminal : sudo reboot

Tuesday, May 22, 2012

Tomcat - java.lang.OutOfMemoryError: PermGen space problem

When i run tomcat, sometimes i got JVM out of memory. This is due to insufficient tomcat memory allocated.

Solution
Steps :
1)Open catalina.sh in specific tomcat. tomcat/bin/catalina.sh
2)add in JAVA_OPTS value: -server -Xms512m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+DisableExplicitGC
3)Restart tomcat again

For more you can refer Here

Monday, April 30, 2012

Octave 3.4.3 Installation Ubuntu

1) Sudo apt-get install build-essentials
2)Download octave3.4.3 from Here
3)Untar it, and then install gfortran using sudo apt-get install gfortran
4)apt-get install octave3.2-headers [For building dependencies]
Or
apt-get install libblas-dev libatlas-dev liblapack-dev
5)sudo mv /usr/bin/gfortran    /usr/bin/gfortran.ORG
6)sudo ln -s /usr/bin/gfortran-x.x   /usr/bin/gfortran [create a soft link]
7)Create enviornment variables :
   - export CC=/usr/bin/gcc-x.x
   - export CXX=/usr/bin/g++-x.x
8)./configure in octave folder
9)make [it will take around 30 mins]
10)./run-octave
11)make check
12)sudo make install
13) make
14)make clean
For more Info check here
 
 
 

Wednesday, January 4, 2012

Useful Links for Algorithms and Puzzles

Coding Contest :

1>Code Chef : Initiative of Directi
2>AI challenge : http://aichallenge.org/problem_description.php
3>ML coding contest : http://www.kaggle.com/ [Best]
4>Heritage Health Prize :  Data analysis contest http://www.heritagehealthprize.com/c/hhp


Algorithms :

1>Carrer Cup [Blog /Pdf]
2>Geek for geeks [best]
3>Crack The Interview [Pdf]
4>Freshers world

Puzzles :
1>Gurmeet singh puzzles :Delightful Puzzles
2>Cracking the code : http://shashank7s.blogspot.com/p/stacks.html


Wednesday, December 7, 2011

Git Simple commands

Like other repository managers CVS, git is a also a repository manager. Where we can commit our code, so that it can be updated in repository.

Branches : Whole repository can be managed using two branches : local  and remote branches. Master is the root of all branches where all builds from different branches are committed and are in production version. Local branches are the branches specific for particular project. Remote branches are the branches specific to particular project but not created by us.

Useful git commands :

#git push origin branch-name : push the code to specified branch
#git branch : will show your current branch
#git branch  -a :show list of all branches including remote and local and master branches
#git branch -r : show list of all remote branches.
#git stash : save your local changes into git pull or git svn (so that whenever you git pull --rebase changes will automatically comes in)
#git checkout <branch_name> : To switch to specified branch
#git pull --rebase : pull the updated build from current branch
#git logs : to check the last updates in repository
#gitk : is an Graphical interface to git.
#git status : How many local changes you have made.
#git diff <filename> : shows the differences of changes you have made in that file after last commit/update.
#git branch <branch_name> : create a new branch with <branch_name>

Note for more details refer link