Friday, October 3, 2014

Index mysql data to Solr


Check video for configuration

In this tutorial I am going to index tabular data from mysql to Apache Solr, The version of mysql I am using is mysql-5.5.20. We are going to create a sample table Employee, The script to create the table in mysql is as given below.

Table : Employee

CREATE TABLE Employee (
EmpID int,
LastName varchar(255),
FirstName varchar(255),
City varchar(255),
PRIMARY KEY (EmpID)
);

 

Query's to insert data into Employee table :

insert into employee (EmpID, LastName, FirstName, City ) values ( 1, 'Pai', 'Gurunath', 'Delhi');
insert into employee (EmpID, LastName, FirstName, City ) values ( 2, 'Sharma', 'Naresh', 'Ahmedabad');
insert into employee (EmpID, LastName, FirstName, City ) values ( 3, 'Shenoy', 'Suresh', 'Mumbai');
insert into employee (EmpID, LastName, FirstName, City ) values ( 4, 'Shetty', 'Rohit', 'Mumbai');


Now lets try to index this data using DataImportHandler of Solr.

 1. First of all, copy the appropriate libraries that are required to use Data Import Handler. So, let's create the dih folder anywhere on your system (I created it in the Solr home directory, so the path for the jar is relatively set.), and place apache-solr-dataimporthandler-4.x.jar and apache solr-dataimporthandler-extras-4.x.jar from the Solr distribution dist directory in the folder. In addition to that, we need the following entry to be added to the solrconfig.xml file:

<lib dir="../../../dih" regex=".*\.jar" />

2. Next we need to modify the solrconfig.xml file. You should add an entry like the following code:

 <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
        <str name="config">db-data-config.xml</str>
    </lst>
</requestHandler>


3. Now we will create the db-data-config.xml file that is responsible for the Data Import Handler configuration. It should have contents like the following example:

<dataConfig>
    <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="root" />
    <document>
        <entity name="Employee" query="SELECT EmpID, LastName, FirstName, City from Employee">
            <field column="EmpID" name="id" />
            <field column="LastName" name="LastName" />
            <field column="FirstName" name="FirstName" />
            <field column="City" name="City" />
        </entity>
    </document>
</dataConfig>


If you want to use other database engines, please change the driver, url, and user and password attributes.

4. Now, let's create a sample index structure. To do that we need to modify the fields section of the schema.xml file to something like the following snippet:

<field name="id" type="int" indexed="true" stored="true" required="true"/>
<field name="LastName" type="text_general" indexed="true" stored="true" />
<field name="FirstName" type="
text_general" indexed="true" stored="true"/>
<field name="City" type="
text_general" indexed="true" stored="true"/>

5. One more thing before the indexing – you should copy an appropriate JDBC driver to the lib directory of your Solr installation or the dih directory we created before.

6. Now we can start indexing. Run the following query to Solr:

http://localhost:8983/solr/dataimport?command=full-import


7. Open Solr admin, and check whether the data is indexed.


Please feel free to review and comment, many more tutorials to roll-out.

3 comments:

  1. Hello can u provide step by step tutorial to mysql data import in solr using ubuntu ?

    ReplyDelete
    Replies
    1. Hi Nidhi,

      I need some time to create the blog for the above requirement, But yes i can surely provide you the steps.

      Delete
  2. Thank you so much your code is working fine,
    I have some confusion that how we can implement solr search in php website
    Php website is running on other server and solr is running on other server then how we can restart
    the solr server from php website on time of request a search.?

    I am using solr php client file to connect with solr search then how we can restart solr server automatic.
    Please provide me any solution,
    Help me !!

    ReplyDelete