-
Notifications
You must be signed in to change notification settings - Fork 1
/
help docs.txt
213 lines (179 loc) · 8.13 KB
/
help docs.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
*******************kafka helping commands *****************************
Step 1: Download the code
Download the 0.10.1.0 release and un-tar it.
> tar -xzf kafka_2.11-0.10.1.0.tgz
> cd kafka_2.11-0.10.1.0
Step 2: Start the server
Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one.
You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.
> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...
Now start the Kafka server:
> bin/kafka-server-start.sh config/server.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...
Step 3: Create a topic
Let's create a topic named "test" with a single partition and only one replica:
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
We can now see that topic if we run the list topic command:
> bin/kafka-topics.sh --list --zookeeper localhost:2181
test
Step 4: Send some messages
Kafka comes with a command line client that will take input from a file or from standard input and
send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.
Run the producer and then type a few messages into the console to send to the server.
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message
Step 5: Start a consumer
Kafka also has a command line consumer that will dump out messages to standard output.
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message
This is another message
Now create a new topic with a replication factor of three:
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
Okay but now that we have a cluster how can we know which broker is doing what? To see that run the "describe topics" command:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3 Configs:
Topic: my-replicated-topic Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
Here is an explanation of output. The first line gives a summary of all the partitions, each additional line gives information about one partition. Since we have only one partition for this topic there is only one line.
"leader" is the node responsible for all reads and writes for the given partition.
Each node will be the leader for a randomly selected portion of the partitions.
"replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or
even if they are currently alive.
"isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader.
We can run the same command on the original topic we created to see where it is:
> bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
So there is no surprise there—the original topic has no replicas and is on server 0, the only server in our cluster when we created it.
Let's publish a few messages to our new topic:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
...
my test message 1
my test message 2
^C
Now let's consume these messages:
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
...
my test message 1
my test message 2
^C
*******************Spark helping commands *****************************
To Run Producer(KafkaProducer)
java -cp /home/riveriq/Projects/SparkKafkaIntegration/SparkKafkaIntegration-jar-with-dependencies.jar:/etc/kafka_2.10-0.8.2.0/libs/* com.riveriq.kafkaproducer.KafkaProducer JSON
To Run comsumer(SparkKafkaIntegration)
./spark-submit --class "com.riveriq.sparkkafkaintegration.SparkKafkaIntegration" \
--master local[2] \
/home/riveriq/Projects/SparkKafkaIntegration/SparkKafkaIntegration-jar-with-dependencies.jar localhost:2181 test-consumer-group kafkatest 1
******************** Producer(KafkaProducer) output ***********************
java -cp /home/riveriq/Projects/SparkKafkaIntegration/SparkKafkaIntegration-jar-with-dependencies.jar:/etc/kafka_2.10-0.8.2.0/libs/* com.riveriq.kafkaproducer.KafkaProducer JSON
initializing kafka config....
locading input file....
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Complete JSON input....
[
{
"id": "AK0001",
"Name": "Ashish",
"gender": "male",
"age": 28,
"address": "580 Veterans "
},
{
"id": "AK0002",
"Name": "Raj",
"gender": "male",
"age": 33,
"address": "560 Veterans"
},
{
"id": "AK0003",
"Name": "Arjun",
"gender": "male",
"age": 28,
"address": "570 Veterans"
}
]
Splitting file to jsons....
{"address":"580 Veterans ","gender":"male","id":"AK0001","age":28,"Name":"Ashish"}
{"address":"560 Veterans","gender":"male","id":"AK0002","age":33,"Name":"Raj"}
{"address":"570 Veterans","gender":"male","id":"AK0003","age":28,"Name":"Arjun"}
converting to JsonDocuments....
number of documents is: 3
sending msg to kafka....
sending msg....0
Message sent successfully
msg sent....0
sending msg....1
Message sent successfully
msg sent....1
sending msg....2
Message sent successfully
msg sent....2
Total of 3 messages sent
********************comsumer(SparkKafkaIntegration) output ***********************
./spark-submit --class "com.riveriq.sparkkafkaintegration.SparkKafkaIntegration" \
--master local[2] \
/home/riveriq/Projects/SparkKafkaIntegration/SparkKafkaIntegration-jar-with-dependencies.jar localhost:2181 test-consumer-group kafkatest 1
Main Started...
creating spark streaming contaxt...
17/02/16 19:25:20 INFO Remoting: Starting remoting
17/02/16 19:25:21 INFO Remoting: Remoting started;
KafkaUtils Streaming started...
Creating JavaDStream...
calling Create Dataframe...
17/02/16 19:25:31 INFO WriteAheadLogManager
.
.
.
17/02/16 19:25:36 INFO ConsumerFetcherManager:
Creating spark java bean...
Creating dataframe from bean class...
+-------------+---+------+------+------+
| address|age|gender| id| name|
+-------------+---+------+------+------+
|580 Veterans | 28| male|AK0001|Ashish|
| 560 Veterans| 33| male|AK0002| Raj|
| 570 Veterans| 28| male|AK0003| Arjun|
|580 Veterans | 28| male|AK0001|Ashish|
|580 Veterans | 28| male|AK0001|Ashish|
|580 Veterans | 28| male|AK0001|Ashish|
| 560 Veterans| 33| male|AK0002| Raj|
| 570 Veterans| 28| male|AK0003| Arjun|
|580 Veterans | 28| male|AK0001|Ashish|
|580 Veterans | 28| male|AK0001|Ashish|
+-------------+---+------+------+------+
registering dataframe as table...
+------+------+------+---+-------------+
| id| name|gender|age| address|
+------+------+------+---+-------------+
|AK0001|Ashish| male| 28|580 Veterans |
|AK0002| Raj| male| 33| 560 Veterans|
|AK0003| Arjun| male| 28| 570 Veterans|
|AK0001|Ashish| male| 28|580 Veterans |
|AK0001|Ashish| male| 28|580 Veterans |
|AK0001|Ashish| male| 28|580 Veterans |
|AK0002| Raj| male| 33| 560 Veterans|
|AK0003| Arjun| male| 28| 570 Veterans|
|AK0001|Ashish| male| 28|580 Veterans |
|AK0001|Ashish| male| 28|580 Veterans |
+------+------+------+---+-------------+
17/02/16 19:26:06 INFO WriteAheadLogManager for Thread:
17/02/16 19:26:06 INFO WriteAheadLogManager for Thread:
Creating spark java bean...
Creating dataframe from bean class...
+-------+---+------+---+----+
|address|age|gender| id|name|
+-------+---+------+---+----+
+-------+---+------+---+----+
registering dataframe as table...
+---+----+------+---+-------+
| id|name|gender|age|address|
+---+----+------+---+-------+
+---+----+------+---+-------+
^Z